[HN Gopher] A reckless introduction to Hindley-Milner type infer...
       ___________________________________________________________________
        
       A reckless introduction to Hindley-Milner type inference (2019)
        
       Author : nequo
       Score  : 79 points
       Date   : 2024-06-24 05:24 UTC (1 days ago)
        
 (HTM) web link (reasonableapproximation.net)
 (TXT) w3m dump (reasonableapproximation.net)
        
       | cies wrote:
       | I encountered HM a lot in my working life as a programmer. I did
       | not expect that when I first encountered the names.
       | 
       | From the article:
       | 
       | > HM type systems can't implement heterogenous lists [...] >
       | [well it does when wrapping the elements in a sum type, but then]
       | > it can only accept types you know about in advance. > [which
       | makes it] a massive pain to work with.
       | 
       | Sorry, I totally disagree... Heterogeneous lists without the sum
       | types are a massive pain! Maybe it is easier for the initial
       | development effort, but if you need to maintain code with
       | heterogeneous lists (or dynamically typed code in general) the
       | pain will come at some point and likely when already in
       | production.
       | 
       | I much rather fix compile errors than runtime errors. And a
       | massive thanks to Hindley and Milner for advancing type systems!
       | 
       | Found their photos in gratitude just now:
       | 
       | https://www.researchgate.net/profile/J-Hindley
       | 
       | https://en.wikipedia.org/wiki/Robin_Milner
        
         | fluoridation wrote:
         | They're only a pain if you need to know the run-time types of
         | the elements. If you're just operating on them through a
         | narrowly defined interface, that's just dynamic polymorphism,
         | which is indeed more flexible with regards to future
         | expandability, as long as you define the interface
         | appropriately.
        
       | skybrian wrote:
       | I think this article would be better if it ignored the issue of
       | proving termination.
       | 
       | Proving that a program terminates _eventually_ is a very weak
       | guarantee, since it could be a program that terminates after a
       | million years. This matters mostly for proof languages, where
       | it's useful to write proofs about functions that you _never run._
       | If the function provably terminates, you can assume that whatever
       | value it returns exists, without calculating it.
       | 
       | For a possibly long-running job like a SQL query, a static
       | guarantee that it terminates eventually is almost certainly not
       | what you want. You're going to need to test its performance, and
       | ideally have a timeout and a way of handling the timeout. If you
       | want to get fancy, a progress bar and cancel button might be
       | nice.
       | 
       | Edit:
       | 
       | Compilers need to run fast for the programs we actually write. A
       | type system that possibly doesn't terminate under unusual
       | conditions isn't that big a deal, as long as people don't write
       | code like that. Hopefully you get a decent error message.
       | 
       | Compare with memory use. We don't want compilers to use excessive
       | amounts of memory, but don't need static guarantees about memory
       | use.
        
         | dmkolobov wrote:
         | The problem with "not writing code like that" is that
         | eventually someone does.
         | 
         | At work we use a home-grown Haskell-like language with "row
         | types" which can be split into disjoint sets of fields in order
         | to represent joins and the like.
         | 
         | It works well in most cases, but it's very easy to forget a
         | type annotation and trigger unmanageable build-times. Often
         | times, it is not obvious where something needs to be annotated.
         | Sure, we can act as human time-outs, but the productivity loss
         | can be substantial, especially for a newcomer to the language.
        
           | gpm wrote:
           | The solution to "eventually someone does" is "add a timeout,
           | and when you hit that timeout give them an error so they can
           | fix their code".
           | 
           | There's no practical difference between exponential and
           | infinite time, you need to do this for bad exponential time
           | cases anyways.
           | 
           | Many langauges type systems (Java, Typescript, C++, Rust,
           | ...) really _aren 't_ guaranteed to terminate in the absence
           | of a timeout either. They're (the type systems) turing
           | complete, you can write infinite loops that the compiler
           | can't know are infinite loops. This makes no practical
           | difference relative to languages which "merely" have
           | exponential time type systems, because with an exponential
           | time type system you're alread accepting the risk that
           | typechecking might not terminate before the heat death of the
           | universe.
        
             | AnimalMuppet wrote:
             | A timeout (or recursion/expansion depth limit) could be
             | useful to give the human some specifics about what went
             | wrong. But humans generally kill compilations that take
             | much longer than usual. If it took 20 seconds before, and
             | now it's run for 10 minutes and hasn't finished, I'm
             | probably going to control-C it (or whatever), and then diff
             | against the previous code to try to figure out what went
             | wrong.
        
           | skybrian wrote:
           | Yes, making it possible to reason locally about compiler
           | performance can be pretty helpful. It's one reason I like to
           | have type declarations for public API's. Also, hitting an
           | implementation-defined limit (that you can raise) for
           | unreasonable constructs might result in better errors than
           | just continuing to run, but slowly.
           | 
           | A termination guarantee for type checking doesn't really
           | help.
        
         | samatman wrote:
         | > _Proving that a program terminates eventually is a very weak
         | guarantee, since it could be a program that terminates after a
         | million years. This matters mostly for proof languages, where
         | it's useful to write proofs about functions that you never run.
         | If the function provably terminates, you can assume that
         | whatever value it returns exists, without calculating it._
         | 
         | In practical algorithms of significance, we can often do much
         | better than this, proving both termination, and the time
         | complexity of that termination. For instance, a DFA of a
         | regular language will terminate in O(n) of the length of the
         | haystack, and we also have a bound for the time it takes to
         | construct that DFA.
         | 
         | Sometimes, as you're alluding to, we know something terminates,
         | but the time complexity is impractical: exhaustive naive search
         | for Traveling Salesman will, in principle, halt. This is,
         | itself, useful information.
         | 
         | The article claims (I believe correctly) not just that H-M
         | inference halts, but that it does so in "near-linear" time.
         | That's a much tighter bound than merely proving termination.
        
           | skybrian wrote:
           | Yes, agreed.
        
           | nextaccountic wrote:
           | Complexity sometimes is still too little, because the
           | constants can overwhelm your O(n) for all n that you care
           | about in practice. Or rather, complexity can't help you to
           | select between two competing O(n) algorithms
           | 
           | However, without a precise cost model of your operations
           | (say, you just know each operation has a bounded time and
           | such they are each O(1 ) but nothing beyond that), that's
           | often the best you can do
           | 
           | There's something called resource aware computation that can
           | do more finer grained bounds on the runtime and memory usage
           | of functions (and other resource usage like stack space so
           | you can prove you always use less than 8MB or whatever
           | number) not only complexity but also the constants. With the
           | right cost model you _could_ model CPU and memory latencies
           | and other things like CPU port usage, cache usage etc, but
           | there 's an interplay with the optimizer here (this could
           | even be used to spot performance regressions of compiler
           | codegen even without running any benchmark)
        
           | gpm wrote:
           | If "near-linear" time was a bound it would be, but it's not.
           | HM type inference is exptime-complete. Near-linear here is
           | just "most programs people actually write type check in
           | something close to linear time".
        
         | bmacho wrote:
         | I second this, for programs to terminate _eventually_ , is not
         | a good goal to pursue. In practice if you need termination, you
         | will need bounded termination, which solves your eventual
         | termination problem completely.
         | 
         | I don't understand why people talk so much about eventual
         | termination.
        
           | mrkeen wrote:
           | Think of it not like proofs but like tests, i.e. it catches
           | bad stuff.
           | 
           | Having static types is a low cost solution that catches a lot
           | of possible bugs. Purity catches another bunch of bugs. No-
           | nulls catches even more. At this point you can kinda start
           | trusting function names and type signatures to do what they
           | say on the box (often said about Haskell). Likewise, if it
           | compiles, it probably does the right thing.
           | 
           | So where's the next gap in Haskell that lets bugs through?
           | I'd argue that (eventual) termination would be that next
           | step. If you say a function has type a->a (for all a), it
           | only admits one possible implementation... except for bad
           | ones, like 'undefined' or an infinite recursion - both of
           | which termination would catch for you.
        
             | chowells wrote:
             | > If you say a function has type a->a (for all a), it only
             | admits one possible implementation... except for bad ones,
             | like 'undefined' or an infinite recursion - both of which
             | termination would catch for you.
             | 
             | To be super-pedantic, it only admits one possible
             | denotation, assuming termination. There are an infinite
             | number of implementations that waste finite time on
             | something useless before returning the argument. We usually
             | don't talk about those, because... Why would you write code
             | that does that? But it's worth keeping in mind as an edge
             | case in this sort of discussion.
        
           | shiandow wrote:
           | It's very easy to write programs solving some unsolvable
           | problem that don't terminate.
        
         | marssaxman wrote:
         | > I think this article would be better if it ignored the issue
         | of proving termination.
         | 
         | I agree; that part went on so long that I got bored and stopped
         | reading before the article actually got around to saying
         | anything about type inference.
        
       | abeppu wrote:
       | Can someone familiar with the territory recommend a better or
       | complementary resource of similar level of approachability?
       | 
       | The intro here flags that the author i) is not happy with the
       | post in its current form, ii) was resynthesized from wikipedia
       | "without particularly putting in the effort to check my own
       | understanding", which doesn't inspire confidence.
       | 
       | And then when you do get into the content, stuff _looks_
       | incorrect, but a non-expert reader, given the caveats in the
       | intro, should be on the lookout for errors but isn 't equipped to
       | identify them with confidence. E.g. the very first inference rule
       | included is described as:
       | 
       | > if you have a variable x declared to have some polytype p, and
       | if the monotype m is a specialisation of p, then x can be judged
       | to have type m
       | 
       | ... but I _think_ this should be  "p is a specialization of m".
       | 
       | Basically, I think this is the kind of material where detail
       | matters, where errors seriously impede understanding, and a
       | resource where the reader/learner has to be on guard second-
       | guessing the author is a bad intro resource. But surely there are
       | better ones out there.
        
         | nathanrf wrote:
         | The quoted paragraph is correct as written. E.g. if
         | 
         | f: [?]A, B. A -> B -> A
         | 
         | Then we have poly type p = "[?]A, B. A -> B -> A".
         | 
         | The monotype m = Str -> Int -> Str is a specialization of p, so
         | we are also permitted to judge that f has type m.
         | 
         | The type specialization operator "looks backwards", where we
         | write sorta "p <= m" because there's a kind of "subtyping"
         | relationship (though most HM type systems don't actually
         | contain subtyping) - anywhere that a term of type m is used, a
         | term of type p can be used instead because it can be
         | specialized to m.
         | 
         | I think this is a good complementary article:
         | https://boxbase.org/entries/2018/mar/5/hindley-milner/ (it also
         | links to three more academic sources for more context/detail)
        
         | shpongled wrote:
         | "Types and Programming Languages" by Benjamin Pierce is an
         | excellent textbook (and approachable, even to people without
         | formal CS education) that covers type inference and introduces
         | a lot of the nomenclature/formalism gradually.
        
         | mrkeen wrote:
         | This is nice and genuinely entertaining
         | https://youtu.be/IOiZatlZtGU (Propositions as types by phil
         | wadler). Its true focus is the 1-1 mapping between type systems
         | and logic systems, but it's well worth the watch.
         | 
         | This is good reading - it demystifies the funny typing rules in
         | their mathematical form. https://legacy-
         | blog.akgupta.ca/blog/2013/05/14/so-you-still-...
         | 
         | (Neither of these tell you how to implement the checker
         | yourself though)
        
         | efnx wrote:
         | It will never be finished but Stephen Diehl's "Write you a
         | Haskell" is great.
         | 
         | http://dev.stephendiehl.com/fun/
        
       | ashton314 wrote:
       | Here are my notes on writing a type checker/type inferrer in case
       | anyone is interested:
       | https://lambdaland.org/posts/2022-07-27_how_to_write_a_type_...
        
       ___________________________________________________________________
       (page generated 2024-06-25 23:00 UTC)