[HN Gopher] Unification in Elixir
       ___________________________________________________________________
        
       Unification in Elixir
        
       Author : weatherlight
       Score  : 169 points
       Date   : 2024-06-30 20:58 UTC (1 days ago)
        
 (HTM) web link (www.ericpfahl.com)
 (TXT) w3m dump (www.ericpfahl.com)
        
       | ReleaseCandidat wrote:
       | Type checking (type inference) is the usual application of
       | unification in programming languages (compilers or interpreters).
       | Pattern matching does not need unification, because, as mentioned
       | in the beginning of post, the "matching" is done one the left
       | side only (no variables to "match" on the right side).
        
         | YeGoblynQueenne wrote:
         | The one-sided pattern matching seems to be a peculiarity of
         | Elixir. The post above discusses an implementation of full
         | unification without that restriction. To clarify, unification
         | is not generally restricted to one-sided matching only.
         | 
         | Is unification unnecessary for pattern matching? I think that
         | depends on what you meany by "pattern matching". In programming
         | cirlces "pattern matching" usually means matching strings. Now,
         | unification is Turing-complete pattern matching, meaning that
         | it goes well beyond e.g. regular expressions that can only
         | match the set of strings recognised by regular automata. For
         | instance, in Prolog, you can unify a variable with an entire
         | Prolog program and call the program at runtime, etc. So if you
         | need "pattern matching" only to match strings then you probably
         | don't need the full power of unification.
         | 
         | If I understand correctly (not an expert) Hindley-Milner type
         | checking, the type checking that uses unification, relies
         | exactly on that ability of unification to match entire
         | programs. Anyway that kind of type checking is indeed pattern
         | matching by unification. So it's not different than pattern
         | matching on strings with regular expressions. The difference is
         | only in what kinds of strings can be matched.
        
           | troupo wrote:
           | > The one-sided pattern matching seems to be a peculiarity of
           | Elixir.
           | 
           | And Erlang. And Haskell. And OCaml. And F#. And...
           | 
           | As far as I understand, only Prolog has unification (out of
           | more-or-less used/known languages). And in languages that
           | require (rather awkward and verbose) workarounds for this
           | functionality, it's hard to see the immediate value of
           | unification.
        
             | YeGoblynQueenne wrote:
             | >> And Erlang. And Haskell. And OCaml. And F#. And..
             | 
             | Thanks, I didn't know that. I took the article to mean that
             | this kind of one-sided pattern matching is unique in
             | Elixir. My misreading.
             | 
             | From what I've seen it's true that most languages that have
             | support for pattern matching have special constructs for
             | it, whereas in Prolog it's baked in to the only data
             | structure in the language, the term, which can be a single
             | variable (a logic variable that can unify with ...
             | anything).
             | 
             | I don't know how useful or not unification would be in a
             | language other than Prolog to be honest. In Prolog it's an
             | integral component of the Resolution-based theorem prover
             | used as an interpreter of Prolog programs: resolution
             | proceeds by unifying Horn goals to the heads of definite
             | clauses, to produce new goals, recursively. More over, when
             | one runs a Prolog "query" the result of the query itself is
             | the substitution of the variables in the query, i.e. the
             | substitution that makes the query true [1]; a substitution
             | derived and propagated by unification.
             | 
             | I like how George W. Lloyd describes the use of unification
             | in Prolog. Apologies in advance for the long-form quote:
             | 
             |  _The idea that first order logic, or at least substantial
             | subsets of it, could be used as a programming language was
             | revolutionary, because, until 1972, logic had only ever
             | been used as a specification or declarative language in
             | computer science. However, what [R. Kowalski, Predicate
             | Logic as a Programming Language, 1974] shows is that logic
             | has a_ procedural interpretation _, which makes it very
             | effective as a programming language. Briefly, a program
             | clause A <- B1, ..., Bn is regarded as a _procedure
             | definition _. If <- C1, ..., Ck is a goal, then each Cj is
             | regarded as a _procedure call _. A program is run by giving
             | it an initial goal. If the current goal is <- C1, ..., Ck,
             | a step in the computation involves unifying some Cj with
             | the head A of a program clause A <- B1, ... Bn and thus
             | reducing the current goal to the goal <-(C1, ..., Cj-1, B1,
             | ..., Bn, Cj+1, ..., Ck)th, where th is the unifying
             | substitution. Unification thus becomes a uniform mechanism
             | for parameter passing, data selection and data
             | construction. The computation terminates when the empty
             | goal is produced._
             | 
             | And that's first-order SLD-Resolution, with unification, in
             | a nutshell! From George Lloyd: Foundations of Logic
             | Programming, second Ed. 1993.
             | 
             | There's a copy of the book here:
             | 
             | https://archive.org/details/foundationsoflog0000lloy_o7l2
             | 
             | But it's a limited preview :/
             | 
             | ________
             | 
             | [1] ... well, the substitution that makes the Horn goal
             | that we call a "query" _false_ , in reality, because Prolog
             | proves things by refutation but we fudge it a bit.
        
               | jerf wrote:
               | When you're using pattern matching to also create
               | variables, it's just really confusing to be able to
               | create variables with both sides of the expression. It
               | can get confusing enough as it is. Haskell has some
               | extensions to pattern matching that AIUI are generally
               | considered just too "powerful" to generally use, as it
               | makes it very difficult to read, for instance.
               | 
               | Plus to a first approximation you can always just swap
               | the binding from the right to the left. A deeper analysis
               | may reveal there's some case you can't do that, there's a
               | lot to pattern matching and I'm not going to try to sit
               | here and construct a case where you can't, but certainly
               | the most common cases (by far) would just require a
               | simple swap, so it's not an onerous requirement. And if
               | you did construct a case where you can't, you could make
               | a very solid software engineering case that you just
               | shouldn't do that and should do it it two parts.
               | 
               | To the extent this doesn't apply to prolog, it would be
               | when prolog is functioning directly as a logic language
               | rather than a programming language, and to be honest I'd
               | still advise against a prolog programmer binding
               | variables on both sides of an expression unless they're
               | forced to for some reason just because if nothing else,
               | you may have to understand your own code a week later.
               | Note this specifically applies to binding variables; I'm
               | not bothered by two already-bound variables being matched
               | against each other, that's just prolog functioning as
               | designed.
        
               | jpt4 wrote:
               | The Curry-Howard Correspondence is more of an observation
               | than a theorem, but insofar as it describes the general
               | concept of relating logics and programming languages,
               | what is the CHC for logic programming languages? If
               | strikes me that LP directly instantiates and manipulates
               | logical expressions, in a "self-dual" sense; or, is it
               | that the various "proof procedures" of the logic side map
               | to "proof search" algorithms on the computing side?
        
             | throwawaymaths wrote:
             | >And Erlang. And Haskell. And OCaml. And F#. And...
             | 
             | JavaScript (maps), Rust, Ruby, Python...
        
               | __jonas wrote:
               | What do you mean by this? Do maps in JavaScript have some
               | kind of pattern matching?
        
               | munificent wrote:
               | JavaScript supports destructuring lists and objects (sort
               | of like maps):
               | 
               | https://developer.mozilla.org/en-
               | US/docs/Web/JavaScript/Refe...
               | 
               | It's not full pattern matching because as far as I know
               | they don't support refutable patterns, but it handles
               | destructuring and binding to variables.
        
               | weatherlight wrote:
               | It's not the same.                   > [a, 2, 2, b] = [1,
               | 2, 2, 3]         [1, 2, 2, 3]         > a         1
               | > b         3         > [foo, 2, bar] = [3,3,3]
               | %MatchError{term: [3, 3, 3]}         > foo
               | %CompileError{description: "undefined function foo/0
               | (there is no such import)", file: "nofile", line: 1}
               | 
               | Ruby does something similar but the variables are bound
               | in the scope of the case block.
        
               | gergo_barany wrote:
               | What's not the same as what? Based on your examples it
               | looks like you're saying that unification is not the same
               | as pattern matching. But nobody said it was.
        
             | weatherlight wrote:
             | its not quite the same. you have to write quiet a bit of
             | OCaml or Haskell to do the following.
             | 
             | [foo, [2, 3, bar], baz, 5] = [1,[2,3,4], 5]
             | 
             | If there's a match, the bindings "foo", "bar", "baz" will
             | be bound to the values on the right hand side.
             | 
             | if not an exception will be thrown
             | 
             | (You are right, though, Erlang and Elixir have the same
             | semantics, both being on the BEAM)
        
           | ReleaseCandidat wrote:
           | > I think that depends on what you meany by "pattern
           | matching"
           | 
           | Oh, yes, that's true. "Pattern matching" as that what
           | normally is used in "functional programming languages". This
           | kind of pattern matching does not need unification. The IMHO
           | best (correct, but understandable and with all needed
           | definitions) explanation of it is in chapters 4 "STRUCTURED
           | TYPES AND THE SEMANTICS OF PATTERN-MATCHING" and 5 "EFFICIENT
           | COMPILATION OF PATTERN-MATCHING" of Peyton Jones et al., "The
           | implementation of functional programming languages".
           | 
           | https://simon.peytonjones.org/slpj-book-1987/
           | 
           | > In programming cirlces "pattern matching" usually means
           | matching strings.
           | 
           | Depends on the "circle" ;)
           | https://en.wikipedia.org/wiki/Pattern_matching
        
             | YeGoblynQueenne wrote:
             | Thanks, I don't really know much about Functional
             | Programming. My expertise is in logic programming which btw
             | is a different paradigm. I've heard it said that they're
             | similar and it seems to me the reason was the support for
             | pattern matching in functional languages.
             | 
             | I agree that what "pattern matching" means depends on a
             | programmer's background. It's interesting that the
             | wikipedia article that you link to does not mention
             | "unification" at all and only has one reference in Prolog
             | as one of a group of example languages with a "pattern
             | matching construct"; which is not how I would describe the
             | use of the unification algorithm in Prolog.
             | 
             | To be honest, I am not sure what is meant by "pattern
             | matching". It seems to be a loosely defined term, like
             | "Finite State Machine" (as opposed to "Finite State
             | _Automaton_ "). I expect that the book you linked to has a
             | more formal treating, perhaps, but from what I can tell
             | from a quick look the "pattern matching" here is based on
             | types. In Prolog, unification is used to bind values to
             | logic variables, which do not have types. So a variable, X,
             | in Prolog can unify, i.e. "match with" absolutely anything.
             | A "pattern" then can be formed by constructing terms (the
             | only data structure in Prolog) with "holes" like p(X, a, b,
             | Y,[1,2,3|Tail], f(Z)) etc. where capital letters are
             | variables.
             | 
             | One use of this ability is to manipulate singly-linked
             | lists to insert in or remove elements from their tail,
             | where normally only the head of such a list can be accessed
             | directly. What I mean is that in Prolog a list is denoted
             | by the special syntax [H|T] where H is the head of a list
             | (and can be any term, including another list) and T is the
             | tail of the list, another list.
             | 
             | Normally one appends to a list in Prolog using the append/3
             | predicate:                 append([], L, L).
             | append([H|T], L, [H|R]) :-         append(T, L, R).
             | 
             | This works on ordinary lists, but it is also possible to
             | define a "difference list" as a term, e.g. like:
             | [a,b,c,d|T]-T
             | 
             | Where T is a variable bound to the end of the tail of the
             | list. Two difference lists like that one can be appended
             | with the following predicate:                 append_dl(Xs-
             | Ys, Ys-Zs, Xs-Zs).
             | 
             | For example:                 ?- Xs = [a,b,c,d|T1]-T1, Ys =
             | [e,f,g,h|T2]-T2, append_dl(Xs,Ys,Zs).       Xs =
             | [a,b,c,d,e,f,g,h|T2]-[e,f,g,h|T2],       T1 = [e,f,g,h|T2],
             | Ys = [e,f,g,h|T2]-T2,       Zs = [a,b,c,d,e,f,g,h|T2]-T2.
             | 
             | The result of appending is in Zs, but you can see the
             | result of the intermediary unification steps in the other
             | variables echoed in the Prolog REPL (the "listener").
             | 
             | Of course, because this is Prolog you can also use the same
             | predicate to split a list:                 ?- Xs =
             | [a,b,c,d|T1]-T1, Zs = [a,b,c,d,e,f,g,h|T2]-T2,
             | append_dl(Xs,Ys,Zs).       Xs =
             | [a,b,c,d,e,f,g,h|T2]-[e,f,g,h|T2],       T1 = [e,f,g,h|T2],
             | Zs = [a,b,c,d,e,f,g,h|T2]-T2,       Ys = [e,f,g,h|T2]-T2.
             | 
             | Where Ys is what remains if we split Xs from Zs.
             | 
             | Appending or splitting lists like this is preferred to
             | append/3 because it appends the two lists in a single
             | operation so it can be used to economically put a list
             | together in a loop. The same technique is used with
             | Prolog's Definite Clause Grammars where the "difference" in
             | a difference list is the start and end of a string accepted
             | by a grammar, acting as a parser.
             | 
             | https://en.wikipedia.org/wiki/Definite_clause_grammar
             | 
             | Sterling and Shapiro's "The Craft of Prolog" has a nice
             | chapter on "Incomplete Data Structures" using difference
             | lists and patterns with "holes" but I don't think there's a
             | free version online, unfortunately.
        
               | ReleaseCandidat wrote:
               | > To be honest, I am not sure what is meant by "pattern
               | matching".
               | 
               | I'd say that "pattern matching" is unification, where
               | there are no free variables on the right side and the
               | left side is the "pattern" to match.
               | 
               | As Prolog code, something like                   ?-
               | [X|Xs] = [a, b, c]
               | 
               | where X = a and Xs = [b, c] would be "pattern matching"
               | the head and the tail of the list [a, b, c], but
               | ?- [X|[b, c]] = [a|Xs]
               | 
               | where X = a and Xs = [b, c] would be "real" unification,
               | if [X|[b, c]] would be valid syntax.
               | 
               | Oh, and both unification for type checking (inference)
               | and pattern matching are compile time actions (or done
               | before evaluating the generated functions).
               | 
               | A (simple) pattern match like                   match
               | list with           | head :: tail -> do_something(head,
               | tail)           | [] -> do_something_else()
               | 
               | is "compiled" to something like                   if list
               | != []           then do_something(head(list), tail(list))
               | else do_something_else()
        
               | YeGoblynQueenne wrote:
               | That sounds right. I guess then "pattern matching" as in
               | Elixir et al. is really just one-sided unification. Which
               | makes me wonder- why don't those languages go all the
               | way?
               | 
               | At a guess, that's something to do with the difficulty to
               | implement unification efficiently. The first description
               | of Unification, by J. Alan Robinson (who first described
               | Resolution) had exponential time complexity and it took a
               | while to find efficient unification algorithms. Perhaps
               | the designers of functional languages simply don't need
               | the hassle and can make do with something more efficient,
               | less complete. That would make good, practical sense in
               | languages that do not use Resolution as an interpreter-
               | it's Resolution that absolutely needs unification
               | (there's ground Resolution, without the need for
               | unification, but it's restricted to propositional terms
               | without first-order variables).
        
               | rdtsc wrote:
               | Pattern matching as is in Erlang, or how it is in most
               | modern languages, was not there at some point and was
               | considered novel and exotic. Now most everyone can do [X,
               | Y | _] = [1, [2], 3, 4] and it's not a seen as something
               | advanced.
               | 
               | I am not sure if it was ever called "unification" in
               | Erlang, it was always called "pattern matching". But the
               | equal sign did retain its name as the "match" operator:
               | https://www.erlang.org/doc/system/expressions.html#the-
               | match.... And that is a legacy of Prolog. I never read
               | X=1 as "make X equal to 1", in my head I always read it
               | as "match X to 1".
               | 
               | At some point, I also wondered: why didn't they go all
               | the way and just bring in Prolog's full unification. They
               | obviously liked Prolog; Erlang was developed in Prolog
               | initially, even. Well, it turns out with unification we'd
               | have to carry the match "hole" around until we might a
               | match. But Erlang has lightweight processes with isolated
               | heaps which communicate via messages, so now we'd need to
               | pass these "holes" around from process to process until
               | say one of the processes somewhere manages to unify the
               | value, and then have it propagate across the system to
               | original query. That breaks isolation. So you see how the
               | two neat features came into a conflict and they picked
               | one over the other, and rightly so it seems.
        
               | jpt4 wrote:
               | YeGoblynQueene, would you be amenable to discussing
               | certain questions of logic programming via e-mail? I am
               | reading about Jean-Yves Girard's "Stellar Resolution"
               | model of computation, and would greatly appreciate a
               | knowledgeable counterpart in the investigation.
        
               | gergo_barany wrote:
               | > I'd say that "pattern matching" is unification, where
               | there are no free variables on the right side and the
               | left side is the "pattern" to match.
               | 
               | Plus the restriction that the left hand side must be
               | linear, that is, must not contain any variable more than
               | once. So you usually can't write a pattern match like
               | match list with         | [x, y, x] -> ...
               | 
               | but languages with pattern matching usually allow you to
               | add side conditions like this:                   match
               | list with         | [x, y, x'] if x = x' -> ...
        
           | gergo_barany wrote:
           | > Now, unification is Turing-complete pattern matching
           | 
           | No. First-order syntactic unification is not Turing complete.
           | The combination of unification and resolution is Turing
           | complete.
           | 
           | > For instance, in Prolog, you can unify a variable with an
           | entire Prolog program and call the program at runtime, etc.
           | 
           | Lots of languages allow you to do this. Lisp had "eval" ten
           | years before Prolog existed. Unification is not necessary for
           | this.
           | 
           | Prolog is great, unification is great. Some other things are
           | great too.
        
       | ReleaseCandidat wrote:
       | While I'm at (free) books: a great example implementation of
       | unification (in Lisp) is in chapter 11 "Logic Programming" of
       | Peter Norvig's "Paradigms of Artificial Intelligence Programming:
       | Case Studies in Common Lisp".
       | 
       | https://github.com/norvig/paip-lisp
       | 
       | There is also an implementation in Python, but I have never
       | tested it, just googled it now:
       | https://github.com/dhconnelly/paip-python/blob/master/prolog...
       | 
       | https://dhconnelly.github.io/paip-python/docs/prolog.html
        
       | contificate wrote:
       | It is worth noting that lots of applications of unification do
       | not reify explicit substitutions in their implementations. You
       | often see introductory type inference articles (usually focused
       | on Hindley-Milner) use algorithm W, which uses a unification
       | procedure that returns substitutions (which must then be composed
       | with other substitutions). All of this is less efficient and
       | arguably more error-prone and more complicated than the
       | destructive unification implied by algorithm J (using the union-
       | find data structure, often implemented invasively into the type
       | representation to imply a union-find forest).
       | 
       | To this end, good coverage of decent (destructive) unification
       | algorithms can be found in any simple resource on type inference
       | (https://okmij.org/ftp/ML/generalization/sound_eager.ml) or the
       | Warren Abstract Machine
       | (https://github.com/a-yiorgos/wambook/blob/master/wambook.pdf).
       | 
       | Of course, there are times where you would want to reify
       | substitutions as a data structure to compose and apply, but most
       | of the time you just want to immediately apply them in a
       | pervasive way.
       | 
       | Despite what another comment says, unification is a valid - and
       | rather convenient - way to implement pattern matching (as in the
       | style of ML) in an interpreter: much like how you rewrite type
       | variables with types you are unifying them with, you can rewrite
       | the pattern variables to refer to the parts of the (evaluated)
       | value you are matching against (which you then use to extend the
       | environment with when evaluating the right hand side).
        
         | ReleaseCandidat wrote:
         | > Despite what another comment says, unification is a valid -
         | and rather convenient - way to implement pattern matching
         | 
         | The article shows that it is a valid and rather convenient way.
         | That's why it is important to tell people that it is not
         | necessary to use it for the "usual" pattern matching. You do
         | (again: for the "usual" pattern matching without "variables",
         | as in free variables, on the right side) _not_ want to use
         | unification but "compile" the pattern matching.
        
           | contificate wrote:
           | If we are talking about the context of a compiler, I agree:
           | you should compile pattern matching to decision trees (DAGs),
           | by way of something like Pettersson's algorithm. In my
           | comment, I specified "in an interpreter", as I think it's the
           | natural way to implement an SML-like match construct in that
           | context.
           | 
           | EDIT: Perhaps I am missing Elixir-specific context and I
           | apologise if this is the case. It was the unification that
           | made me click this post.
        
             | ReleaseCandidat wrote:
             | Well, I'd do that with "real" interpreters (using bytecode)
             | too. But on the other hand also _especially_ for "toy" tree
             | walking interpreters, to learn how to build the pattern
             | matching tree -- as unification is needed for the type
             | checker anyways ;). Although I would start by reading
             | Wadler's version (that I linked in the other post), it's
             | IMHO a bit more accessible than Peterson's https://www.clas
             | ses.cs.uchicago.edu/archive/2011/spring/2262...
             | 
             | But please don't get me wrong: I'm just saying that there
             | exist other, specialised and (well, generally) more
             | performant algorithms for pattern matching and people
             | should have at least heard about them. If somebody decides
             | to use unification for pattern matching, that still could
             | be changed if performance or better error messages or ...
             | matters.
        
               | contificate wrote:
               | Ah, I thought your username was familiar: you recommended
               | Wadler's approach on a previous HN thread concerning my
               | blog post about Pettersson/Maranget's algorithm
               | (https://news.ycombinator.com/item?id=39241776). Yeah, I
               | admit that Pettersson's telling of the algorithm is a bit
               | turgid (probably why Maranget's paper - which is the same
               | technique as its core, at least to my understanding - is
               | cited more often).
        
       ___________________________________________________________________
       (page generated 2024-07-01 23:01 UTC)