[HN Gopher] Unification in Elixir
___________________________________________________________________
Unification in Elixir
Author : weatherlight
Score : 169 points
Date : 2024-06-30 20:58 UTC (1 days ago)
(HTM) web link (www.ericpfahl.com)
(TXT) w3m dump (www.ericpfahl.com)
| ReleaseCandidat wrote:
| Type checking (type inference) is the usual application of
| unification in programming languages (compilers or interpreters).
| Pattern matching does not need unification, because, as mentioned
| in the beginning of post, the "matching" is done one the left
| side only (no variables to "match" on the right side).
| YeGoblynQueenne wrote:
| The one-sided pattern matching seems to be a peculiarity of
| Elixir. The post above discusses an implementation of full
| unification without that restriction. To clarify, unification
| is not generally restricted to one-sided matching only.
|
| Is unification unnecessary for pattern matching? I think that
| depends on what you meany by "pattern matching". In programming
| cirlces "pattern matching" usually means matching strings. Now,
| unification is Turing-complete pattern matching, meaning that
| it goes well beyond e.g. regular expressions that can only
| match the set of strings recognised by regular automata. For
| instance, in Prolog, you can unify a variable with an entire
| Prolog program and call the program at runtime, etc. So if you
| need "pattern matching" only to match strings then you probably
| don't need the full power of unification.
|
| If I understand correctly (not an expert) Hindley-Milner type
| checking, the type checking that uses unification, relies
| exactly on that ability of unification to match entire
| programs. Anyway that kind of type checking is indeed pattern
| matching by unification. So it's not different than pattern
| matching on strings with regular expressions. The difference is
| only in what kinds of strings can be matched.
| troupo wrote:
| > The one-sided pattern matching seems to be a peculiarity of
| Elixir.
|
| And Erlang. And Haskell. And OCaml. And F#. And...
|
| As far as I understand, only Prolog has unification (out of
| more-or-less used/known languages). And in languages that
| require (rather awkward and verbose) workarounds for this
| functionality, it's hard to see the immediate value of
| unification.
| YeGoblynQueenne wrote:
| >> And Erlang. And Haskell. And OCaml. And F#. And..
|
| Thanks, I didn't know that. I took the article to mean that
| this kind of one-sided pattern matching is unique in
| Elixir. My misreading.
|
| From what I've seen it's true that most languages that have
| support for pattern matching have special constructs for
| it, whereas in Prolog it's baked in to the only data
| structure in the language, the term, which can be a single
| variable (a logic variable that can unify with ...
| anything).
|
| I don't know how useful or not unification would be in a
| language other than Prolog to be honest. In Prolog it's an
| integral component of the Resolution-based theorem prover
| used as an interpreter of Prolog programs: resolution
| proceeds by unifying Horn goals to the heads of definite
| clauses, to produce new goals, recursively. More over, when
| one runs a Prolog "query" the result of the query itself is
| the substitution of the variables in the query, i.e. the
| substitution that makes the query true [1]; a substitution
| derived and propagated by unification.
|
| I like how George W. Lloyd describes the use of unification
| in Prolog. Apologies in advance for the long-form quote:
|
| _The idea that first order logic, or at least substantial
| subsets of it, could be used as a programming language was
| revolutionary, because, until 1972, logic had only ever
| been used as a specification or declarative language in
| computer science. However, what [R. Kowalski, Predicate
| Logic as a Programming Language, 1974] shows is that logic
| has a_ procedural interpretation _, which makes it very
| effective as a programming language. Briefly, a program
| clause A <- B1, ..., Bn is regarded as a _procedure
| definition _. If <- C1, ..., Ck is a goal, then each Cj is
| regarded as a _procedure call _. A program is run by giving
| it an initial goal. If the current goal is <- C1, ..., Ck,
| a step in the computation involves unifying some Cj with
| the head A of a program clause A <- B1, ... Bn and thus
| reducing the current goal to the goal <-(C1, ..., Cj-1, B1,
| ..., Bn, Cj+1, ..., Ck)th, where th is the unifying
| substitution. Unification thus becomes a uniform mechanism
| for parameter passing, data selection and data
| construction. The computation terminates when the empty
| goal is produced._
|
| And that's first-order SLD-Resolution, with unification, in
| a nutshell! From George Lloyd: Foundations of Logic
| Programming, second Ed. 1993.
|
| There's a copy of the book here:
|
| https://archive.org/details/foundationsoflog0000lloy_o7l2
|
| But it's a limited preview :/
|
| ________
|
| [1] ... well, the substitution that makes the Horn goal
| that we call a "query" _false_ , in reality, because Prolog
| proves things by refutation but we fudge it a bit.
| jerf wrote:
| When you're using pattern matching to also create
| variables, it's just really confusing to be able to
| create variables with both sides of the expression. It
| can get confusing enough as it is. Haskell has some
| extensions to pattern matching that AIUI are generally
| considered just too "powerful" to generally use, as it
| makes it very difficult to read, for instance.
|
| Plus to a first approximation you can always just swap
| the binding from the right to the left. A deeper analysis
| may reveal there's some case you can't do that, there's a
| lot to pattern matching and I'm not going to try to sit
| here and construct a case where you can't, but certainly
| the most common cases (by far) would just require a
| simple swap, so it's not an onerous requirement. And if
| you did construct a case where you can't, you could make
| a very solid software engineering case that you just
| shouldn't do that and should do it it two parts.
|
| To the extent this doesn't apply to prolog, it would be
| when prolog is functioning directly as a logic language
| rather than a programming language, and to be honest I'd
| still advise against a prolog programmer binding
| variables on both sides of an expression unless they're
| forced to for some reason just because if nothing else,
| you may have to understand your own code a week later.
| Note this specifically applies to binding variables; I'm
| not bothered by two already-bound variables being matched
| against each other, that's just prolog functioning as
| designed.
| jpt4 wrote:
| The Curry-Howard Correspondence is more of an observation
| than a theorem, but insofar as it describes the general
| concept of relating logics and programming languages,
| what is the CHC for logic programming languages? If
| strikes me that LP directly instantiates and manipulates
| logical expressions, in a "self-dual" sense; or, is it
| that the various "proof procedures" of the logic side map
| to "proof search" algorithms on the computing side?
| throwawaymaths wrote:
| >And Erlang. And Haskell. And OCaml. And F#. And...
|
| JavaScript (maps), Rust, Ruby, Python...
| __jonas wrote:
| What do you mean by this? Do maps in JavaScript have some
| kind of pattern matching?
| munificent wrote:
| JavaScript supports destructuring lists and objects (sort
| of like maps):
|
| https://developer.mozilla.org/en-
| US/docs/Web/JavaScript/Refe...
|
| It's not full pattern matching because as far as I know
| they don't support refutable patterns, but it handles
| destructuring and binding to variables.
| weatherlight wrote:
| It's not the same. > [a, 2, 2, b] = [1,
| 2, 2, 3] [1, 2, 2, 3] > a 1
| > b 3 > [foo, 2, bar] = [3,3,3]
| %MatchError{term: [3, 3, 3]} > foo
| %CompileError{description: "undefined function foo/0
| (there is no such import)", file: "nofile", line: 1}
|
| Ruby does something similar but the variables are bound
| in the scope of the case block.
| gergo_barany wrote:
| What's not the same as what? Based on your examples it
| looks like you're saying that unification is not the same
| as pattern matching. But nobody said it was.
| weatherlight wrote:
| its not quite the same. you have to write quiet a bit of
| OCaml or Haskell to do the following.
|
| [foo, [2, 3, bar], baz, 5] = [1,[2,3,4], 5]
|
| If there's a match, the bindings "foo", "bar", "baz" will
| be bound to the values on the right hand side.
|
| if not an exception will be thrown
|
| (You are right, though, Erlang and Elixir have the same
| semantics, both being on the BEAM)
| ReleaseCandidat wrote:
| > I think that depends on what you meany by "pattern
| matching"
|
| Oh, yes, that's true. "Pattern matching" as that what
| normally is used in "functional programming languages". This
| kind of pattern matching does not need unification. The IMHO
| best (correct, but understandable and with all needed
| definitions) explanation of it is in chapters 4 "STRUCTURED
| TYPES AND THE SEMANTICS OF PATTERN-MATCHING" and 5 "EFFICIENT
| COMPILATION OF PATTERN-MATCHING" of Peyton Jones et al., "The
| implementation of functional programming languages".
|
| https://simon.peytonjones.org/slpj-book-1987/
|
| > In programming cirlces "pattern matching" usually means
| matching strings.
|
| Depends on the "circle" ;)
| https://en.wikipedia.org/wiki/Pattern_matching
| YeGoblynQueenne wrote:
| Thanks, I don't really know much about Functional
| Programming. My expertise is in logic programming which btw
| is a different paradigm. I've heard it said that they're
| similar and it seems to me the reason was the support for
| pattern matching in functional languages.
|
| I agree that what "pattern matching" means depends on a
| programmer's background. It's interesting that the
| wikipedia article that you link to does not mention
| "unification" at all and only has one reference in Prolog
| as one of a group of example languages with a "pattern
| matching construct"; which is not how I would describe the
| use of the unification algorithm in Prolog.
|
| To be honest, I am not sure what is meant by "pattern
| matching". It seems to be a loosely defined term, like
| "Finite State Machine" (as opposed to "Finite State
| _Automaton_ "). I expect that the book you linked to has a
| more formal treating, perhaps, but from what I can tell
| from a quick look the "pattern matching" here is based on
| types. In Prolog, unification is used to bind values to
| logic variables, which do not have types. So a variable, X,
| in Prolog can unify, i.e. "match with" absolutely anything.
| A "pattern" then can be formed by constructing terms (the
| only data structure in Prolog) with "holes" like p(X, a, b,
| Y,[1,2,3|Tail], f(Z)) etc. where capital letters are
| variables.
|
| One use of this ability is to manipulate singly-linked
| lists to insert in or remove elements from their tail,
| where normally only the head of such a list can be accessed
| directly. What I mean is that in Prolog a list is denoted
| by the special syntax [H|T] where H is the head of a list
| (and can be any term, including another list) and T is the
| tail of the list, another list.
|
| Normally one appends to a list in Prolog using the append/3
| predicate: append([], L, L).
| append([H|T], L, [H|R]) :- append(T, L, R).
|
| This works on ordinary lists, but it is also possible to
| define a "difference list" as a term, e.g. like:
| [a,b,c,d|T]-T
|
| Where T is a variable bound to the end of the tail of the
| list. Two difference lists like that one can be appended
| with the following predicate: append_dl(Xs-
| Ys, Ys-Zs, Xs-Zs).
|
| For example: ?- Xs = [a,b,c,d|T1]-T1, Ys =
| [e,f,g,h|T2]-T2, append_dl(Xs,Ys,Zs). Xs =
| [a,b,c,d,e,f,g,h|T2]-[e,f,g,h|T2], T1 = [e,f,g,h|T2],
| Ys = [e,f,g,h|T2]-T2, Zs = [a,b,c,d,e,f,g,h|T2]-T2.
|
| The result of appending is in Zs, but you can see the
| result of the intermediary unification steps in the other
| variables echoed in the Prolog REPL (the "listener").
|
| Of course, because this is Prolog you can also use the same
| predicate to split a list: ?- Xs =
| [a,b,c,d|T1]-T1, Zs = [a,b,c,d,e,f,g,h|T2]-T2,
| append_dl(Xs,Ys,Zs). Xs =
| [a,b,c,d,e,f,g,h|T2]-[e,f,g,h|T2], T1 = [e,f,g,h|T2],
| Zs = [a,b,c,d,e,f,g,h|T2]-T2, Ys = [e,f,g,h|T2]-T2.
|
| Where Ys is what remains if we split Xs from Zs.
|
| Appending or splitting lists like this is preferred to
| append/3 because it appends the two lists in a single
| operation so it can be used to economically put a list
| together in a loop. The same technique is used with
| Prolog's Definite Clause Grammars where the "difference" in
| a difference list is the start and end of a string accepted
| by a grammar, acting as a parser.
|
| https://en.wikipedia.org/wiki/Definite_clause_grammar
|
| Sterling and Shapiro's "The Craft of Prolog" has a nice
| chapter on "Incomplete Data Structures" using difference
| lists and patterns with "holes" but I don't think there's a
| free version online, unfortunately.
| ReleaseCandidat wrote:
| > To be honest, I am not sure what is meant by "pattern
| matching".
|
| I'd say that "pattern matching" is unification, where
| there are no free variables on the right side and the
| left side is the "pattern" to match.
|
| As Prolog code, something like ?-
| [X|Xs] = [a, b, c]
|
| where X = a and Xs = [b, c] would be "pattern matching"
| the head and the tail of the list [a, b, c], but
| ?- [X|[b, c]] = [a|Xs]
|
| where X = a and Xs = [b, c] would be "real" unification,
| if [X|[b, c]] would be valid syntax.
|
| Oh, and both unification for type checking (inference)
| and pattern matching are compile time actions (or done
| before evaluating the generated functions).
|
| A (simple) pattern match like match
| list with | head :: tail -> do_something(head,
| tail) | [] -> do_something_else()
|
| is "compiled" to something like if list
| != [] then do_something(head(list), tail(list))
| else do_something_else()
| YeGoblynQueenne wrote:
| That sounds right. I guess then "pattern matching" as in
| Elixir et al. is really just one-sided unification. Which
| makes me wonder- why don't those languages go all the
| way?
|
| At a guess, that's something to do with the difficulty to
| implement unification efficiently. The first description
| of Unification, by J. Alan Robinson (who first described
| Resolution) had exponential time complexity and it took a
| while to find efficient unification algorithms. Perhaps
| the designers of functional languages simply don't need
| the hassle and can make do with something more efficient,
| less complete. That would make good, practical sense in
| languages that do not use Resolution as an interpreter-
| it's Resolution that absolutely needs unification
| (there's ground Resolution, without the need for
| unification, but it's restricted to propositional terms
| without first-order variables).
| rdtsc wrote:
| Pattern matching as is in Erlang, or how it is in most
| modern languages, was not there at some point and was
| considered novel and exotic. Now most everyone can do [X,
| Y | _] = [1, [2], 3, 4] and it's not a seen as something
| advanced.
|
| I am not sure if it was ever called "unification" in
| Erlang, it was always called "pattern matching". But the
| equal sign did retain its name as the "match" operator:
| https://www.erlang.org/doc/system/expressions.html#the-
| match.... And that is a legacy of Prolog. I never read
| X=1 as "make X equal to 1", in my head I always read it
| as "match X to 1".
|
| At some point, I also wondered: why didn't they go all
| the way and just bring in Prolog's full unification. They
| obviously liked Prolog; Erlang was developed in Prolog
| initially, even. Well, it turns out with unification we'd
| have to carry the match "hole" around until we might a
| match. But Erlang has lightweight processes with isolated
| heaps which communicate via messages, so now we'd need to
| pass these "holes" around from process to process until
| say one of the processes somewhere manages to unify the
| value, and then have it propagate across the system to
| original query. That breaks isolation. So you see how the
| two neat features came into a conflict and they picked
| one over the other, and rightly so it seems.
| jpt4 wrote:
| YeGoblynQueene, would you be amenable to discussing
| certain questions of logic programming via e-mail? I am
| reading about Jean-Yves Girard's "Stellar Resolution"
| model of computation, and would greatly appreciate a
| knowledgeable counterpart in the investigation.
| gergo_barany wrote:
| > I'd say that "pattern matching" is unification, where
| there are no free variables on the right side and the
| left side is the "pattern" to match.
|
| Plus the restriction that the left hand side must be
| linear, that is, must not contain any variable more than
| once. So you usually can't write a pattern match like
| match list with | [x, y, x] -> ...
|
| but languages with pattern matching usually allow you to
| add side conditions like this: match
| list with | [x, y, x'] if x = x' -> ...
| gergo_barany wrote:
| > Now, unification is Turing-complete pattern matching
|
| No. First-order syntactic unification is not Turing complete.
| The combination of unification and resolution is Turing
| complete.
|
| > For instance, in Prolog, you can unify a variable with an
| entire Prolog program and call the program at runtime, etc.
|
| Lots of languages allow you to do this. Lisp had "eval" ten
| years before Prolog existed. Unification is not necessary for
| this.
|
| Prolog is great, unification is great. Some other things are
| great too.
| ReleaseCandidat wrote:
| While I'm at (free) books: a great example implementation of
| unification (in Lisp) is in chapter 11 "Logic Programming" of
| Peter Norvig's "Paradigms of Artificial Intelligence Programming:
| Case Studies in Common Lisp".
|
| https://github.com/norvig/paip-lisp
|
| There is also an implementation in Python, but I have never
| tested it, just googled it now:
| https://github.com/dhconnelly/paip-python/blob/master/prolog...
|
| https://dhconnelly.github.io/paip-python/docs/prolog.html
| contificate wrote:
| It is worth noting that lots of applications of unification do
| not reify explicit substitutions in their implementations. You
| often see introductory type inference articles (usually focused
| on Hindley-Milner) use algorithm W, which uses a unification
| procedure that returns substitutions (which must then be composed
| with other substitutions). All of this is less efficient and
| arguably more error-prone and more complicated than the
| destructive unification implied by algorithm J (using the union-
| find data structure, often implemented invasively into the type
| representation to imply a union-find forest).
|
| To this end, good coverage of decent (destructive) unification
| algorithms can be found in any simple resource on type inference
| (https://okmij.org/ftp/ML/generalization/sound_eager.ml) or the
| Warren Abstract Machine
| (https://github.com/a-yiorgos/wambook/blob/master/wambook.pdf).
|
| Of course, there are times where you would want to reify
| substitutions as a data structure to compose and apply, but most
| of the time you just want to immediately apply them in a
| pervasive way.
|
| Despite what another comment says, unification is a valid - and
| rather convenient - way to implement pattern matching (as in the
| style of ML) in an interpreter: much like how you rewrite type
| variables with types you are unifying them with, you can rewrite
| the pattern variables to refer to the parts of the (evaluated)
| value you are matching against (which you then use to extend the
| environment with when evaluating the right hand side).
| ReleaseCandidat wrote:
| > Despite what another comment says, unification is a valid -
| and rather convenient - way to implement pattern matching
|
| The article shows that it is a valid and rather convenient way.
| That's why it is important to tell people that it is not
| necessary to use it for the "usual" pattern matching. You do
| (again: for the "usual" pattern matching without "variables",
| as in free variables, on the right side) _not_ want to use
| unification but "compile" the pattern matching.
| contificate wrote:
| If we are talking about the context of a compiler, I agree:
| you should compile pattern matching to decision trees (DAGs),
| by way of something like Pettersson's algorithm. In my
| comment, I specified "in an interpreter", as I think it's the
| natural way to implement an SML-like match construct in that
| context.
|
| EDIT: Perhaps I am missing Elixir-specific context and I
| apologise if this is the case. It was the unification that
| made me click this post.
| ReleaseCandidat wrote:
| Well, I'd do that with "real" interpreters (using bytecode)
| too. But on the other hand also _especially_ for "toy" tree
| walking interpreters, to learn how to build the pattern
| matching tree -- as unification is needed for the type
| checker anyways ;). Although I would start by reading
| Wadler's version (that I linked in the other post), it's
| IMHO a bit more accessible than Peterson's https://www.clas
| ses.cs.uchicago.edu/archive/2011/spring/2262...
|
| But please don't get me wrong: I'm just saying that there
| exist other, specialised and (well, generally) more
| performant algorithms for pattern matching and people
| should have at least heard about them. If somebody decides
| to use unification for pattern matching, that still could
| be changed if performance or better error messages or ...
| matters.
| contificate wrote:
| Ah, I thought your username was familiar: you recommended
| Wadler's approach on a previous HN thread concerning my
| blog post about Pettersson/Maranget's algorithm
| (https://news.ycombinator.com/item?id=39241776). Yeah, I
| admit that Pettersson's telling of the algorithm is a bit
| turgid (probably why Maranget's paper - which is the same
| technique as its core, at least to my understanding - is
| cited more often).
___________________________________________________________________
(page generated 2024-07-01 23:01 UTC)