[HN Gopher] De Bruijn notation, and why it's useful
___________________________________________________________________
De Bruijn notation, and why it's useful
Author : blueberry87
Score : 106 points
Date : 2025-05-30 15:51 UTC (7 hours ago)
(HTM) web link (blueberrywren.dev)
(TXT) w3m dump (blueberrywren.dev)
| kccqzy wrote:
| While I have used de bruijn indices successfully in the past, a
| part of me wonders, why can't this problem be solved at the
| parser level before we even get a syntax tree? The parser knows
| when we are introducing a new variable in a lambda abstraction,
| and it can keep track of a stack of variables to be able to give
| them new fresh names so that in later passes we won't have to
| worry about name conflicts. I mean when writing a parser I
| basically already need to keep a stack of environments to resolve
| names. I sometimes keep variable names as simply a source code
| range where it is first introduced: so instead of dealing with
| the variable "f" I deal with (1,4)-(1,5) meaning the variable
| introduced in line 1 between columns 4 and 5; here I can still
| recover the string-based name but I also get to differentiate
| between identically named variables defined in different
| locations.
|
| Is this a viable idea?
| thunderseethe wrote:
| Take a look at The Rapier in GHC:
| https://www.researchgate.net/publication/220676611_Secrets_o...
|
| It employs a similar idea. Track the set of in scope variables
| and use them to unique variables on collision.
| cvoss wrote:
| What happens in the evaluator when you have
|
| (\ a . a a) (\ x . \ y . x y)
|
| Variables are uniquely named at parse time, but after one beta
| step, you have two distinct instances each of x and y, albeit
| in different scopes, and after a second beta step, you have two
| distinct variables named y in the same scope.
| kccqzy wrote:
| Interesting example! I think at that time I was either
| working with simply typed lambda calculus or something
| slightly better (maybe Hindley-Milner based system) where the
| omega combinator isn't allowed. Do you know what sort of type
| system restriction would make the above idea sound?
| maplant wrote:
| Yeah, it's viable. I do this for my scheme implementation.
| Defining a new lexical variable generates a new globally unique
| identifier: https://github.com/maplant/scheme-
| rs/blob/fb89f6481443130399...
| gfaster wrote:
| I used De Bruijn indices when attempting to prove correctness of
| a toy language of mine. It's an elegant solution, but I found
| them exceedingly difficult to reason about both intuitively and
| in proofs. That being said, I've yet to find a better system
| (that isn't just augmenting De Bruijn indices) for implementing
| any sort of lambda calculus.
| Joker_vD wrote:
| > Now we're cool! Everything works as expected, and it takes much
| less work
|
| Does it though? It's about the same amount of work as in renaming
| potentially conflicting variables (and more work than renaming
| _actually_ conflicting variables). AFAIK, the main argument for
| them (by B. C. Pierce in his TAPL?) is that if you botch up the
| implementation of de Bruijn indices /levels, it will blow up in
| your face almost immediately, while the conventional renaming
| schemes may have subtle undetected bugs for months.
| gfaster wrote:
| From my perspective, not having to deal with strings is one of
| the big benefits, but I believe the main argument is actually
| that it trivializes alpha equivalence.
|
| As for renaming, if you're going to be looking at every binding
| anyway, changing most of them is no big deal (unless of course
| that entails doing many deep copies)
| hoping1 wrote:
| This and the other comment under this seem to be talking about
| the work the computer is doing at runtime. I believe the point
| is about the developer's work in implementing this. (For
| example, renaming potentially conflicting variables is clearly
| _less_ work than renaming only the actually conflicting
| variables, in this framing.) This kind of work is important for
| (1) compilers whose bugs are potentially very financially
| expensive, and (2) compilers with very intentionally-portable
| implementations (including but not limited to educational
| implementations)
| tromp wrote:
| Another advantage of de-Bruijn notation is in providing a very
| simple binary encoding of lambda terms, as demonstrated in my
| IOCCC submission [1].
|
| [1] https://www.ioccc.org/2012/tromp/
| Disposal8433 wrote:
| To you and OP and the others: you're crazy or geniuses, but
| thanks for posting because that's inspiring whether I
| understand it or not.
| jmount wrote:
| Just ran into your writeup a day ago. Truly amazing!
| biot wrote:
| This has long been my favorite entry! Still on my todo list to
| go through and fully understand it.
| tromp wrote:
| You might find this repo [1] by Melvin Zhang helpful.
|
| [1] https://github.com/melvinzhang/binary-lambda-calculus
| guerrilla wrote:
| I've implemted this several times before (years ago), but somone
| tell me, why don't we just consider variables unique. Each
| instance of a name* should have an absolute (i.e. global)
| sequence number associated with it. That seems simpler than this
| relative sequence numbering and the complexity that comes from
| it.
|
| Scrolling comments, I see somone else nentions this as well. I
| feel like this is something that got stuck in academia hell but
| in practice could be solved in more oractical ways. I remember
| various papers on fresh naming and name calculi from the 00's but
| why? We have a parser and we have numbers.
|
| The reason I bring this up is because I recall it being easy to
| introduce bugs into de Bruijn calculi and also adding symbolic
| debugging to real-world languages was annoying for reasons I have
| no memory of.
|
| * Note I said name not variable. (\ x . x (\x . x)) Would be (\
| x#0 . x#0 (\x#1 . x#1)) similar to de Bruijn indices but could
| differ in other situations.
| hoping1 wrote:
| See cvoss's comment in another thread:
|
| " What happens in the evaluator when you have
|
| (\ a . a a) (\ x . \ y . x y)
|
| Variables are uniquely named at parse time, but after one beta
| step, you have two distinct instances each of x and y, albeit
| in different scopes, and after a second beta step, you have two
| distinct variables named y in the same scope. "
|
| This doesn't come up in most mainstream languages, but Zig and
| Jai are getting people excited about comptime evaluation, and
| dependent types need comptime evaluation as well. Uniqueness is
| fine until you're doing higher-order evaluation in your
| compilation pipeline like this, which will be more and more
| desirable in the future, at least if the current trends are any
| indication.
| guerrilla wrote:
| (\ a#0 . a#0 a#0) (\ x#1 . \ y#2 . x#1 y#2)
|
| -->
|
| (\ x#3 . \ y#4 . x#3 y#4)(\ x#5 . \ y#6 . x#5 y#6)
| SkiFire13 wrote:
| This implies that applying a function is no longer the same
| as a plain substitution. Moreover this can be pretty
| expensive computationally.
| marvinborner wrote:
| Also, once you get used to it, it makes writing large lambda
| terms faster, easier and more intuitive. The terms all just kind
| of magically interconnect in your mind after a while. At least
| that has been my experience after writing a lot of code in pure
| de Bruijn-indexed lambda calculus using bruijn [1]. Otherwise,
| thinking a few reduction steps ahead in complex terms requires
| alpha conversion, which often ends in confusion.
|
| Similarly, it seems like languages with de Bruijn indices are
| immune to LLMs, since they require an internal stack/graph of
| sorts and don't reduce only on a textual basis.
|
| [1]: https://bruijn.marvinborner.de/
| matheist wrote:
| The Faust compiler uses de Bruijn indices internally to reuse
| computations. Anyone else know any other examples?
|
| https://github.com/grame-cncm/faust
| trealira wrote:
| > This is "the capture problem", and it is quite annoying.
| Generally to avoid this, we need to rename everything1 in our
| "substituted" term to names that are free (do not occur) in the
| "subsitutee" so nothing is accidentally captured. What if we
| could introduce a notation that avoids this?
|
| I've heard of this problem, but I've never really gotten why the
| solution in lambda calculus is to rename variables in lower
| scopes with overlapping names so that everything has a globally
| unique name. The way that programming languages usually solve
| this is by keeping a stack of symbol tables for scopes, right?
| Why is that never presented as a possible solution?
|
| Of course, De Bruijn notation seems to sidestep that completely,
| which is convenient. I guess I'm just commenting on lambda
| calculus implementations in general.
| enricozb wrote:
| Lambda variables are also scoped, in the sense that a bound
| variable shadows one with the same name if it is in scope at
| bind-time. When looking at lambda calculus as purely a term
| rewriting system, the rule for substitution is just a bit
| nuanced, that's all. With DeBruijn notation, substitution is a
| much simpler operation.
| torstenvl wrote:
| The idea is that you want functions that can return functions,
| those inner functions being defined in part by the arguments to
| the outer function. This can't happen without intentionally
| leaky namespaces.
|
| Bad pseudocode:
|
| function stepincrementgenerator(step) -> function (num) -> num
| + step
|
| addsix = stepincrementgenerator(6)
|
| addsix(5)
|
| 11
| chriswarbo wrote:
| > I've heard of this problem, but I've never really gotten why
| the solution in lambda calculus is to rename variables in lower
| scopes with overlapping names so that everything has a globally
| unique name. The way that programming languages usually solve
| this is by keeping a stack of symbol tables for scopes, right?
| Why is that never presented as a possible solution?
|
| That's a perfectly fine way to implement this-or-that language
| (and is hence why many/most programming languages do it).
| However, it's not a valid solution _in lambda calculus_ , since
| lambda calculus only has variables, lambdas or applications;
| there's literally no way to write down "a stack of symbol
| tables", or whathaveyou (note that you can _implement_ such a
| thing _using_ lambda calculus, but that would be more like an
| interpreter for some other language; "running" that
| implementation would still require a way to beta-reduce the
| "host" language in a capture-avoiding way)
| hollerith wrote:
| Stack of symbol tables is more elegant, but is only really
| understandable by programmers whereas maybe lambda calc gets
| taught to a lot of non-programmers?
|
| Note that lambda calc was invented ("defined") before the
| invention of the digital computer.
| chriswarbo wrote:
| I used de Bruijn indexes in Plumb[0], which is an alternative
| syntax for Lambda Calculus that embeds nicely in common scripting
| languages like PHP and Python.
|
| Whilst a classic lambda expression like `lx.y` has three parts:
| the argument name `x`, the body/result `y`, and the `l_._` syntax
| to distinguish it from a variable or application; if we instead
| use de Bruijn indices, then we don't need a separate part for the
| argument: all we need is a body wrapped in some distinguishing
| syntax (like `ly` in the article).
|
| In Plumb, that distinguishing syntax is square brackets, so the
| "lambda term" `ly` would be written `[y]`. This wouldn't be
| possible if we had to write argument names, but ends up having
| some nice consequences:
|
| - We can write this as-is in many languages, and the result is a
| data structure (usually a list) which we can easily walk over
| with an interpreter function.
|
| - Variables are numbers, which can also be written as-is in many
| languages, and result in data that an interpreter function can
| handle.
|
| - The final piece of syntax is _applications_ , which necessarily
| has two parts (the function and the input value). Classic Lambda
| Calculus uses juxtaposition, whilst Plumb uses an explicit binary
| operator: the comma `,`. Again, commas can be written as-is in
| many languages, and they interact nicely with square bracket
| notation, (host language) function application, and (in the case
| of Python and Haskell) tuple notation.
|
| The nice thing about using de Bruijn indexen, rather than de
| Bruijn "levels" or some other approaches to anonymous bindings
| (like those mentioned at the end of TFA), is that functions
| defined this way have an "intrinsic meaning"; e.g. `l0` (or `[0]`
| in Plumb) is _always_ the identity function; `lll(2 0 1)` (or
| `[[[2 , 0 , 1]]]` in Plumb) is _always_ boolean NOT; etc. This
| lets us mix and match snippets of Plumb code with snippets of
| host-language code, embed each inside the other, and sprinkle
| around calls to an interpreter function willy-nilly, without
| worrying that the meaning depends on _where_ our code ends up
| (e.g. at which "level" it reaches an interpreter).
|
| [0] http://www.chriswarbo.net/projects/plumb/using.html
| anthk wrote:
| Related: http://t3x.org/clc/index.html
|
| Code: http://t3x.org/clc/code.html
|
| If you know Scheme, you almost don't need the book.
___________________________________________________________________
(page generated 2025-05-30 23:00 UTC)