[HN Gopher] De Bruijn notation, and why it's useful
       ___________________________________________________________________
        
       De Bruijn notation, and why it's useful
        
       Author : blueberry87
       Score  : 106 points
       Date   : 2025-05-30 15:51 UTC (7 hours ago)
        
 (HTM) web link (blueberrywren.dev)
 (TXT) w3m dump (blueberrywren.dev)
        
       | kccqzy wrote:
       | While I have used de bruijn indices successfully in the past, a
       | part of me wonders, why can't this problem be solved at the
       | parser level before we even get a syntax tree? The parser knows
       | when we are introducing a new variable in a lambda abstraction,
       | and it can keep track of a stack of variables to be able to give
       | them new fresh names so that in later passes we won't have to
       | worry about name conflicts. I mean when writing a parser I
       | basically already need to keep a stack of environments to resolve
       | names. I sometimes keep variable names as simply a source code
       | range where it is first introduced: so instead of dealing with
       | the variable "f" I deal with (1,4)-(1,5) meaning the variable
       | introduced in line 1 between columns 4 and 5; here I can still
       | recover the string-based name but I also get to differentiate
       | between identically named variables defined in different
       | locations.
       | 
       | Is this a viable idea?
        
         | thunderseethe wrote:
         | Take a look at The Rapier in GHC:
         | https://www.researchgate.net/publication/220676611_Secrets_o...
         | 
         | It employs a similar idea. Track the set of in scope variables
         | and use them to unique variables on collision.
        
         | cvoss wrote:
         | What happens in the evaluator when you have
         | 
         | (\ a . a a) (\ x . \ y . x y)
         | 
         | Variables are uniquely named at parse time, but after one beta
         | step, you have two distinct instances each of x and y, albeit
         | in different scopes, and after a second beta step, you have two
         | distinct variables named y in the same scope.
        
           | kccqzy wrote:
           | Interesting example! I think at that time I was either
           | working with simply typed lambda calculus or something
           | slightly better (maybe Hindley-Milner based system) where the
           | omega combinator isn't allowed. Do you know what sort of type
           | system restriction would make the above idea sound?
        
         | maplant wrote:
         | Yeah, it's viable. I do this for my scheme implementation.
         | Defining a new lexical variable generates a new globally unique
         | identifier: https://github.com/maplant/scheme-
         | rs/blob/fb89f6481443130399...
        
       | gfaster wrote:
       | I used De Bruijn indices when attempting to prove correctness of
       | a toy language of mine. It's an elegant solution, but I found
       | them exceedingly difficult to reason about both intuitively and
       | in proofs. That being said, I've yet to find a better system
       | (that isn't just augmenting De Bruijn indices) for implementing
       | any sort of lambda calculus.
        
       | Joker_vD wrote:
       | > Now we're cool! Everything works as expected, and it takes much
       | less work
       | 
       | Does it though? It's about the same amount of work as in renaming
       | potentially conflicting variables (and more work than renaming
       | _actually_ conflicting variables). AFAIK, the main argument for
       | them (by B. C. Pierce in his TAPL?) is that if you botch up the
       | implementation of de Bruijn indices /levels, it will blow up in
       | your face almost immediately, while the conventional renaming
       | schemes may have subtle undetected bugs for months.
        
         | gfaster wrote:
         | From my perspective, not having to deal with strings is one of
         | the big benefits, but I believe the main argument is actually
         | that it trivializes alpha equivalence.
         | 
         | As for renaming, if you're going to be looking at every binding
         | anyway, changing most of them is no big deal (unless of course
         | that entails doing many deep copies)
        
         | hoping1 wrote:
         | This and the other comment under this seem to be talking about
         | the work the computer is doing at runtime. I believe the point
         | is about the developer's work in implementing this. (For
         | example, renaming potentially conflicting variables is clearly
         | _less_ work than renaming only the actually conflicting
         | variables, in this framing.) This kind of work is important for
         | (1) compilers whose bugs are potentially very financially
         | expensive, and (2) compilers with very intentionally-portable
         | implementations (including but not limited to educational
         | implementations)
        
       | tromp wrote:
       | Another advantage of de-Bruijn notation is in providing a very
       | simple binary encoding of lambda terms, as demonstrated in my
       | IOCCC submission [1].
       | 
       | [1] https://www.ioccc.org/2012/tromp/
        
         | Disposal8433 wrote:
         | To you and OP and the others: you're crazy or geniuses, but
         | thanks for posting because that's inspiring whether I
         | understand it or not.
        
         | jmount wrote:
         | Just ran into your writeup a day ago. Truly amazing!
        
         | biot wrote:
         | This has long been my favorite entry! Still on my todo list to
         | go through and fully understand it.
        
           | tromp wrote:
           | You might find this repo [1] by Melvin Zhang helpful.
           | 
           | [1] https://github.com/melvinzhang/binary-lambda-calculus
        
       | guerrilla wrote:
       | I've implemted this several times before (years ago), but somone
       | tell me, why don't we just consider variables unique. Each
       | instance of a name* should have an absolute (i.e. global)
       | sequence number associated with it. That seems simpler than this
       | relative sequence numbering and the complexity that comes from
       | it.
       | 
       | Scrolling comments, I see somone else nentions this as well. I
       | feel like this is something that got stuck in academia hell but
       | in practice could be solved in more oractical ways. I remember
       | various papers on fresh naming and name calculi from the 00's but
       | why? We have a parser and we have numbers.
       | 
       | The reason I bring this up is because I recall it being easy to
       | introduce bugs into de Bruijn calculi and also adding symbolic
       | debugging to real-world languages was annoying for reasons I have
       | no memory of.
       | 
       | * Note I said name not variable. (\ x . x (\x . x)) Would be (\
       | x#0 . x#0 (\x#1 . x#1)) similar to de Bruijn indices but could
       | differ in other situations.
        
         | hoping1 wrote:
         | See cvoss's comment in another thread:
         | 
         | " What happens in the evaluator when you have
         | 
         | (\ a . a a) (\ x . \ y . x y)
         | 
         | Variables are uniquely named at parse time, but after one beta
         | step, you have two distinct instances each of x and y, albeit
         | in different scopes, and after a second beta step, you have two
         | distinct variables named y in the same scope. "
         | 
         | This doesn't come up in most mainstream languages, but Zig and
         | Jai are getting people excited about comptime evaluation, and
         | dependent types need comptime evaluation as well. Uniqueness is
         | fine until you're doing higher-order evaluation in your
         | compilation pipeline like this, which will be more and more
         | desirable in the future, at least if the current trends are any
         | indication.
        
           | guerrilla wrote:
           | (\ a#0 . a#0 a#0) (\ x#1 . \ y#2 . x#1 y#2)
           | 
           | -->
           | 
           | (\ x#3 . \ y#4 . x#3 y#4)(\ x#5 . \ y#6 . x#5 y#6)
        
             | SkiFire13 wrote:
             | This implies that applying a function is no longer the same
             | as a plain substitution. Moreover this can be pretty
             | expensive computationally.
        
       | marvinborner wrote:
       | Also, once you get used to it, it makes writing large lambda
       | terms faster, easier and more intuitive. The terms all just kind
       | of magically interconnect in your mind after a while. At least
       | that has been my experience after writing a lot of code in pure
       | de Bruijn-indexed lambda calculus using bruijn [1]. Otherwise,
       | thinking a few reduction steps ahead in complex terms requires
       | alpha conversion, which often ends in confusion.
       | 
       | Similarly, it seems like languages with de Bruijn indices are
       | immune to LLMs, since they require an internal stack/graph of
       | sorts and don't reduce only on a textual basis.
       | 
       | [1]: https://bruijn.marvinborner.de/
        
       | matheist wrote:
       | The Faust compiler uses de Bruijn indices internally to reuse
       | computations. Anyone else know any other examples?
       | 
       | https://github.com/grame-cncm/faust
        
       | trealira wrote:
       | > This is "the capture problem", and it is quite annoying.
       | Generally to avoid this, we need to rename everything1 in our
       | "substituted" term to names that are free (do not occur) in the
       | "subsitutee" so nothing is accidentally captured. What if we
       | could introduce a notation that avoids this?
       | 
       | I've heard of this problem, but I've never really gotten why the
       | solution in lambda calculus is to rename variables in lower
       | scopes with overlapping names so that everything has a globally
       | unique name. The way that programming languages usually solve
       | this is by keeping a stack of symbol tables for scopes, right?
       | Why is that never presented as a possible solution?
       | 
       | Of course, De Bruijn notation seems to sidestep that completely,
       | which is convenient. I guess I'm just commenting on lambda
       | calculus implementations in general.
        
         | enricozb wrote:
         | Lambda variables are also scoped, in the sense that a bound
         | variable shadows one with the same name if it is in scope at
         | bind-time. When looking at lambda calculus as purely a term
         | rewriting system, the rule for substitution is just a bit
         | nuanced, that's all. With DeBruijn notation, substitution is a
         | much simpler operation.
        
         | torstenvl wrote:
         | The idea is that you want functions that can return functions,
         | those inner functions being defined in part by the arguments to
         | the outer function. This can't happen without intentionally
         | leaky namespaces.
         | 
         | Bad pseudocode:
         | 
         | function stepincrementgenerator(step) -> function (num) -> num
         | + step
         | 
         | addsix = stepincrementgenerator(6)
         | 
         | addsix(5)
         | 
         | 11
        
         | chriswarbo wrote:
         | > I've heard of this problem, but I've never really gotten why
         | the solution in lambda calculus is to rename variables in lower
         | scopes with overlapping names so that everything has a globally
         | unique name. The way that programming languages usually solve
         | this is by keeping a stack of symbol tables for scopes, right?
         | Why is that never presented as a possible solution?
         | 
         | That's a perfectly fine way to implement this-or-that language
         | (and is hence why many/most programming languages do it).
         | However, it's not a valid solution _in lambda calculus_ , since
         | lambda calculus only has variables, lambdas or applications;
         | there's literally no way to write down "a stack of symbol
         | tables", or whathaveyou (note that you can _implement_ such a
         | thing _using_ lambda calculus, but that would be more like an
         | interpreter for some other language;  "running" that
         | implementation would still require a way to beta-reduce the
         | "host" language in a capture-avoiding way)
        
         | hollerith wrote:
         | Stack of symbol tables is more elegant, but is only really
         | understandable by programmers whereas maybe lambda calc gets
         | taught to a lot of non-programmers?
         | 
         | Note that lambda calc was invented ("defined") before the
         | invention of the digital computer.
        
       | chriswarbo wrote:
       | I used de Bruijn indexes in Plumb[0], which is an alternative
       | syntax for Lambda Calculus that embeds nicely in common scripting
       | languages like PHP and Python.
       | 
       | Whilst a classic lambda expression like `lx.y` has three parts:
       | the argument name `x`, the body/result `y`, and the `l_._` syntax
       | to distinguish it from a variable or application; if we instead
       | use de Bruijn indices, then we don't need a separate part for the
       | argument: all we need is a body wrapped in some distinguishing
       | syntax (like `ly` in the article).
       | 
       | In Plumb, that distinguishing syntax is square brackets, so the
       | "lambda term" `ly` would be written `[y]`. This wouldn't be
       | possible if we had to write argument names, but ends up having
       | some nice consequences:
       | 
       | - We can write this as-is in many languages, and the result is a
       | data structure (usually a list) which we can easily walk over
       | with an interpreter function.
       | 
       | - Variables are numbers, which can also be written as-is in many
       | languages, and result in data that an interpreter function can
       | handle.
       | 
       | - The final piece of syntax is _applications_ , which necessarily
       | has two parts (the function and the input value). Classic Lambda
       | Calculus uses juxtaposition, whilst Plumb uses an explicit binary
       | operator: the comma `,`. Again, commas can be written as-is in
       | many languages, and they interact nicely with square bracket
       | notation, (host language) function application, and (in the case
       | of Python and Haskell) tuple notation.
       | 
       | The nice thing about using de Bruijn indexen, rather than de
       | Bruijn "levels" or some other approaches to anonymous bindings
       | (like those mentioned at the end of TFA), is that functions
       | defined this way have an "intrinsic meaning"; e.g. `l0` (or `[0]`
       | in Plumb) is _always_ the identity function; `lll(2 0 1)` (or
       | `[[[2 , 0 , 1]]]` in Plumb) is _always_ boolean NOT; etc. This
       | lets us mix and match snippets of Plumb code with snippets of
       | host-language code, embed each inside the other, and sprinkle
       | around calls to an interpreter function willy-nilly, without
       | worrying that the meaning depends on _where_ our code ends up
       | (e.g. at which  "level" it reaches an interpreter).
       | 
       | [0] http://www.chriswarbo.net/projects/plumb/using.html
        
       | anthk wrote:
       | Related: http://t3x.org/clc/index.html
       | 
       | Code: http://t3x.org/clc/code.html
       | 
       | If you know Scheme, you almost don't need the book.
        
       ___________________________________________________________________
       (page generated 2025-05-30 23:00 UTC)