[HN Gopher] I wrote a 231-byte Brainfuck compiler by abusing eve...
       ___________________________________________________________________
        
       I wrote a 231-byte Brainfuck compiler by abusing everything
        
       Author : ingve
       Score  : 143 points
       Date   : 2021-07-12 06:10 UTC (16 hours ago)
        
 (HTM) web link (briancallahan.net)
 (TXT) w3m dump (briancallahan.net)
        
       | xg15 wrote:
       | a 231-byte Brainfuck-to-C compiler*
       | 
       | It's still an impressive feat, and very informative on the
       | assembly details - but doesn't feel as incredible as the headline
       | makes believe as the core logic seems to be a string search-and-
       | replace of the strings in the "reviewing brainfuck" table.
       | 
       | Seems to me, you could write the next brainfuck compiler in sed.
        
         | zelon88 wrote:
         | It seems more like a parser than a compiler. It doesn't write a
         | binary, it just translates brainfuck to C source code. In the
         | OPs defense, they did say "by abusing everything."
        
           | notme77 wrote:
           | Well then,
           | 
           | I can write a tiny C compiler in bash...
        
           | barosl wrote:
           | A compiler doesn't have to emit machine code. It just needs
           | to translate one language into another, which the author's
           | program does.
           | 
           | Also, a parser implies having an internal, structured
           | representation. As the C language is neither internal nor
           | structured, I'd say the program is more likely a compiler.
        
             | recursive wrote:
             | I blame the rise of the word "transpiler" for making people
             | think that compilation is some more magic/narrow/special
             | thing.
        
               | whatshisface wrote:
               | There is a practical, empirical distinction between
               | compilers and transpilers. Most compilers in practice
               | have backends full of deep and well-explored techniques
               | specific to machine code like register allocation, window
               | optimizations, and other stuff like that. Most
               | transpilers you see out in the wild share more in common
               | with compiler frontends in the techniques and the theory
               | they apply.
        
               | mywittyname wrote:
               | The term transpiler arose when there was a need to denote
               | a special kind of compiler whose target is another
               | language or version of the same language.
        
         | DonHopkins wrote:
         | Shouldn't the point of doing something as practically useless
         | as compiling brainfuck be to do it in the most difficult way
         | possible?
         | 
         | I'm more impressed by a brainfuck compiler written in
         | brainfuck.
         | 
         | https://news.ycombinator.com/item?id=697501
         | 
         | https://github.com/matslina/awib
         | 
         | Awib is a brainfuck compiler entirely written in brainfuck.
         | 
         | Awib implements several optimization strategies and its
         | compiled output outperforms that of many other brainfuck
         | compilers
         | 
         | Awib is itself a 4-language polyglot and can be run/compiled as
         | brainfuck, Tcl, C and bash
         | 
         | Awib has 6 separate backends and is capable of compiling
         | brainfuck source code to Linux executables (i386) and five
         | programming languages: C, Tcl, Go, Ruby and Java
        
           | user-the-name wrote:
           | The original point of Brainfuck was actually exactly to make
           | the smallest possible compiler for a Turing-complete
           | language.
           | 
           | The original was 240 bytes, and created an executable
           | directly.
        
       | panic wrote:
       | Here's an even smaller one for 64-bit Linux:
       | https://gist.github.com/ianh/61e6219f0a9866b31a2b864033c1812...
       | $ uname -a         Linux personal-1 4.19.0-6-cloud-amd64 #1 SMP
       | Debian 4.19.67-2 (2019-08-28) x86_64 GNU/Linux         $ as -g -o
       | bf.o bf.S && ld -o bf bf.o         $ size bf.o            text
       | data     bss     dec     hex filename             167       0
       | 0     167      a7 bf.o         $ size bf            text    data
       | bss     dec     hex filename             199       0       0
       | 199      c7 bf
        
       | ndesaulniers wrote:
       | Would a jump table create a smaller compiler? Trade off an
       | indirect call (and more static data) for all of those
       | comparisons.
        
       | mjevans wrote:
       | Isn't a compiler supposed to produce one or more of: machine
       | code, link-able segments of machine code, or at least 'bytecode'
       | for a 'virtual' machine?
       | 
       | The definition of this problem strikes me far more as either a
       | transpiler or a pre-processor for a Domain Specific Language.
        
         | Al-Khwarizmi wrote:
         | In classic compiler theory, the term "transpiler" doesn't even
         | exist, they're all compilers.
        
           | 411111111111111 wrote:
           | They literally didn't exist back then though. so not just the
           | term was unknown back then, the practice itself too...
        
             | KMnO4 wrote:
             | Apparently Intel shipped a transpiler for 8080 to 8086
             | source code in 1978.
             | 
             | https://en.wikipedia.org/wiki/Source-to-
             | source_compiler#Inte...
        
               | cout wrote:
               | Likewise, the first C++ compiler (cfront) might today be
               | considered a transpiler.
        
         | Banana699 wrote:
         | There is no fundamental difference between compilers, code
         | formatters, preprocessors or transpilers. Anything that takes a
         | program and outputs another program related to it by some
         | semantics is a compiler.
         | 
         | That said though, I do agree that calling an assembler like
         | this (brainfuck is a kind of assembly) a compiler stretches
         | things a bit, for an entirely different reason: usually there
         | is some sort of a complexity threshold of the input language.
         | The compiler has to at least maintain a symbol table for named
         | entities (traditional assemblies has variable-like entities,
         | macros and subroutines, the assembler has to keep track of all
         | that). Brainfuck is completely linear, with no named entities
         | at all. The "compiler" looks like a dumb string processor, just
         | iterates over one buffer to transform it into another buffer,
         | and doesn't maintain any sort of structures on the code being
         | translated.
         | 
         | By the strict "compiler : program->program" definition, this
         | doesn't matter. But my intuition holds that dumb string
         | processing is a bit short of "true" compilation.
        
           | xg15 wrote:
           | > _There is no fundamental difference between compilers, code
           | formatters, preprocessors or transpilers. Anything that takes
           | a program and outputs another program related to it by some
           | semantics is a compiler._
           | 
           | That is arguing semantics and, while not wrong, I think it
           | muddies the waters: By that logic, sed and awk are compilers.
           | 
           | In the end, only machine code can be executed, so you'll need
           | something at the end of your chain that produces machine code
           | or at least executes your DSL in an interpreter loop.
           | 
           | So I think a distinction between "compilers" that generate
           | machine code and "compilers" that don't is worthwhile.
        
             | dgb23 wrote:
             | Where do you draw the line? Is javac not a compiler?
        
         | bidirectional wrote:
         | Most researchers I've spoken to loathe the term transpiler.
        
         | TazeTSchnitzel wrote:
         | Plenty of compilers target relatively high-level languages.
         | They're no less compilers for it; "transpiler" feels fairly
         | arbitrary to me.
         | 
         | With a bit of effort and the right add-on
         | (https://github.com/JuliaComputing/llvm-cbe) you can make any
         | LLVM-based compiler produce C code!
        
         | thewakalix wrote:
         | "Compiling to C" is an established strategy; for instance, GHC
         | used to compile to C--, a subset of C.
         | 
         | This doesn't seem wrong to me; after all, assembly is already
         | an abstraction layer over machine code, and outputting assembly
         | would hardly be "uncompilerish" behavior. I suppose it depends
         | on whether you view C as a low-level language.
        
           | DonHopkins wrote:
           | C combines the power of assembly language with the
           | readability and maintainability of assembly language.
        
           | anonymfus wrote:
           | _> GHC used to compile to C--, a subset of C._
           | 
           | C-- is not a subset of C.
        
             | wccrawford wrote:
             | https://en.wikipedia.org/wiki/C--
             | 
             | > The name of the language is an in-joke, indicating that
             | C-- is a reduced form of C, in the same way that C++ is
             | basically an expanded form of C. ("--" and "++" mean
             | "decrement" and "increment".)
        
               | BenjiWiebe wrote:
               | Q&A #5 of https://www.cs.tufts.edu/~nr/c--/faq.html
               | explains why C-- is not a subset of C.
        
             | DonHopkins wrote:
             | "C++++-= is the new language that is a little more than C++
             | and a lot less." -Bill Joy
             | 
             | Bill Joy's Law: 2^(Year-1984) Million Instructions per
             | Second
             | 
             | https://donhopkins.medium.com/bill-joys-
             | law-2-year-1984-mill...
        
             | thewakalix wrote:
             | You're right, and as it happens, GHC compiles _first_ to
             | Cmm and then (as one half-deprecated option) to C.
        
         | nextaccountic wrote:
         | A transpiler is a compiler
        
         | pajko wrote:
         | A compiler compiles something to something. For example, gcc
         | compiles C code to assembly, which is then compiled to machine
         | code by gas. Clang does the same:
         | https://freecompilercamp.org/clang-basics/
        
           | user-the-name wrote:
           | Clang definitely does not output assembly code and invoke an
           | assembler. It generates machine code directly.
           | 
           | It can optionally also generate assembly code if you want to
           | look at it, but it does not do this during normal
           | compilation.
        
             | pajko wrote:
             | Not saving everything to files and not doing things via
             | multiple executables does not mean that a compiler does
             | things in a single step, generating machine code directly
             | from the source code. Both gcc and clang support the -save-
             | temps switch to keep intermediate files during the
             | compilation process. These files are created even without
             | this argument, except that their names are randomly
             | generated somewhere and the files are deleted afterwards.
        
               | user-the-name wrote:
               | -save-temps is for backwards compatibility. The files are
               | not generated if you do not give that flag. clang goes to
               | the extra effort of generating them for you if you
               | request them using that flag, but they are not part of
               | the compilation process.
               | 
               | Machine code is generated "directly from the source code"
               | in the sense that there are no intermediate languages
               | produced. There are multiple stages of compilation, of
               | course, but these all involved in-memory data structures,
               | not textual languages.
        
               | pajko wrote:
               | Was doing a quick check. Looks like the temporary files
               | are not always generated and could not even see any pipes
               | in Process Monitor.
               | 
               | EDIT: clang can tell what's being done:
               | 
               | $ clang -ccc-print-phases -x c t.c 0: input, "t.c", c 1:
               | preprocessor, {0}, cpp-output 2: compiler, {1}, ir 3:
               | backend, {2}, assembler 4: assembler, {3}, object 5:
               | linker, {4}, image
               | 
               | Reference:
               | https://clang.llvm.org/docs/DriverInternals.html
        
               | NavinF wrote:
               | > These files are created even without this argument
               | 
               | That used to be the case in the dark ages of autoconf.
               | 
               | clang does everything in a single address space which is
               | a lot faster now that you aren't doing a million
               | filesystem calls.
               | 
               | btw I remember seeing a talk a few years ago about
               | integrating clang with the build system so that the same
               | compiler process can be reused to compile multiple files.
               | Startup time is significant when you have a lot of source
               | files
               | 
               | Past: >1 process and many fs calls per cpp. Present: 1
               | process and a few fs calls per cpp. Future: <1 process
               | and a few fs calls per cpp (on average)
        
             | misnome wrote:
             | Clang doesn't generate LLVM IR as part of it's normal
             | process?
        
               | user-the-name wrote:
               | Not in a text file, no. It generates LLVM data structures
               | that can be serialised to a text format, but aren't.
               | 
               | The only external binary clang invokes is, optionally,
               | the linker. Otherwise, everything is done in one
               | executable invocation without temporary files.
        
               | aetherspawn wrote:
               | Yes and no. It uses the llvm API rather than pipe to llc
        
         | TheMatten wrote:
         | Adding to thewakalix's answer, many compilers use LLVM as a
         | backend, where LLVM's IR is higher-level-than-bytecode language
         | too. So that would mean that e.g Clang or current GHC wouldn't
         | qualify either.
        
           | user-the-name wrote:
           | LLVM is a static library that you link. Clang does not
           | generate LLVM IR text files, it invokes LLVM's code generator
           | directly.
        
       | user-the-name wrote:
       | The original Brainfuck compiler was 240 bytes, and created
       | executables directly rather than relying on an external
       | toolchain.
        
       | danparsonson wrote:
       | If you'd like to see some tiny BF interpreters, have a look here:
       | https://www.hugi.scene.org/compo/compoold.htm#compo6
       | 
       | 98 bytes, the smallest!
        
         | aetherspawn wrote:
         | The smallest compiler would copy the BF program to the end of
         | this interpreter.
        
           | Brian_K_White wrote:
           | OMG THIS
           | 
           | This is my favorite answer to why this should or should not
           | qualify as a compiler.
           | 
           | If translating to c is a compiler then anything is a compiler
           | and the term has no meaning at all.
        
             | klyrs wrote:
             | The word we need here is "trivial". A compiler is a
             | function (or possibly, nondeterministic process) from
             | source code in a first language, to source in a second
             | language. The languages can be the same, for example, in
             | the case of metaprogramming. The identity function is a
             | _trivial compiler_. Calling it a non-compiler only serves
             | to complicate the definition of compiler, with no
             | discernable benefit.
        
               | tylerhou wrote:
               | How do you define "trivial?"
               | 
               | I've created a new language called Brainfuck2 that is
               | exactly the same as Brainfuck except the + symbol is
               | replaced with a t. Clearly my compiler from Brainfuck2 =>
               | Brainfuck does not implement the identity function, and
               | is less than 50 bytes.
        
         | [deleted]
        
       | Zardoz84 wrote:
       | I find the DLang version far more easy to understand. Also, it
       | could do compiler optimizations :
       | https://theartofmachinery.com/2017/12/31/compile_time_brainf...
        
       | xiphias2 wrote:
       | Wouldn't outputting amd64 instructions and jumping to it create a
       | shorter interpreter?
        
         | ManDeJan wrote:
         | amd64 instructions generally have longer encodings due to
         | having to encode prefixes and 64-bit addresses.
        
           | xiphias2 wrote:
           | sure, it can be x86 as well, right now the program outputs C
           | code, I just thought that compiling to byte code would be
           | more fun
        
       ___________________________________________________________________
       (page generated 2021-07-12 23:02 UTC)