[HN Gopher] I wrote a 231-byte Brainfuck compiler by abusing eve...
___________________________________________________________________
I wrote a 231-byte Brainfuck compiler by abusing everything
Author : ingve
Score : 143 points
Date : 2021-07-12 06:10 UTC (16 hours ago)
(HTM) web link (briancallahan.net)
(TXT) w3m dump (briancallahan.net)
| xg15 wrote:
| a 231-byte Brainfuck-to-C compiler*
|
| It's still an impressive feat, and very informative on the
| assembly details - but doesn't feel as incredible as the headline
| makes believe as the core logic seems to be a string search-and-
| replace of the strings in the "reviewing brainfuck" table.
|
| Seems to me, you could write the next brainfuck compiler in sed.
| zelon88 wrote:
| It seems more like a parser than a compiler. It doesn't write a
| binary, it just translates brainfuck to C source code. In the
| OPs defense, they did say "by abusing everything."
| notme77 wrote:
| Well then,
|
| I can write a tiny C compiler in bash...
| barosl wrote:
| A compiler doesn't have to emit machine code. It just needs
| to translate one language into another, which the author's
| program does.
|
| Also, a parser implies having an internal, structured
| representation. As the C language is neither internal nor
| structured, I'd say the program is more likely a compiler.
| recursive wrote:
| I blame the rise of the word "transpiler" for making people
| think that compilation is some more magic/narrow/special
| thing.
| whatshisface wrote:
| There is a practical, empirical distinction between
| compilers and transpilers. Most compilers in practice
| have backends full of deep and well-explored techniques
| specific to machine code like register allocation, window
| optimizations, and other stuff like that. Most
| transpilers you see out in the wild share more in common
| with compiler frontends in the techniques and the theory
| they apply.
| mywittyname wrote:
| The term transpiler arose when there was a need to denote
| a special kind of compiler whose target is another
| language or version of the same language.
| DonHopkins wrote:
| Shouldn't the point of doing something as practically useless
| as compiling brainfuck be to do it in the most difficult way
| possible?
|
| I'm more impressed by a brainfuck compiler written in
| brainfuck.
|
| https://news.ycombinator.com/item?id=697501
|
| https://github.com/matslina/awib
|
| Awib is a brainfuck compiler entirely written in brainfuck.
|
| Awib implements several optimization strategies and its
| compiled output outperforms that of many other brainfuck
| compilers
|
| Awib is itself a 4-language polyglot and can be run/compiled as
| brainfuck, Tcl, C and bash
|
| Awib has 6 separate backends and is capable of compiling
| brainfuck source code to Linux executables (i386) and five
| programming languages: C, Tcl, Go, Ruby and Java
| user-the-name wrote:
| The original point of Brainfuck was actually exactly to make
| the smallest possible compiler for a Turing-complete
| language.
|
| The original was 240 bytes, and created an executable
| directly.
| panic wrote:
| Here's an even smaller one for 64-bit Linux:
| https://gist.github.com/ianh/61e6219f0a9866b31a2b864033c1812...
| $ uname -a Linux personal-1 4.19.0-6-cloud-amd64 #1 SMP
| Debian 4.19.67-2 (2019-08-28) x86_64 GNU/Linux $ as -g -o
| bf.o bf.S && ld -o bf bf.o $ size bf.o text
| data bss dec hex filename 167 0
| 0 167 a7 bf.o $ size bf text data
| bss dec hex filename 199 0 0
| 199 c7 bf
| ndesaulniers wrote:
| Would a jump table create a smaller compiler? Trade off an
| indirect call (and more static data) for all of those
| comparisons.
| mjevans wrote:
| Isn't a compiler supposed to produce one or more of: machine
| code, link-able segments of machine code, or at least 'bytecode'
| for a 'virtual' machine?
|
| The definition of this problem strikes me far more as either a
| transpiler or a pre-processor for a Domain Specific Language.
| Al-Khwarizmi wrote:
| In classic compiler theory, the term "transpiler" doesn't even
| exist, they're all compilers.
| 411111111111111 wrote:
| They literally didn't exist back then though. so not just the
| term was unknown back then, the practice itself too...
| KMnO4 wrote:
| Apparently Intel shipped a transpiler for 8080 to 8086
| source code in 1978.
|
| https://en.wikipedia.org/wiki/Source-to-
| source_compiler#Inte...
| cout wrote:
| Likewise, the first C++ compiler (cfront) might today be
| considered a transpiler.
| Banana699 wrote:
| There is no fundamental difference between compilers, code
| formatters, preprocessors or transpilers. Anything that takes a
| program and outputs another program related to it by some
| semantics is a compiler.
|
| That said though, I do agree that calling an assembler like
| this (brainfuck is a kind of assembly) a compiler stretches
| things a bit, for an entirely different reason: usually there
| is some sort of a complexity threshold of the input language.
| The compiler has to at least maintain a symbol table for named
| entities (traditional assemblies has variable-like entities,
| macros and subroutines, the assembler has to keep track of all
| that). Brainfuck is completely linear, with no named entities
| at all. The "compiler" looks like a dumb string processor, just
| iterates over one buffer to transform it into another buffer,
| and doesn't maintain any sort of structures on the code being
| translated.
|
| By the strict "compiler : program->program" definition, this
| doesn't matter. But my intuition holds that dumb string
| processing is a bit short of "true" compilation.
| xg15 wrote:
| > _There is no fundamental difference between compilers, code
| formatters, preprocessors or transpilers. Anything that takes
| a program and outputs another program related to it by some
| semantics is a compiler._
|
| That is arguing semantics and, while not wrong, I think it
| muddies the waters: By that logic, sed and awk are compilers.
|
| In the end, only machine code can be executed, so you'll need
| something at the end of your chain that produces machine code
| or at least executes your DSL in an interpreter loop.
|
| So I think a distinction between "compilers" that generate
| machine code and "compilers" that don't is worthwhile.
| dgb23 wrote:
| Where do you draw the line? Is javac not a compiler?
| bidirectional wrote:
| Most researchers I've spoken to loathe the term transpiler.
| TazeTSchnitzel wrote:
| Plenty of compilers target relatively high-level languages.
| They're no less compilers for it; "transpiler" feels fairly
| arbitrary to me.
|
| With a bit of effort and the right add-on
| (https://github.com/JuliaComputing/llvm-cbe) you can make any
| LLVM-based compiler produce C code!
| thewakalix wrote:
| "Compiling to C" is an established strategy; for instance, GHC
| used to compile to C--, a subset of C.
|
| This doesn't seem wrong to me; after all, assembly is already
| an abstraction layer over machine code, and outputting assembly
| would hardly be "uncompilerish" behavior. I suppose it depends
| on whether you view C as a low-level language.
| DonHopkins wrote:
| C combines the power of assembly language with the
| readability and maintainability of assembly language.
| anonymfus wrote:
| _> GHC used to compile to C--, a subset of C._
|
| C-- is not a subset of C.
| wccrawford wrote:
| https://en.wikipedia.org/wiki/C--
|
| > The name of the language is an in-joke, indicating that
| C-- is a reduced form of C, in the same way that C++ is
| basically an expanded form of C. ("--" and "++" mean
| "decrement" and "increment".)
| BenjiWiebe wrote:
| Q&A #5 of https://www.cs.tufts.edu/~nr/c--/faq.html
| explains why C-- is not a subset of C.
| DonHopkins wrote:
| "C++++-= is the new language that is a little more than C++
| and a lot less." -Bill Joy
|
| Bill Joy's Law: 2^(Year-1984) Million Instructions per
| Second
|
| https://donhopkins.medium.com/bill-joys-
| law-2-year-1984-mill...
| thewakalix wrote:
| You're right, and as it happens, GHC compiles _first_ to
| Cmm and then (as one half-deprecated option) to C.
| nextaccountic wrote:
| A transpiler is a compiler
| pajko wrote:
| A compiler compiles something to something. For example, gcc
| compiles C code to assembly, which is then compiled to machine
| code by gas. Clang does the same:
| https://freecompilercamp.org/clang-basics/
| user-the-name wrote:
| Clang definitely does not output assembly code and invoke an
| assembler. It generates machine code directly.
|
| It can optionally also generate assembly code if you want to
| look at it, but it does not do this during normal
| compilation.
| pajko wrote:
| Not saving everything to files and not doing things via
| multiple executables does not mean that a compiler does
| things in a single step, generating machine code directly
| from the source code. Both gcc and clang support the -save-
| temps switch to keep intermediate files during the
| compilation process. These files are created even without
| this argument, except that their names are randomly
| generated somewhere and the files are deleted afterwards.
| user-the-name wrote:
| -save-temps is for backwards compatibility. The files are
| not generated if you do not give that flag. clang goes to
| the extra effort of generating them for you if you
| request them using that flag, but they are not part of
| the compilation process.
|
| Machine code is generated "directly from the source code"
| in the sense that there are no intermediate languages
| produced. There are multiple stages of compilation, of
| course, but these all involved in-memory data structures,
| not textual languages.
| pajko wrote:
| Was doing a quick check. Looks like the temporary files
| are not always generated and could not even see any pipes
| in Process Monitor.
|
| EDIT: clang can tell what's being done:
|
| $ clang -ccc-print-phases -x c t.c 0: input, "t.c", c 1:
| preprocessor, {0}, cpp-output 2: compiler, {1}, ir 3:
| backend, {2}, assembler 4: assembler, {3}, object 5:
| linker, {4}, image
|
| Reference:
| https://clang.llvm.org/docs/DriverInternals.html
| NavinF wrote:
| > These files are created even without this argument
|
| That used to be the case in the dark ages of autoconf.
|
| clang does everything in a single address space which is
| a lot faster now that you aren't doing a million
| filesystem calls.
|
| btw I remember seeing a talk a few years ago about
| integrating clang with the build system so that the same
| compiler process can be reused to compile multiple files.
| Startup time is significant when you have a lot of source
| files
|
| Past: >1 process and many fs calls per cpp. Present: 1
| process and a few fs calls per cpp. Future: <1 process
| and a few fs calls per cpp (on average)
| misnome wrote:
| Clang doesn't generate LLVM IR as part of it's normal
| process?
| user-the-name wrote:
| Not in a text file, no. It generates LLVM data structures
| that can be serialised to a text format, but aren't.
|
| The only external binary clang invokes is, optionally,
| the linker. Otherwise, everything is done in one
| executable invocation without temporary files.
| aetherspawn wrote:
| Yes and no. It uses the llvm API rather than pipe to llc
| TheMatten wrote:
| Adding to thewakalix's answer, many compilers use LLVM as a
| backend, where LLVM's IR is higher-level-than-bytecode language
| too. So that would mean that e.g Clang or current GHC wouldn't
| qualify either.
| user-the-name wrote:
| LLVM is a static library that you link. Clang does not
| generate LLVM IR text files, it invokes LLVM's code generator
| directly.
| user-the-name wrote:
| The original Brainfuck compiler was 240 bytes, and created
| executables directly rather than relying on an external
| toolchain.
| danparsonson wrote:
| If you'd like to see some tiny BF interpreters, have a look here:
| https://www.hugi.scene.org/compo/compoold.htm#compo6
|
| 98 bytes, the smallest!
| aetherspawn wrote:
| The smallest compiler would copy the BF program to the end of
| this interpreter.
| Brian_K_White wrote:
| OMG THIS
|
| This is my favorite answer to why this should or should not
| qualify as a compiler.
|
| If translating to c is a compiler then anything is a compiler
| and the term has no meaning at all.
| klyrs wrote:
| The word we need here is "trivial". A compiler is a
| function (or possibly, nondeterministic process) from
| source code in a first language, to source in a second
| language. The languages can be the same, for example, in
| the case of metaprogramming. The identity function is a
| _trivial compiler_. Calling it a non-compiler only serves
| to complicate the definition of compiler, with no
| discernable benefit.
| tylerhou wrote:
| How do you define "trivial?"
|
| I've created a new language called Brainfuck2 that is
| exactly the same as Brainfuck except the + symbol is
| replaced with a t. Clearly my compiler from Brainfuck2 =>
| Brainfuck does not implement the identity function, and
| is less than 50 bytes.
| [deleted]
| Zardoz84 wrote:
| I find the DLang version far more easy to understand. Also, it
| could do compiler optimizations :
| https://theartofmachinery.com/2017/12/31/compile_time_brainf...
| xiphias2 wrote:
| Wouldn't outputting amd64 instructions and jumping to it create a
| shorter interpreter?
| ManDeJan wrote:
| amd64 instructions generally have longer encodings due to
| having to encode prefixes and 64-bit addresses.
| xiphias2 wrote:
| sure, it can be x86 as well, right now the program outputs C
| code, I just thought that compiling to byte code would be
| more fun
___________________________________________________________________
(page generated 2021-07-12 23:02 UTC)