[HN Gopher] Tiny-C Compiler (2001)
___________________________________________________________________
Tiny-C Compiler (2001)
Author : swatson741
Score : 199 points
Date : 2023-03-13 10:30 UTC (12 hours ago)
(HTM) web link (www.iro.umontreal.ca)
(TXT) w3m dump (www.iro.umontreal.ca)
| WoodenChair wrote:
| This is an interpreter for a super restricted subset of C and it
| looks well written from a pedagogical standpoint (keeps thing
| pretty simple, fairly easy to read). But it's slightly awk to
| strip-down a language (what features do you keep, what do you
| lose?). I think it's more fun to build an interpreter for an
| actual tiny language. In my next book I have interpreters for
| Brainfuck [0], an obfuscated kind of joke of a language, and Tiny
| BASIC[1] a real tiny language that was used on early personal
| computers. These are pretty common first projects for folks
| interested in doing an interpreter.
|
| Here's why real languages are better than stripped down
| languages: Anyone with programming knowledge can implement a
| Brainfuck interpreter in a few hours and run any Brainfuck
| program. Anyone with a tiny bit of CS knowledge can implement a
| Tiny BASIC interpreter in just a day and then you can run any
| real Tiny BASIC program from the late 70s. It's cool to run real
| programs people actually used. With this stripped down C, there
| are no pre-made real programs...
|
| 0:https://en.wikipedia.org/wiki/Brainfuck
| 1:https://en.wikipedia.org/wiki/Tiny_BASIC
| Gordonjcp wrote:
| FORTH is another language that's quick and easy to write from
| scratch, where you need a couple of dozen words written in
| assembler and then the rest of FORTH can be written in FORTH.
| doodlesdev wrote:
| Another language that's more modern and currently useful but
| which is very tiny to write an interpreter for is Lua [0][1].
| Currently the official Lua interpreter has around 30k LOC which
| I find pretty amusing for a language used so widely in games
| and for scripting purposes [2]. Of course it's still at least
| an order of magnitude larger than a small Tiny BASIC
| interpreter but the fact it's a current language used in so
| many places makes it even more interesting to make your for-fun
| implementation.
|
| Also related to small language implementations I find notable
| PicoC [3] which is a C interpreter written in around 3k LOC of
| C. Past discussion about it here 13 years ago [4].
|
| [0]: https://www.lua.org/about.html
|
| [1]: https://www.lua.org/spe.html
|
| [2]: https://en.wikipedia.org/wiki/Lua_(programming_language)
|
| [3]: https://gitlab.com/zsaleeba/picoc
|
| [4]: https://news.ycombinator.com/item?id=1658890
| benj111 wrote:
| While I appreciate your point.
|
| 1. You use the example of a tiny basic of a 'real language' and
| I don't see how tiny basic is a 'real language', but tiny C is
| a stripped down language.
|
| 2. You can build on this to make a full c implementation. A
| minimal c implementation that can potentially bootstrap a full
| c environment is more useful than a brainfuck interpreter.
| northernskys30 wrote:
| I did my CS degree at umontreal and this was an assignment in a
| second year class. This was a pretty interesting introduction to
| compilers, and even if this is a toy subset of C, this was
| challenging, at least for me. We would get 0 if there were any
| memory leak, so we were pretty paranoid about it.
|
| The second assignment was writing a Scheme interpreter.
| ndiddy wrote:
| That's kind of surprising they cared so much about memory use,
| a lot of one-shot C programs such as compilers don't bother
| freeing memory and let the OS clean up after them once they
| exit.
| ComputerGuru wrote:
| I was about to comment and say the same thing, but as a
| graded learning exercise there is certainly value in that
| approach.
| ttvecthrowaway wrote:
| Not to be confused with https://bellard.org/tcc/, which is a tiny
| compiler for the C language.
| Laaas wrote:
| I use tcc for all of my small C "scripts" for doing ioctls,
| etc. Less bloat, suckless. I imagine most software would be
| better off using tcc than gcc/clang. Performance isn't that
| important in most cases.
| notorandit wrote:
| I think you are confusing the work of Frabrice Bellard with
| this very one. The former is a C-language compiler. This once
| is a compiler for a language called "Tiny C". Understandable
| confusion, though.
| [deleted]
| doublepg23 wrote:
| > Performance isn't that important in most cases.
|
| Optimizing for storage space is...better?
| vidarh wrote:
| Since they say "scripts", note that tcc supports being
| invoked in the shebang line. E.g.
| #!/usr/bin/tcc -run
|
| You _can_ do that with gcc /clang too (e.g. #if 0, #endif
| to wrap a block of shell script to compile the current file
| and execute the result) but a primary value of tcc is that
| it _compiles fast_.
|
| On a more philosophical note, the suckless approach is to
| optimise for _simplicity_ not storage. It 's perfectly
| valid to disagree with that of course, but if simplicitly
| of the system as a whole is a consideration gcc and clang
| doesn't really fit.
| LukeShu wrote:
| You can only _sort of_ do that with gcc /clang. The #if 0
| trick relies on funny behavior that is in a few common
| shells. When you try to execve(2) a script without a
| proper #! shebang, the kernel will return ENOEXEC. Bash
| will check for ENOEXEC then check a few heuristics to see
| if it looks like a text file, and if it does, then it
| will try to run it as a shell script.
|
| This means that your script will work when run from a
| shell, but won't work when exec()ed from a non-shell
| program, which is a weird foot-gun.
| LanternLight83 wrote:
| Thanks for sharing! I've yet to go through my C phase, but
| see it on the horizon, and will remember this and the shebang
| trick.
| kevin_thibedeau wrote:
| This is a recommended practice for scripting with Nim if
| you want a batteries-included language.
| circuit10 wrote:
| I feel like a lot of software written in C is written in C
| for performance reasons. Obviously that's not always the case
| and TCC is useful but I wouldn't say that that most software
| should use it
| squarefoot wrote:
| It is sad that tcc is unmaintained as it would be really useful
| in small embedded systems. I just tried it on Debian and
| compilation fails without #undefining CONFIG_TCC_MALLOC_HOOKS
| in lib/bcheck.c. After compilation it passes tests, but they
| warn that it could be unreliable.
| jart wrote:
| Try chibicc. It's x86_64 native and so much more readable as
| a codebase than TCC.
| dantrell wrote:
| While Fabrice Bellard is no longer working on TCC [0] and an
| official release tarball hasn't been packaged since version
| 0.9.27 (5 years ago) the project is by no means unmaintained.
|
| For details, check their current working repository [1] and
| mailing list [2].
|
| [0]: https://bellard.org/tcc/
|
| [1]: https://repo.or.cz/tinycc.git
|
| [2]: https://lists.nongnu.org/archive/html/tinycc-devel/
| siliconunit wrote:
| I'm quite confused, not the same project at all? To me tiny c
| compiler always meant the bellard page. Super useful stuff for
| micro hacky projects.
| hawski wrote:
| One could say that the one from this submission is Tiny-C
| Compiler and Bellard's is Tiny C-Compiler.
| Narishma wrote:
| This is a compiler for a language called Tiny-C.
| notorandit wrote:
| I understand the confusion: it is more about "syntax
| associativity"
|
| (tiny C) compiler --> "This is a compiler for the Tiny-C
| language"
|
| vs
|
| Tiny (C compiler) --> "TinyCC [...] is a small but hyper fast
| C compiler"
|
| That's it! ;-)
| moffkalast wrote:
| Now obviously the next step is to make a tiny tiny c
| compiler compiler.
| Koshkin wrote:
| Sigh. I wish people would teach compilers using Oberon as an
| example. One can write a small yet complete compiler for (what
| turns out to be not-so-tiny) a language.
| peacefulhat wrote:
| Best to pick languages anybody has heard of.
| stevekemp wrote:
| That's a cute project, thanks for sharing.
|
| I hacked in support for ">", ">=", and "<=" to match the "<"
| support, but I just noticed that ints are truncated, so the
| maximum value stored in a variable is 127.
| bitwize wrote:
| Oh, Marc Feeley. Wonder if we'll see a Tiny-C target for Gambit?
| feeley wrote:
| That's not on my TODO! But Gambit does have support for TCC.
| For example you can use TCC to compile a file to a dynamically
| loadable object file (aka shared library). The compilation is
| faster than gcc and the code size is typically smaller too:
| $ cat hello.scm (display "hello!\n") $ gsc
| hello.scm $ gsi hello.o1 hello! $ ls -l
| hello.o1 # this is generated by gcc -rwxrwxr-x 1 feeley
| feeley 18152 Mar 13 17:16 hello.o1 $ rm hello.o1 $
| gsc -cc "tcc -shared" hello.scm $ gsi hello.o1
| hello! $ ls -l hello.o1 # this is generated by tcc
| -rwxrwxr-x 1 feeley feeley 4432 Mar 13 17:17 hello.o1
| fernly wrote:
| Um, excuse me, but there existed a Tiny-C in 1979. Whatever you
| are talking about creating in 2000 is in no way an original idea.
|
| References:
|
| Dr. Dobb's Journal #32 (Feb 1979) page 41, review of Tiny-C User
| Manual by Ted Shapin [0]
|
| Dr. Dobb's Journal #35 (May 1979) page 37, "Tiny-C Interpreter on
| C-Dos" by Ray Duncan[1]
|
| Tiny-C Associates incorporated in Holmdel, NJ, March 1978 [2]
|
| "Tiny C" trademark application filed 1979, cancelled 1987 [3]
|
| There was also a "Small C", see DDJ #69 (July 1982) p. 66, "Small
| C for the 9900" by Matthew Halfant[4]
|
| [0]
| https://archive.org/details/dr_dobbs_journal_vol_04_201803/p...
|
| [1]
| https://archive.org/details/dr_dobbs_journal_vol_04_201803/p...
|
| [2] https://www.bizapedia.com/nj/tiny-c-associates.html
|
| [3] https://alter.com/trademarks/tiny-c-73219160
|
| [4]
| https://archive.org/details/dr_dobbs_journal_vol_07_201803/p...
| mati365 wrote:
| Recently I'm working on toy C compiler and x86 Assembler in
| TypeScript[1] and I can confirm that the amount of work that have
| to be done to compile and print simple Hello World is
| astronomically huge (as the satisfaction)
|
| [1] https://github.com/Mati365/ts-c-compiler
| Narishma wrote:
| This isn't a C compiler though. It's a compiler for a language
| called Tiny-C.
| [deleted]
| jokoon wrote:
| first assignment would be to add the multiply and divide
| operators...
|
| I admit I have trouble understanding how the VM run() function
| works... anybody can give some insight?
| mav88 wrote:
| The function runs through the program by incrementing the
| program counter (*pc++) and dispatching what instruction it
| sees. It's a stack-based VM so individual instructions are
| pushed onto and popped from the stack depending on the
| operation. Is there anything specific you don't grok? Happy to
| help.
| feeley wrote:
| Author here. Just for context tinyc.c was created in 2000 (I
| found the file in my archives and the last modification date is
| January 12, 2001). I was not aware at the time of Fabrice
| Bellard's work which after all won the IOCCC in 2001, so the
| confusion with TCC was not intentional. My tinyc.c was meant to
| teach the basics of compilers in a relatively accessible way,
| from parsing to AST to code generation to bytecode interpreter.
| And yes it is the subset of C that is tiny, not a tiny compiler
| for the full C language.
| bullen wrote:
| I wish I had time to make a list what would be required to
| bootstrap this.
|
| Either by adding complexity (more features to the compiler) or
| dropping complexity (fewer C features in the implementation).
|
| Did you ever look at that?
|
| Edit: functions, enum, struct, arrays and maybe make all
| variables/functions a-z?
|
| Edit2: https://joyofsource.com/projects/bootstrappable-tcc.html
| userbinator wrote:
| It's unfortunately not self-compiling, but has a structure which
| is very reminiscent of C4 --- another tiny C-subset compiler +
| stack-based VM which is self-compiling:
|
| https://news.ycombinator.com/item?id=8558822
|
| The 26 predefined integer variables make this look like a variant
| of minimal BASIC, except with structured control flow instead of
| only GOTO.
| bakul wrote:
| This doesn't have types, functions, arrays or much error
| checking. It has one char identifiers. I don't think we should
| read into this any more than a tiny example or experiment by the
| author.
| Gordonjcp wrote:
| So, it's the C equivalent of Tiny BASIC?
|
| So, a Tiny C?
| bakul wrote:
| Not even that as it doesn't have function calls or even
| print!
|
| See Feeley's response for the proper context.
| netgusto wrote:
| It's worth noting that this is a compiler for the Tiny-C
| language, and not as one might think a tiny compiler for the C
| language.
| susam wrote:
| Yes, a better title would be:
|
| Compiler for the Tiny-C Language (2001)
|
| In fact, that is exactly how the source code describes itself
| in the comments.
| unwind wrote:
| It's probably better to call it an interpreter, since it will
| also run the program and print the values of all non-zero
| variables afterward.
|
| Calling it a compiler is (to me) really stretching things, I
| can't see any code to emit any other form of the code, it's all
| aimed at evaluating (executing) it.
|
| Edit: oops, I didn't read the code closely enough, it does emit
| code but only internally, that code is what gets executed.
| Thanks for the corrections!
| northernskys30 wrote:
| It compiles to a sort of byte code that is executed by a
| stack based virtual machine.
| userbinator wrote:
| It is a compiler rather than a direct evaluator, since it
| generates bytecode for a stack VM --- and also includes the
| interpreter for that (look at the bottom).
| masklinn wrote:
| That's more or less every interpreter. CPython compiles to
| bytecode before interpreting that, yet nobody would call it
| a compiler.
| Mike_12345 wrote:
| That is definitely a compiler and anyone with a CS degree
| would call it that if they were discussing its
| functionality, because that's technically what it is.
| (Referring specifically to the part which compiles Python
| to bytecode)
|
| Your SQL database also has a compiler. SQL is compiled to
| an execution plan. Compile doesn't only mean "create a
| machine code executable file".
| masklinn wrote:
| > That is definitely a compiler and anyone with a CS
| degree would call it that if they were discussing its
| functionality because that's technically what it is.
|
| None of these assertions is correct.
|
| > (Referring specifically to the part which compiles
| Python to bytecode)
|
| So referring specifically to something different than
| what I explicitly specified, it's called something else.
|
| By that reasoning, a cow is a muscle and you are an acid.
|
| > Your SQL database also has a compiler.
|
| "Has a" and "is a" are rather different relationships.
|
| > Compile doesn't only mean "create a machine code
| executable file".
|
| You're the only person who made that assertion.
| shadowfox wrote:
| In contrast, Java also did that and I doubt if most
| people think of Java as interpreted. So, using a byte-
| code interpreter may not be the criteria most people are
| using to decide on this. Truthfully, I think it is all a
| bit arbitrary.
| [deleted]
| zabzonk wrote:
| not sure i understand how enums work here. but interesting.
___________________________________________________________________
(page generated 2023-03-13 23:01 UTC)