[HN Gopher] A toy programming language in 137 lines of Python code
___________________________________________________________________
A toy programming language in 137 lines of Python code
Author : miguelgrinberg
Score : 77 points
Date : 2023-07-02 15:36 UTC (7 hours ago)
(HTM) web link (blog.miguelgrinberg.com)
(TXT) w3m dump (blog.miguelgrinberg.com)
| mr_00ff00 wrote:
| Great article. I've always wanted to try and build a toy language
| for the fun of it, great to see something that breaks it down so
| well.
|
| Curious if anyone has any additional thoughts to add onto this
| for anyone who wants to build one themselves?
|
| I have experience in C++, so will likely use that or rust (I
| imagine rust's built-in methods might make it way easier at least
| for the parser, although idk if ownership rules get painful for
| work like this).
| benhoyt wrote:
| Yes, my thought is to read Bob Nystrom's "Crafting
| Interpreters". It's excellent, in-depth but not academic, and
| the writing style is great. https://craftinginterpreters.com/"
| He also has a lot of good thoughts about language design.
| 29athrowaway wrote:
| Once you have an intuition, you can use yacc.
| laxd wrote:
| And lose touch with your code and having too deal with problems
| through a heavily abstracted layer. Great!
| 29athrowaway wrote:
| You will likely not get a perfect syntax right away. So you
| will need to iterate on it until you are happy with how it
| looks and feels, and doing it while developing the compiler
| yourself will naturally lead to a lot of unnecessary bugs and
| tech debt.
|
| Alternatively, you iterate using a cheaper method and once
| you reach a point in which you are content with the syntax,
| you can consider writing a compiler for it.
|
| You can also avoid iterating on the syntax so you can ship a
| compiler as fast as possible, but users won't like the syntax
| and won't use your language.
| mcluck wrote:
| I have a feeling you haven't worked on parsers or compilers
| that much if you think you design all of the syntax before
| building anything
| remexre wrote:
| I don't really think of yacc as having much more abstraction
| than a regex engine -- like in an (NFA-based) regex engine,
| you're building a big data structure a human wouldn't want to
| hand-engineer from a declarative specification. (And, like in
| regexes, you've got good theory backing it.)
| laxd wrote:
| You got good theory backing hand-written parsers, and the
| code is there right in your face. I don't know how a regex
| engine works.
| remexre wrote:
| By "good theory," I mean being able to know properties of
| the parser as a result of the formalism.
|
| For example, one can prove that any yacc-generated parser
| never infinite loops and runs in O(n) (at least for the
| actual parsing; you can write custom code in actions that
| runs when a construct has been parsed, and of course you
| can write whatever you want there).
|
| If you hand-write a grammar like the following, you'll
| end up with an infinite loop without restructuring the
| grammar; such restructuring is of course much easier if
| you have written the grammar out, rather than having it
| implicitly exist in code. Expr ::= Expr
| '[' Expr ']' Expr ::= Name Expr ::= Int
|
| If you try to naively parse the following, you'll never
| parse the marked production either. Something like yacc
| will warn you about this, while an ordinary compiler
| compiling a hand-written parser won't.
| Stmt ::= Expr ';' Stmt ::= Name '=' Expr ';'
| Stmt ::= Expr '[' Expr ']' '=' Expr ';' Expr ::=
| Expr '[' Expr ']' Expr ::= Name Expr ::=
| Int
|
| [0] is a good introduction to regex matching, [1] is a
| good introduction to the algorithm yacc uses, if you want
| to learn either.
|
| [0]: https://swtch.com/~rsc/regexp/regexp1.html [1]: http
| s://web.archive.org/web/20210507215636/http://web.cs.dal.
| ..
| laxd wrote:
| > For example, one can prove that any yacc-generated
| parser never infinite loops and runs in O(n)
|
| Trivial when actually writing the code. And your example
| of infinite left recursion is basics in parser theory. No
| one in their right mind would write a hanging program
| like that, at least not for long. The problem just seem
| more clever when written in grammars.
| remexre wrote:
| Left recursion is basic, sure, but it's not a problem for
| an (LA)LR parser generator like yacc. (Depending on your
| parse actions, you might even get slightly better memory
| consumption when using left recursion rather than right
| recursion.)
| miguelgrinberg wrote:
| Knowing how to build a parser manually will make you much much
| better at generating, and most importantly debugging,
| automatically generated parsers.
| 29athrowaway wrote:
| Yes. And once you are done learning every aspect of it, one
| day you will look yourself in the mirror and you will be 60
| years old.
| benj111 wrote:
| Why do you think people are writing toy programming
| languages?
|
| It's a learning exercise. Sure, if you want to learn to use
| yacc, or whatever, use yacc. I don't think time efficiency
| is a useful metric when the overwhelming majority of people
| is this space aren't doing it for their job, or even
| because they 'need' to create a new language.
|
| Ultimately you could just use python, it'll be faster and
| more fully featured.
| furyofantares wrote:
| I'm not sure why this intuition is so strong, I had it
| myself before learning this stuff in my early 20s on
| shipping projects (both existing, and from scratch) and at
| the end of that I was still in my early 20s.
|
| Despite this intuition it's all actually very learnable and
| my experience is that using a parser generator makes no
| sense in a professional context and very little sense in a
| hobby/toy project context.
| 3cats-in-a-coat wrote:
| Once you have an intuition, you just write the parser yourself.
|
| I'm not a fan of parser generators. I find them restrictive. It
| is also notable most mainstream languages do not use tools like
| YACC, they have "handmade" parsers and compilers.
|
| This may seem like I'm suggesting pie-in-the-sky approach from
| the ground up for something that may never need such control.
| Could be. But I don't feel like learning YACC is easier than
| learning how to parse anything in general. I think YACC and so
| on are holding people back from truly expressing themselves
| with custom languages, and after all that was the whole point
| of them writing a language or wasn't it?
| 29athrowaway wrote:
| So let's say, you decide to create your own programming
| language, now you can automatically afford to do as much work
| as the organizations behind mainstream programming languages?
| Organizations with thousands of contributors, funding, etc.?
|
| Now, let's omit that for a moment. Let's say that after
| having implemented your language after years of work, now for
| some reason you got 1 prospect user.
|
| That person will ask you about: syntax highlighting, linting,
| code navigation, testing, packaging, documentation, operating
| systems/architecture support, etc. Who is going to contribute
| all that if you are busy writing everything by hand?
|
| Unless you have as much time as Terry Davis, you are better
| off starting by piggybacking on something that exists. Then,
| once you have an idea that scales, you can convince other
| people to help you and have a successful project. Then you
| can have a viable ecosystem that people can comfortably join.
| laxd wrote:
| .... "syntax highlighting, linting, code navigation,
| testing, packaging, documentation, operating
| systems/architecture support"
|
| This has nothing to do with using a parser generator or
| not.
| 29athrowaway wrote:
| It has to do with having time to spend in creating a
| convenient development experience and picking your
| battles.
| laxd wrote:
| That's agreeable. We seems to disagree on how to achieve
| that ideal - on the specific point of whether to use a
| parser generator or not. The correct answer is not :)
| jstimpfle wrote:
| The real problem about parsing is not parsing the formal
| grammar. It's deciding what to store about the parsed input
| (CST to AST) and designing the ancillary datastructures for
| further processing, integrating the parser with the rest of
| the application, etc.
|
| The real problem about _building a compiler_ is everything
| else but not parsing. Parsing is the easiest part. Once you
| 've written a view recursive descent parsers you will
| probably agree.
|
| The parser for syntax highlighting is probably a different
| one than the parser for AST generation and compilation. So
| is a parser for batch compilation different to one for
| incremental compilation. Even the _syntax_ might be
| different.
| 3cats-in-a-coat wrote:
| Using YACC doesn't give you magical integration with IDEs,
| operating systems, testing, documentation and so on. I feel
| as if you took my specific comment about parsers (which are
| not hard to write at all, few hundred lines of code for a
| moderately complex language) in an inappropriately general
| way as in "don't use anything for anything when doing
| anything".
| Tade0 wrote:
| I'm not well versed in this space. Are you suggesting
| designing a language without specifying a formal grammar?
| jstimpfle wrote:
| One does not preclude the other. Most languages that lend
| themselves to hand-parsing via recursive descent are
| probably LL(1).
|
| Syntaxes that are easy to parse via recursive descent are
| probably also the ones that are easiest to parse for
| humans.
| tomp wrote:
| short code review:
|
| - the `tokens` method returning a variable-length tuple is a very
| bad idea; modern Python supports the `datatype` decorator which
| should be used in this case
|
| - I'd strongly recommend using a regex for lexing; much easier to
| get it right than a long if/elif/else block
|
| - I'd recommend refactoring code and implementing the
| `next_token` logic so that it works for _any_ generator; an
| example implementation could be (WARNING: I wrote this in 5min,
| would require extensive unittests!) class Peek:
| def __init__(self, it): self.cur, self.nxt =
| itertools.tee(it, 2) self.current = next(self.cur)
| def __next__(self): self.current = next(self.cur)
| return next(self.nxt)
|
| - statements with a "prefix" (like `print X` or `if Y`) are very
| easy to parse (just use the lookahead token!) but when you get to
| parsing expressions, I strongly recommend using a Pratt parser
| (extensible operator precedence parsing)
| benhoyt wrote:
| > modern Python supports the `datatype` decorator
|
| I presume you mean the @dataclass decorator?
| tomp wrote:
| indeed :D
| intalentive wrote:
| I love Python but after taking PL in Racket (a Lisp-like
| language) it's hard to imagine implementing toy languages
| otherwise. Seems like Python 3.10's match statements might come
| in handy.
| BoppreH wrote:
| This is a really fun exercise, I recommend every programmer try
| it once. You can replace most of the tokenization code with
| re.Scanner[1], which also allows you to have strings without
| worrying about `code.split()` messing them up.
|
| [1] https://news.ycombinator.com/item?id=36517749
| Fraterkes wrote:
| This seems great. When I read these kinds of explanations, it
| always strikes me how the way you're supposed to write a
| programming languages is basically the same way someone who had
| never written one might do it. Just break the input up into
| units, map the units to some category, and connect functionality
| to that. Like if you forced someone who had 2 weeks of
| programming experience to come up with a way to turn code into
| programs, they would probably think of something like the
| lexer/parser/evaluater process if you gave them enough time
| gabrielsroka wrote:
| Actual title "Building a Toy Programming Language in Python"
| codetrotter wrote:
| I think using a different title on HN is fine when OP is the
| author, such as here
| gabrielsroka wrote:
| I noticed that after I posted it. But I still have a "web"
| button in my HN browser, and it doesn't work unless the title
| is correct.
|
| Of course, that author can change the title on the website as
| well.
| frgfm wrote:
| Really great content, and an interesting exercise! A while back,
| I wanted to do something similar but built on top of a low-level
| language to get good performances on a specific use case. I'd
| love to see an article about something built on top of Rust or C
| (like a retrospective analysis on success/mistake of Python over
| the years)
___________________________________________________________________
(page generated 2023-07-02 23:00 UTC)