[HN Gopher] Language and shell in Go with 92% test coverage and ...
___________________________________________________________________
Language and shell in Go with 92% test coverage and instant CI/CD
[video]
Author : todsacerdoti
Score : 154 points
Date : 2024-08-30 15:09 UTC (1 days ago)
(HTM) web link (www.youtube.com)
(TXT) w3m dump (www.youtube.com)
| xiaq wrote:
| Hey, it's my talk, AMA :)
|
| If you're interested in Elvish, you may also be interested in the
| talk on its design - https://www.youtube.com/watch?v=wrl9foNXdgM
| HeralFacker wrote:
| Do you have a link to a copy of the video with captions?
| YouTube autogen doesn't cut it unfortunately. Or perhaps a
| written-form version (slide deck + transcript)?
|
| What's in the 8% not covered by testing?
| xiaq wrote:
| I don't have a version with captions, sorry. You can find the
| slidedeck at https://github.com/elves/elvish/blob/master/webs
| ite/slides/2...
|
| The remaining 8% mostly falls into the following categories:
|
| - Code that use OS functionalities that are cumbersome to
| mock in tests
|
| - Code paths that are triggered relatively rarely and I was
| simply too lazy to add tests for them
|
| Nothing is impossible to cover, but for whatever reason it
| was too much work for me when I wrote the code.
|
| However, it's worth mentioning that I only settled on the
| transcript test pattern fairly recently, and if I were to
| rewrite or refactor some of the untested code today I would
| add tests for them, because the cost of adding tests has been
| lowered considerably. So Elvish's test coverage is still
| increasing slowly as the cost of testing decreases.
| zvolsky wrote:
| Hey, thanks again for the talk and for answering my fork bomb
| question with a live demo!
| xiaq wrote:
| Thanks for your question and glad that you enjoyed it!
| hnlmorg wrote:
| I thought you handled the question really well. To be
| honest the whole talk was excellent. I'm gutted I missed it
| in person.
| xiaq wrote:
| Thanks! Murex talk when??? :)
| hnlmorg wrote:
| haha I can't present nearly as well as yourself but maybe
| one day.
|
| It's not easy to present though. I know on HN we see a
| lot of very clever people give some well executed
| presentations and it's sometimes easy to forget how much
| preparation and courage it takes to perform like that.
| And it's great to see how engaged people were with the
| content too.
|
| Sorry, this is less of a question and more just comment
| of appreciation.
| xiaq wrote:
| Thanks, I appreciate the comment the appreciation :)
| heleninboodler wrote:
| There were a lot of aspects of this talk that I thought
| were really great. The willingness to try something
| unscripted, diving into the code repo live (e.g. to show
| where fuzzing is used), and the discussions of the
| reasoning behind the design choices. Great job @xiaq.
| This really makes me want to try elvish out, and I
| usually am quite skeptical of new shells.
| xiaq wrote:
| Thanks! Glad that the talk is working as a marketing
| pitch for Elvish :)
| 0xdeadbeefbabe wrote:
| In vim vi or nvim :r !date gives me shell returned 2
| xiaq wrote:
| Did you set your login shell to Elvish? Vim unfortunately
| relies on your shell being a POSIX shell, but you can fix
| that with "set shell=/bin/sh" in your rc file.
| xiaq wrote:
| FWIW, I've just added this instruction to
| https://elv.sh/get/default-shell.html#vim-/-neovim
| mpenick wrote:
| Does elvish have a command history limit? Or is it
| configurable? I like a nearly infinite history.
| xiaq wrote:
| History entries are kept indefinitely.
| cdcarter wrote:
| Do you have any written posts or documents about this language
| and your design decisions?
| whereistimbo wrote:
| https://elv.sh/
| xiaq wrote:
| I gave a talk about the design:
| https://www.youtube.com/watch?v=wrl9foNXdgM
|
| As the sibling comment mentioned, you can find documentation on
| Elvish itself on the website https://elv.sh. There are
| tutorials and (not 100% but fairly complete) reference
| documents.
| throwaway2016a wrote:
| This seems like a cool project.
|
| This is meant as additional information not criticism. I skimmed
| the transcript really fast so if this is in there and I missed
| it, please correct me, but two things I think are helpful for
| people creating projects like this to be aware of:
|
| - This video seems to combine the concepts of lexing and parsing.
| It is usually beneficial to separate these two steps and lex the
| input into tokens before passing to the parser.
|
| - Go actually has a pure Go implementation of Yacc in the toolset
| and I've used it in several projects to make parses. Dealing with
| the Yacc file is often much easier than dealing with code
| directly since it takes care of writing the actual parser. There
| is a lot of boiler plate that goes into parsers that when you use
| Yacc it "just works".
|
| Edit: there are also some tools for writing parsers in Lex/Flex
| like syntax (re2c comes to mind) but I've found hand writing
| lexers to be effective in Go if your language doesn't have many
| different types of tokens.
| ridiculous_fish wrote:
| Shells have somewhat unusual parsing requirements. For example
| "if" is a keyword when used as `if echo` but not `echo if`.
|
| So you either need to implement the lexer hack, or have a
| "string" token type which is disambiguated by the parser (which
| is what fish-shell does).
|
| https://en.wikipedia.org/wiki/Lexer_hack
| radiospiel wrote:
| unless i miss something this should not be an issue. the
| lexer could parse if as an IF token, and the parser could
| treat tags as STRING || IF ( || other keywords... )
| duskwuff wrote:
| That seems like it'd get really awkward pretty quickly.
| "if" isn't unique in this regard; there are about a hundred
| shell builtins, and all of them can be used as an argument
| to a command. (For example, "echo then complete command
| while true history" is a valid shell command consisting
| entirely of names of builtins, and the only keyword in it
| is the leading "echo".)
| deathanatos wrote:
| You'd have to `|| EVERY_KEYWORD_IN_LANG`, and then if you
| ever add a keyword, now you're updating that list there,
| _and_ anywhere else you 've used it.
|
| As the "Lexer hack" Wiki page says, this is only a problem
| if you're lexing in the first place. If you just parse the
| grammar, this isn't a problem.
| hnlmorg wrote:
| The problem lies with shells extensive usage of barewords.
| If you could eliminate the requirement for any bareword to
| be treated as a string then parsing shell code would then
| become much simpler...but also few people would want to use
| it because nobody wants to write the following in their
| interactive shell: git "commit" "-am"
| "message" ls "-l" etc
| throwaway2016a wrote:
| That's no problem in many modern lexers as they usually have
| a "state" so when you encounter "echo" you can switch to a
| new state and that state may have different token parsing
| rules. So "if" in the "echo" state could be a string literal
| whereas it may be a keyword in the initial state.
|
| Lex/Flex takes care of that mostly for you which is one of
| the benefits of using a well worn lexer generator and not
| rolling your own.
| xiaq wrote:
| Right, I may have forgot to mention that lexerless parsers are
| somewhat unusual.
|
| I didn't have much time in the talk to go into the reason, so
| here it is:
|
| - You'll need a more complex lexer to parse a shell-like
| syntax. For example, one common thing you do with lexers is get
| rid of whitespaces, but shell syntax is whitespace sensitive:
| "a$x" and "a $x" (double quotes not part of the code) are
| different things: the first is a single word containing a
| string concatenation, the second is two separate words.
|
| - If your parser backtracks a lot, lexing can improve
| performance: you're not going back characters, only tokens (and
| there are fewer tokens than characters). Elvish's parser
| doesn't backtrack. (It does use lookahead fairly liberally.)
|
| Having a lexerless parser does mean that you have to constantly
| deal with whitespaces in every place though, and it can get a
| bit annoying. But personally I like the conceptual simplicity
| and not having to deal with silly tokens like LBRACE, LPAREN,
| PIPE.
|
| I have not used parser generators enough to comment about the
| benefits of using them compared to writing a parser by hand.
| The handwritten one works well so far :)
| throwaway2016a wrote:
| That example you gave could certainly be done in Lex/Flex and
| I assume other lexers/tokenizers as well, for instance, you
| would probably use states and have "$x" in the initial state
| evaluate to a different token type than "$x" in the string
| state.
|
| But I do get your meaning, I've written a lot of tokenizers
| by hand as well, sometimes they can be easier to follow the
| hand written code. Config files for grammars can get
| convoluted fast.
|
| But again, I was not meaning it as criticism. But your talk
| title does start with "How to write a programming language
| and shell in Go" so given the title I think Lexers /
| Tokenizers are worth noting.
| xiaq wrote:
| Yeah, ultimately there's an element of personal taste at
| play.
|
| The authoritative tone of "how to write ..." is meant in
| jest, but obviously by doing that I risk being
| misunderstood. A more accurate title would be "how I wrote
| ...", but it's slightly boring and I was trying hard to get
| my talk proposal accepted you see :)
| throwaway2016a wrote:
| As someone who has given a handful of talks at
| conferences.. 100% relatable.
| lolinder wrote:
| > Dealing with the Yacc file is often much easier than dealing
| with code directly since it takes care of writing the actual
| parser. There is a lot of boiler plate that goes into parsers
| that when you use Yacc it "just works".
|
| Honestly, I think this is overstating the amount of boilerplate
| in a parser and overstating how well a parser generator "just
| works". I haven't used Yacc, so maybe it's better than ANTLR,
| but having tried ANTLR and written a few recursive descent
| parsers I've been pretty well cured of wanting to ever use a
| parser generator. ANTLR's generated code is verbose, the data
| structures are hard to work with, and error handling leaves a
| lot to be desired.
|
| Parser boilerplate can be reduced to a large extent with a good
| set of helper methods (I often find myself referring back to
| the set used in Crafting Interpreters [0]), and what you get in
| exchange is full control over the data structure generated by
| the parser and over the error handling. For a language that
| you're serious about, that tradeoff is totally worth it.
|
| [0] http://craftinginterpreters.com/
| pianoben wrote:
| Maybe it's just my skill level, but I've used both hand-
| rolled recursive-descent and ANTLR for the same project
| (Thrift parser), and hoo boy I would _never_ go back to
| recursive-descent for that. ANTLR shrank my code by an order
| of magnitude, and cleaned up some bugs too.
|
| I'd be willing to believe that beyond a certain level of
| input complexity, ANTLR no longer pays for itself. In my
| experience, there exists a class of languages for which
| there's no better tool.
| xiaq wrote:
| I would love to see the diff between the hand-rolled
| recursive-descent parser and the ANTLR syntax!
|
| I certainly feel the amount of boilerplate in my hand-
| rolled recursive-descent parser is manageable. Of course
| it's not as succinct as an EBNF grammar:
|
| - For example, you have to write an actual loop (with "for"
| and looping conditions) instead of just * for repetition
|
| - The Go formatter demands a newline in most control flows
|
| - Go is also not the most succinct language in general
|
| So you do end up with many more lines of code. But at the
| end of the day, the structure of each parsing function is
| remarkably similar to a production rule, and for simpler
| ones I can mentally map between them pretty easily, with
| the added benefit of being able to insert code anywhere if
| I need something beyond old-school context-free parsing.
| adastra22 wrote:
| > This video seems to combine the concepts of lexing and
| parsing. It is usually beneficial to separate these two steps
| and lex the input into tokens before passing to the parser.
|
| Historically, yes. In recent years combined lever-parsers have
| outperformed dedicated lexer + dedicated parser combinations,
| and with modern tooling this isn't the janky mess it used to
| be. Some of the best tools out there are combined lexer-
| parsers.
| eru wrote:
| > - This video seems to combine the concepts of lexing and
| parsing. It is usually beneficial to separate these two steps
| and lex the input into tokens before passing to the parser.
|
| With traditional techniques, yes. But if you eg use parser
| combinators (which would admittedly a bit unusual in Go),
| combining both steps is pretty common.
|
| > - Go actually has a pure Go implementation of Yacc in the
| toolset and I've used it in several projects to make parses.
| Dealing with the Yacc file is often much easier than dealing
| with code directly since it takes care of writing the actual
| parser. There is a lot of boiler plate that goes into parsers
| that when you use Yacc it "just works".
|
| You are right that it's best to avoid Go when you can. Just
| like Java folks (stereotypically) seemed to avoid writing Java
| at all costs and rather wrote XML config files to drive their
| logic.
|
| Yacc (and lex) are otherwise not good choice for specifying
| languages these days.
___________________________________________________________________
(page generated 2024-08-31 23:02 UTC)