[HN Gopher] Language and shell in Go with 92% test coverage and ...
       ___________________________________________________________________
        
       Language and shell in Go with 92% test coverage and instant CI/CD
       [video]
        
       Author : todsacerdoti
       Score  : 154 points
       Date   : 2024-08-30 15:09 UTC (1 days ago)
        
 (HTM) web link (www.youtube.com)
 (TXT) w3m dump (www.youtube.com)
        
       | xiaq wrote:
       | Hey, it's my talk, AMA :)
       | 
       | If you're interested in Elvish, you may also be interested in the
       | talk on its design - https://www.youtube.com/watch?v=wrl9foNXdgM
        
         | HeralFacker wrote:
         | Do you have a link to a copy of the video with captions?
         | YouTube autogen doesn't cut it unfortunately. Or perhaps a
         | written-form version (slide deck + transcript)?
         | 
         | What's in the 8% not covered by testing?
        
           | xiaq wrote:
           | I don't have a version with captions, sorry. You can find the
           | slidedeck at https://github.com/elves/elvish/blob/master/webs
           | ite/slides/2...
           | 
           | The remaining 8% mostly falls into the following categories:
           | 
           | - Code that use OS functionalities that are cumbersome to
           | mock in tests
           | 
           | - Code paths that are triggered relatively rarely and I was
           | simply too lazy to add tests for them
           | 
           | Nothing is impossible to cover, but for whatever reason it
           | was too much work for me when I wrote the code.
           | 
           | However, it's worth mentioning that I only settled on the
           | transcript test pattern fairly recently, and if I were to
           | rewrite or refactor some of the untested code today I would
           | add tests for them, because the cost of adding tests has been
           | lowered considerably. So Elvish's test coverage is still
           | increasing slowly as the cost of testing decreases.
        
         | zvolsky wrote:
         | Hey, thanks again for the talk and for answering my fork bomb
         | question with a live demo!
        
           | xiaq wrote:
           | Thanks for your question and glad that you enjoyed it!
        
             | hnlmorg wrote:
             | I thought you handled the question really well. To be
             | honest the whole talk was excellent. I'm gutted I missed it
             | in person.
        
               | xiaq wrote:
               | Thanks! Murex talk when??? :)
        
               | hnlmorg wrote:
               | haha I can't present nearly as well as yourself but maybe
               | one day.
               | 
               | It's not easy to present though. I know on HN we see a
               | lot of very clever people give some well executed
               | presentations and it's sometimes easy to forget how much
               | preparation and courage it takes to perform like that.
               | And it's great to see how engaged people were with the
               | content too.
               | 
               | Sorry, this is less of a question and more just comment
               | of appreciation.
        
               | xiaq wrote:
               | Thanks, I appreciate the comment the appreciation :)
        
               | heleninboodler wrote:
               | There were a lot of aspects of this talk that I thought
               | were really great. The willingness to try something
               | unscripted, diving into the code repo live (e.g. to show
               | where fuzzing is used), and the discussions of the
               | reasoning behind the design choices. Great job @xiaq.
               | This really makes me want to try elvish out, and I
               | usually am quite skeptical of new shells.
        
               | xiaq wrote:
               | Thanks! Glad that the talk is working as a marketing
               | pitch for Elvish :)
        
         | 0xdeadbeefbabe wrote:
         | In vim vi or nvim :r !date gives me shell returned 2
        
           | xiaq wrote:
           | Did you set your login shell to Elvish? Vim unfortunately
           | relies on your shell being a POSIX shell, but you can fix
           | that with "set shell=/bin/sh" in your rc file.
        
             | xiaq wrote:
             | FWIW, I've just added this instruction to
             | https://elv.sh/get/default-shell.html#vim-/-neovim
        
         | mpenick wrote:
         | Does elvish have a command history limit? Or is it
         | configurable? I like a nearly infinite history.
        
           | xiaq wrote:
           | History entries are kept indefinitely.
        
       | cdcarter wrote:
       | Do you have any written posts or documents about this language
       | and your design decisions?
        
         | whereistimbo wrote:
         | https://elv.sh/
        
         | xiaq wrote:
         | I gave a talk about the design:
         | https://www.youtube.com/watch?v=wrl9foNXdgM
         | 
         | As the sibling comment mentioned, you can find documentation on
         | Elvish itself on the website https://elv.sh. There are
         | tutorials and (not 100% but fairly complete) reference
         | documents.
        
       | throwaway2016a wrote:
       | This seems like a cool project.
       | 
       | This is meant as additional information not criticism. I skimmed
       | the transcript really fast so if this is in there and I missed
       | it, please correct me, but two things I think are helpful for
       | people creating projects like this to be aware of:
       | 
       | - This video seems to combine the concepts of lexing and parsing.
       | It is usually beneficial to separate these two steps and lex the
       | input into tokens before passing to the parser.
       | 
       | - Go actually has a pure Go implementation of Yacc in the toolset
       | and I've used it in several projects to make parses. Dealing with
       | the Yacc file is often much easier than dealing with code
       | directly since it takes care of writing the actual parser. There
       | is a lot of boiler plate that goes into parsers that when you use
       | Yacc it "just works".
       | 
       | Edit: there are also some tools for writing parsers in Lex/Flex
       | like syntax (re2c comes to mind) but I've found hand writing
       | lexers to be effective in Go if your language doesn't have many
       | different types of tokens.
        
         | ridiculous_fish wrote:
         | Shells have somewhat unusual parsing requirements. For example
         | "if" is a keyword when used as `if echo` but not `echo if`.
         | 
         | So you either need to implement the lexer hack, or have a
         | "string" token type which is disambiguated by the parser (which
         | is what fish-shell does).
         | 
         | https://en.wikipedia.org/wiki/Lexer_hack
        
           | radiospiel wrote:
           | unless i miss something this should not be an issue. the
           | lexer could parse if as an IF token, and the parser could
           | treat tags as STRING || IF ( || other keywords... )
        
             | duskwuff wrote:
             | That seems like it'd get really awkward pretty quickly.
             | "if" isn't unique in this regard; there are about a hundred
             | shell builtins, and all of them can be used as an argument
             | to a command. (For example, "echo then complete command
             | while true history" is a valid shell command consisting
             | entirely of names of builtins, and the only keyword in it
             | is the leading "echo".)
        
             | deathanatos wrote:
             | You'd have to `|| EVERY_KEYWORD_IN_LANG`, and then if you
             | ever add a keyword, now you're updating that list there,
             | _and_ anywhere else you 've used it.
             | 
             | As the "Lexer hack" Wiki page says, this is only a problem
             | if you're lexing in the first place. If you just parse the
             | grammar, this isn't a problem.
        
             | hnlmorg wrote:
             | The problem lies with shells extensive usage of barewords.
             | If you could eliminate the requirement for any bareword to
             | be treated as a string then parsing shell code would then
             | become much simpler...but also few people would want to use
             | it because nobody wants to write the following in their
             | interactive shell:                   git "commit" "-am"
             | "message"              ls "-l"           etc
        
           | throwaway2016a wrote:
           | That's no problem in many modern lexers as they usually have
           | a "state" so when you encounter "echo" you can switch to a
           | new state and that state may have different token parsing
           | rules. So "if" in the "echo" state could be a string literal
           | whereas it may be a keyword in the initial state.
           | 
           | Lex/Flex takes care of that mostly for you which is one of
           | the benefits of using a well worn lexer generator and not
           | rolling your own.
        
         | xiaq wrote:
         | Right, I may have forgot to mention that lexerless parsers are
         | somewhat unusual.
         | 
         | I didn't have much time in the talk to go into the reason, so
         | here it is:
         | 
         | - You'll need a more complex lexer to parse a shell-like
         | syntax. For example, one common thing you do with lexers is get
         | rid of whitespaces, but shell syntax is whitespace sensitive:
         | "a$x" and "a $x" (double quotes not part of the code) are
         | different things: the first is a single word containing a
         | string concatenation, the second is two separate words.
         | 
         | - If your parser backtracks a lot, lexing can improve
         | performance: you're not going back characters, only tokens (and
         | there are fewer tokens than characters). Elvish's parser
         | doesn't backtrack. (It does use lookahead fairly liberally.)
         | 
         | Having a lexerless parser does mean that you have to constantly
         | deal with whitespaces in every place though, and it can get a
         | bit annoying. But personally I like the conceptual simplicity
         | and not having to deal with silly tokens like LBRACE, LPAREN,
         | PIPE.
         | 
         | I have not used parser generators enough to comment about the
         | benefits of using them compared to writing a parser by hand.
         | The handwritten one works well so far :)
        
           | throwaway2016a wrote:
           | That example you gave could certainly be done in Lex/Flex and
           | I assume other lexers/tokenizers as well, for instance, you
           | would probably use states and have "$x" in the initial state
           | evaluate to a different token type than "$x" in the string
           | state.
           | 
           | But I do get your meaning, I've written a lot of tokenizers
           | by hand as well, sometimes they can be easier to follow the
           | hand written code. Config files for grammars can get
           | convoluted fast.
           | 
           | But again, I was not meaning it as criticism. But your talk
           | title does start with "How to write a programming language
           | and shell in Go" so given the title I think Lexers /
           | Tokenizers are worth noting.
        
             | xiaq wrote:
             | Yeah, ultimately there's an element of personal taste at
             | play.
             | 
             | The authoritative tone of "how to write ..." is meant in
             | jest, but obviously by doing that I risk being
             | misunderstood. A more accurate title would be "how I wrote
             | ...", but it's slightly boring and I was trying hard to get
             | my talk proposal accepted you see :)
        
               | throwaway2016a wrote:
               | As someone who has given a handful of talks at
               | conferences.. 100% relatable.
        
         | lolinder wrote:
         | > Dealing with the Yacc file is often much easier than dealing
         | with code directly since it takes care of writing the actual
         | parser. There is a lot of boiler plate that goes into parsers
         | that when you use Yacc it "just works".
         | 
         | Honestly, I think this is overstating the amount of boilerplate
         | in a parser and overstating how well a parser generator "just
         | works". I haven't used Yacc, so maybe it's better than ANTLR,
         | but having tried ANTLR and written a few recursive descent
         | parsers I've been pretty well cured of wanting to ever use a
         | parser generator. ANTLR's generated code is verbose, the data
         | structures are hard to work with, and error handling leaves a
         | lot to be desired.
         | 
         | Parser boilerplate can be reduced to a large extent with a good
         | set of helper methods (I often find myself referring back to
         | the set used in Crafting Interpreters [0]), and what you get in
         | exchange is full control over the data structure generated by
         | the parser and over the error handling. For a language that
         | you're serious about, that tradeoff is totally worth it.
         | 
         | [0] http://craftinginterpreters.com/
        
           | pianoben wrote:
           | Maybe it's just my skill level, but I've used both hand-
           | rolled recursive-descent and ANTLR for the same project
           | (Thrift parser), and hoo boy I would _never_ go back to
           | recursive-descent for that. ANTLR shrank my code by an order
           | of magnitude, and cleaned up some bugs too.
           | 
           | I'd be willing to believe that beyond a certain level of
           | input complexity, ANTLR no longer pays for itself. In my
           | experience, there exists a class of languages for which
           | there's no better tool.
        
             | xiaq wrote:
             | I would love to see the diff between the hand-rolled
             | recursive-descent parser and the ANTLR syntax!
             | 
             | I certainly feel the amount of boilerplate in my hand-
             | rolled recursive-descent parser is manageable. Of course
             | it's not as succinct as an EBNF grammar:
             | 
             | - For example, you have to write an actual loop (with "for"
             | and looping conditions) instead of just * for repetition
             | 
             | - The Go formatter demands a newline in most control flows
             | 
             | - Go is also not the most succinct language in general
             | 
             | So you do end up with many more lines of code. But at the
             | end of the day, the structure of each parsing function is
             | remarkably similar to a production rule, and for simpler
             | ones I can mentally map between them pretty easily, with
             | the added benefit of being able to insert code anywhere if
             | I need something beyond old-school context-free parsing.
        
         | adastra22 wrote:
         | > This video seems to combine the concepts of lexing and
         | parsing. It is usually beneficial to separate these two steps
         | and lex the input into tokens before passing to the parser.
         | 
         | Historically, yes. In recent years combined lever-parsers have
         | outperformed dedicated lexer + dedicated parser combinations,
         | and with modern tooling this isn't the janky mess it used to
         | be. Some of the best tools out there are combined lexer-
         | parsers.
        
         | eru wrote:
         | > - This video seems to combine the concepts of lexing and
         | parsing. It is usually beneficial to separate these two steps
         | and lex the input into tokens before passing to the parser.
         | 
         | With traditional techniques, yes. But if you eg use parser
         | combinators (which would admittedly a bit unusual in Go),
         | combining both steps is pretty common.
         | 
         | > - Go actually has a pure Go implementation of Yacc in the
         | toolset and I've used it in several projects to make parses.
         | Dealing with the Yacc file is often much easier than dealing
         | with code directly since it takes care of writing the actual
         | parser. There is a lot of boiler plate that goes into parsers
         | that when you use Yacc it "just works".
         | 
         | You are right that it's best to avoid Go when you can. Just
         | like Java folks (stereotypically) seemed to avoid writing Java
         | at all costs and rather wrote XML config files to drive their
         | logic.
         | 
         | Yacc (and lex) are otherwise not good choice for specifying
         | languages these days.
        
       ___________________________________________________________________
       (page generated 2024-08-31 23:02 UTC)