[HN Gopher] Ohm - A library and language for building parsers, i...
       ___________________________________________________________________
        
       Ohm - A library and language for building parsers, interpreters,
       compilers, etc.
        
       Author : testing_1_2_3_4
       Score  : 210 points
       Date   : 2021-03-27 16:20 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | crazypython wrote:
       | This title is misleading. It's a library and language for
       | building parsers. Full stop. Parsing toolkit, as they say
       | themselves.
        
         | exdsq wrote:
         | The title copies the second sentence of their readme:
         | 
         | > You can use it to parse custom file formats or quickly build
         | parsers, interpreters, and compilers for programming languages.
        
           | UncleMeat wrote:
           | I guess it depends on what it means to somebody to build a
           | compiler. Something like yacc says "compiler compiler" in the
           | name but really it is a parser generator. The hard part of
           | industrial compilers is the optimization.
        
       | f430 wrote:
       | If I want to modify GraphQL to support custom syntax, would Ohm
       | work? Or does a solution exist already for my needs?
        
       | fjfaase wrote:
       | I recently wrote a similar parser, maybe less fancy, for a
       | workshop on parsing. It does display the the abstract syntax tree
       | and also as a build evaluator for a limited set of language
       | constructs.
       | https://fransfaase.github.io/ParserWorkshop/Online_inter_par...
       | It is based on a parser I implemented in C++.
        
       | rkagerer wrote:
       | OHM is also the acronym for Open Hardware Monitor, a great open-
       | source project for monitoring computer temperatures, fan speeds,
       | voltages, etc: https://openhardwaremonitor.org/
        
       | pjmlp wrote:
       | Love it, this is great for teaching purposes.
        
       | tobr wrote:
       | Speaking of - what's the status of HARC? Is it defunct?
        
         | azeirah wrote:
         | Yep, HARC is no more. I don't recall the exact history but iirc
         | SAP withdrew its funding and HARC basically ceased to exist.
         | 
         | Now, ohm survives as an open-source project, Bret Victor
         | continues work with Dynamicland and Vi Hart is currently
         | employed at Microsoft Research.
        
         | jagger27 wrote:
         | Defunct enough to let their TLS cert expire.
        
       | corysama wrote:
       | This is a follow-up to a major component of the
       | http://vpri.org/writings.php project that created an self-
       | contained office suite, OS and compiler suite in something like
       | 100-200k lines of code without external dependencies.
        
         | hobo_mark wrote:
         | Do you have a link to the project? I'm failing to find it on
         | that page.
        
           | beagle3 wrote:
           | Not op, and can't google now but the project was called
           | STEPS, they did a down-to-metal os including network and GUI
           | (and mote) in 20k lines.
           | 
           | Don't remember anything about office suite. Related names I
           | remember are Alan Kay, Dan Amelang, Alessandro Wirth and Ian
           | Piumarta.
        
             | elgertam wrote:
             | The biggest artifact from STEPS was Frank, which was at the
             | time bootstrapped using Squeak Smalltalk and included the
             | work from Ian Piumarta (IDST/Maru, which was a fully
             | bootstrapped LISP down to the metal), Dan Amelang (Nile,
             | the graphics language, and Gezira, the 2.5D graphics
             | library implemented in Nile, which both depended on Maru),
             | Alex Warth (OMeta, which had some sort of relationship to
             | Ian's work on Maru), Yoshiki Ohshima (a lot of the
             | experimental things from Alan's demos of Frank were made by
             | Yoshiki) and then several other names. I got close to
             | getting Frank working, but honestly, I'm not sure it's
             | worth it at this point. A lot of the work is 10-15 years
             | old, and the last time I dove in, I ran into issues running
             | 32-bit binaries. The individual components are more
             | interesting and could be packaged together in some other
             | way.
             | 
             | Since it was a research project, STEPS never quite achieved
             | a cohesive, unified experience, but they proved that the
             | individual components could be substantially minimized and
             | the cost of developing them amortized over a large project
             | like a full GUI environment. Nile and some of the
             | applications of Maru, like a minimal but functioning TCP/IP
             | stack that can be compiled to bare metal by virtue of being
             | made in Maru, still fascinate me.
             | 
             | Work on Maru is ongoing, albeit run by a community (with
             | some input from Ian), Nile has been somewhat reborn of
             | late, Ohm is again under active development as the
             | successor to OMeta and Alan is still around.
             | 
             | (Source: Dan is a friend and colleague, and I've met a few
             | of the STEPS/VPRI people that way.)
        
               | xkriva11 wrote:
               | Can you publish what you have collected?
        
             | renox wrote:
             | The 'Word' equivalent was called Frank but AFAIK nobody has
             | been able to reproduce what was demonstrated..
             | 
             | Quite painfully ironic for a software research project that
             | they didn't use properly a VCS..
        
               | elgertam wrote:
               | They did use VCS, actually, but a lot of them used SVN
               | and each person in the STEPS project was hosting their
               | own code. Most of those servers have gone dark now,
               | though you can find random ports over to GitHub (rarely
               | with the version history). As far as I can tell, Dan
               | Amelang and Alex Warth were the only two who used git or
               | moved their code over to git.
        
             | hobo_mark wrote:
             | Thank you, funnily enough this lead me back to the orange
             | website:
             | 
             | "STEPS Toward the Reinvention of Programming, 2012 Final
             | Report Submitted to the National Science Foundation (NSF)
             | October 2012"
             | 
             | https://news.ycombinator.com/item?id=11686325
        
           | e12e wrote:
           | See:
           | 
           | https://en.m.wikipedia.org/wiki/Ometa (including reference
           | section)
           | 
           | Or go to: http://www.vpri.org/writings.php
           | 
           | If I recall correctly you want: "STEPS Toward the Reinvention
           | of Programming, 2012 Final Report Submitted to the National
           | Science Foundation (NSF) October 2012" (and earlier reports)
           | 
           | Discussed on hn:
           | https://news.ycombinator.com/item?id=11686325
           | 
           | And: https://news.ycombinator.com/item?id=585360
           | 
           | Notable for implementing tcp/ip by parsing the rfc.
           | 
           | "A Tiny TCP/IP Using Non-deterministic Parsing Principal
           | Researcher: Ian Piumarta
           | 
           | For many reasons this has been on our list as a prime target
           | for extreme reduction. (...) See Appendix E for a more
           | complete explanation of how this "Tiny TCP" was realized in
           | well under 200 lines of code, including the definitions of
           | the languages for decoding header format and for controlling
           | the flow of packets."
           | 
           | (...)
           | 
           | "Appendix E: Extended Example: A Tiny TCP/IP Done as a Parser
           | (by Ian Piumarta) Elevating syntax to a 'first-class citizen'
           | of the programmer's toolset suggests some unusually expres-
           | sive alternatives to complex, repetitive, opaque and/or
           | error-prone code. Network protocols are a per- fect example
           | of the clumsiness of traditional programming languages
           | obfuscating the simplicity of the protocols and the internal
           | structure of the packets they exchange. We thought it would
           | be instructive to see just how transparent we could make a
           | simple TCP/IP implementation. Our first task is to describe
           | the format of network packets. Perfectly good descriptions
           | already exist in the various IETF Requests For Comments
           | (RFCs) in the form of "ASCII-art diagrams". This form was
           | probably chosen because the structure of a packet is
           | immediately obvious just from glancing at the pictogram. For
           | example:                 +-------------+-------------+-------
           | ------------------+----------+-------------------------------
           | ---------+       | 00 01 02 03 | 04 05 06 07 | 08 09 10 11 12
           | 13 14 15 | 16 17 18 | 19 20 21 22 23 24 25 26 27 28 29 30 31
           | |       +-------------+-------------+------------------------
           | -+----------+----------------------------------------+
           | |   version   |  headerSize |      typeOfService      |
           | length                        |       +-------------+--------
           | -----+-------------------------+----------+------------------
           | ----------------------+       |
           | identification                  |  flags   |
           | offset                |       +---------------------------+--
           | -----------------------+----------+--------------------------
           | --------------+       |       timeToLive          |
           | protocol        |                    checksum
           | |       +---------------------------+------------------------
           | -+---------------------------------------------------+
           | |                                               sourceAddress
           | |       +----------------------------------------------------
           | -----------------------------------------------------+
           | |
           | destinationAddress                                          |
           | +------------------------------------------------------------
           | ---------------------------------------------+
           | 
           | If we teach our programming language to recognize pictograms
           | as definitions of accessors for bit fields within structures,
           | our program is the clearest of its own meaning. The following
           | expression cre- ates an IS grammar that describes ASCII art
           | diagrams."
        
       | gklitt wrote:
       | Ohm's key selling point for me is the visual editor environment,
       | which shows how the parser is executing on various sample inputs
       | as you modify the grammar. It makes writing parsers fun rather
       | than tedious. One of the best applications of "live programming"
       | I've seen.
       | 
       | https://ohmlang.github.io/editor/
        
         | Waterluvian wrote:
         | A lot of regex testers do this and I can't imagine writing a
         | regex or a parser without.
        
       | tovej wrote:
       | Compiler compilers are great, I love writing DSLs for my
       | projects. I usually use yacc/lex, or write my own compiler
       | (typically in go these days).
       | 
       | However (and this is just me talking), I don't see the point in a
       | javascript-based compiler. Surely any file format/DSL/programming
       | language you write will be parsed server-side?
        
         | branneman wrote:
         | In that case, way I ask why you are not a Racket user? Sounds
         | like it'll save you a ton of time and keep your implementations
         | high level.
        
         | hansvm wrote:
         | (also just me talking -- here are some potential counterpoints)
         | 
         | The choice of language often matters a lot less than how
         | familiar you are with it (and its ecosystem(s)). I think it's
         | totally reasonable to want to use JS for a compiler in, e.g., a
         | Node project if for no other reason than to not have to learn
         | too many extra things at once to be productive with the new
         | tool.
         | 
         | I also don't think it's fair to assume everything will be
         | parsed, tokenized, etc server-side. Even assuming that data
         | originates server-side (since if it didn't you very well might
         | have a compelling case for handling it client-side if for no
         | other reason than latency), it's moderately popular nowadays to
         | serve a basically static site describing a bunch of dynamic
         | things for the frontend to do. Doing so can make it
         | easier/cheaper to hit any given SLA at the cost of making your
         | site unusable for underpowered clients and pushing those costs
         | to your users, and that tradeoff isn't suitable everywhere, but
         | it does exist.
         | 
         | It's interesting that you seem to implicitly assume the only
         | reason somebody would choose JS is that they're writing
         | frontend code. It's personally not my first choice for most
         | things, but it's not too hard to imagine that some aspect of JS
         | (e.g., npm) might make it a top contender for a particular
         | project despite its other flaws and tradeoffs.
        
         | RodgerTheGreat wrote:
         | There's a great deal of value to making programming
         | environments available in a browser, especially in the context
         | of creative coding and education. I have built and used many
         | such tools which are purely client-side.
         | 
         | There is a world of difference in accessibility between a tool
         | that requires installation and a tool that you can use by
         | following a hyperlink.
        
         | breck wrote:
         | > I don't see the point in a javascript-based compiler
         | 
         | My CC is Javascript based (well it was initially, then
         | TypeScript, now a lot of it is written in itself).
         | 
         | 99% of the time I use the actual languages I make in it server
         | side (nodejs), but I am able to develop the languages in my
         | browser using https://jtree.treenotation.org/designer/. It's
         | super easy and fun (at least for me, UX sucks for most people
         | at the moment). There's something somewhat magical about being
         | able to tweak a language from my iPhone and then send the new
         | lang to someone via text. (Warning: Designer is still hard to
         | use and a big refresh is overdue).
        
         | coldtea wrote:
         | > _Surely any file format /DSL/programming language you write
         | will be parsed server-side?_
         | 
         | Well, Javascript has been used for over a decade heavily on the
         | server side, with Node, WASM and other projects.
         | 
         | And as far as raw speed goes, something like v8 smokes all
         | scripting languages bar maybe LuaJit.
         | 
         | So, there's that...
        
         | chrisseaton wrote:
         | > I don't see the point in a javascript-based compiler
         | 
         | JavaScript is a full programming language. Why wouldn't it be a
         | fine choice to write a compiler in? People have a funny idea
         | that compilers are more complex software or are somehow
         | something low-level? In reality they're conceptually simple -
         | as long as your language lets you write a function from one
         | array of bytes to another array of bytes, then you can write a
         | compiler in it. And for practicalities beyond that you just
         | need basic records or objects or some other kind of structure,
         | and you can have a pleasant experience writing a compiler.
         | 
         | > Surely any file format/DSL/programming language you write
         | will be parsed server-side?
         | 
         | JavaScript can be used user-side, or anywhere else. It's just a
         | regular programming language.
        
           | dw-im-here wrote:
           | I'd rather put my hand in boiling water than develop a
           | compiler in a dynamic weak typed language.
        
             | chrisseaton wrote:
             | My experience doing both in practice is that the type
             | system helps you with things that aren't really a problem
             | anyway (a compiler doesn't really have complex data
             | structures and you don't often get these basic things
             | wrong) and all but the most sophisticated type systems
             | don't even begin to help you with things you really need
             | help with - maintaining invariants.
        
             | hutzlibu wrote:
             | Well .. I did both and oh surprise, I got a few burnings
             | ...
             | 
             | (actually more from the compiler in Javascript. Putting my
             | hands for a moment in boiling water actually does not burn
             | them)
        
             | pwdisswordfish6 wrote:
             | Write a compiler in a strongly typed language, and then
             | remove all the type annotations. This may come as a shock,
             | but this is what a compiler (or any codebase) could look
             | like when developed in a weakly typed language.
        
               | mintplant wrote:
               | That doesn't work if you're using types for anything
               | beyond correctness-checking. Type-driven dispatch, for
               | example, which tends to be used heavily in big compiler
               | and interpreter projects. And tagged unions (or algebraic
               | datatypes), a natural fit for representing ASTs, become
               | more unwieldy without type-directed features like pattern
               | matching.
        
               | pwdisswordfish6 wrote:
               | Sounds like a double standard and possibly moving the
               | goalposts. There are strongly typed languages that don't
               | have those features, and compiler codebases that don't
               | use that kind of architecture. Do they get a pass or not?
        
           | e12e wrote:
           | > I don't see the point in a javascript-based compiler
           | 
           | Typescript, sass, jsx... There are a lot of languages running
           | on top of js. Or you might want to do colorizing,
           | autoformating on input in the browser?
           | 
           | Along with all that, there's as mentioned nodejs, deno for
           | running server side.
           | 
           | But at any rate - lots of front-end problems involve various
           | kinds of parsing/validation and transformation (eg:
           | processing.js).
        
         | acarabott wrote:
         | I interned with the PI behind Ohm (Alex Warth) and one of his
         | reasons for using the browser was simple:
         | 
         | "If I send someone an executable, they will never download it.
         | If I send them a URL, they have no excuse."
        
           | BiteCode_dev wrote:
           | We are talking about a compiler here.
           | 
           | If someone interested in a compiler doesn't download it, it's
           | not a excuse, it's a filter. Or a warning sign.
        
             | coolreader18 wrote:
             | I mean it's JavaScript, I don't think it's intended for you
             | to write C compilers in it - but for compile-to-JS
             | languages, it's a real asset to be able to run it in the
             | browser, although more and more that can be done with
             | WebAssembly as well. However, look at the project listed as
             | using it - it may not even be for web languages, but just
             | projects that need to parse something.
        
             | coldtea wrote:
             | Spoken like someone who has never taught real students!
        
             | pwdisswordfish6 wrote:
             | You know all those jokes that people like Linus make about
             | Real Programmers--the ones who have hair on their chests,
             | etc--you know those are all _jokes_ , right? Jokes in the
             | laughing-at-them sort of way, the way Colbert did it--not
             | something that you're supposed to unironically buy into.
             | 
             | > If someone interested in a compiler doesn't download it,
             | it's not a excuse, it's a filter. Or a warning sign.
             | 
             | You're so invested in gatekeeping that you're confusing the
             | point of research with technofetishism.
             | 
             | Here's what Joe Armstrong had to say in "The Mess We're
             | In":
             | 
             | "I downloaded this program, and I followed the
             | instructions, and it said I didn't have grunt installed!
             | [...Then] I installed grunt, and it said grunt was
             | installed, and then I ran the script that was gonna make my
             | slides... and it said 'Unable to find local grunt'."
             | 
             | Looks like someone needs to go dig up Joe and let him know
             | that the real problem is that there was a mistake in
             | letting him get past the point where he was supposed to be
             | filtered out. He was never supposed to be playing amongst
             | the Real Programmers.
        
             | d110af5ccf wrote:
             | > doesn't download it, it's not a excuse, it's a filter
             | 
             | If it's a decently large project, sure. But if it's a small
             | project with only a couple contributors who I've never
             | heard of? There's the potential for that to be hiding
             | malicious code. Plus the potential complexity of getting a
             | project that's only ever been built on (say) 2 computers to
             | successfully compile and run on _my_ system. Plus figuring
             | out whatever build system and weird flags they happen to
             | use. And potentially wrangling a bunch of dependencies.
             | 
             | All that just to take a quick look at a language that might
             | not actually be of interest to me in the end. The browser
             | offers huge benefits here - follow a link and play around
             | in a text box. It _just works_. (This is also why I use
             | Godbolt - I don 't want to bother with a Windows VM or
             | wrangle ten different versions of Clang and GCC myself.)
        
         | kesava wrote:
         | A ton of front end templating languages/frameworks. They
         | involve compilers to different degrees, don't they?
        
         | TheRealPomax wrote:
         | If your ecosystem is JS, having a JS based compiler is pretty
         | convenient. As long as it's just "slower by some constant",
         | rather than by a runtime order, the fact that it's not as fast
         | as yacc/bison etc. is pretty much irrelevant, so being able to
         | keep everything JS is quite powerful for people new to the idea
         | having started their programming career using JS, as well as
         | seasoned devs working in large JS codebases.
         | 
         | (and you can always decide that you need more speed - if you
         | have a grammar defined, it's almost trivial to feed it to some
         | other parser-generator)
        
         | [deleted]
        
         | peterhunt wrote:
         | There's definitely a use for js based parsing for tooling that
         | runs in the browser (autocomplete, documentation browsing etc).
         | Integration with the Monaco editor is a common use case.
        
       | TheRealPomax wrote:
       | It'd be cool if the online editor dispensed with the need to
       | "write the grammar" entirely. A node based parser-generator in
       | addition to Ohm being yet another grammar based parser-generator
       | would be pretty great.
        
         | ampdepolymerase wrote:
         | Even better would be to generate parser from examples. See the
         | Microsoft Research Excel Flash Fill paper.
        
       | joshmarinacci wrote:
       | I'm so happy to see this on HN. I've used Ohm for several
       | projects. If you want a tutorial for building a simple
       | programming language using Ohm, check out this series I put on
       | GitHub.
       | 
       | https://github.com/joshmarinacci/meowlang
        
       | j0e1 wrote:
       | This is an example of a library we built using Ohm:
       | https://github.com/Bridgeconn/usfm-grammar [1]
       | 
       | It works great for our use-case though I have been eyeing tree-
       | sitter[2] for its ability to do partial parses.
       | 
       | [1] USFM: https://ubsicap.github.io/usfm/ [2] https://tree-
       | sitter.github.io/tree-sitter/
        
       | branneman wrote:
       | When should one use Ohm over Racket?
        
         | coldtea wrote:
         | When they want a library and toolkit for building parsers and
         | languages, rather than a general programming language based on
         | Scheme.
        
           | branneman wrote:
           | ... but racket basically exists to create parsers and
           | languages. It happens to also be a general programming
           | language. But so is JS nowadays with Node.
        
           | dunefox wrote:
           | So, I guess you don't know why OP specifically asked about
           | Racket: https://www.cs.utah.edu/plt/dagstuhl19/
           | https://beautifulracket.com/stacker/why-make-languages.html
        
             | coldtea wrote:
             | Nah, I know about Racket's DSL support and touting itself
             | as friengly to language writing, but it's still not the
             | same as a dedicated parsing toolkit, the same way I
             | wouldn't consider a Lisp with reader macros equivalent
             | either...
        
       | hardwaregeek wrote:
       | I've used PEGs in the past. They're nice since they combine the
       | mental model of LL grammars with the automation of LALR parser
       | generators. However, it is quite easy to accidentally write rules
       | where you never parse the second rule due to the ordering
       | priority for rules. For instance:                   ident ::=
       | name | name ("." name)+
       | 
       | Because with PEGs, the parser tries the first rule, then the
       | second, and because whenever the second rule matches, the first
       | one will also match, we will never parse the second rule. That's
       | kinda annoying.
       | 
       | Of course with PEG tools you could probably solve this by
       | computing the first sets for both rules and noticing that they're
       | the same. Hopefully that's what this tool does.
        
         | sleavey wrote:
         | This is what's called left-recursion, and there's indeed a way
         | to deal with it in PEG parsers:
         | https://github.com/PhilippeSigaud/Pegged/wiki/Left-Recursion.
        
       ___________________________________________________________________
       (page generated 2021-03-27 23:00 UTC)