[HN Gopher] Ohm - A library and language for building parsers, i...
___________________________________________________________________
Ohm - A library and language for building parsers, interpreters,
compilers, etc.
Author : testing_1_2_3_4
Score : 210 points
Date : 2021-03-27 16:20 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| crazypython wrote:
| This title is misleading. It's a library and language for
| building parsers. Full stop. Parsing toolkit, as they say
| themselves.
| exdsq wrote:
| The title copies the second sentence of their readme:
|
| > You can use it to parse custom file formats or quickly build
| parsers, interpreters, and compilers for programming languages.
| UncleMeat wrote:
| I guess it depends on what it means to somebody to build a
| compiler. Something like yacc says "compiler compiler" in the
| name but really it is a parser generator. The hard part of
| industrial compilers is the optimization.
| f430 wrote:
| If I want to modify GraphQL to support custom syntax, would Ohm
| work? Or does a solution exist already for my needs?
| fjfaase wrote:
| I recently wrote a similar parser, maybe less fancy, for a
| workshop on parsing. It does display the the abstract syntax tree
| and also as a build evaluator for a limited set of language
| constructs.
| https://fransfaase.github.io/ParserWorkshop/Online_inter_par...
| It is based on a parser I implemented in C++.
| rkagerer wrote:
| OHM is also the acronym for Open Hardware Monitor, a great open-
| source project for monitoring computer temperatures, fan speeds,
| voltages, etc: https://openhardwaremonitor.org/
| pjmlp wrote:
| Love it, this is great for teaching purposes.
| tobr wrote:
| Speaking of - what's the status of HARC? Is it defunct?
| azeirah wrote:
| Yep, HARC is no more. I don't recall the exact history but iirc
| SAP withdrew its funding and HARC basically ceased to exist.
|
| Now, ohm survives as an open-source project, Bret Victor
| continues work with Dynamicland and Vi Hart is currently
| employed at Microsoft Research.
| jagger27 wrote:
| Defunct enough to let their TLS cert expire.
| corysama wrote:
| This is a follow-up to a major component of the
| http://vpri.org/writings.php project that created an self-
| contained office suite, OS and compiler suite in something like
| 100-200k lines of code without external dependencies.
| hobo_mark wrote:
| Do you have a link to the project? I'm failing to find it on
| that page.
| beagle3 wrote:
| Not op, and can't google now but the project was called
| STEPS, they did a down-to-metal os including network and GUI
| (and mote) in 20k lines.
|
| Don't remember anything about office suite. Related names I
| remember are Alan Kay, Dan Amelang, Alessandro Wirth and Ian
| Piumarta.
| elgertam wrote:
| The biggest artifact from STEPS was Frank, which was at the
| time bootstrapped using Squeak Smalltalk and included the
| work from Ian Piumarta (IDST/Maru, which was a fully
| bootstrapped LISP down to the metal), Dan Amelang (Nile,
| the graphics language, and Gezira, the 2.5D graphics
| library implemented in Nile, which both depended on Maru),
| Alex Warth (OMeta, which had some sort of relationship to
| Ian's work on Maru), Yoshiki Ohshima (a lot of the
| experimental things from Alan's demos of Frank were made by
| Yoshiki) and then several other names. I got close to
| getting Frank working, but honestly, I'm not sure it's
| worth it at this point. A lot of the work is 10-15 years
| old, and the last time I dove in, I ran into issues running
| 32-bit binaries. The individual components are more
| interesting and could be packaged together in some other
| way.
|
| Since it was a research project, STEPS never quite achieved
| a cohesive, unified experience, but they proved that the
| individual components could be substantially minimized and
| the cost of developing them amortized over a large project
| like a full GUI environment. Nile and some of the
| applications of Maru, like a minimal but functioning TCP/IP
| stack that can be compiled to bare metal by virtue of being
| made in Maru, still fascinate me.
|
| Work on Maru is ongoing, albeit run by a community (with
| some input from Ian), Nile has been somewhat reborn of
| late, Ohm is again under active development as the
| successor to OMeta and Alan is still around.
|
| (Source: Dan is a friend and colleague, and I've met a few
| of the STEPS/VPRI people that way.)
| xkriva11 wrote:
| Can you publish what you have collected?
| renox wrote:
| The 'Word' equivalent was called Frank but AFAIK nobody has
| been able to reproduce what was demonstrated..
|
| Quite painfully ironic for a software research project that
| they didn't use properly a VCS..
| elgertam wrote:
| They did use VCS, actually, but a lot of them used SVN
| and each person in the STEPS project was hosting their
| own code. Most of those servers have gone dark now,
| though you can find random ports over to GitHub (rarely
| with the version history). As far as I can tell, Dan
| Amelang and Alex Warth were the only two who used git or
| moved their code over to git.
| hobo_mark wrote:
| Thank you, funnily enough this lead me back to the orange
| website:
|
| "STEPS Toward the Reinvention of Programming, 2012 Final
| Report Submitted to the National Science Foundation (NSF)
| October 2012"
|
| https://news.ycombinator.com/item?id=11686325
| e12e wrote:
| See:
|
| https://en.m.wikipedia.org/wiki/Ometa (including reference
| section)
|
| Or go to: http://www.vpri.org/writings.php
|
| If I recall correctly you want: "STEPS Toward the Reinvention
| of Programming, 2012 Final Report Submitted to the National
| Science Foundation (NSF) October 2012" (and earlier reports)
|
| Discussed on hn:
| https://news.ycombinator.com/item?id=11686325
|
| And: https://news.ycombinator.com/item?id=585360
|
| Notable for implementing tcp/ip by parsing the rfc.
|
| "A Tiny TCP/IP Using Non-deterministic Parsing Principal
| Researcher: Ian Piumarta
|
| For many reasons this has been on our list as a prime target
| for extreme reduction. (...) See Appendix E for a more
| complete explanation of how this "Tiny TCP" was realized in
| well under 200 lines of code, including the definitions of
| the languages for decoding header format and for controlling
| the flow of packets."
|
| (...)
|
| "Appendix E: Extended Example: A Tiny TCP/IP Done as a Parser
| (by Ian Piumarta) Elevating syntax to a 'first-class citizen'
| of the programmer's toolset suggests some unusually expres-
| sive alternatives to complex, repetitive, opaque and/or
| error-prone code. Network protocols are a per- fect example
| of the clumsiness of traditional programming languages
| obfuscating the simplicity of the protocols and the internal
| structure of the packets they exchange. We thought it would
| be instructive to see just how transparent we could make a
| simple TCP/IP implementation. Our first task is to describe
| the format of network packets. Perfectly good descriptions
| already exist in the various IETF Requests For Comments
| (RFCs) in the form of "ASCII-art diagrams". This form was
| probably chosen because the structure of a packet is
| immediately obvious just from glancing at the pictogram. For
| example: +-------------+-------------+-------
| ------------------+----------+-------------------------------
| ---------+ | 00 01 02 03 | 04 05 06 07 | 08 09 10 11 12
| 13 14 15 | 16 17 18 | 19 20 21 22 23 24 25 26 27 28 29 30 31
| | +-------------+-------------+------------------------
| -+----------+----------------------------------------+
| | version | headerSize | typeOfService |
| length | +-------------+--------
| -----+-------------------------+----------+------------------
| ----------------------+ |
| identification | flags |
| offset | +---------------------------+--
| -----------------------+----------+--------------------------
| --------------+ | timeToLive |
| protocol | checksum
| | +---------------------------+------------------------
| -+---------------------------------------------------+
| | sourceAddress
| | +----------------------------------------------------
| -----------------------------------------------------+
| |
| destinationAddress |
| +------------------------------------------------------------
| ---------------------------------------------+
|
| If we teach our programming language to recognize pictograms
| as definitions of accessors for bit fields within structures,
| our program is the clearest of its own meaning. The following
| expression cre- ates an IS grammar that describes ASCII art
| diagrams."
| gklitt wrote:
| Ohm's key selling point for me is the visual editor environment,
| which shows how the parser is executing on various sample inputs
| as you modify the grammar. It makes writing parsers fun rather
| than tedious. One of the best applications of "live programming"
| I've seen.
|
| https://ohmlang.github.io/editor/
| Waterluvian wrote:
| A lot of regex testers do this and I can't imagine writing a
| regex or a parser without.
| tovej wrote:
| Compiler compilers are great, I love writing DSLs for my
| projects. I usually use yacc/lex, or write my own compiler
| (typically in go these days).
|
| However (and this is just me talking), I don't see the point in a
| javascript-based compiler. Surely any file format/DSL/programming
| language you write will be parsed server-side?
| branneman wrote:
| In that case, way I ask why you are not a Racket user? Sounds
| like it'll save you a ton of time and keep your implementations
| high level.
| hansvm wrote:
| (also just me talking -- here are some potential counterpoints)
|
| The choice of language often matters a lot less than how
| familiar you are with it (and its ecosystem(s)). I think it's
| totally reasonable to want to use JS for a compiler in, e.g., a
| Node project if for no other reason than to not have to learn
| too many extra things at once to be productive with the new
| tool.
|
| I also don't think it's fair to assume everything will be
| parsed, tokenized, etc server-side. Even assuming that data
| originates server-side (since if it didn't you very well might
| have a compelling case for handling it client-side if for no
| other reason than latency), it's moderately popular nowadays to
| serve a basically static site describing a bunch of dynamic
| things for the frontend to do. Doing so can make it
| easier/cheaper to hit any given SLA at the cost of making your
| site unusable for underpowered clients and pushing those costs
| to your users, and that tradeoff isn't suitable everywhere, but
| it does exist.
|
| It's interesting that you seem to implicitly assume the only
| reason somebody would choose JS is that they're writing
| frontend code. It's personally not my first choice for most
| things, but it's not too hard to imagine that some aspect of JS
| (e.g., npm) might make it a top contender for a particular
| project despite its other flaws and tradeoffs.
| RodgerTheGreat wrote:
| There's a great deal of value to making programming
| environments available in a browser, especially in the context
| of creative coding and education. I have built and used many
| such tools which are purely client-side.
|
| There is a world of difference in accessibility between a tool
| that requires installation and a tool that you can use by
| following a hyperlink.
| breck wrote:
| > I don't see the point in a javascript-based compiler
|
| My CC is Javascript based (well it was initially, then
| TypeScript, now a lot of it is written in itself).
|
| 99% of the time I use the actual languages I make in it server
| side (nodejs), but I am able to develop the languages in my
| browser using https://jtree.treenotation.org/designer/. It's
| super easy and fun (at least for me, UX sucks for most people
| at the moment). There's something somewhat magical about being
| able to tweak a language from my iPhone and then send the new
| lang to someone via text. (Warning: Designer is still hard to
| use and a big refresh is overdue).
| coldtea wrote:
| > _Surely any file format /DSL/programming language you write
| will be parsed server-side?_
|
| Well, Javascript has been used for over a decade heavily on the
| server side, with Node, WASM and other projects.
|
| And as far as raw speed goes, something like v8 smokes all
| scripting languages bar maybe LuaJit.
|
| So, there's that...
| chrisseaton wrote:
| > I don't see the point in a javascript-based compiler
|
| JavaScript is a full programming language. Why wouldn't it be a
| fine choice to write a compiler in? People have a funny idea
| that compilers are more complex software or are somehow
| something low-level? In reality they're conceptually simple -
| as long as your language lets you write a function from one
| array of bytes to another array of bytes, then you can write a
| compiler in it. And for practicalities beyond that you just
| need basic records or objects or some other kind of structure,
| and you can have a pleasant experience writing a compiler.
|
| > Surely any file format/DSL/programming language you write
| will be parsed server-side?
|
| JavaScript can be used user-side, or anywhere else. It's just a
| regular programming language.
| dw-im-here wrote:
| I'd rather put my hand in boiling water than develop a
| compiler in a dynamic weak typed language.
| chrisseaton wrote:
| My experience doing both in practice is that the type
| system helps you with things that aren't really a problem
| anyway (a compiler doesn't really have complex data
| structures and you don't often get these basic things
| wrong) and all but the most sophisticated type systems
| don't even begin to help you with things you really need
| help with - maintaining invariants.
| hutzlibu wrote:
| Well .. I did both and oh surprise, I got a few burnings
| ...
|
| (actually more from the compiler in Javascript. Putting my
| hands for a moment in boiling water actually does not burn
| them)
| pwdisswordfish6 wrote:
| Write a compiler in a strongly typed language, and then
| remove all the type annotations. This may come as a shock,
| but this is what a compiler (or any codebase) could look
| like when developed in a weakly typed language.
| mintplant wrote:
| That doesn't work if you're using types for anything
| beyond correctness-checking. Type-driven dispatch, for
| example, which tends to be used heavily in big compiler
| and interpreter projects. And tagged unions (or algebraic
| datatypes), a natural fit for representing ASTs, become
| more unwieldy without type-directed features like pattern
| matching.
| pwdisswordfish6 wrote:
| Sounds like a double standard and possibly moving the
| goalposts. There are strongly typed languages that don't
| have those features, and compiler codebases that don't
| use that kind of architecture. Do they get a pass or not?
| e12e wrote:
| > I don't see the point in a javascript-based compiler
|
| Typescript, sass, jsx... There are a lot of languages running
| on top of js. Or you might want to do colorizing,
| autoformating on input in the browser?
|
| Along with all that, there's as mentioned nodejs, deno for
| running server side.
|
| But at any rate - lots of front-end problems involve various
| kinds of parsing/validation and transformation (eg:
| processing.js).
| acarabott wrote:
| I interned with the PI behind Ohm (Alex Warth) and one of his
| reasons for using the browser was simple:
|
| "If I send someone an executable, they will never download it.
| If I send them a URL, they have no excuse."
| BiteCode_dev wrote:
| We are talking about a compiler here.
|
| If someone interested in a compiler doesn't download it, it's
| not a excuse, it's a filter. Or a warning sign.
| coolreader18 wrote:
| I mean it's JavaScript, I don't think it's intended for you
| to write C compilers in it - but for compile-to-JS
| languages, it's a real asset to be able to run it in the
| browser, although more and more that can be done with
| WebAssembly as well. However, look at the project listed as
| using it - it may not even be for web languages, but just
| projects that need to parse something.
| coldtea wrote:
| Spoken like someone who has never taught real students!
| pwdisswordfish6 wrote:
| You know all those jokes that people like Linus make about
| Real Programmers--the ones who have hair on their chests,
| etc--you know those are all _jokes_ , right? Jokes in the
| laughing-at-them sort of way, the way Colbert did it--not
| something that you're supposed to unironically buy into.
|
| > If someone interested in a compiler doesn't download it,
| it's not a excuse, it's a filter. Or a warning sign.
|
| You're so invested in gatekeeping that you're confusing the
| point of research with technofetishism.
|
| Here's what Joe Armstrong had to say in "The Mess We're
| In":
|
| "I downloaded this program, and I followed the
| instructions, and it said I didn't have grunt installed!
| [...Then] I installed grunt, and it said grunt was
| installed, and then I ran the script that was gonna make my
| slides... and it said 'Unable to find local grunt'."
|
| Looks like someone needs to go dig up Joe and let him know
| that the real problem is that there was a mistake in
| letting him get past the point where he was supposed to be
| filtered out. He was never supposed to be playing amongst
| the Real Programmers.
| d110af5ccf wrote:
| > doesn't download it, it's not a excuse, it's a filter
|
| If it's a decently large project, sure. But if it's a small
| project with only a couple contributors who I've never
| heard of? There's the potential for that to be hiding
| malicious code. Plus the potential complexity of getting a
| project that's only ever been built on (say) 2 computers to
| successfully compile and run on _my_ system. Plus figuring
| out whatever build system and weird flags they happen to
| use. And potentially wrangling a bunch of dependencies.
|
| All that just to take a quick look at a language that might
| not actually be of interest to me in the end. The browser
| offers huge benefits here - follow a link and play around
| in a text box. It _just works_. (This is also why I use
| Godbolt - I don 't want to bother with a Windows VM or
| wrangle ten different versions of Clang and GCC myself.)
| kesava wrote:
| A ton of front end templating languages/frameworks. They
| involve compilers to different degrees, don't they?
| TheRealPomax wrote:
| If your ecosystem is JS, having a JS based compiler is pretty
| convenient. As long as it's just "slower by some constant",
| rather than by a runtime order, the fact that it's not as fast
| as yacc/bison etc. is pretty much irrelevant, so being able to
| keep everything JS is quite powerful for people new to the idea
| having started their programming career using JS, as well as
| seasoned devs working in large JS codebases.
|
| (and you can always decide that you need more speed - if you
| have a grammar defined, it's almost trivial to feed it to some
| other parser-generator)
| [deleted]
| peterhunt wrote:
| There's definitely a use for js based parsing for tooling that
| runs in the browser (autocomplete, documentation browsing etc).
| Integration with the Monaco editor is a common use case.
| TheRealPomax wrote:
| It'd be cool if the online editor dispensed with the need to
| "write the grammar" entirely. A node based parser-generator in
| addition to Ohm being yet another grammar based parser-generator
| would be pretty great.
| ampdepolymerase wrote:
| Even better would be to generate parser from examples. See the
| Microsoft Research Excel Flash Fill paper.
| joshmarinacci wrote:
| I'm so happy to see this on HN. I've used Ohm for several
| projects. If you want a tutorial for building a simple
| programming language using Ohm, check out this series I put on
| GitHub.
|
| https://github.com/joshmarinacci/meowlang
| j0e1 wrote:
| This is an example of a library we built using Ohm:
| https://github.com/Bridgeconn/usfm-grammar [1]
|
| It works great for our use-case though I have been eyeing tree-
| sitter[2] for its ability to do partial parses.
|
| [1] USFM: https://ubsicap.github.io/usfm/ [2] https://tree-
| sitter.github.io/tree-sitter/
| branneman wrote:
| When should one use Ohm over Racket?
| coldtea wrote:
| When they want a library and toolkit for building parsers and
| languages, rather than a general programming language based on
| Scheme.
| branneman wrote:
| ... but racket basically exists to create parsers and
| languages. It happens to also be a general programming
| language. But so is JS nowadays with Node.
| dunefox wrote:
| So, I guess you don't know why OP specifically asked about
| Racket: https://www.cs.utah.edu/plt/dagstuhl19/
| https://beautifulracket.com/stacker/why-make-languages.html
| coldtea wrote:
| Nah, I know about Racket's DSL support and touting itself
| as friengly to language writing, but it's still not the
| same as a dedicated parsing toolkit, the same way I
| wouldn't consider a Lisp with reader macros equivalent
| either...
| hardwaregeek wrote:
| I've used PEGs in the past. They're nice since they combine the
| mental model of LL grammars with the automation of LALR parser
| generators. However, it is quite easy to accidentally write rules
| where you never parse the second rule due to the ordering
| priority for rules. For instance: ident ::=
| name | name ("." name)+
|
| Because with PEGs, the parser tries the first rule, then the
| second, and because whenever the second rule matches, the first
| one will also match, we will never parse the second rule. That's
| kinda annoying.
|
| Of course with PEG tools you could probably solve this by
| computing the first sets for both rules and noticing that they're
| the same. Hopefully that's what this tool does.
| sleavey wrote:
| This is what's called left-recursion, and there's indeed a way
| to deal with it in PEG parsers:
| https://github.com/PhilippeSigaud/Pegged/wiki/Left-Recursion.
___________________________________________________________________
(page generated 2021-03-27 23:00 UTC)