[HN Gopher] Design Guidelines for Domain Specific Languages (2014)
___________________________________________________________________
Design Guidelines for Domain Specific Languages (2014)
Author : lr0
Score : 123 points
Date : 2023-11-09 02:37 UTC (20 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| valenterry wrote:
| Unfortunately, they don't mention the most easy way of defining a
| DSL: by doing it in the host language itself. Not all programming
| languages are capable of allowing that, but some do, and in that
| case it is by far the most easy and safe solution.
|
| It is still much much easier than combining existing language
| like adding OCL into another language, as suggest in the paper;
| and you get compiler/IDE support, tooling support and everything
| else from the "host language" out of the box.
| MrJohz wrote:
| I'd go a step further, and suggest that most languages are able
| to embed some level of DSL, although you might not be able to
| reach quite the expressiveness that you had in mind.
|
| Generally a combination of block closures and being able to
| chain methods together is usually enough to create something
| reasonably expressive, although even then there are
| alternatives - I've seen DSLs in Python that use the class
| syntax and metaclasses to define quite complex this. You won't
| be generating your own syntax or anything, but you can often
| get pretty far.
| victorNicollet wrote:
| This is definitely the best idea for "drop down" DSLs (by drop
| down, I mean that you are working in a general-purpose language
| and then, for one specific situation, you drop down into the
| DSL because it is more productive).
|
| I think the article is more oriented towards standalone DSLs
| (where the entry point is the DSL itself), as those get actual
| maintenance benefits from not being able to invoke arbitrary
| code in the host language.
| valenterry wrote:
| Yes. My point is just that I would judge that in 90% of the
| cases, "drop down" DSLs are the best solution for the problem
| at hand. So when talking about DSLs they should be mentioned,
| or at least the term DSL should be defined accordingly, which
| did not happen in the paper.
| chriswarbo wrote:
| Alan Kay describes OOP as defining interactions with/between
| values via mini languages/algebras. In a language like
| Smalltalk (using whitespace and mixfix syntax) this very much
| feels like crafting a lightweight/hosted DSL. Other languages
| require a lot more squinting, to get past the host language's
| punctuation and ceremony.
|
| For example, compare something like the following:
| canvas drawA: circle at: topLeft in: red.
| canvas.draw(circle, at=topLeft, in=red);
| ljm wrote:
| I think the discoverability of Smalltalk code within the IDE
| lends nicely to that too. It'd be quite obscure to many these
| days, but there's something to be said about the ability to
| browse and experiment with a catalogue of classes like a
| graphical REPL on steroids.
| Too wrote:
| Doesn't necessarily have to be the host language. Just _any_
| other existing language.
|
| If it's static and declarative, just use yaml. (Yes it sucks,
| but you get it point)
|
| If it's imperative, just use the host language.
|
| Then you have some hybrid solutions. HCL allowing rich
| expression in an otherwise very declarative language. A big
| list of constant dataclasses in python is another quick way to
| define complex data structures without having to reinvent
| anything. This even gives you type safety for free.
|
| Heck, MongoDB managed to build big business on a database whose
| only query language is super verbose json. This isn't just
| negative, because by leveraging existing language, a lot of
| tooling around is ready to plug in on day one.
| victorNicollet wrote:
| A few strong opinions, a lot stronger than they were in my 2017
| talk https://youtu.be/_gmMJwjg0Pk, surely that must be the
| experience speaking :)
|
| - Creating a DSL is a commitment. It is never done, it's either
| in development or abandoned. Plan for this, both in terms of team
| size and knowledge management.
|
| - Don't allow escape hatches or ways to access arbitrary
| functionality in the host language. When (not if) these are used,
| they ruin backwards compatibility and make maintenance of DSL
| code a nightmare.
|
| - Don't use a PEG (parsing expression grammar), they are terrible
| for backwards compatibility. Hand-written parsers are even worse.
| Using an LR parser generator is worth the time investment.
|
| - Support static analysis, or at the very least don't make it
| impossible to implement later. It improves productivity, and
| every error message is an opportunity to teach the DSL to the
| users.
|
| - You need a language server. The bare minimum is: highlighting,
| auto-completion, hints on hover, find references, rename.
|
| - It should be trivial to reproduce a production run on a
| development machine. This means the language should be
| deterministic, and the production context should be preserved for
| later reuse, and the development machine should be safe for
| accessing production data.
|
| - Thousands of unit tests, thousands of regression tests.
|
| It's fine to break these rules, if you can live with the
| consequences.
|
| And here's a brand new one from this year: use RAG to have
| ChatGPT "read" your documentation, and ask it to produce scripts
| for you. When the script doesn't compile, feed the error message
| back to ChatGPT and try again until you get a working script.
| Think deeply about why ChatGPT made some mistakes, as it's often
| the case that humans would have the same misunderstandings.
| Improve your documentation, error messages or syntax to eliminate
| them, and then try again to see if it works. We have been doing
| this for 6 months now, and there have already been some
| significant improvements that are only obvious in hindsight. And
| we've now opened a PhD position to investigate whether we can
| have a separate "LLM-friendly" syntax for the language, dedicated
| to making code generation easier.
| avindroth wrote:
| Hey Victor, these are amazing tips. I am currently building a
| DSL for LLMs, and would love your feedback/consulting. I could
| not find your email online.
|
| Please feel free to reply to this or reach me here:
| @eating_entropy on X joshcho@stanford.edu
|
| Would love advisement from someone who has thought about DSLs
| extensively!
| danielvaughn wrote:
| I'm in the midst of creating a DSL (it's my first try) and this
| sounds like really good advice. I'm not clear on the third item
| though - by LR parser generator are you talking about something
| like ANTLR?
| victorNicollet wrote:
| Yes, something like ANTLR for Java, or the entire list here:
| https://en.wikipedia.org/wiki/Comparison_of_parser_generator.
| ..
|
| I've had good experiences with Menhir (for OCaml) and Tree-
| sitter, and implemented my own SLR parser generator for C#
| https://github.com/Lokad/Parsing
|
| In the end, what matters is that they should be able to
| report conflicts and ambiguities.
| _a_a_a_ wrote:
| Are you saying PEGs can't conflicts and ambiguities? I
| didn't know that.
| victorNicollet wrote:
| Not having ambiguities is actually the main selling point
| of PEGs. If you have two rules A and B that can both
| match the input, then a CFG A|B has an ambiguity (two
| possible derivations), but a PEG A/B explicitly says that
| the A derivation is chosen. The good part is that unlike
| a CFG, the PEG doesn't require you to go and fix anything
| (the / operator already did that for you). This makes the
| initial implementation of the grammar easier.
|
| On the other hand, if you already have code in the wild
| that uses the old grammar G1, and in order to add new
| features, you introduce a new grammar G2 that is a
| superset of G1. You need to know if any of the existing
| code has a derivation in G2 that is different from its
| derivation in G1 (as that would cause backwards
| incompatibility). With a PEG, there's no way to tell, so
| you have to check this by hand (and mistakes are easy).
| With a CFG, you know that backwards incompatibility
| happens if and only if G2 has conflicts, and those
| conflicts are precisely the cases that are not backwards
| compatible.
| _a_a_a_ wrote:
| Excellent answer, thanks
| ulrikrasmussen wrote:
| A PEG is not actually a generative grammar but a domain-
| specific language for specifying top-down parsers. So
| they are free of conflicts and ambiguities by definition
| of their semantics.
|
| PEG is actually just syntactic sugar on top of (G)TDPL:
| https://en.wikipedia.org/wiki/Top-down_parsing_language
| valenterry wrote:
| What host language are you using?
| danielvaughn wrote:
| To be perfectly honest, I'm not sure what that means. I'm
| new to language design. I can describe what I'm trying to
| do.
|
| I'm creating a UI description language for designers. The
| goal is to let them use terminology and mental models
| familiar to them to describe their designs in code, instead
| of a visual tool like Figma.
|
| A trivial example would look something like this:
| component Button { elements shape btn-
| container text btn-label style
| btn-container fill-color: blue style
| btn-label content: Click Me text-color:
| white }
|
| The output of the language would be an intermediate
| representation, I'm imagining a JSON object or something
| with a very specific schema. This can then be transpiled
| into any format the developer wants - you could build a
| transpilation target for React, Vue, Svelte, plain old
| html/css, etc etc.
|
| So I'm in a weird spot where I know what I want to make,
| but I don't know any of the conventions or tools common in
| the language design world, because I'm just stepping into
| it.
| victorNicollet wrote:
| The host language is the language that you are using to
| write the compiler/interpreter. I suppose the grandparent
| is asking the question in order to recommend a parser
| generator for your use case. From your GitHub profile, I
| assume it's JS or TypeScript ?
| danielvaughn wrote:
| Ah I see, thanks. Probably Typescript, although I'm
| considering Rust as well. I'm using tree-sitter for the
| lexer.
| no_wizard wrote:
| you might want to look at Rascal[0]
|
| [0]: https://www.rascal-mpl.org/
| danielvaughn wrote:
| Nice, this looks interesting. I was building my lexer in
| tree-sitter but not sure what to use after that. I'll check
| this out, thank you.
| FrustratedMonky wrote:
| Is an LR better than a parser/combinator. Seems like there are
| pluses to rolling your own parser/combinator over the time to
| integrate with a yacc or other library.
| victorNicollet wrote:
| I am a bit biased (in the sense that I wrote my own SLR
| parser generator twice, in C# and F#!), but the benefit of LR
| is that you can know whether your CFG contains conflicts,
| which is a game changer for preserving backwards
| compatibility when you change the syntax later during the
| life of the DSL.
|
| But if a parser combinator library may support converting the
| resulting combined parser into an eBNF grammar, and check
| whether that grammar contains conflicts.
| marcosdumay wrote:
| Hum... A grammar conflict on parser combinators always
| require that you step down from the parser abstraction and
| resolve it by hand with basic language functionality,
| doesn't it? It's something quite hard to notice.
|
| Are you concerned with the documentation getting out of
| sync with the actual language?
| victorNicollet wrote:
| In a situation where I'm adding a new feature to the
| language, with a new syntax that causes me to add a new
| rule to the grammar, I'm concerned that the new rule will
| accidentally "capture" some code that already exists in
| the wild, that was previously derived by another part of
| the grammar.
|
| To give a very ugly example, if you have a language with
| function calls f(expr, expr, expr) and you want to add
| tuple syntax to expressions with a brand new rule:
| expr := expr COMMA expr
|
| Then you might have accidentally turned all functions
| into unary functions, as the tuple rule captures the
| "expr, expr, expr" part and leaves you with f(expr).
| FrustratedMonky wrote:
| I do love F# for parsing and for DSL's. Maybe it is
| 'functional' programming bias.
|
| IF you are using a 'functional' paradigm, you can do your
| own parser/combinator.
|
| But if you are using C#, you use something like Yacc and
| LR.?
| trealira wrote:
| > Hand-written parsers are even worse. Using an LR parser
| generator is worth the time investment.
|
| I'm curious, what makes you say this? From what I've read
| online secondhand, people tend to recommend handwritten
| recursive descent parsing and Pratt/TDOP parsing, because it's
| easier to add good error messages and later add context
| sensitive productions (e.g. the way Clang resolves class-wide
| declarations while parsing C++:
| https://eli.thegreenplace.net/2012/07/05/how-clang-
| handles-t...).
| mjul wrote:
| He explains it in the linked video presentation which is
| worth watching: the value being that your parser generator
| will complain about any ambiguity in the grammar (shift-
| reduce conflicts) rather than having that show up as a "wat?"
| at runtime.
|
| You might argue that this argument reduces to checking the
| grammar using static analysis similar to the order generator.
| But then handwriting the parser is still extra work and risk.
| civilitty wrote:
| What I've seen most nontrivial projects do (and have done
| in a commercial product myself) is to start with a parser
| generator then move to hand written parsers when UX and
| polish become a bigger priority than exploring the problem
| space.
|
| The parser generator grammar spec can then be used to
| generate random test cases and compare the output of the
| new parser with the old one to make sure they're identical.
| bloopernova wrote:
| Good lord yes on the language server. Reduce the friction
| involved with writing code.
|
| I'm frustrated with terraform-ls right now. It doesn't work
| with private module registries, reloading each module in the
| .terraform dir over and over. Which impacts performance by
| maxing out a CPU core.
|
| And yet a feature of Terraform Cloud (TFC) is that you can use
| private registries.
|
| Writing Terraform is already plagued by friction. Adding TFC
| adds to the friction. The language server then not loading
| private modules adds even more.
|
| Hopefully OpenTofu helps in that regard, but I don't think
| they've got anyone working on the language server either.
| cube2222 wrote:
| > but I don't think they've got anyone working on the
| language server either.
|
| Not yet, but it's on our radar. We're focusing on the stable
| release + registry for now (eta mid-December), but we're
| planning to work on the language server eventually.
|
| If you have any more frustrations with the language server,
| feel free to respond here, and it'll help us find avenues for
| improvements.
|
| Disclaimer: Interim Tech Lead of OpenTofu
| starcraft2wol wrote:
| This is ridiculously not practical. For lisp programmers making
| dsls is just how you write programs and you don't need a
| parser.
| Too wrote:
| One more from me
|
| - Any expression that may likely include a user-generated
| variable should have a strategy for how to inject it safely. To
| avoid sql injection class of attacks. Native templating is the
| best. If string templating is the only way out, the rules for
| escaping should be clearly defined and ideally functions for
| doing so provided by the reference implementation.
|
| Not like the geniuses at Atlassian who came up with JQL that
| refuse to document how it works, instead delegates all security
| to the user model and "don't run queries with any data that you
| didn't provide yourself".
| mejutoco wrote:
| I first saw this talk of DSLs with Ruby, where it is/was popular
| to use missing method and others to build your own "language".
|
| I do not mean it in a negative way, but at the time it seemed
| like calling a utility class DSL was excessive. It just seemed
| like regular programming, using the language constructs. It also
| was not enforcing many boundaries (there are escape hatches).
|
| Looking at all of this, would it not be easier to encode the DSL
| in something like json, and turn it into a data/format problem?
| Something like terraform. The implementation can then be in any
| language that can read json. One could imagine, for example,
| regex defined in json as a DSL.
| victorNicollet wrote:
| I would say JSON + schema, rather than just JSON. Like
| S-expressions, JSON saves you from having to implement a
| tokenizer, and a tokenizer is one of the easier parts of a DSL.
| You still need to implement something like a syntax, because
| things that are handled by the syntax now get pushed into later
| stages of the DSL. However, if you can provide a JSON schema
| for your language, then the problem is mostly solved !
|
| On the other hand, it's not enough to read JSON, you need a
| JSON parser library that provides you with the position
| (line:column) of every JSON value, so that you can emit user-
| friendly errors.
| chriswarbo wrote:
| I think _every_ language (DSL or not) should support
| translation to /from a simple data representation for its
| syntax trees (whether s-expressions, JSON, XML, etc.). Not
| necessarily _the same_ representation, but just something
| structured that existing tools can walk and transform, such
| that (a) we don 't resort to grepping through bytes (and
| manually filtering out matches that are comments or strings)
| and (b) adding a keyword to a language won't break all existing
| tooling that's unable to parse it.
|
| I wrote about this at
| http://www.chriswarbo.net/blog/2017-01-31-syntax_trees.html
| LelouBil wrote:
| I built a DSL for a Visual Novel game I am making, by using ANTLR
| for parsing and then executing it using C#.
|
| Are there generic frameworks for implementing static analysis
| that exists out there ?
|
| I did a rudimentary version of it with some simple checks, but I
| have troubles wrapping my head about checking conditional
| branches and stuff like this.
| no_wizard wrote:
| check out Rascal[0]
|
| [0]: https://www.rascal-mpl.org/
| LelouBil wrote:
| Looks interesting but I think I lack a lot of knowledge to be
| able to use it right now..
| scottfr wrote:
| I've created a few DSL's and I've regretted not closely emulating
| an existing language that users already know.
|
| When creating a new DSL you should find the language that is most
| familiar to your users. That could be Python, Javascript, or SQL.
| There is a good chance though that it is spreadsheet formulas.
|
| Whatever that language, align capabilities you add to your DSL
| with the syntax and function naming of that language only
| diverging where needed to add the custom capabilities which
| motivated the DSL.
|
| The closer you align with what your users already know, the more
| easily they will be able to adopt it. An added benefit of this
| that emerged in the past year is GPT's will be able to write your
| DSL with less prompting.
| swader999 wrote:
| I want to build a DSL for a really specific reporting use case.
| It's daunting though. Not sure if what I have in mind is a true
| DSL, perhaps just a souped up builder pattern.
___________________________________________________________________
(page generated 2023-11-09 23:02 UTC)