[HN Gopher] Syntax Design (2014)
___________________________________________________________________
Syntax Design (2014)
Author : memorable
Score : 180 points
Date : 2022-10-18 13:50 UTC (9 hours ago)
(HTM) web link (cs.lmu.edu)
(TXT) w3m dump (cs.lmu.edu)
| samsquire wrote:
| I began designing a language that handled recursion and iteration
| as relations between variables which are topologically sorted to
| determine control flow.
|
| Each function is a toplogical graph of stream functions so it is
| similar to a data flow language or reactive programming language.
| The goal is that you should express the critical insight of the
| algorithm to work out what to write and the code is not nested so
| there is very little tree structure.
|
| Algebralang is rough notes on how it would appear.
|
| https://github.com/samsquire/algebralang
|
| Example programs in the repository are binary search, btrees, a*
| algorithm.
|
| I wrote a multithreaded parallel actor interpreter in Java and it
| uses an invented assembly language which doesn't have a bytecode,
| it's just text.
|
| https://github.com/samsquire/multiversion-concurrency-contro...
|
| I like the ideas behind ani language
| https://github.com/waves281/anic
| Jtsummers wrote:
| If possible, I'd add indentation to your examples to make them
| much more readable. As it stands, it's like reading one of my
| math prof's C code (he was an old FORTRAN when it was shouting
| coder and never learned to indent): insert t
| node = recursive_deepest_first(items=t.children,item=t,
| lastRecursion=l)( if len(t.children) == 0 t.activate()
| location = reversed(t.children).find(item=i, item.value >=
| node.value) output = insert t node if
| len(t.children) > 3 { replace(t,
| Node(value=t.children/(point=middle=m) =
| m.value,children.sort(item=i,sortKey=i.value)=t) output
| = l.t } else deepest t.children.append(node) )
|
| Assuming this doesn't invalidate the program it reads more
| clearly and only takes one more line: insert t
| node = recursive_deepest_first(items=t.children, item=t,
| lastRecursion=l)( if len(t.children) == 0
| t.activate() location =
| reversed(t.children).find(item=i, item.value >= node.value)
| output = insert t node if len(t.children) > 3 {
| replace(t, Node(value=t.children/(point=middle=m) = m.value,
| children.sort(item=i, sortKey=i.value)=t) output =
| l.t } else deepest
| t.children.append(node) )
|
| That original program was hard to decipher for both lack of
| indentation and the odd line breaks, and inconsistent choice of
| space or no space after commas. Another question, why do you
| use `if ... then...` in some examples and not in others? Is
| that a user choice?
| samsquire wrote:
| Wow thanks for reading my page and looking at the examples. I
| appreciate your time.
|
| There's a few bugs in that code. Sorry for presenting
| something that obviously wasn't ready. I didn't use location
| when I insert into that position in the btree. And the node
| spliting code has an error.
|
| And thanks for reformatting the code.
|
| It's a very rough design. The critical insight over many
| algorithms is hidden in one character or line. Such as a
| strategic -1 or +1 or pattern of recursion that means it
| becomes understandable.
|
| I find when writing code the structure of the code is more
| important than the calculation or addition or subtraction.
| Which is surprising because computers are calculators. The
| structure of traversal, laying out data in memory and
| structure of the jumping around instructions in memory is
| harder than the core insight of a division, or subtraction or
| addition or append or +1 or if statement here or there.
|
| When I write recursive code I often want to refer to outer
| context of an outer recursion. So that's the meaning of the
| "deepest"
| Jtsummers wrote:
| You may be interested in things like Strand and the work on
| parallel Prologs which have a similar "let the computer
| system sort out the proper execution order". This wouldn't
| satisfy your syntax desire (Algol family) but may help
| develop your understanding of the problem domain.
|
| A discussion last year:
| https://news.ycombinator.com/item?id=26948351 (wow, 18
| months since that discussion, seemed more recent in my
| memory)
| adamnemecek wrote:
| Seems down
| https://web.archive.org/web/20221018135106/https://cs.lmu.ed...
| Arch-TK wrote:
| > Because C does not have real arrays
|
| C does have real arrays, they just get implicitly converted to
| pointers to their first element in a lot of cases (for a
| multitude of reasons in part having to do with simplifying the
| language), A[B] is defined as such so it works with normal
| pointers and arrays-converted-to-pointers in the same manner.
|
| Try using an array with sizeof, unary &, or in the form of a
| string literal used to initialise an array. In those situations
| it suddenly stops behaving like a pointer to its first element
| and definitely behaves like something which is unlike anything
| else in C (hint: it's an array).
| djedr wrote:
| Very nice little article! Learned some new terms.
|
| To anybody dabbling as I do in syntax design, who may be looking
| for an extremely minimal representation for trees (even more
| minimal than S-exprs!) I would like to introduce my little
| project called Jevko:
| https://djedr.github.io/posts/jevko-2022-02-22.html
|
| It is pure distilled treeness. Its grammar fits into one line, if
| compressed well: Jevko = *("[" Jevko "]" / "`"
| ("`" / "[" / "]") / %x0-5a / %x5c / %x5e-5f / %x61-10ffff)
|
| This took me years of syntax golfing to figure out. I think it's
| turned out pretty nice. It's complete, formally defined, with a
| few simple parsers written, except it has no users. ;D
|
| To relate back to the article, an interesting and AFAIK original
| feature of this syntax is that newlines or other whitespace are
| neither significant nor insignificant nor "possibly significant"
| in Jevko. I'd call it whitespace-agnostic. Various whitespace
| rules can be laid on top of it, producing for example a Lisp-like
| language with native multiword identifiers with spaces, e.g.:
| define [sum primes [[a][b]] accumulate [ [+]
| [0] filter [ [prime?] enumerate
| interval [[a][b]] ] ] ]
|
| here "sum primes" and "enumerate interval" are two double-word
| identifiers. It's the only right_solution to the identifierWars,
| I-tell-you!
| nathell wrote:
| I thought "what a weird name", then silently pronounced it and
| my Polish ear heard "drzewko", meaning "little tree". What a
| fitting name. :)
| djedr wrote:
| :)
|
| For a long time I couldn't find the right name that would
| express the generic nature of it.
|
| An earlier prototype was called TAO, as an acronym for Tree
| Annotation Operator (it had an extra feature called
| operators), and as a reference to the ancient Chinese
| concept, in essence nameless and by design hard to pin down
| -- this seemed to fit perfectly.
|
| However there is about 2^42 cowznofski potrzebie things
| called TAO (kind of ironic, as the original idea was that the
| Tao would be distinct from the countless named things), so it
| turned out to be a bad name after all. So I decided to find a
| more unique one and here we are.
|
| The amount of time spent thinking about this and the lengths
| I went to are better left untold.
|
| In other words, naming is hard.
| abathur wrote:
| This reminds me of Breck Yunits' Tree Notation
| (https://treenotation.org/). Both seem to have a ~totalizing
| energy. Maybe some common cause. :)
| djedr wrote:
| Indeed, it's close. Obviously mine and Breck's levels of
| appreciation for indentation/brackets are very different. ;)
| Although independent, the paths we have taken to arrive at
| these are somewhat similar (somewhere early in there are
| experiments with visual programming). As are the tools of
| thought (minimalism). We were thus taken to similar places.
|
| Before I was aware of the existence of Tree Notation I put my
| syntax online at tree-annotation.org (now defunct), so even
| naming converged. I was initially very confused myself. :D
|
| Ultimately I think that the existence of multiple
| incarnations of this idea suggests that there is (perhaps a
| very niche) need for a minimal syntax like this. Something
| like S-exps, but general-purpose. Trying to satisfy that need
| is the common cause.
|
| The way I imagine it is that it would be supported across
| programming languages, like JSON. It could be an universal
| format for (tree) structured data. There is this piece of the
| Unix philosophy which says that text streams are the
| universal interface. That's true on a certain level. On
| another level not far below binary streams are the universal
| interface. On another level not far above... there was
| nothing universal until XML. But that was overkill, so JSON
| displaced it. But that's still overkill, so...
| abathur wrote:
| I agree that it feels like multiple projects are converging
| on something that is ripe (or close).
|
| I have done some deep-digging for markup languages and came
| across more than one project in this space. (I've added
| Jevko to my list;
| https://twitter.com/abathur/status/1582492437984837632)
|
| You may have already seen it as well, but you might also
| find https://github.com/teamtreesurf/link interesting.
| znkr wrote:
| This is very beautiful, nice work. I wonder if I should use it
| for something...
| djedr wrote:
| Thank you. :)
|
| > I wonder if I should use it for something...
|
| I'd be honored!
|
| A couple of ideas:
|
| How about a simple configuration format? https://gist.github.
| com/djedr/681e0199859874b3324eaa84192c42... (I should make a
| library out of this)
|
| Or you can put it in your query strings to make them more
| humane: https://github.com/jevko/queryjevko.js
|
| Or make up a markup DSL: https://github.com/jevko/markup-
| experiments#asttohtmltable
|
| Or serialize game objects in your indie game. Or make it the
| interface of your experimental app. Or use it to shave off a
| few unnecessary characters off your data:
| https://jevko.github.io/compactness.html
|
| No parser in your favorite language? A basic one should be
| only a couple dozen lines!
| https://github.com/jevko/parsejevko.js /
| https://github.com/jevko/specifications/blob/master/spec-
| sta...
| thrtythreeforty wrote:
| I find the section on "syntactic salt" interesting:
|
| > The opposite of syntactic sugar, a feature designed to make it
| harder to write bad code. Specifically, syntactic salt is a hoop
| the programmer must jump through just to prove that he knows
| what's going on, rather than to express a program action.
|
| This is perhaps an uncharitable way to describe it, but the
| concept does ring a bell. Rust's unsafe {}, C++'s
| reinterpret_cast<>(), etc - all slightly verbose. More important
| than jumping through hoops, the verbosity helps when _reading_
| code to know that something out of the ordinary is going on.
| DonHopkins wrote:
| And then there's Perl's "syntactic syrup of ipecac".
|
| https://en.wikipedia.org/wiki/Syrup_of_ipecac
| nyanpasu64 wrote:
| What I can't stand about Rust is that the language developers
| think they know better than language users developing software.
| They stack on syntactic salt to make it more unpleasant to
| write the equivalent of correct C++ programs with aliased
| mutation or manual freeing, in situations where the idiomatic
| Rust ways are _also_ unpleasant to write (Cell, RefCell, raw
| pointers), have runtime overhead (RefCell), are busywork to
| implement in working programs (restructuring your entire
| program around an ownership tree with only ephemeral stack-
| allocated cross-linking &mut, which is only sometimes possible
| without reducing performance or increasing memory use), or are
| easy, tempting, and undefined behavior in Rust but not C++
| (casting *mut to &mut in an unsafe block in a safe function).
| athrowaway3z wrote:
| "Man tries to hammer in screw. Angry at hammer manufactures
| for not being screwdriver manufactures"
|
| ----
|
| I'll just edit this comment to clarify: Rust does a lot of
| thing really well. Non-RC manual memory management for a
| tree/graph structure is absolutely not it.
|
| As for the verbose 'unsafe' pointer/memory manipulations, i
| really don't see the issue. I've written my share and I think
| its fine to add a roadblock if you want to shoulder the
| ability to add segfaults and other issues into a codebase.
| Additionally, it usually helps that you decide to encapsulate
| it into the least number of unsafe functions, instead of
| 'doing it manually' all over a codebase.
| nyanpasu64 wrote:
| Rust is intended to be a systems language capable of
| replacing C++ in its niche, and interfacing with existing
| C++ at a fairly coarse-grained level like Firefox's
| oxidation (though cxx is trying to enable rich interop
| passing richer types than C-ABI ones), so it's trying to be
| a better screwdriver more so than a hammer. So difficulty
| expressing C++ concepts is arguably a flaw, and difficulty
| implementing all software with the same CPU/memory overhead
| as C++ (which I'd argue is the case, though some would
| disagree) is definitely a flaw.
|
| It's like creating a new screwdriver bit or handle, trying
| to convert the world to it, then attracting a legion of
| followers arguing that manufacturing flat-head screwdrivers
| should expose you to legal liability for anyone who slips
| it out of the socket and injures themselves (ignoring that
| flat-head screws existed and will continue to exist).
| Chris_Newton wrote:
| "Left-handed person tries to put screw in with screwdriver
| shaped for right-handed holding. Right-handed people are
| surprised when left-handed person decides screwdriver isn't
| for them and uses something easier instead."
|
| We saw this for years with C++ and the new-style casts. The
| principle of making casting behaviours more specific and
| clarifying the risk of using each of them was fine. In
| practice, if someone asks programmers to start writing
| verbose, syntactically awkward stuff like
| reinterpret_cast<X*>(p) instead of (X*)p and either will
| work, obviously in the real world many will choose the
| latter. Empirically, the in-your-face syntax turned out to
| be a deterrent to adopting the better tools the language
| offered and so devalued those tools for everyone.
| Rusky wrote:
| This is not an accurate characterization of the Rust language
| developers. Neither of these features were designed as
| "syntactic salt!" They are compromises, made on a time budget
| to achieve goals which were higher priority for the project-
| but the door is still open to improve them. This is a far cry
| from "knowing better than language users," which implies that
| they could have simply left that syntax out while still
| achieving their goals. (Or worse, that their goal was
| specifically to annoy people...?)
|
| For instance, they are not satisfied with the current raw
| pointer syntax either, as it interacts poorly with
| lvalue/place syntax in ways that make unsafe code harder to
| audit. There are regular proposals for how to improve things
| like `(*ptr).field` or the use of raw pointers as method
| receivers.
|
| The situation with interior mutability is similar: compile-
| time memory safety inherently requires some limitations on
| programming style, but I regularly see proposals for how to
| improve "field projection" syntax.
|
| > undefined behavior in Rust but not C++ (casting *mut to
| &mut in an unsafe block in a safe function).
|
| The question of "syntactic salt" aside, this is simply false.
| nyanpasu64 wrote:
| Casting _mut to &mut in a safe function is unsound, and UB
| if the result is used alternatingly with an earlier &mut to
| the save object (I've seen this in a library I tried
| using). In C++ casting a _ to a & is sound, and casting a *
| into a __restrict & might be unsound but restrict is so
| rarely used that it doesn't matter, whereas safe Rust
| nearly requires using &mut for mutating through a pointer.
|
| As for "compile-time memory safety inherently requires some
| limitations on programming style", I find compile-time
| lifetime safety to be a tradeoff, and often a net negative
| in not only performance but ease of programming for low-
| level code maintained by the same individual preserving a
| "theory" of the code over time (whereas I don't find
| compile-time bounds checking or thread safety to be a net
| negative to programmer experience nearly as often). And
| when I see people on crusades to stop programmers from
| writing code in unsafe languages (taking away programmers')
| ability to opt out of this tradeoff, I will stop at nothing
| to oppose these people.
| [deleted]
| zppln wrote:
| > syntactic salt
|
| I feel like this describes Rust's lifetime annotations pretty
| well too.
| epage wrote:
| Not just lifetimes but types used to do more complex
| lifetimes that are normalized by other languages like
| RefCell, Arc, etc.
| Karliss wrote:
| I disagree about lifetime annotations being syntax salt. At
| least with my interpretation of what syntax salt.
|
| From syntax perspective lifetime annotations are almost as
| short as possible assuming you want to explicitly convey this
| information at all ('a is just two symbols and one of them is
| identifier). The alternative of not specifying it at all
| comes with major tradeoffs of either in memory safety (like
| in C) or runtime performance (like most programming languages
| with dynamic memory management). In theory there is third
| option of compiler fully deducing lifetimes, but that's far
| from trivial, has it's own costs and realistically even
| further narrow down what programs the compiler considers
| valid and increasing compilation time.
|
| There are strong similarities with typing strategies. Just
| because there are programming languages with dynamic typing
| doesn't mean that explicit static typing is salt. Dynamic
| typing has performance cost, and static inferred typing has
| worse self documentation properties and slightly bigger
| compilation time cost since you can't process each function
| independently.
|
| On the other hand reinterpret_cast<Foo*>(expr) doesn't
| provide to compiler any extra information that couldn't be
| conveyed with simpler less verbose syntax like R(Foo*)expr or
| (expr as Foo*). Same with unsafe{} blocks, compiler already
| know which operations are unsafe.
| tmtvl wrote:
| I've heard it called "syntactic vinegar" instead.
|
| I wonder if there's a term for syntactic useless stuff, like
| commas in Clojure quasiquoted lists.
| cxr wrote:
| > More important [...] the verbosity helps when reading code to
| know that something out of the ordinary is going on
|
| This applies to JS as well with its strict equality check
| (triple equals). Bad practices within the NodeJS ecosystem,
| however, have led to circumstances where triple equals has been
| cargo culted as the "right" thing to do for any equality
| comparison. The consequences of this include code that is more
| verbose, is no more type safe (and often _doesn 't_ do the
| right thing for some inputs--whereas with double equals, in
| contrast, it would...), and that the appearance of triple
| equals is no longer a strong signal that there's something
| happening that's worth paying attention to.
| adamddev1 wrote:
| Can you give some examples of inputs where === _doesn 't_ do
| the right thing?
| mvf4z7 wrote:
| This reminds me of Reacts "dangerouslySetInnerHTML" prop.
|
| https://reactjs.org/docs/dom-elements.html#dangerouslysetinn...
| [deleted]
| mncharity wrote:
| Another exploration of syntax: http://rigaux.org/language-
| study/syntax-across-languages.htm...
| xaedes wrote:
| Wow, amazing! This really is a comprehensive overview.
| Basically a syntax Tafelwerk.
| hzhou321 wrote:
| The `infix` syntax is missing from the major items. Without infix
| syntax, all languages are just variations of LISP -- I guess that
| was all the article is about.
| JohnDeHope wrote:
| I enjoy this sort of "one example, multiple different lenses"
| style of discussion. It reminds me of this book... Exercises in
| Programming Style by Cristina Videira Lopes.
| tabtab wrote:
| I actually like the VB-style, but VB did it mostly wrong: if you
| start the block with X, you should always end it with "End X".
| Thus, you'd have While ... End While instead of crap like "Wend"
| and "Next" (in For...Next).
|
| It's more legible to know what block is being ended. C-style
| continually frustrates me that area. The End-X style just never
| found a nice way to wrap text for longer statements.
|
| C-style also has a problem in that there is no way to define
| arbitrary blocks: it relies too much on key-words. I'm trying to
| remedy this with "Moth" syntax:
|
| https://www.reddit.com/r/ProgrammingLanguages/comments/ky22d...
|
| It's LINQ-esque but without the bloated Lambda conventions, and
| influenced by XML in that you have a simple syntax pattern that
| can "implement" many domains' needs. It started with an attempt
| to merge the best of Lisp and C-style. (Whether it succeeded or
| not is hotly contested. I welcome other attempts.)
| f1shy wrote:
| Note that in the for case, you can have many "next" and having
| many "end" would be silly.
| Jtsummers wrote:
| > I actually like the VB-style, but VB did it mostly wrong: if
| you start the block with X, you should always end it with "End
| X". Thus, you'd have While ... End While instead of crap like
| "Wend" and "Next" (For...).
|
| That's covered in their syntactic salt section, but Ada does,
| mostly, what you describe. procedure Proc(...)
| is -- vars, types, and subprograms defined here
| begin ... end Proc; for I in
| Some_Array'Range loop ... end loop; if
| condition then ... end if;
| mojifwisi wrote:
| > C-style also has a problem in that there is no way to define
| arbitrary blocks
|
| I might have misunderstood what you mean by "arbitrary blocks",
| but you can definitely do this in C: int main()
| { { /* arbitrary block */ return 0;
| } }
| Jtsummers wrote:
| They seem to mean a computational block (or a closure/lambda)
| that can be passed on to other functions. Try to do this in C
| (it's invalid as presented, but this is the concept):
| int main() { int* collection = ... int*
| filtered = filter(collection, int func(int item) { return
| item > 10; }); ... }
|
| You have to actually define a function at the top level in
| order to pass that in and there is no notion of closures so
| you can't do the more useful thing that you might have in
| even C++ these days: int main() {
| int* collection = ... int limit = ... auto
| filter = [&limit](auto item) { return item > limit; };
| int* filtered = filter(collection, filter); // assuming
| `filter` is defined ... }
|
| You can come close, but you create a lot of extra bookkeeping
| in your C program to pull it off, and the functions are still
| only defined at the top-level.
| wodow wrote:
| I really like the term "Sugary Functional Style" -- sweetening
| pure functional programming with faux procedurality.
|
| Looks like it's a (three word) Google Whack at the time of
| writing:
| https://www.google.com/search?hl=en&q=%22Sugary%20Functional...
| bhauer wrote:
| It's awesome to see an article from Dr. Ray Toal on HN this
| morning! In my biased opinion, the excellent tenured professors
| at LMU's CS program make it a stand out for its size.
| kragen wrote:
| Another interesting weird syntax I ran across a few years ago is
| OGDL: https://ogdl.org/
|
| It's sort of an alternative to S-expressions with much less
| punctuation, but the data model is slightly different -- in
| S-expressions you label the leaves, and in OGDL you label the
| nodes. In other contexts these node-labeled trees are sometimes
| called "rose trees"; they are the basic data model of, for
| example, Prolog. Labeling nodes is almost equivalent to labeling
| arcs, but OGDL does support multiple references, so not quite.
|
| The OGDL proposal was intended for data, like XML, not programs.
| They started out by trying to simplify YAML, which has arrays and
| dicts, and they simplified it by unifying them into a single
| structure.
|
| Here's one of their examples: network
| eth0 ip 192.168.0.10 mask 255.255.255.0
| gw 192.168.0.1 hostname crispin
|
| This is not quite just an edge-labeled digraph because, as in
| S-expressions, the order of arcs within a node is significant;
| you can have multiple edges with the same label in the same node,
| and you can select edges by ordinal rather than, or in addition
| to, label.
|
| This is of course amenable to use as a programming syntax.
| Existenceblinks wrote:
| Fun read. I would love to read what if it's not text based, is it
| going to be different? Visual programming seems to suffer from
| composability and it's also bounded to be using human language as
| well, box with border is hard to comprehend, can get messy
| easily.
|
| I mean, visual but text-like theme. It seems to be in sweat spot.
| Only fix some downside/limit of text.
| csmeyer wrote:
| Shameless plug for my hybrid visual/text pl, Pickcode, which
| matches what you're talking about
|
| Demo programs: https://app.pickcode.io/playground
| Vermeulen wrote:
| Wow this is awesome, love this style. I did something really
| similar with our game's scripting language called MBScript:
| https://docs.modboxgame.com/docs/mbscript Same kind of line
| setup, visual add button, etc.
|
| Whats Pickcode made for? Web programming?
| csmeyer wrote:
| Pickcode is meant for K-12 education as an alternative to
| block programming. The end goal is to have a WYSIWG editor
| for web apps with behavior defined using the visual
| programming language.
|
| MBScript looks great and I'd love to talk about your
| learnings from it. My contact is on the pickcode.io landing
| page if you want to chat!
| Existenceblinks wrote:
| Hey, yes! Nice, I'm thinking more serious and ambitious. You
| could go mass by adding "module" and ways to compose. The
| keyboard navigation is ok-ish (honest opinion) because this
| is the hardest ones which is design for every day task in
| long run, at minimum should be as fast as text based
| programming. At least 3-4 devs are comfortable to work on
| this codebase (non-realtime, just normal version branching
| flow)
|
| I really really want this decades old idea to take off. We
| should have grammar files in .. json is fine (have to start
| somewhere), and have spec for editor implementor to spread it
| across platforms. Ideally languages creators only have to
| customize "view" to decide how their lang would look like.
| Probably configure keymaps if they think their lang can be
| developed fast with certain keystroke (akin to emacs but more
| friendly because medium is not text anymore)
| masklinn wrote:
| You could check out Self. It's image-based so the objects are
| "live", and can be interacted with directly via the UI.
| Existenceblinks wrote:
| Is it https://www.youtube.com/watch?v=CCx6Nj_Hr1g ? I've seen
| quite many live programming languages. Though not a single
| one seems to want to go mass. Like able to have at least a
| 3-4 devs team work on it.
___________________________________________________________________
(page generated 2022-10-18 23:00 UTC)