[HN Gopher] Igneous Linearizer: semi-structured source code
___________________________________________________________________
Igneous Linearizer: semi-structured source code
Author : seagreen
Score : 47 points
Date : 2024-06-25 15:44 UTC (2 days ago)
(HTM) web link (domain-j.com)
(TXT) w3m dump (domain-j.com)
| zacgarby wrote:
| this is great! i've been thinking about exactly this (though
| styled after Logseq rather than Obsidian) but not gotten as far
| as implementing anything.
|
| that being said, the thing i haven't been able to convince myself
| of yet is why these are different to just normal (in-line)
| functions? as in, why should i have to write [[foo]]: would it
| not be better to have all identifiers automatically linked?
| fwip wrote:
| As a language-agnostic thing, I suppose you need to prevent the
| machinery from pulling in keywords and variable names
| accidentally (or the insides of strings or comments).
|
| I like the idea (also a fan of Unison's approach to code-in-
| the-db), but I worry about the potential issues that come from
| effectively having a single global namespace. Could be that I
| just don't have the discipline for it, though.
| seagreen wrote:
| > As a language-agnostic thing, I suppose you need to prevent
| the machinery from pulling in keywords and variable names
| accidentally (or the insides of strings or comments).
|
| Exactly. But zacgarby's right that you would want some auto-
| linking, so this is where language-specific plugins come in.
|
| The difference from today's world would be that those plugins
| would leave their results explicitly serialized in the source
| medium, so they wouldn't have to keep being reconstructed by
| every other tool.
|
| > I like the idea (also a fan of Unison's approach to code-
| in-the-db), but I worry about the potential issues that come
| from effectively having a single global namespace. Could be
| that I just don't have the discipline for it, though.
|
| I have lots of thoughts on this. I was initially disappointed
| that Unison kept a unique hierarchy to organize their code--
| that seems so filesystem-ey and 1990s.
|
| However, I'm now a convert. The result of combining a unique
| hierarchy with explicit links between nodes is a 'compound
| graph' (or a 'cluster graph', depending, getting the language
| from https://rtsys.informatik.uni-
| kiel.de/~biblio/downloads/these...). These are very
| respectable data structures! One thing they're good for is
| being able to always give a canonical title to a node, but
| varying what that title is depending on the situation.
|
| I think that for serious work the linearizer would want to
| copy this strategy as well. Right now it's flat because
| that's all I need for my website, but if you were doing big
| projects in it you'd want to follow Unison and have a
| hierarchy. In the `HashMap` folder you'd display
| `HashMap.get` with a link alias that shows plain `get`, but
| if that function is being called from some other folder it
| would appear as the full `HashMap.get`.
|
| You could still do all the other cool stuff like organize by
| tags and attributes using frontmatter, but for the particular
| purpose of display names having a global hierarchy is useful.
|
| EDIT: What matters more than what the linearizer does is what
| Obsidian displays, so it's there that the "take relative
| hierarchical position into account when showing links" logic
| would have to occur. That could be a plugin or maybe
| Obsidian's relative link feature, I haven't used the latter.
| seagreen wrote:
| > this is great! i've been thinking about exactly this (though
| styled after Logseq rather than Obsidian) but not gotten as far
| as implementing anything.
|
| Thank you! I think [[links]] will work out of the box with
| Logseq since they're the same as Obsidian. Transclusions will
| be in the wrong format since Obsidian transclusions look like
| `![[this]]`, but it would be quick to modify the linearizer to
| handle them.
|
| You may not want transclusions though since transcluding code
| into other code is... very weird. I'm curious what use cases
| people come up with for it though.
| arnsholt wrote:
| This is neat, but it does seem like a lot of work to get part of
| the way to what a Smalltalk already gives you.
| skulk wrote:
| or emacs/org-mode
| seagreen wrote:
| I love Smalltalk, and have done a reasonable amount of messing
| around with Cuis (which is awesome and everyone should try it).
|
| However this gives you _two_ things that Smalltalk doesn 't:
|
| 1. It's language agnostic (boring I know)
|
| 2. It promotes keeping your code and written texts in the same
| system where they're both first class. That way they can link
| between each other, transclude each other, be published
| together, be organized the same way, etc. I really think this
| is the most interesting thing about the project, it really
| feels important to me.
|
| Caveat: right now my written documents can link to/transclude
| code, but it doesn't work the other way yet. This is because
| the linearizer will see a link from code to documents as
| another definition and try to jam it in the source file. This
| would be an interesting use case for typed links, but Obsidian
| doesn't a have them AFAIK. Kind of cool since I haven't seen
| many other use cases for typed links in the wild.
|
| EDIT: It occurs to me that I've never used a Smalltalk
| notetaking or word processing program. Are there any that are
| integrated with the System Browser, so that they can link to
| (or even better embed) code? If anyone has more info please let
| me know!
| couchbed wrote:
| Lepiter is a Pharo-based notetaking app within the Glamorous
| Toolkit. I'm not sure it's mature enough to compete with
| Obsidian/etc., but it does allow linked and embedded code
| like you were thinking.
|
| https://lepiter.io/feenk/introducing-lepiter--knowledge-
| mana...
| seagreen wrote:
| Of course! I should have just guessed they'd already have
| something like this.
|
| We either need to port ALL of Glamorous Toolkit to
| mainstream langs or we need to convince all our employers
| to switch to Smalltalk. I am not certain which of those is
| possible or easier.
| groby_b wrote:
| Sure, but not all of us work in Smalltalk, and "but Smalltalk
| already does it" doesn't move legacy code bases either.
| DannyBee wrote:
| I mean, it doesn't even get you to where VisualAge was with
| Java, C++, and Smalltalk decades ago.
| ralferoo wrote:
| I remember reading an article on Source Code In Database back
| in the early 2000s, and it's been knocking around my brain ever
| since as something I ponder every couple of years. I just can't
| shake the feeling that there's the gem of a future paradigm
| where everyone wonders "why we didn't always do it that way?",
| but then every time I try to follow those thoughts through to a
| conclusion, it always feels like it'd just be re-implementing
| Smalltalk, and then the question is "why isn't Smalltalk more
| popular?"
|
| That said, there's a lot to be said for revisiting old ideas.
| There was so much interesting research done in the 60s and 70s
| in all sorts of random directions, maybe because at that time
| there were no precedents or expectations for how things should
| be done. There are so many untapped resources here, it's crazy.
| Every now and then I re-watch "The Mother of All Demos" [1]
| from 1968 where Douglas Englebert demonstrates some of the
| research at Stanford or the Sketchpad Demo [2] from 1963 where
| Ivan Sutherland is presenting a GUI-based CAD system.
|
| Fortunately, these ideas have now been picked up again, but to
| me it's interesting to note just how long a time lapsed between
| these ideas and becoming mainstream. Some of it is obviously
| the cost as the state-of-the-art research machines were
| massively more powerful than the home computers even 2 decades
| later, but I'm sure there were a lot of great ideas that have
| just been forgotten.
|
| Part of the problem, I think, is that we have found solutions
| to some of the easy problems and optimised it to such a degree
| that it's then hard to ever go back and revisit the alternative
| approaches because you'd need to regress so far from the
| current levels of expectations.
|
| [1] https://www.youtube.com/watch?v=yJDv-zdhzMY [2]
| https://www.youtube.com/watch?v=6orsmFndx_o
| ralferoo wrote:
| Decided to split the off-topic part off into a reply so that
| it didn't distract from the answer!
|
| In terms of over-optimisation forcing a certain technologies
| to be developed and others to be ignored, one example I'm
| very familiar with is computer graphics. I'd written a TON of
| stuff here, but decided to simplify it as it was labouring to
| specific a point.
|
| But our computer graphics state-of-the-art was roughly along
| these lines: drawing all edges of polygons, hidden-line
| removal (Sutherland), clipping intersecting polygons
| (Hodgman), filling polygons with a single colour, *, Gouraud
| shading, Gouraud shading with smaller triangles, Phong
| shading with bigger triangles, texturing, fixed texture and
| lighting pipelines, pixel shaders, vertex shaders. I'll also
| add compute shaders too, but that was more of a
| generalisation of what people were starting to do with pixel
| shaders operating on data that wasn't really pixel data.
|
| Now, you'll notice my * around the time of single colour
| filled polygons... this might not be the correct place to put
| the *, but around this point some people started
| experimenting with ray-tracing and got amazing results, just
| incredibly slowly. These were seen as the "gold standard",
| but because drawing triangles was much faster, this is where
| the money continued to be poured into, optimising and
| optimising this special case, discovering more techniques to
| "approximate" the right image, but trying to avoid the hard
| work of actually rendering it. Over time, things have got
| closer and closer to ray tracing, except transparent and
| shiny objects have always been the achilles heel.
|
| Fortunately industry's interest in ray-tracing has resumed,
| and now compute shaders are general enough that they can be
| used, but they're still orders of magnitude slower because
| the renderer needs to consider the entire scene not just a
| triangle at a time, so you need to store the scene in some
| kind of tree that's paged in on demand and different
| latencies for different pixels causes problems for the SIMD
| architectures. We're starting to see more and more consumer-
| level hardware with decent ray-tracing performance now, but
| it's been a decade of lost time in terms of optimisation from
| where it could have been if the entire market hadn't been
| competing only in making triangles rasterise more quickly.
|
| In the ray-tracing space, we still see that it's too slow to
| create perfect images (for very complicated scenes with lots
| of shiny surfaces and few lights, you might need thousands of
| rays per pixel to just get a handful that actually reach a
| light source), so we've invented all sorts of approaches to
| cover it up - whether it's training an ML model to guess the
| real colour for black pixels from neighbouring ones, or re-
| projecting pixels from a previous frame to fill in the games,
| etc.
|
| Personally, I can't help but think the real breakthrough in
| performant raytracing will come from tracing light from the
| light sources instead. This wasn't done traditionally because
| potentially it's even more expensive than tracing backwards
| from the pixel, but should be more accurate when there are
| multiple light sources.
|
| But even the latest batch of hardware is all focused on
| raytracing, which I think is missing the biggest trick of all
| - they could be using cone-tracing as a first approximation
| and then subdividing the cone into smaller and smaller chunks
| until they're approximately pixel sized. None of this is new,
| it's just not what the larger industry is doing right now,
| because it's cheaper and easier for them to do rays instead.
| igouy wrote:
| > Source Code In Database
|
| ?
|
| https://www.google.com/books/edition/Mastering_ENVY_Develope.
| ..
|
| https://gemtalksystems.com/products/gs64/
|
| https://en.wikipedia.org/wiki/MUMPS
| ralferoo wrote:
| > > Source Code In Database > ?
|
| Rather than think about specific closed environments (which
| may be an unavoidable consequence of SCID), I was thinking
| more generically about the issues. At the time, I was
| firmly in the Java large webapp space, so I was mostly
| thinking about how you could target a JVM.
|
| In terms of the actual article, there's a bunch of links on
| Wikipedia [1] but I specifically was referring to an older
| version of this [2] article (I think, I don't remember it
| being as garish colours) and I think I found it via c2 [3].
|
| None of these quite match up with what I thought I
| remembered which informed a lot of my thoughts back then,
| but I was mostly thinking about what the UI for such a
| system might look like because a nice friendly GUI isn't
| necessarily optimal for an experienced programmer who's
| probably happiest writing and seeing their code as a big
| chunks of text, and also how sometimes you want code that
| the linter would hate because you've deliberately formatted
| something to make it easier for humans to understand.
|
| I was then thinking about how you could abstract and
| generalise statements and sub-expressions into small mini-
| functions that weren't complete functions per se, but more
| like templates. I spent a long time thinking how one might
| do code de-duplication by copying a graph of code, and then
| changing some of the nodes in a copy-on-write style thing,
| but decided there was no easy way of programatically
| deciding which part of the tree were being fixed due to a
| bug and needed to be shared with all copies, and which were
| just modified inputs or local changes. In terms of code,
| it's not that hard to do, but presenting in an intuitive
| way in a UI is much harder, especially if one of the goals
| is to make things easier for a novice programmer.
|
| [1] https://en.wikipedia.org/wiki/Source_Code_in_Database
| [2] https://www.mindprod.com/project/scid.html [3]
| https://wiki.c2.com/?SourceCodeInDatabase
| igouy wrote:
| > experienced programmer who's probably happiest writing
| and seeing their code as a big chunks of text
|
| Not if you're a Smalltalk programmer.
|
| "ENVY/Manager augments this model by providing
| configuration management and version control facilities.
| All code is stored in a central database rather than in
| files associated with a particular image. Developers are
| continuously connected to this database; therefore
| changes are immediately visible to all developers."
|
| https://www.google.com/books/edition/Mastering_ENVY_Devel
| ope...
| ralferoo wrote:
| I kind of feel that we're talking at cross-purposes here.
| To me, whether the code is in a database, in-memory,
| serialised to a file isn't all that important as they're
| all just representations of the same AST. For me, the
| label "source code in database" is in comparison to
| "source code in linear files that need not have any
| inherent structure".
|
| Also, perhaps my use of the phrase "experienced
| programmer" is being interpreted negatively. I'm not
| trying to imply that someone who has a lot of experience
| using a graphical programming language is less
| experienced, I'm using it as a shorthand for "experienced
| programmer of a traditional text-based language". I'll
| continue to do so for this reply too, because adding that
| caveat every time I use that shorthand makes the actual
| meaning I was trying to convey much harder to see.
|
| As regards to the writing and seeing code as text
| comment, I meant that experienced programmers will
| probably find writing something like
|
| sin(angle) * radius + offset
|
| in a textual form the quickest way of expressing that
| idea, especially if their IDE supports auto-completion of
| variables. Most programmers generally also prefer to
| visualise their code the same way they wrote it, so
| presenting it back to them as text makes sense.
|
| A novice programmer might prefer to see that as a graph
| of operator nodes because it guides them through the
| process. Even better if they can organise the nodes in
| the layout they want, as some people remember visually
| and can use the distinctive look of each area of "code"
| to navigate when zoomed out.
|
| Certainly, in game development, I've seen fairly non-
| techy people create massive Blueprint graphs for Unreal
| this way, but they wouldn't consider themselves a
| programmer and were be scared by the prospect of a
| screenful of code that does the same thing. On the other
| hand, as someone who's used text to code for decades, I
| find Blueprints to be horrendously slow for me to
| understand when presented with someone else's "code"
| because there are far fewer "social code norms" being
| obeyed and people just do whatever makes sense to them.
|
| I actually think in the above example of a short
| expression, the best solution for a novice programmer
| probably isn't even a graph, but actually closer to the
| text for an experienced programmer. Most people will have
| had exposure to formulas at school, so they might well
| prefer to see something closer to the traditional text
| form of the source code, but with tools to guide them to
| entering that like the "Insert Equation" that's been in
| Word for decades - so for instance, you might insert a
| divide symbol and then fill in the two boxes, etc.
|
| The point is that both code as text and code as a graph
| can both be used to express the same AST at the end of
| the day, and people should be able to use whichever make
| most sense to them or makes them most productive. The
| tricky part then comes when people have _chosen_ to
| format their code or layout their graph in a particular
| way that adds meaning and aids understanding of the graph
| / source code without actually affecting the AST. I'm
| specifically talking about formatting here rather than
| comments, which is a similar problem of how you you would
| build a code comment to a specific part of the tree in a
| graphical view, and likewise graphical views might have
| "visual sections" of the graph that don't necessarily map
| neatly to a linear sequence of source lines.
| igouy wrote:
| > I'm using it as a shorthand for "experienced programmer
| of a traditional text-based language".
|
| And I was telling you that experienced Smalltalk
| programmers don't work with "big chunks of text".
|
| Small snippets of text, presented in context.
|
| Sorry I'm not seeing enough clarity to wish to continue.
| JonChesterfield wrote:
| This is an interesting direction.
|
| One thought is that obsidian can execute web assembly and a
| parser / sema checker written in something that turns into wasm
| can therefore be run on the source files. Can probably tie that
| to a syntax highlighter style thing for in-ide feedback.
|
| The other is that markdown is a tempting format for literate
| programming. I do have some notes in obsidian that are fed to
| cmark to product html. With some conventions, splitting a
| literate program into executable code embedded in a html document
| is probably doable as an XML pipeline.
|
| In a much simpler vein, I'm experimenting with machine
| configuration from within obsidian. The local DNS server sets
| itself up using a markdown file so editing an IP or adding a new
| machine can be done by changing that markdown.
|
| I hope the author continues down this path and writes more about
| the experience.
| nohat wrote:
| Yeah, I definitely see using this for literate programming. Not
| quite sure the best way to organize it. Maybe use a static site
| compiler to auto host documentation version.
| seagreen wrote:
| I wish you could just use Obsidian Publish to host sites, but
| due to the indentation issue you have to control the
| rendering, which is a bummer.
|
| Obsidian Digital Garden[1] is FOSS, so it might be modifiable
| parse and output the code pages correctly.
|
| [1] https://github.com/oleeskild/obsidian-digital-garden
| JonChesterfield wrote:
| markdown -> xml -> hack around -> html is essentially a
| static site generator. https://github.com/commonmark/cmark
| works really well.
| nbbaier wrote:
| > In a much simpler vein, I'm experimenting with machine
| configuration from within obsidian. The local DNS server sets
| itself up using a markdown file so editing an IP or adding a
| new machine can be done by changing that markdown.
|
| This sounds really interesting, any code to look at anywhere or
| write up about it?
| JonChesterfield wrote:
| Simple enough to write inline.
|
| There's a file called DNS.md which contains lines like
| `192.168.1.15 milan` in an obsidian vault. Obsidian sync
| copies it around. DNS is by pihole which uses a plain text
| file in that sort of format for the entries.
|
| Then superuser's crontab -l 0 * * * * cmp
| /home/jon/Documents/Obsidian/SystemControl/DNS.md
| /etc/pihole/custom.list >/dev/null 2>&1 || cat
| /home/jon/Documents/Obsidian/SystemControl/DNS.md >
| /etc/pihole/custom.list && sudo -u jon pihole restartdns
|
| Cron has rules about relative paths that I don't remember so
| it's literally written as above.
|
| It seems likely that the idea generalises. I'm considering
| managing public keys for ssh / wireguard in similar fashion
| but haven't done so yet.
| seagreen wrote:
| > I hope the author continues down this path and writes more
| about the experience.
|
| I appreciate it=) I definitely want to write some more stuff
| up, in particular how code organization changes when you can
| tag and add attributes to definitions.
| nbbaier wrote:
| > The solution I've been waiting for is source-code-in-the-
| database. I'm cheering on multiple projects attempting this.
|
| What are the projects you're especially bullish on?
| DannyBee wrote:
| VisualAge C++, Java, and Smalltalk.
| seagreen wrote:
| Hazel and Unison are two of the big ones. I'm friends with some
| of the Unison folks so I'm biased, but I really like how _few_
| features there are in the language. In general I 'm just a huge
| sucker for subtractive improvement: if you can have a small
| number of awesome things (eg abilities) instead of a bunch of
| special case things (exception handling, monad trickery,
| dependency injection machinery) sign me up.
|
| I know less about Hazel, my understanding is that it's source-
| code-in-CRDTs, which is definitely structured source code
| though may not technically be in a database.
| thyrsus wrote:
| Is this the unison you mean? https://www.unison-lang.org/
|
| Unrelated: what has your experience been using igneous-
| linearizer to help understand other people's code?
| seagreen wrote:
| That's it. And the linearizer is only one way-- you write
| text with [[links]] and turn that into plaintext.
___________________________________________________________________
(page generated 2024-06-27 23:01 UTC)