[HN Gopher] Igneous Linearizer: semi-structured source code
       ___________________________________________________________________
        
       Igneous Linearizer: semi-structured source code
        
       Author : seagreen
       Score  : 47 points
       Date   : 2024-06-25 15:44 UTC (2 days ago)
        
 (HTM) web link (domain-j.com)
 (TXT) w3m dump (domain-j.com)
        
       | zacgarby wrote:
       | this is great! i've been thinking about exactly this (though
       | styled after Logseq rather than Obsidian) but not gotten as far
       | as implementing anything.
       | 
       | that being said, the thing i haven't been able to convince myself
       | of yet is why these are different to just normal (in-line)
       | functions? as in, why should i have to write [[foo]]: would it
       | not be better to have all identifiers automatically linked?
        
         | fwip wrote:
         | As a language-agnostic thing, I suppose you need to prevent the
         | machinery from pulling in keywords and variable names
         | accidentally (or the insides of strings or comments).
         | 
         | I like the idea (also a fan of Unison's approach to code-in-
         | the-db), but I worry about the potential issues that come from
         | effectively having a single global namespace. Could be that I
         | just don't have the discipline for it, though.
        
           | seagreen wrote:
           | > As a language-agnostic thing, I suppose you need to prevent
           | the machinery from pulling in keywords and variable names
           | accidentally (or the insides of strings or comments).
           | 
           | Exactly. But zacgarby's right that you would want some auto-
           | linking, so this is where language-specific plugins come in.
           | 
           | The difference from today's world would be that those plugins
           | would leave their results explicitly serialized in the source
           | medium, so they wouldn't have to keep being reconstructed by
           | every other tool.
           | 
           | > I like the idea (also a fan of Unison's approach to code-
           | in-the-db), but I worry about the potential issues that come
           | from effectively having a single global namespace. Could be
           | that I just don't have the discipline for it, though.
           | 
           | I have lots of thoughts on this. I was initially disappointed
           | that Unison kept a unique hierarchy to organize their code--
           | that seems so filesystem-ey and 1990s.
           | 
           | However, I'm now a convert. The result of combining a unique
           | hierarchy with explicit links between nodes is a 'compound
           | graph' (or a 'cluster graph', depending, getting the language
           | from https://rtsys.informatik.uni-
           | kiel.de/~biblio/downloads/these...). These are very
           | respectable data structures! One thing they're good for is
           | being able to always give a canonical title to a node, but
           | varying what that title is depending on the situation.
           | 
           | I think that for serious work the linearizer would want to
           | copy this strategy as well. Right now it's flat because
           | that's all I need for my website, but if you were doing big
           | projects in it you'd want to follow Unison and have a
           | hierarchy. In the `HashMap` folder you'd display
           | `HashMap.get` with a link alias that shows plain `get`, but
           | if that function is being called from some other folder it
           | would appear as the full `HashMap.get`.
           | 
           | You could still do all the other cool stuff like organize by
           | tags and attributes using frontmatter, but for the particular
           | purpose of display names having a global hierarchy is useful.
           | 
           | EDIT: What matters more than what the linearizer does is what
           | Obsidian displays, so it's there that the "take relative
           | hierarchical position into account when showing links" logic
           | would have to occur. That could be a plugin or maybe
           | Obsidian's relative link feature, I haven't used the latter.
        
         | seagreen wrote:
         | > this is great! i've been thinking about exactly this (though
         | styled after Logseq rather than Obsidian) but not gotten as far
         | as implementing anything.
         | 
         | Thank you! I think [[links]] will work out of the box with
         | Logseq since they're the same as Obsidian. Transclusions will
         | be in the wrong format since Obsidian transclusions look like
         | `![[this]]`, but it would be quick to modify the linearizer to
         | handle them.
         | 
         | You may not want transclusions though since transcluding code
         | into other code is... very weird. I'm curious what use cases
         | people come up with for it though.
        
       | arnsholt wrote:
       | This is neat, but it does seem like a lot of work to get part of
       | the way to what a Smalltalk already gives you.
        
         | skulk wrote:
         | or emacs/org-mode
        
         | seagreen wrote:
         | I love Smalltalk, and have done a reasonable amount of messing
         | around with Cuis (which is awesome and everyone should try it).
         | 
         | However this gives you _two_ things that Smalltalk doesn 't:
         | 
         | 1. It's language agnostic (boring I know)
         | 
         | 2. It promotes keeping your code and written texts in the same
         | system where they're both first class. That way they can link
         | between each other, transclude each other, be published
         | together, be organized the same way, etc. I really think this
         | is the most interesting thing about the project, it really
         | feels important to me.
         | 
         | Caveat: right now my written documents can link to/transclude
         | code, but it doesn't work the other way yet. This is because
         | the linearizer will see a link from code to documents as
         | another definition and try to jam it in the source file. This
         | would be an interesting use case for typed links, but Obsidian
         | doesn't a have them AFAIK. Kind of cool since I haven't seen
         | many other use cases for typed links in the wild.
         | 
         | EDIT: It occurs to me that I've never used a Smalltalk
         | notetaking or word processing program. Are there any that are
         | integrated with the System Browser, so that they can link to
         | (or even better embed) code? If anyone has more info please let
         | me know!
        
           | couchbed wrote:
           | Lepiter is a Pharo-based notetaking app within the Glamorous
           | Toolkit. I'm not sure it's mature enough to compete with
           | Obsidian/etc., but it does allow linked and embedded code
           | like you were thinking.
           | 
           | https://lepiter.io/feenk/introducing-lepiter--knowledge-
           | mana...
        
             | seagreen wrote:
             | Of course! I should have just guessed they'd already have
             | something like this.
             | 
             | We either need to port ALL of Glamorous Toolkit to
             | mainstream langs or we need to convince all our employers
             | to switch to Smalltalk. I am not certain which of those is
             | possible or easier.
        
         | groby_b wrote:
         | Sure, but not all of us work in Smalltalk, and "but Smalltalk
         | already does it" doesn't move legacy code bases either.
        
         | DannyBee wrote:
         | I mean, it doesn't even get you to where VisualAge was with
         | Java, C++, and Smalltalk decades ago.
        
         | ralferoo wrote:
         | I remember reading an article on Source Code In Database back
         | in the early 2000s, and it's been knocking around my brain ever
         | since as something I ponder every couple of years. I just can't
         | shake the feeling that there's the gem of a future paradigm
         | where everyone wonders "why we didn't always do it that way?",
         | but then every time I try to follow those thoughts through to a
         | conclusion, it always feels like it'd just be re-implementing
         | Smalltalk, and then the question is "why isn't Smalltalk more
         | popular?"
         | 
         | That said, there's a lot to be said for revisiting old ideas.
         | There was so much interesting research done in the 60s and 70s
         | in all sorts of random directions, maybe because at that time
         | there were no precedents or expectations for how things should
         | be done. There are so many untapped resources here, it's crazy.
         | Every now and then I re-watch "The Mother of All Demos" [1]
         | from 1968 where Douglas Englebert demonstrates some of the
         | research at Stanford or the Sketchpad Demo [2] from 1963 where
         | Ivan Sutherland is presenting a GUI-based CAD system.
         | 
         | Fortunately, these ideas have now been picked up again, but to
         | me it's interesting to note just how long a time lapsed between
         | these ideas and becoming mainstream. Some of it is obviously
         | the cost as the state-of-the-art research machines were
         | massively more powerful than the home computers even 2 decades
         | later, but I'm sure there were a lot of great ideas that have
         | just been forgotten.
         | 
         | Part of the problem, I think, is that we have found solutions
         | to some of the easy problems and optimised it to such a degree
         | that it's then hard to ever go back and revisit the alternative
         | approaches because you'd need to regress so far from the
         | current levels of expectations.
         | 
         | [1] https://www.youtube.com/watch?v=yJDv-zdhzMY [2]
         | https://www.youtube.com/watch?v=6orsmFndx_o
        
           | ralferoo wrote:
           | Decided to split the off-topic part off into a reply so that
           | it didn't distract from the answer!
           | 
           | In terms of over-optimisation forcing a certain technologies
           | to be developed and others to be ignored, one example I'm
           | very familiar with is computer graphics. I'd written a TON of
           | stuff here, but decided to simplify it as it was labouring to
           | specific a point.
           | 
           | But our computer graphics state-of-the-art was roughly along
           | these lines: drawing all edges of polygons, hidden-line
           | removal (Sutherland), clipping intersecting polygons
           | (Hodgman), filling polygons with a single colour, *, Gouraud
           | shading, Gouraud shading with smaller triangles, Phong
           | shading with bigger triangles, texturing, fixed texture and
           | lighting pipelines, pixel shaders, vertex shaders. I'll also
           | add compute shaders too, but that was more of a
           | generalisation of what people were starting to do with pixel
           | shaders operating on data that wasn't really pixel data.
           | 
           | Now, you'll notice my * around the time of single colour
           | filled polygons... this might not be the correct place to put
           | the *, but around this point some people started
           | experimenting with ray-tracing and got amazing results, just
           | incredibly slowly. These were seen as the "gold standard",
           | but because drawing triangles was much faster, this is where
           | the money continued to be poured into, optimising and
           | optimising this special case, discovering more techniques to
           | "approximate" the right image, but trying to avoid the hard
           | work of actually rendering it. Over time, things have got
           | closer and closer to ray tracing, except transparent and
           | shiny objects have always been the achilles heel.
           | 
           | Fortunately industry's interest in ray-tracing has resumed,
           | and now compute shaders are general enough that they can be
           | used, but they're still orders of magnitude slower because
           | the renderer needs to consider the entire scene not just a
           | triangle at a time, so you need to store the scene in some
           | kind of tree that's paged in on demand and different
           | latencies for different pixels causes problems for the SIMD
           | architectures. We're starting to see more and more consumer-
           | level hardware with decent ray-tracing performance now, but
           | it's been a decade of lost time in terms of optimisation from
           | where it could have been if the entire market hadn't been
           | competing only in making triangles rasterise more quickly.
           | 
           | In the ray-tracing space, we still see that it's too slow to
           | create perfect images (for very complicated scenes with lots
           | of shiny surfaces and few lights, you might need thousands of
           | rays per pixel to just get a handful that actually reach a
           | light source), so we've invented all sorts of approaches to
           | cover it up - whether it's training an ML model to guess the
           | real colour for black pixels from neighbouring ones, or re-
           | projecting pixels from a previous frame to fill in the games,
           | etc.
           | 
           | Personally, I can't help but think the real breakthrough in
           | performant raytracing will come from tracing light from the
           | light sources instead. This wasn't done traditionally because
           | potentially it's even more expensive than tracing backwards
           | from the pixel, but should be more accurate when there are
           | multiple light sources.
           | 
           | But even the latest batch of hardware is all focused on
           | raytracing, which I think is missing the biggest trick of all
           | - they could be using cone-tracing as a first approximation
           | and then subdividing the cone into smaller and smaller chunks
           | until they're approximately pixel sized. None of this is new,
           | it's just not what the larger industry is doing right now,
           | because it's cheaper and easier for them to do rays instead.
        
           | igouy wrote:
           | > Source Code In Database
           | 
           | ?
           | 
           | https://www.google.com/books/edition/Mastering_ENVY_Develope.
           | ..
           | 
           | https://gemtalksystems.com/products/gs64/
           | 
           | https://en.wikipedia.org/wiki/MUMPS
        
             | ralferoo wrote:
             | > > Source Code In Database > ?
             | 
             | Rather than think about specific closed environments (which
             | may be an unavoidable consequence of SCID), I was thinking
             | more generically about the issues. At the time, I was
             | firmly in the Java large webapp space, so I was mostly
             | thinking about how you could target a JVM.
             | 
             | In terms of the actual article, there's a bunch of links on
             | Wikipedia [1] but I specifically was referring to an older
             | version of this [2] article (I think, I don't remember it
             | being as garish colours) and I think I found it via c2 [3].
             | 
             | None of these quite match up with what I thought I
             | remembered which informed a lot of my thoughts back then,
             | but I was mostly thinking about what the UI for such a
             | system might look like because a nice friendly GUI isn't
             | necessarily optimal for an experienced programmer who's
             | probably happiest writing and seeing their code as a big
             | chunks of text, and also how sometimes you want code that
             | the linter would hate because you've deliberately formatted
             | something to make it easier for humans to understand.
             | 
             | I was then thinking about how you could abstract and
             | generalise statements and sub-expressions into small mini-
             | functions that weren't complete functions per se, but more
             | like templates. I spent a long time thinking how one might
             | do code de-duplication by copying a graph of code, and then
             | changing some of the nodes in a copy-on-write style thing,
             | but decided there was no easy way of programatically
             | deciding which part of the tree were being fixed due to a
             | bug and needed to be shared with all copies, and which were
             | just modified inputs or local changes. In terms of code,
             | it's not that hard to do, but presenting in an intuitive
             | way in a UI is much harder, especially if one of the goals
             | is to make things easier for a novice programmer.
             | 
             | [1] https://en.wikipedia.org/wiki/Source_Code_in_Database
             | [2] https://www.mindprod.com/project/scid.html [3]
             | https://wiki.c2.com/?SourceCodeInDatabase
        
               | igouy wrote:
               | > experienced programmer who's probably happiest writing
               | and seeing their code as a big chunks of text
               | 
               | Not if you're a Smalltalk programmer.
               | 
               | "ENVY/Manager augments this model by providing
               | configuration management and version control facilities.
               | All code is stored in a central database rather than in
               | files associated with a particular image. Developers are
               | continuously connected to this database; therefore
               | changes are immediately visible to all developers."
               | 
               | https://www.google.com/books/edition/Mastering_ENVY_Devel
               | ope...
        
               | ralferoo wrote:
               | I kind of feel that we're talking at cross-purposes here.
               | To me, whether the code is in a database, in-memory,
               | serialised to a file isn't all that important as they're
               | all just representations of the same AST. For me, the
               | label "source code in database" is in comparison to
               | "source code in linear files that need not have any
               | inherent structure".
               | 
               | Also, perhaps my use of the phrase "experienced
               | programmer" is being interpreted negatively. I'm not
               | trying to imply that someone who has a lot of experience
               | using a graphical programming language is less
               | experienced, I'm using it as a shorthand for "experienced
               | programmer of a traditional text-based language". I'll
               | continue to do so for this reply too, because adding that
               | caveat every time I use that shorthand makes the actual
               | meaning I was trying to convey much harder to see.
               | 
               | As regards to the writing and seeing code as text
               | comment, I meant that experienced programmers will
               | probably find writing something like
               | 
               | sin(angle) * radius + offset
               | 
               | in a textual form the quickest way of expressing that
               | idea, especially if their IDE supports auto-completion of
               | variables. Most programmers generally also prefer to
               | visualise their code the same way they wrote it, so
               | presenting it back to them as text makes sense.
               | 
               | A novice programmer might prefer to see that as a graph
               | of operator nodes because it guides them through the
               | process. Even better if they can organise the nodes in
               | the layout they want, as some people remember visually
               | and can use the distinctive look of each area of "code"
               | to navigate when zoomed out.
               | 
               | Certainly, in game development, I've seen fairly non-
               | techy people create massive Blueprint graphs for Unreal
               | this way, but they wouldn't consider themselves a
               | programmer and were be scared by the prospect of a
               | screenful of code that does the same thing. On the other
               | hand, as someone who's used text to code for decades, I
               | find Blueprints to be horrendously slow for me to
               | understand when presented with someone else's "code"
               | because there are far fewer "social code norms" being
               | obeyed and people just do whatever makes sense to them.
               | 
               | I actually think in the above example of a short
               | expression, the best solution for a novice programmer
               | probably isn't even a graph, but actually closer to the
               | text for an experienced programmer. Most people will have
               | had exposure to formulas at school, so they might well
               | prefer to see something closer to the traditional text
               | form of the source code, but with tools to guide them to
               | entering that like the "Insert Equation" that's been in
               | Word for decades - so for instance, you might insert a
               | divide symbol and then fill in the two boxes, etc.
               | 
               | The point is that both code as text and code as a graph
               | can both be used to express the same AST at the end of
               | the day, and people should be able to use whichever make
               | most sense to them or makes them most productive. The
               | tricky part then comes when people have _chosen_ to
               | format their code or layout their graph in a particular
               | way that adds meaning and aids understanding of the graph
               | / source code without actually affecting the AST. I'm
               | specifically talking about formatting here rather than
               | comments, which is a similar problem of how you you would
               | build a code comment to a specific part of the tree in a
               | graphical view, and likewise graphical views might have
               | "visual sections" of the graph that don't necessarily map
               | neatly to a linear sequence of source lines.
        
               | igouy wrote:
               | > I'm using it as a shorthand for "experienced programmer
               | of a traditional text-based language".
               | 
               | And I was telling you that experienced Smalltalk
               | programmers don't work with "big chunks of text".
               | 
               | Small snippets of text, presented in context.
               | 
               | Sorry I'm not seeing enough clarity to wish to continue.
        
       | JonChesterfield wrote:
       | This is an interesting direction.
       | 
       | One thought is that obsidian can execute web assembly and a
       | parser / sema checker written in something that turns into wasm
       | can therefore be run on the source files. Can probably tie that
       | to a syntax highlighter style thing for in-ide feedback.
       | 
       | The other is that markdown is a tempting format for literate
       | programming. I do have some notes in obsidian that are fed to
       | cmark to product html. With some conventions, splitting a
       | literate program into executable code embedded in a html document
       | is probably doable as an XML pipeline.
       | 
       | In a much simpler vein, I'm experimenting with machine
       | configuration from within obsidian. The local DNS server sets
       | itself up using a markdown file so editing an IP or adding a new
       | machine can be done by changing that markdown.
       | 
       | I hope the author continues down this path and writes more about
       | the experience.
        
         | nohat wrote:
         | Yeah, I definitely see using this for literate programming. Not
         | quite sure the best way to organize it. Maybe use a static site
         | compiler to auto host documentation version.
        
           | seagreen wrote:
           | I wish you could just use Obsidian Publish to host sites, but
           | due to the indentation issue you have to control the
           | rendering, which is a bummer.
           | 
           | Obsidian Digital Garden[1] is FOSS, so it might be modifiable
           | parse and output the code pages correctly.
           | 
           | [1] https://github.com/oleeskild/obsidian-digital-garden
        
           | JonChesterfield wrote:
           | markdown -> xml -> hack around -> html is essentially a
           | static site generator. https://github.com/commonmark/cmark
           | works really well.
        
         | nbbaier wrote:
         | > In a much simpler vein, I'm experimenting with machine
         | configuration from within obsidian. The local DNS server sets
         | itself up using a markdown file so editing an IP or adding a
         | new machine can be done by changing that markdown.
         | 
         | This sounds really interesting, any code to look at anywhere or
         | write up about it?
        
           | JonChesterfield wrote:
           | Simple enough to write inline.
           | 
           | There's a file called DNS.md which contains lines like
           | `192.168.1.15 milan` in an obsidian vault. Obsidian sync
           | copies it around. DNS is by pihole which uses a plain text
           | file in that sort of format for the entries.
           | 
           | Then superuser's crontab -l                   0 * * * * cmp
           | /home/jon/Documents/Obsidian/SystemControl/DNS.md
           | /etc/pihole/custom.list >/dev/null 2>&1 || cat
           | /home/jon/Documents/Obsidian/SystemControl/DNS.md >
           | /etc/pihole/custom.list && sudo -u jon pihole restartdns
           | 
           | Cron has rules about relative paths that I don't remember so
           | it's literally written as above.
           | 
           | It seems likely that the idea generalises. I'm considering
           | managing public keys for ssh / wireguard in similar fashion
           | but haven't done so yet.
        
         | seagreen wrote:
         | > I hope the author continues down this path and writes more
         | about the experience.
         | 
         | I appreciate it=) I definitely want to write some more stuff
         | up, in particular how code organization changes when you can
         | tag and add attributes to definitions.
        
       | nbbaier wrote:
       | > The solution I've been waiting for is source-code-in-the-
       | database. I'm cheering on multiple projects attempting this.
       | 
       | What are the projects you're especially bullish on?
        
         | DannyBee wrote:
         | VisualAge C++, Java, and Smalltalk.
        
         | seagreen wrote:
         | Hazel and Unison are two of the big ones. I'm friends with some
         | of the Unison folks so I'm biased, but I really like how _few_
         | features there are in the language. In general I 'm just a huge
         | sucker for subtractive improvement: if you can have a small
         | number of awesome things (eg abilities) instead of a bunch of
         | special case things (exception handling, monad trickery,
         | dependency injection machinery) sign me up.
         | 
         | I know less about Hazel, my understanding is that it's source-
         | code-in-CRDTs, which is definitely structured source code
         | though may not technically be in a database.
        
           | thyrsus wrote:
           | Is this the unison you mean? https://www.unison-lang.org/
           | 
           | Unrelated: what has your experience been using igneous-
           | linearizer to help understand other people's code?
        
             | seagreen wrote:
             | That's it. And the linearizer is only one way-- you write
             | text with [[links]] and turn that into plaintext.
        
       ___________________________________________________________________
       (page generated 2024-06-27 23:01 UTC)