[HN Gopher] Version Control for Structure Editing
___________________________________________________________________
Version Control for Structure Editing
Author : mepian
Score : 83 points
Date : 2021-10-19 19:03 UTC (3 hours ago)
(HTM) web link (alarmingdevelopment.org)
(TXT) w3m dump (alarmingdevelopment.org)
| lewisjoe wrote:
| The challenge with implementing this is dealing with half a dozen
| types of operations or maybe more. In typical string OT/CRDT we
| are dealing with a minimal set of operations (insert/delete) but
| when it comes to a structure (= semantic trees) the ops are very
| tailored for that semantic structure and could span and evolve
| with the structure.
|
| Even if we get the OT part right, it'd be huge effort to port
| this to support other semantic structures with different set of
| ops. Also I can't wrap my head around how transformations and
| conflict detections work under these cases. Will watch out for
| more from this project.
| lewisjoe wrote:
| Also, what happens if the structure was edited in some other
| editor and you suddenly get two structures with no history to
| compare against?
| dgb23 wrote:
| Haven't read the paper thoroughly yet, looking forward to it.
| The idea here seems to be very type driven and I think there is
| something to it.
|
| The general goal reminds me of Unison[0], which takes a
| different approach. It sees code as kind of a database where
| the functions are immutable entries. So it is less granular,
| but likely more semantic.
|
| What I immediately thought of reading your comment is paredit.
| I know of the Emacs mode[1] and the Calva VSCode plugin[2]. One
| could work from there, see code evolution as collection of
| structured editing units.
|
| And then, some languages are extremely terse like APL or Forth.
| Haven't yet found time to study them, but maybe their
| representation and semantics are more suitable for this type of
| thing?
|
| But yeah, just text might just not be the right medium for code
| in the first place. Not when we start thinking about what code
| actually is.
|
| We're manipulating structures indirectly by manipulating text.
| Something is not right here... I know there have been many
| attempts to move away from it, some are successful but only for
| specific use-cases and I don't think anything succeeded in the
| general purpose space. Maybe someone will succeed though. There
| is no reason to believe otherwise. I feel like it would have to
| be a very cross disciplinary collaboration. People who make
| games, databases, art, science. Different perspectives to break
| out of what we think programming is or should be.
|
| I watched this talk[3] some months ago. One of the cool things
| is the discussion near the end of the video at around 1h11m:
| look what Sussman does, when he talks about stratification and
| code structure - he closes his eyes. What is he seeing there?
| He explains it sure, but he _sees_ something. That's what the
| program _is_, not the text, not the bits and bytes. It's a
| deeply connected, complex, flowing structure - I think they
| talk about forests in there.
|
| When we program, we manipulate this structure and the text we
| write is kind of far away from the actual mental model we have.
| Yes, I see code in my inner eye too, but that is when I think
| about implementing it, or when I navigate actually written code
| from memory. But it's not _the thing_.
|
| [0] https://www.unisonweb.org/
|
| [1] https://www.emacswiki.org/emacs/ParEdit
|
| [2] https://calva.io/paredit/
|
| [3] "Stratified Design: A Lisp Tradition"
| https://www.youtube.com/watch?v=BoGb56k2txk
| narush wrote:
| I spent a while working on a generalized version control system
| when I graduated two years ago. It was called Saga [1]. Saga -
| get it? The name was the best bit.
|
| It allowed you to specify a "file representation format," and
| then used some messy 2d-and-above longest-common subsequence
| matching algo [3] I can up with to diff the files, and merge them
| if you wanted. It was a lovely learning experience I tried to
| pass off as a startup, and got two of my friends involved as
| cofounders.
|
| From there, we tried to focus (generalized version control is
| really hard... technically and otherwise), and pivoted to version
| control for Excel spreadsheets. At one point we had branching and
| merging working for XLSX files. But as we began to discover what
| version of Excel customers used, things got a lot less fun. That
| + lack of interest led to another pivot.
|
| Anyways, for the past 1 year (just passed!) we've been building
| Mito [3] with our learnings from all those spreadsheet folks we
| spent time above. Mito is effectively a spreadsheet within your
| Python environment. It's absolutely still getting off the ground,
| but we're pretty proud of the value we're delivering to users
| currently!
|
| [1] https://github.com/saga-vcs/saga
|
| [2] https://github.com/saga-
| vcs/saga/blob/master/saga/base_file/...
|
| [3] https://trymito.io/hn
| a_c wrote:
| At first glance I thought it was some kind of version control for
| designing tool, like figma.
|
| In my experience, the workflow between designers are highly
| variable and the designs rarely reflect production fidelity. I am
| hoping to have a tool to facilitate the collaboration between
| visual/UI design and engineering. Anyway, am getting tangential
| here
| morelisp wrote:
| Just a reminder that git stores _files_ , not _diffs_ , and you
| can replace the merging strategy (e.g. how it handles multiple
| heads), merge driver (e.g. word vs. line based merging), and
| interactive diffing tool with anything you want. In this sense
| git is purely concerned with _version control_ (what instance do
| I have of this data and what is its provenance in regards to
| other instances), and doesn 't really give a crap _how_ those
| files got there.
|
| I see a new structured editing project kicking off 3-4 times a
| year and for some reason all of them seem to start by replacing
| git. Thereby they immediately have to contend with storage,
| branching, naming, and distribution, rather than using git as an
| object store and focusing on their new editing algorithms.
|
| (There are also very real workflow issues with the snapshot
| model! But these structure editing projects don't try to address
| those either.)
| ftomassetti wrote:
| True, indeed JetBrains MPS has its own git driver
| gnufx wrote:
| Darcs (and Pijul?) can support more patch types than textual
| diffs, but I doubt much use has ever been made of that. I don't
| know about the more general case, but it supports the extra type
| now for identifier replacement, at least as basically s/x/y/g.
| (One place where another type might be useful is changelogs, but
| I never looked at what that might take.)
|
| The Toolpack tool set for Fortran from the '80s was based around
| parse trees and had a VCS, but I don't remember whether that
| actually operated on trees or just text.
| jayd16 wrote:
| git can support different diff/merge tools. I just wish more of
| gits configuration could be added to the repo itself. As it is,
| if you needed a custom merge tool (like UnityYamlMerge) you
| need each user to configure it separately.
|
| The consequence is every contributor needs to know enough about
| every file type in the repo to know if a custom merge tool
| should be add/updated. You might get surprised with a merge
| conflict in a filetype you never touched if you happen to be
| the one merging down feature branches.
|
| Hopefully some of this stuff and default client githooks are
| fixed one day. Seems easy enough to add a "suggested project
| config" to git.
| escot wrote:
| > Perhaps version control is actually the weak point of the
| textual edifice, where we might be able to win a battle.
|
| It would be interesting because as the paper says textual editing
| has great deployment and collaboration tooling. So if non textual
| could get a foothold in that exact area -- git -- it could draw a
| ton of people who just want to get things shipped.
| bob1029 wrote:
| The answer for successfully applying VCS to higher-dimensional
| spaces will demand more mathematically-elegant intermediate
| representations. Most source code files are highly structured by
| default. Image files are mostly feasible to diff as-is. Typical
| 3d models, not so much. 3d models _with_ animation, even less so.
|
| To be clear - the problem isn't that we cant detect a difference,
| it's that we cannot produce a useful view of the difference such
| that a human can make an informed decision. With
| images/audio/code, you can still extract useful knowledge as long
| as you know the shape of the difference relative to the whole,
| even if the difference itself is a meaningless mesh of colors
| between 2 image files.
|
| Writing a _useful_ diff engine for 3d models represented using
| constructive solid geometry would probably be substantially
| easier than with other approaches. I don 't know if CSG is
| actually constrained to 3 dimensions either... I feel like GitHub
| actually tried to do something like this but I don't know if it
| went very far.
| bob1029 wrote:
| Here is the GH blog post I'm thinking of from 2013:
|
| https://github.blog/2013-09-17-3d-file-diffs/
| la4ry wrote:
| Of historical interest was Interlisp-D as a system that did
| structure editing and version management. it was at the beginning
| of time so getting it to work again as a practical development
| environment is a lot of work.
|
| https://github.com/Interlisp/medley/issues/533
| shrimpx wrote:
| Since the beginning of computer time people have been working on
| structure editing, because academically it's very compelling, yet
| in practice text wins out over and over. That said, there's
| probably a lot of opportunity to have "structure under the hood",
| but that's kind of a moot point in general because that's what
| linters, compilers, etc., are.
|
| But maybe his specific point about structural diffs is salient;
| that maybe there are huge wins in structural diffs that we
| haven't tapped into for some reason. Again, there are decades of
| research in structural diffing, so where's the impact?
| [deleted]
| avindroth wrote:
| It works well with lisps at the very least
| hardwaregeek wrote:
| Ooh this is exactly what I've been thinking about. Text is such a
| slow, clunky medium. It'd be interesting if you could think of
| versions as events modifying a tree. Renaming a variable and
| inserting a character would both be an event. Also I wonder if
| structural editing will take over. IDEs are already so powerful
| that if you could create good keybindings, you could do so much
| with just IDE commands (generate expr, rename var, swap args,
| etc.). Then if your editor knows that it will always keep a valid
| AST, what can you do with your tooling?
| solarkraft wrote:
| I really, really hope so.
|
| Text is so clunky, especially in languages with superfluous
| syntax (semicolon, braces). My tree based outliner allows me to
| easily rearrange arbitrarily large blocks while never creating
| invalid syntax, why the heck doesn't my IDE? Code is just a
| damn tree. Why can't I arbitrarily choose to comment out/in
| code without breaking basically all the IDE tooling (collapsed
| a block? Well too bad!!)?
|
| We should _never_ have to think about syntax. Yet we (or
| certainly I) do a significant portion of the time.
|
| The stuff I'm thinking of should be fairly possible to do as a
| Vscodium/VSCode plugin. Can somebody please tell me it's
| already being done?
| layer8 wrote:
| Does it really make much of a difference whether you press an
| end-of-statement keyboard shortcut vs. typing a semicolon?
|
| Having the latter as part of the source code is more
| explicit, similar to LaTex vs. invisible formatting marks in
| a word processor.
| ModernMech wrote:
| Those semicolons are redundant but not superfluous. Here are
| some good reasons why you might want to keep them around even
| in they aren't strictly necessary in parsing your program.
|
| https://digitalmars.com/articles/b05.html
| layer8 wrote:
| I don't think that always keeping a valid AST is important.
| Realtime highlighting of syntax errors already resumes parsing
| after invalid code, usually mapping to error nodes internally.
| That is, you still have an AST, just with additional node
| types. Having an interim state with error nodes isn't really
| different from having intermediate states with temporary
| (possibly large) changes in valid code, e.g. where you
| move/cut/paste larger portions of code around, and then maybe
| decide to change it back (or just change back some parts).
| Creating a sensible history of AST operations doesn't really
| depend on whether you have error nodes in your AST grammar or
| not.
|
| On the other hand, allowing error nodes (i.e. invalid code) at
| least as an intermediate state arguably allows more freedom and
| creativity when editing code, and feels less coercive. It is
| also unavoidable in certain contexts, such as while typing an
| identifier, the identifier may be invalid in most intermediate
| states until you have finished typing it.
|
| Therefore I'm unconvinced that restricting editing to valid
| ASTs is (a) critical to collaborative editing and versioning,
| and (b) strictly desirable from a usability perspective.
| zwieback wrote:
| Super interesting. Instead of going whole-hog, could we add some
| kind of hinting system to existing text-based systems that would
| make structural changes known to the VCS? Maybe also make it
| clear what's a comment or other insignificant change so that the
| important changes can be tracked separately?
___________________________________________________________________
(page generated 2021-10-19 23:00 UTC)