[HN Gopher] Difftastic: A diff that understands syntax
___________________________________________________________________
Difftastic: A diff that understands syntax
Author : tempodox
Score : 741 points
Date : 2022-03-29 11:38 UTC (11 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| jedisct1 wrote:
| No support for Zig :(
| Wilfred wrote:
| Difftastic has support for ~20 languages, and I'm happy to add
| more if there's a decent tree-sitter parser available :)
| yewenjie wrote:
| Does a `magit` plugin exist for Emacs users? The author of this
| package is also the author of a couple of popular Emacs packages
| but I did not see any mention of Emacs.
| Myrmornis wrote:
| It won't be able to form the basis of a magit plugin because it
| does not target traditional diff format.
| sytelus wrote:
| Is there a VSCode extension for this?
| neves wrote:
| Now I want a 3 way merge version :-)
| einpoklum wrote:
| The documentation says:
|
| > Difftastic output is intended for human consumption
|
| Why not separate the human-consumption part and the underlying
| parsing part? Or at least provide both in the same utility?
| Wilfred wrote:
| The underlying parser is just tree-sitter, which is a reusable
| (and excellent) parsing library.
|
| Difftastic then converts the tree-sitter parse tree to a
| simpler s-expression style format (see
| https://difftastic.wilfred.me.uk/parsing.html#simplified-
| syn...), and computes differences on that.
|
| I'm just trying to clarify that I'm not generating conventional
| 'unified diff' patches, so I can provide a nicer interface
| (e.g. line numbers).
| synergy20 wrote:
| I use meld and it seems syntax aware plus it can do merge with a
| click, how will difftastic diff in that regard?
| berkes wrote:
| I use meld too. But afaics, meld 'syntax aware' is very
| different from from difftastic.
|
| Meld takes a diff, and applies syntax highlighting over the
| diffed files. It additionally highlights the changed characters
| in a line. Git diff, vimdiff and probably others, do this as
| well.
|
| From the demo, I understand that Difftastic first applies
| syntax and then rebuilds the patch over that. Being aware of
| line wrapping, changes in nesting, moving codeblocks into
| functions and so on.
| challenger-derp wrote:
| First thing that came to mind is diffing python notebooks.
| gh02t wrote:
| Don't think this tool supports that, but there is
| https://nbdime.readthedocs.io/en/latest/
| cycomanic wrote:
| For Jupyter Notebooks I highly recommend trying out jupytext,
| which converts Notebooks on the fly to a number of formats. It
| really has been a game changer for working with git and
| Notebooks for me. I essentially never want to preserve state of
| the notebooks anyway so converting just makes sense. The best
| thing is it is completely transparent, i.e. it generates a
| notebook file when you open the other file and saves to the
| file ever time the notebook is saved. If you want to keep the
| state of the notebook you can always keep that file around as
| well.
| dmarinus wrote:
| Looks nice! Now I only need patchtasic :-)
| dotancohen wrote:
| Actually, the README addresses that! > Non-
| goals > Patching. Difftastic output is intended for human
| consumption, and it does > not generate patches that you
| can apply later. Use diff if you need a patch.
| taspeotis wrote:
| I paid and used SemanticMerge quite successfully when we had a
| complex Git workflow with lots of conflicts.
|
| https://semanticmerge.com/
|
| Since moving to short lived feature branches it is less useful to
| me.
| ziml77 wrote:
| I don't need SemanticMerge often, but when I do I'm incredibly
| thankful that I have it.
| Liquid_Fire wrote:
| SemanticMerge sounded interesting enough so I wanted to check
| it out, but to my surprise there is no Buy or Download link
| anywhere on the site. The only thing that might do it is a
| Login link, but I don't want to create an account just to see
| how much the thing costs. Is it only sold in bulk to companies?
| I find it bizarre that there isn't even a "contact sales"
| button.
| ziml77 wrote:
| That's incredibly annoying! They must have changed something
| about their pricing and sales model since the time that I had
| purchased it. I don't understand why companies think that's a
| good idea. I guess I can't recommend it anymore.
| Aeolun wrote:
| There is a 'sales' button at the bottom, but it's just a link
| to an email. I'm really not sure how they're even trying to
| sell this thing.
|
| Maybe they don't want to any more? And this is just their
| subtle way of pushing everyone interested in using it away?
| bifftastic wrote:
| I like the name.
| rednosehacker wrote:
| Any plan for the Scheme programming language ?
| Wilfred wrote:
| I'd like to add it, but I haven't found any good tree-sitter
| parsers for Scheme.
| emacsen wrote:
| This looks absolutely amazing.
|
| One thing I do find interesting (and a wish were different) is
| that only programming languages are supported, rather than data
| formats as well.
|
| For example, two JSON documents may be valid but formatted
| slightly differently, or a common task for me is comparing two
| YAML files.
|
| Comparing config files that have a well defined syntax and or can
| be abstracted into a tree (JSON, YAML, TOML, etc.) would be
| absolutely lovely, even and including (if possible) Markdown and
| its ilk.
| simonw wrote:
| I would naively expect that this problem is easiest to solve
| for languages like JSON that have an unambiguous way to be
| pretty printed.
| chockchocschoir wrote:
| Indeed. One could just do `diff $(jq . $fileOne) $(jq .
| $fileTwo)` and you'll end up with a "nice enough" diff even
| if $fileOne and $fileTwo were very differently formatted.
| lstamour wrote:
| The problem is when a file also needs to be normalized -
| e.g. object keys in a different order, YAML syntax
| expansion. It can be very useful to indicate when a JSON
| file is identical to another JSON file but some of the
| properties or array items are out of order and that
| requires more in-depth knowledge of the data format. Let's
| not mention that you could UTF-8 encode characters or write
| out the same character using backslash notation, numeric or
| boolean data that might be wrapped in a string in one file
| but not in another, etc. There can still be a lot of
| modelling and interpretation to consider when comparing
| data files rather than code files.
| chockchocschoir wrote:
| I'm not too familiar with YAML, so can't answer to that.
|
| But re JSON:
|
| > object keys in a different order
|
| They can't be "in a different order" as JSON keys are not
| ordered. They can be whatever order, and would still be
| considered the same.
|
| > array items are out of order
|
| Then it's different, as JSON arrays are ordered. ["a",
| "b"] is not the same as ["b", "a"] while {a: 1, b: 1} and
| {b: 1, a: 1} is the same.
|
| > you could UTF-8 encode characters or write out the same
| character using backslash notation, numeric or boolean
| data that might be wrapped in a string in one file but
| not in another
|
| Then again, they are different. If the data inside is
| different, it's different.
|
| I understand that logically, they are the same, but not
| syntax-wise, which is why I included the "differently
| formatted" "disclaimer", it wouldn't obviously understand
| that "one" and "1" is the same, but then again, should
| you? Depends on use case I'd say, hard to generalize.
| [deleted]
| stormbrew wrote:
| > They can't be "in a different order" as JSON keys are
| not ordered. They can be whatever order, and would still
| be considered the same.
|
| This is what GP is saying, I'm pretty sure. Object member
| order is non-semantic in json, so in order to do a
| semantic diff (one that understands structure), you need
| to canonicalize the order of the two sides. Simply
| diffing the output of jq doesn't do that, because (afaik)
| jq doesn't alter the order.
|
| Basically, if you want this to come up the same:
| {"a":"b","c":"d"} {"c":"d","a":"b"}
|
| you need more than just `diff $(jq) $(jq)`.
|
| Can argue about whether a tool like difftastic should do
| that, I guess, but I would personally lean towards that
| it should be smart enough to see this because it's
| precisely the sort of thing that both humans and line-
| based diff can be awful at seeing.
| fwip wrote:
| Just an FYI, jq has a flag to sort by the name of keys, I
| believe it's -k.
| stormbrew wrote:
| Fair enough! I should just never assume jq doesn't have a
| feature.
| autarch wrote:
| I wrote a tool that tidies JSON and can do things like
| re-orders keys in a fixed order -
| https://github.com/ActiveState/json-ordered-tidy
| Wilfred wrote:
| https://github.com/andreyvit/json-diff works really well for
| JSON diffing in my experience.
|
| It's more simplistic than difftastic though: it considers `1`
| and `[1]` to have nothing in common.
| [deleted]
| paxys wrote:
| This isn't going to add anything to existing diff tools for
| JSON or YAML though. Those formats barely have any syntax
| highlighting or complex structures.
| Wilfred wrote:
| JSON and CSS are supported today, and I'm interested in adding
| more structured text formats.
|
| If a format has a tree-sitter parser, it can be added to
| difftastic. The TOML tree-sitter parser looks good, but there
| isn't a mature markdown parser for tree-sitter. There are other
| markdown parsers available, so in principle difftastic could
| support markdown that way.
|
| The display logic might need a little tuning for prose-heavy
| formats like markdown though. I'm not happy with how difftastic
| handles block comments yet either.
|
| I'm not sure about formats that contain more prose, such as
| markdown or HTML.
| mark_and_sweep wrote:
| JSON is supported.
|
| HTML and XML are missing, too.
| emacsen wrote:
| You're right. I missed JSON.
|
| Sadly YAML, TOML and the others I mentioned are not there
| (yet?)
| softwarebeware wrote:
| There's always room for contributions!
| alxmrs wrote:
| Similarly, I would love it if Pandoc's AST were supported. Or,
| if this could be extended to compare any documents taking
| formatting into account, or document-to-document conversions.
| linsomniac wrote:
| I would love a great XML diff tool, and after seeing the demo
| of this I was sad to see XML not in there. Would pay for.
| d0gsg0w00f wrote:
| This is kind of like the problem of programmatically analyzing
| AWS IAM roles and policies to understand impact of changes.
| Very difficult to do in JSON format but worth tons of money to
| CISOs if it can be solved.
| LudwigNagasena wrote:
| Is there a good reason why diff tools generally don't use AST?
| skywal_l wrote:
| Performances is one I guess.
| db48x wrote:
| Also there are a lot of languages out there, each with their
| own special and unique syntaxes.
| danbruc wrote:
| Because it is much easier, you don't have to build and maintain
| parsers for hundreds of languages. And you don't need need just
| any parser, you need very robust ones that can deal with
| malformed files well. Or, if you only pick a small set of
| supported languages, your diff tool will not work on most files
| or have to fall back to a structure-agnostic algorithm. Also
| not all text files even follow any useful grammar at all.
|
| Finally, even if you have a syntax tree, that is just part of
| the solution, probably the smaller one. Detecting three lines
| of code wrapped in a new if statement is easy but also doesn't
| benefit much from a syntax-aware algorithm. But once you
| changes names and signatures, extract methods, introduce
| constants, and so on it will become progressively harder to
| match subtrees and one is probably quickly approaching the
| territory of NP-hard and undecidable problems.
| RyEgswuCsn wrote:
| > And you don't need need just any parser, you need very
| robust ones that can deal with malformed files well.
|
| I very much agree. I feel there has been a trend recently
| where people (re)discovered how cool and useful ASTs are and
| now expect everything be using them. I suspect old-school
| computer scientists might be secretly laughing at this while
| programming with some Lisp-like languages they invented for
| themselves.
|
| Jokes aside, I do wonder how modern IDEs manage to parse
| broken source code into usable ASTs --- is this trivial (CS
| theory-wise) or are there a lot of engineering secret sauce
| involved to make it work?
| danbruc wrote:
| With only basic knowledge in the domain I would assume it
| is hard and ugly. If the file is malformed, there is almost
| certainly an infinite number of possible edits to make the
| file adhere to the grammar, hence there can not be any
| algorithm that just provides the one and only correct
| syntax tree. This in turn means that you have to come up
| with heuristics that identify reasonable changes which fix
| the file and that is probably not easy. Also, if you do
| this online in an IDE, the problem becomes probably easier
| [1] - if you have a valid file and then make it invalid by
| deleting an operator in the middle of some expression, you
| can still essentially use the syntax tree from just before
| the deletion. If, on the other hand, you get a malformed
| file, you might have a harder time.
|
| [1] And also harder because if you want to parse the file
| after each key stroke, you have to be fast. This probably
| also makes incremental updates to the syntax tree the
| preferred solution and that might align well with using
| prior result for error recovery.
| jhgb wrote:
| "If the file is malformed, there is almost certainly an
| infinite number of possible edits to make the file adhere
| to the grammar, hence there can not be any algorithm that
| just provides the one and only correct syntax tree. This
| in turn means that you have to come up with heuristics
| that identify reasonable changes which fix the file and
| that is probably not easy."
|
| Don't we call such heuristics "test suites"?
| danbruc wrote:
| I don't understand that question. Given the following
| source file that does not parse var foo =
| bar baz
|
| there are many ways to change it and make it parse
| including the following reasonable ones
| var foo = barbaz var foo = "bar baz" var foo
| = { bar, baz } var foo = bar // baz var foo =
| bar //var foo = bar baz var foo = bar * baz
| var foo = bar + baz var foo = bar.baz var foo
| = bar(baz)
|
| but also unreasonable ones like var abc =
| 123
|
| and therefore a parser that can handle malformed inputs
| has to make educated guesses what the input was actually
| supposed to look like. And don't be fooled by this simple
| example, imagine a long source file with deeply nested
| code in a language with curly braces and randomly
| deleting some of the braces. Now try to figure out where
| classes, methods, if or try statements begin and end in
| order to produce a [partial] syntax tree better than just
| giving up at the position of the first error.
| jhgb wrote:
| My point was that test suites should give you a heuristic
| on what corrections are good and which are bad. A source
| code change that turns a test fail into a test pass
| should be considered an improvement.
| danbruc wrote:
| I am still lost. Test suite for what? We have a parser -
| binary, source code and maybe a test suite if the parser
| developers decided to write tests - and a random text
| file that we throw at the parser and for which the parser
| hopefully generates a useful syntax tree if the content
| is a well-formed or not too badly malformed program in a
| language the parser understands.
| jhgb wrote:
| What "test suite for the parser"? Of course a test suite
| for the faulty program you're trying to correct into a
| working one.
| danbruc wrote:
| So I can only use the diff tool to compare two non-
| compiling versions of a source file if I provide a test
| suite for that file to the diff tool? And how would you
| want to make use of the test suite? Before you can run
| the test suite, the source file must already parse and
| compile which is already more than a diff tool based on a
| syntax tree requires - it must be able to parse the
| source code but it doesn't have to compile. Passing the
| test suite requires even more, not only being able to
| parse and compile but also yield the correct behavior
| which the diff tool doesn't care about.
|
| And you actually jumped over the hard part that requires
| the heuristics, how to modify the input in order to make
| it parse. Take a 10 kB source file and delete 10 random
| characters - how will you figure out which characters to
| put back where? With 100 possible characters, 10,000
| positions to insert a character, and having to insert 10
| characters, you are looking at something like 10^60
| possible modifications. You are certainly not going to
| try them one after another, each time checking if the
| modified source file parses, compiles, and passes the
| test suite.
| jhgb wrote:
| > So I can only use the diff tool to compare two non-
| compiling versions of a source file if I provide a test
| suite for that file to the diff tool?
|
| Not sure what this whole straw man is about. I definitely
| didn't suggest anything like that. Of course you can only
| compare two _compiling_ versions of a source file using a
| test-suite-based heuristics. I thought this whole thing
| was about "heuristics that identify reasonable changes
| which fix the file" mentioned above? "Reasonable changes
| that DON'T fix the file" are clearly recognizable by NOT
| passing the test suite, just as if it was a human trying
| to make those changes and finding out that the change
| that he just did didn't in fact yield the desired results
| after running the test suite.
|
| > With 100 possible characters, 10,000 positions to
| insert a character, and having to insert 10 characters,
| you are looking at something like 10^60 possible
| modifications.
|
| If you're working with an AST, you're almost certainly
| not working with characters. That would be immensely
| wasteful. In fact working with an AST is pretty much the
| only way in which the set of changes is sufficiently
| reduced for almost any change to NOT be rejected
| outright. With character-level modifications, you're
| facing the problem that almost every edit will be
| outright rejected as early as at the stage of parsing.
| mekster wrote:
| > Because it is much easier, you don't have to build and
| maintain parsers for hundreds of languages.
|
| Seems there's a good open market for such a lazy reason.
| NateEag wrote:
| This tool is built on tree-sitter (https://tree-
| sitter.github.io/tree-sitter/), so presumably it doesn't need
| to maintain parsers at all.
|
| I've thought before this is how diffing should be done, and
| speculated that tree-sitter would make it more feasible.
|
| At this point, whenever I think some language-aware tool
| ought to exist, my first thought is "Does the language server
| protocol or tree-sitter make this more feasible?"
| danbruc wrote:
| Someone still has to build and maintain the parsers, you
| are just outsourcing this. And I added a bit to my comment,
| I tend to believe that parsing is the easy part, but that
| is admittedly more a gut feeling and not based on any real
| knowledge of that problem space.
| NateEag wrote:
| That's certainly a good point.
|
| Languages usually change slowly, though, so once a good
| baseline grammar is in place, maintenance is unlikely to
| be a huge load.
|
| Furthermore, with tools like tree-sitter and the language
| server protocol, multiple communities benefit from their
| continued existence, so there's a bigger pool of
| contributors to the parser.
| nanochad wrote:
| Wilfred wrote:
| It's really hard! :)
|
| (1) Parsing an arbitrary language is hard. Without tree-sitter,
| difftastic would probably be a lisp-only tool. You also want a
| parser that preserves comments.
|
| (2) Inputs may not be syntactically well formed.
|
| (3) Efficiently comparing trees is extremely difficult
| (difftastic is O(N^2) in time and memory).
|
| (4) Displaying tree diffs is equally difficult. Alignment is
| particularly challenging when your 'unchanged before' and
| 'unchanged after' are syntactically the same, but textually
| different.
| aasasd wrote:
| Personally I long for a syntactic merge-tool. Every time
| Syncthing hiccups for some reason, I'm up for a merge session
| with my Org-mode files, in the vein of: 'These properties look
| just like those ones, only with a different timestamp... Oh
| lookie, and the heading is totally changed. Let me merge this new
| heading all over the old one, and then pop in the old one after
| it.' Dammit, it's just a whole new heading added with properties.
| This happens with every language heavy on markup.
|
| However, I'm not sure if Org markup lends itself to structuring
| that would allow proper diffing--even with just the headings.
| teknopaul wrote:
| Be good to have different git merge strategies per file type.
|
| e.g. A merge that knows properties files support the same
| property added in different places but only once is needed. And
| another strategy if order is significant.
|
| Cool to have an HTML merge that recognises the tree structure
| and supports merging tags and having the indentation follow
| some rules.
|
| I believe git supports merge strategies, its been on my todo
| list forever.
| loxias wrote:
| This looks really cool and I can't wait to try it, tho... a bit
| of a PITA to get running. ;) Took a while to figure out how to
| build, and had to install 400MB of dependencies first....
|
| Edit: And after installing cargo, watching it fail to build, then
| determining I must need a _newer_ version of cargo, so I built
| that from source... it fails. Apparently I need to install
| `rustc-mozilla` and not `rustc`. "obviously".
|
| This is all a testament to how much I want to try this tool...
|
| MOAR EDIT: even with rustc-mozilla cargo fails to build. running
| `cargo install difftastic` gives me an error about my version of
| cargo being too old ;.;
|
| Dear author: Let us run your tool.
| Wilfred wrote:
| The getting started section of the manual should help:
| https://difftastic.wilfred.me.uk/getting_started.html
|
| I've documented the minimum rust version required today,
| although I'm looking at lowering the minimum version.
| gkfasdfasdf wrote:
| Using ubuntu 20.04, I first installed cargo:
| curl https://sh.rustup.rs -sSf | sh
|
| Restart shell to get $HOME/.cargo/bin in PATH, then did:
| cargo install difftastic
|
| And ~4 minutes later, difft executable is ready.
|
| Agree though that some pre-built binaries would be fantastic!
| loxias wrote:
| Ah, well, if you're willing to accept having a frankensystem
| with a mix of packaged and unpackaged software, sure. ;) I
| used to do that, back in Slackware days.
|
| It's considered really sloppy and unmaintainable to admin a
| system like that. Things quickly get out of hand.
|
| That strategy _does_ work if you isolate it to a chroot or a
| container, but littering /usr/local with all sorts of locally
| compiled upstream is just asking for future pain. Security
| updates, library incompatibilities, &c.
|
| Prebuilt binaries might be nice, but I don't expect them for
| random projects. (and I wouldn't have used them if offered) I
| _do_ think it 's a reasonable expectation to be able to build
| software w/o essentially setting up a new userland just for
| that tool though. :)
| gkfasdfasdf wrote:
| The method I posted above doesn't write anything to
| /usr/local. Root isn't required. Everything is written
| under ~.
| loxias wrote:
| Whoa really?
|
| I'm sorry, and retract my ignorant assumption! Going to
| try it out now.
| Wilfred wrote:
| There are a few packages available, e.g.
| https://aur.archlinux.org/packages/difftastic and
| https://pkgsrc.se/wip/difftastic.
|
| I've also had requests from Alpine Linux packagers to
| allow dynamic linking to parsers. This is something I
| want to support in future, once I'm happy with the basic
| diffing logic.
| jeremyjh wrote:
| I agree it leads to problems but isn't the entire purpose
| of `/usr/local` to be a dumping ground for locally
| administered (unpackaged) programs?
| YetAnotherNick wrote:
| Used `cargo install difftastic`? Finished in a minute for me.
| lopatin wrote:
| Build errors for me. Apparently I'm on some nightly build of
| cargo, but I need 2021 version. The pain begins...
|
| Edit: Reinstalling Cargo worked!
| skywal_l wrote:
| With rustup, it's pretty easy to update/change your cargo
| version.
| loxias wrote:
| How did you do it? When I tried to rebuild cargo I got
| build errors. I'm starting to suspect the only way to run
| this tool is make a chroot tracking sid or something....
| lopatin wrote:
| I just followed the installation instructions here:
| https://doc.rust-lang.org/cargo/getting-
| started/installation...
|
| It'll confirm that you want to install it, because it's
| already installed I think, and I just selected 1. for
| Yes.
| loxias wrote:
| > curl https://sh.rustup.rs -sSf | sh
|
| hard pass :)
| a_passable_dev wrote:
| Out of curiosity, what would be an acceptable way for the
| developers to provide a quick way for users to get up and
| running?
|
| A get started guide with all the required commands easily
| copy-pastable? (A popular option these days) Something
| else?
|
| I don't mean to be critical, I'm simply curious.
| adwn wrote:
| > _hard pass_
|
| Why? You're willing to run some random open source
| project, but you're not willing to run the official Rust
| installation script?
| loxias wrote:
| Sure, but first I had to figure out wtf "cargo" is. :P
|
| Also, `cargo install difftastic` AIUI pulls it from a central
| location, if I'm gonna poke at software for the first time, I
| enjoy building it myself first, so I can get my hands dirty
| in the source. :)
|
| EDIT: Also, the build fails. :(
|
| "error: unexpected token: `include_str` --> /home/loxias/.car
| go/registry/src/github.com-1ecc6299db9ec823/radix-
| heap-0.4.2/src/lib.rs:2:10 | 2 | #![doc =
| include_str!("../README.md")] | ^^^^^^^^^^^
|
| error: aborting due to previous error
|
| error: could not compile `radix-heap`.
|
| _sad trombone_
| Wilfred wrote:
| This looks like you're using a version of Rust older than
| the minimum required (1.56).
| vlunkr wrote:
| A huge part of the appeal of Rust and Go tools is that you can
| just ship a binary, it's frustrating that it's not available
| here.
| ducktective wrote:
| Same here. Looked into repo -> no binary in release or Github
| actions
|
| spinned up a Ubuntu 18.04 instance -> git clone, git checkout
| 0.24.0
|
| installed rust using curl | sh method
|
| build fails:
|
| https://termbin.com/29xy
|
| removed the instance and gonna check it again 6 months later
| adwn wrote:
| In another comment you're asking about vim support. So let me
| get this straight: You're using vim, yet you're unable to
| resolve the error message = note:
| /usr/bin/ld: cannot find Scrt1.o: No such file or directory
| /usr/bin/ld: cannot find crti.o: No such file or directory
|
| Have you tried googling for "ubuntu crti.o: No such file or
| directory" ?
| joemi wrote:
| Using vim has nothing to do with ones ability to
| troubleshoot compiler/ubuntu issues. Plus both compiler and
| ubuntu issues can be massive PITA to solve even if you're
| familiar with them. Personally, if I'm trying to install
| something on whim to try it out and I start getting "no
| such file or directory" errors I'd be upset that something
| is going wrong.
| ducktective wrote:
| >Have you tried googling for "ubuntu crti.o: No such file
| or directory" ?
|
| Depending on the project, there is a certain threshold of
| trying-to-make-something-work which I'm willing to
| undertake in order to test an app.
|
| But you are right. I'm sorry if my OG comment may come
| arrogant to the devs who do stuff for free. ( to the devs)
|
| [edit]: ok, I tried again, `sudo apt update && sudo apt
| install build-essential` before installing rust and `cargo
| install`ing.
|
| Error again:
|
| https://dpaste.com/FTG7FSRQF
| estebank wrote:
| Funnily enough, the error is in a C dependency providing
| Haskell support. vendor/tree-sitter-
| haskell-src/scanner.cc
| goombacloud wrote:
| For easy git usage I created these two scripts in my PATH instead
| of using using git config:
|
| git-difft: #!/bin/sh
| GIT_EXTERNAL_DIFF=difft git diff "$@"
|
| git-showt: #!/bin/sh
| GIT_EXTERNAL_DIFF=difft git show --ext-diff "$@"
|
| Then you can run "git difft ..." or "git showt ..." if you want
| to use it.
| buu700 wrote:
| For everyone wondering, it looks like this will work with git
| diff: https://difftastic.wilfred.me.uk/git.html.
| Starcrunch wrote:
| Exactly what I was looking for. Thanks!
| pvg wrote:
| A previous discussion from 8 months ago, with some comments by
| the author and authors of other diff tools:
|
| https://news.ycombinator.com/item?id=27768861
| dboreham wrote:
| Finally.
| 29athrowaway wrote:
| Today in generation Z rediscovers things: semantic patching.
|
| https://en.wikipedia.org/wiki/Coccinelle_(software)
| vcmiraldo wrote:
| I really like the idea of focusing on producing patches for human
| consumption. I studied the problem of merging AST-level patches
| during my PhD (https://github.com/VictorCMiraldo/hdiff) and can
| confirm: not simple! :)
| narush wrote:
| Can you give a little color on where the difficulties lie? Is
| it an efficiency question, or is determining "which changes"
| hard in the first place?
| scythmic_waves wrote:
| Not OP, but the docs call out some "Tricky Cases" [1].
|
| [1] https://difftastic.wilfred.me.uk/tricky_cases.html
| teeray wrote:
| I'd imagine there's some challenging judgement calls that
| such a tool would have to make. Like, in Go, you can
| reorder the members of a struct definition. In many cases
| this is just diff noise to reviewers. HOWEVER, it does
| impact the layout of the struct in memory, so it can be
| semantically meaningful in performance work.
| gmfawcett wrote:
| A wild nitpicker appears. I understand where you're
| coming from & why this matters. But Go, the language
| spec, doesn't make any guarantees about struct layout at
| all. A layout difference may be meaningful, practically,
| but it's potentially unreliable.
|
| e.g. see https://groups.google.com/g/golang-
| nuts/c/1BlZDNBLiAM
|
| Having said that: if a Go compiler for a given
| architecture decided to change its layout algorithm, I'm
| pretty sure it would earn a changelog entry.
| munk-a wrote:
| PHP long stated that associative array sorting order was
| unstable and not guaranteed (especially when the union
| (+) operator or array_merge function were involved) -
| that doesn't mean ten bazillion websites wouldn't
| instantly break if they ever actually changed the
| ordering to be unpredictable.
|
| Language designers need to content with the fact that the
| ultimate final say in whether a thing is or not is
| whether that behavior is observed.
| zukzuk wrote:
| I wrote a masters thesis about the more general problem
| here (https://tspace.library.utoronto.ca/bitstream/1807/6
| 5616/11/Z...).
|
| The tl;dr is that there's an almost infinite number of
| ways to atomize/conceptualize code into meaningful
| "units" (to "register" it, in my supervisor's words), and
| the most appropriate way to do that is largely
| perspectival -- it depends on what you care about after
| the fact, and there is no single maximal way to do it up
| front.
| vanderZwan wrote:
| Early in the linked thesis there is a one-page argument about
| the shortcomings of traditional approaches, which technically
| isn't what you asked but might still answer the side of the
| question that deals with human usage at least:
|
| https://victorcmiraldo.github.io/data/MiraldoPhD.pdf#page=24
| bool3max wrote:
| Should've named that repo "phdiff".
| pdimitar wrote:
| Best pun I've heard in a long time. Well done. <3
| Groxx wrote:
| I'll vote for "diphph"
| wst_ wrote:
| It's tangential but it reminded me of "lighght" poem by
| Aram Saroyan. https://en.wikipedia.org/wiki/Aram_Saroyan#Mi
| nimalism_and_co...
| munk-a wrote:
| To be pronounced "Doctor-iff" in speech?
| einpoklum wrote:
| So I looked at the paper and it seems interesting. Basic idea:
| Instead of the operations to consider being "insert", "delete"
| and "copy", one adds "reorder" "contract subtree" and
| "duplicate" (although I didn't quite get the subtlety of copy
| vs duplicate on a short skim); and even though extra ops
| increase the search space, they actually let you search more
| effectively. I can buy that argument.
|
| The practical problem, though, is that the Haskell compiler is
| limited/buggy, so you couldn't implement this for C, and you
| settled on a small language like Lua. If you _do_ extend this
| to other languages (perhaps port your implementation from
| Haskell to something else?), please post it on HN and
| elsewhere!
| arianvanp wrote:
| Some of the GHC performance bugs that we ran into during the
| research have been fixed as far as I know! Though I'd have to
| double-check
| pluc wrote:
| good idea, but so dangerous though
| mkdirp wrote:
| Why?
| pluc wrote:
| because it's an automated piece of software making decisions
| about what is an "equal diff" and what is a "difference diff"
| because a diff no longer means just a change, it now has to
| be a _meaningful enough change_.. If you removed something
| like `if (true)` or whatever, that 's _still_ a diff that
| could have some importance and /or unknown consequences. I
| appreciate the value, but the fact that it allows refactoring
| to be a non-diff would worry me in the long run I think.
| Wilfred wrote:
| Difftastic is only ignoring whitespace that isn't
| significant. If you remove `if (true)`, it will get
| highlighted.
|
| With a textual diff today, your only choices are 'highlight
| all whitespace changes' (e.g the git default) or 'ignore
| all whitespace' (e.g. diff --word-diff).
|
| If difftastic says there are no changes, then both files
| have the same parse tree and the same comments.
| gwbas1c wrote:
| Now lets get a WASM build into Github. :)
| thefaux wrote:
| The nine year old inside me can't unsee the unfortunate choice of
| names used in the basic example :)
| AlexAndScripts wrote:
| I would love it if version control stored an AST that also
| includes comments and dividers (where right now we would leave an
| empty line) and dev machines rendered it out however they wanted.
| They could even change the language of keywords in addition to
| normal formatting.
| pie_flavor wrote:
| This exact project is called JetBrains MPS.
| adolph wrote:
| MPS seems to be a DSL authoring tool. How would this be used
| to make an AST diff tool?
|
| https://www.jetbrains.com/mps/
|
| https://en.wikipedia.org/wiki/Abstract_syntax_tree
| glenjamin wrote:
| To do this requires some standard way of encoding an AST which
| includes comments and dividers.
|
| That standard format is commonly known as source code -
| although it lacks a normal form.
|
| Tools like prettier, gofmt and black can be thought of as a way
| to produce a normal form of source code.
|
| This is (IMO) a reasonable incremental approach towards exactly
| what you describe - if a project checks in only source code
| that's formatted using a standardised format, then you're free
| to work on it using whatever equivalent representation you like
| - as long as it's converted back at commit-time.
| Wilfred wrote:
| FWIW VCS for Smalltalk basically does this.
|
| The challenge for a tool like difftastic is that I can't
| guarantee that syntax is well-formed. You might be using new
| syntax that my parser doesn't support, you might have merge
| conflicts, or you might have a plain syntax error in your code.
|
| Tree-sitter handles parse errors gracefully, so difftastic
| handles syntax errors pretty well in my experience.
| tluyben2 wrote:
| Yep, I posted this idea on Reddit recently and people said they
| need a formatted syntax because of diff and version control; we
| do not; get the ast, reformat in the editor as the particular
| user fancies and generate diff and version control artefacts
| also as a particular user sees fit. Our computers are very fast
| so you can make a lot more different views on your code than we
| have now by using the ast instead of text and regexps.
| hyperpallium2 wrote:
| BTW IIRC The tree version of levenshtein distance has (proven)
| terrible complexity. But so does lcs, and diff itself performs
| great in practice so maybe...
| racl101 wrote:
| Ok I really gotta try this.
| pabs3 wrote:
| A related thing is cregit, which does diffs of tokens:
|
| https://github.com/cregit/cregit https://lwn.net/Articles/698425/
| Wilfred wrote:
| Ooh, I'd not seen this and I've seen a bunch of diff tools at
| this point! Thanks for sharing.
| ducktective wrote:
| So how can one use this in vim?
| sanity31415 wrote:
| Unfortunately it's closed source, but
| https://www.semanticmerge.com/ has been around for a few years
| and works similarly, but can also merge.
| oauea wrote:
| I just spent a few minutes on that site and I can't even figure
| out how to try it out, or their pricing, or anything other than
| some very superficial docs, really.
|
| Is this just a pretty website, or is the software actually
| available anywhere?
| Pet_Ant wrote:
| That pages is just the technology primer. The tools are XDiff
| & XMerge:
|
| https://www.plasticscm.com/pricing
|
| Looks like no locally-run-binary/non-SaaS version. I was
| hoping it'd have SublimeText like model. I have no interest
| in trying to get my team to switch nor having to deal with
| the security team when it turns out I was using a free cloud
| account.
| db48x wrote:
| This is written by the same guy who wrote Helpful, an enhancement
| package for the Emacs Help buffer. I highly recommend checking
| out Helpful if you haven't seen it.
| https://github.com/Wilfred/helpful
| maw wrote:
| He wrote https://github.com/Wilfred/deadgrep too. It's awesome
| and I don't know how I lived without it for so long.
| CodeIsTheEnd wrote:
| EDIT: Wilfred IS the original author [3]; my apologies.
|
| Not to discredit Wilfred (it looks like he's taken over the
| project as the maintainer), but, based on the historical
| contributions [1], it looks like it was originally developed by
| Max Brunsfeld, who also created Tree-sitter. [2]
|
| [1]: https://github.com/Wilfred/difftastic/graphs/contributors
|
| [2]: https://github.com/tree-sitter/tree-sitter
|
| [3]:
| https://github.com/Wilfred/difftastic/commit/958033924a2dea7...
| arxanas wrote:
| I think the contributor graph is misleading, and that he's
| using git-subtree to vendor tree-sitter, which makes it look
| like others have contributed more to the project.
| CodeIsTheEnd wrote:
| Oops, I think you're right! Thank you for pointing that
| out.
|
| My apologies to Wilfred.
| disgruntledphd2 wrote:
| Helfpul is (pun fully intended) so very, very helpful.
|
| Honestly, I cannot imagine going back to the standard emacs
| help.
| db48x wrote:
| Agreed. It's so good it feels like it should have been that
| way all along. For example, when you view the help for a
| function Emacs has always given you a link to the source code
| where that function is defined. Helpful shows you the source
| code right in the Help buffer, and shows you a list of
| callers, and gives you buttons that enable tracing or
| debugging for the function.
|
| Once I discovered Helpful, all of those things seemed so
| obviously useful that I can't understand why nobody else
| thought to put them there, including myself.
| disgruntledphd2 wrote:
| The best part is the forget function, for when functions
| are incompatible. As an example, lsp won't work for me
| unless I forget the project-root function from ess-r (I
| have no idea why this hasn't been fixed) and helpful makes
| this a two or three key activity.
| einpoklum wrote:
| Checked out the repository.
|
| Build instructions? Nope.
|
| Minimum system requirements? Nope. But if you check out
| cargo.toml, you'll see it says it needs Rust 1.56.
|
| My system has 1.48.0 . And it the latest Debian release! I don't
| see how a diff tool can expect you to have a bleeding-edge
| development environment. I mean, ok, you chose a new language - I
| can understand that; I won't demand that it build with just a C
| compiler and Make. But come on, this is not supposed to be just a
| toy for new systems.
|
| Anyway, I still cloned it, tried to build with "cargo build", and
| got stuck with:
|
| error: unexpected token: `include_str`
|
| it couldn't even tell me "get Rust 1.56" :-(
| tzahifadida wrote:
| how do i install on macbook to try? Can you give some
| instructions in the getting started?
| ryanianian wrote:
| brew install rust cargo install difftastic
|
| Worked for me without any problems.
| sebdufbeau wrote:
| From https://difftastic.wilfred.me.uk/getting_started.html,
| it's installed via Cargo, so if you already have Cargo
| installed its straightforward, otherwise you can install it via
| https://doc.rust-lang.org/cargo/getting-started/installation...
| yboris wrote:
| My favorite dev tool is _diff2html_ - a CLI that opens up your
| browser with a rich diff. Pro tip: alias `diff` to the command so
| you can launch it quickly ;)
|
| https://diff2html.xyz/
| Aicy wrote:
| Looks really cool, but there was no instructions on how to
| install it.
|
| I would recommend putting an installation guide in your readme,
| and it being a full installation guide.
|
| I followed the link to your manual and then it told me to install
| your tool using a tool called "cargo" with no reference on how to
| install cargo. At this point I gave up. Lazy, maybe, but for a
| convenience tool like this I want a convenient installation.
| conradludgate wrote:
| Cargo is Rust's build tool/package manager and can be installed
| easily using rustup. But I would probably suggest the
| difftastic maintainers add some prebuilt binaries to the
| releases
|
| (I have an example workflow here if anyone from there is
| interested https://github.com/conradludgate/wordle/blob/main/.g
| ithub/wo...)
| jwilk wrote:
| What's rustup and how do I install it?
| asicsp wrote:
| See https://rustup.rs/
| gkfasdfasdf wrote:
| This method worked for me. No root required.
| https://news.ycombinator.com/item?id=30842720
| loxias wrote:
| I think it's wonderful that there's an explosion of new
| exciting languages, it can only improve the quality of all our
| tools. I for one am looking forward to replacing my eons of
| MATLAB experience with Julia.
|
| But I wish there was more of a convention in the F/OSS
| community that if your software isn't written in something
| _universal_ (C, C++, shell and _maybe_ python), then it also
| comes with a container of all that 's necessary to run it.
|
| It's frustrating to pollute my nicely packaged managed system
| with hundreds of locally installed python modules just to run
| one tool. Or, in this case, backport and rebuild a _language
| specific build tool_ simply to compile. :)
| andai wrote:
| >shell
|
| >universal
|
| * laughs in Windows, then cries *
| loxias wrote:
| I used to straddle the two worlds, maintained and supported
| a multi-site AD domain _with_ AFS integration for user
| $HOME and some sort of unholy LDAP /kerberos bridge for
| login. About once every year or two I'll miss something
| about the way Windows does things, compared to normal
| (meaning "linux"). Like the NTFS permissions model, that's
| cool.
|
| But it's just once a year :) And the last time I was deep
| in windows was win7, whenever that was. I tried to use a
| win10 machine and gave up.
|
| Besides, I thought the big new feature in modern windows
| was that WSL improved to the point you can run unix tools!
| ;)
| Spivak wrote:
| > and some sort of unholy LDAP/kerberos bridge for login
|
| It's really not that bad, the AD-IPA cross-forest trust
| is really solid as is the native sssd-ad integration if
| IPA is too much. Honestly I can't really imagine it any
| other way now, so much work has been put into AD support
| that it's actually the best login experience on Linux at
| the moment. OpenLDAP is definitely showing its age --
| dgmr I use it for all my personal infra because it's free
| and my use-cases are dead simple but we got to delete so
| much bespoke code after migrating off it at work.
| Wilfred wrote:
| FWIW I've had reports of people using difftastic on Windows
| successfully.
| simonw wrote:
| Have you used pipx? I really like it for installing Python
| tools because it automatically creates a virtual environment
| for them so that their dependencies don't affect anything
| else.
|
| https://pypa.github.io/pipx/
| loxias wrote:
| I agree with all your points.
|
| Only diff is I got to the point where it said I needed "cargo",
| On a whim, I typed "aptitude install cargo", and it did
| something. Now waiting for the >1GB source repo to clone to see
| if it works.... ;)
| childintime wrote:
| Looks like you need to install the Rust programming language
| and compile it. It worked for me. Not sure if I like the
| installation method. It seems the executable is portable
| though.
| fortran77 wrote:
| It supports Elixir and C#! Too bad it doesn't do Erlang and F#
|
| It looks very handy though. I still do a lot of C and C++
| rom1v wrote:
| It might be useful for reviewing merge/pull requests. But is
| there a way to display the diff "interleaved" instead of
| 2-columns side-by-side? (when executing `GIT_EXTERNAL_DIFF=difft
| git log -p --ext-diff` for example)
| Wilfred wrote:
| There's a basic single-column 'inline' display available if you
| do `INLINE=y`, but it's not as mature as the side-by-side
| display yet.
| DarkPlayer wrote:
| We are working on a code review tool which supports unified
| diffs with semantic diffing. If that sounds interesting for
| you, take a look at https://mergeboard.com
| foreigner wrote:
| I LOL'ed at the first page of the manual: "When it works, it's
| fantastic."
| Pet_Ant wrote:
| I was interested in SemanticMerge/XMerge but when I looked they
| didn't have a Mac clinet and now it looks like they don't have a
| personal edition. I just want to buy a private license and use it
| locally. https://semanticmerge.com
| password4321 wrote:
| They are requesting feedback on the pricing model for the
| latest revision of the technology, maybe HN could change their
| minds:
|
| https://www.gmaster.io/pricing
| Pet_Ant wrote:
| OS X and Linux are "wait & see" again. That describes half of
| our dev team and most of the seniors.
| kid64 wrote:
| This is great. I previously used Code Compare by Devart for this
| purpose, but it has been abandoned without support for modern
| IDEs.
| cjohansson wrote:
| If you have consistent code style and formatting this tool is
| unnecessary. I think that solution is better, you get a more
| consistent code base that is easier to read for humans. (Also
| diffs will be faster to compute)
| mcculley wrote:
| Even if you are consistent, having unchanged indented text show
| up differently is very clever. I often end up reviewing a diff
| that moves a basic block into a conditional branch and have to
| scan each line to see if it changed.
| jlokier wrote:
| If you're using a language that doesn't depend on indentation
| (C, Java, Go, Rust etc), try "diff -b" or "git diff -b".
|
| The indented basic block won't show as a difference, only the
| start and end of the block.
| hu3 wrote:
| interesting. Is -b equivalent to -Xignore-all-space in git?
| kortex wrote:
| I run all python through `black` and `isort`; this is still a
| huge step up in my book in terms of readability and ergonomics
| compared to the standard `git diff` or gnu `diff`.
| mkdirp wrote:
| > If you have consistent code style and formatting this tool is
| unnecessary
|
| I disagree. I struggle to replicate it right now using a simple
| test, but I've seen the following rather infuriating and
| counter intuitive behaviour from Git/GNU diff. If you have a
| simple if statement such as: if (bla) {
| // do something }
|
| And you were to add another statement at the end, after the
| closing curly brace, e.g.: if (bla) {
| // do something } if (bla2) { //
| do something else }
|
| Git/GNU diff will sometimes show the following diff:
| diff --git 1/left 2/right index c2ea6f1..dc0e1c2 100644
| --- 1/left +++ 2/right @@ -1,3 +1,6 @@
| if (bla) { // do something +} +if
| (bla2) { + // do something else }
|
| This is basic example, but there's other similar things. For a
| simple change like the above, this isn't a huge issue, but for
| a bigger patch sets, it can take a minute to understand what is
| really going on.
| lolc wrote:
| Right, I frequently get angry at just how dumb diff really
| is. How it's greedy and can't recognize the best seams
| between blocks of code. But then when I think of simple rules
| that would improve the results, I see how they would lead to
| other problems in other places. So using syntax seems
| necessary.
| hoseja wrote:
| There _is_ an option [0] to use non-default but still built-
| in git diff algorithms that might yield better results.
|
| [0] https://git-scm.com/docs/git-diff#Documentation/git-
| diff.txt...
| NateEag wrote:
| I've used a few of the different git diff algorithms and
| still have had problems like these.
| cyberge99 wrote:
| Nice tool! I've used icdiff for this in the terminal, but I'll
| see how this performs in my workflow.
|
| Since I use VSCode as my editor, I created this oneliner in my
| .bash_profile:
|
| # VS Code Diff
|
| diffcb () { "/usr/local/bin/code" -n --diff $@ > /dev/null 2>&1 ;
| }
|
| With it, I can "diffcb filename1.json filename2.json" to get a
| visual editor with contextual awareness based on installed lint
| modules.
| db48x wrote:
| Yes.
| rcthompson wrote:
| What does it do for unsupported languages? Just fall back to
| "regular" diff?
| Wilfred wrote:
| Yep! It does a conventional textual diff: run Myers' diff
| algorithm on lines, then word highlighting on changed lines.
| rcthompson wrote:
| I wonder if it would be possible to do this in a one-column
| format. That would make it more useful in a lot of contexts where
| a super wide view isn't practical.
___________________________________________________________________
(page generated 2022-03-29 23:00 UTC)