[HN Gopher] Difftastic, a structural diff tool that understands ...
       ___________________________________________________________________
        
       Difftastic, a structural diff tool that understands syntax
        
       Author : jiripospisil
       Score  : 1092 points
       Date   : 2024-03-21 13:42 UTC (1 days ago)
        
 (HTM) web link (difftastic.wilfred.me.uk)
 (TXT) w3m dump (difftastic.wilfred.me.uk)
        
       | bloopernova wrote:
       | Related, updating difftastic and friends if you installed via
       | cargo:                 cargo install cargo-update       cargo
       | install-update --list       cargo install-update --all
       | 
       | Other fun Rust projects available via cargo:
       | 
       | https://mise.jdx.dev/ mise-en-place, a drop-in replacement for
       | asdf https://asdf-vm.com/ that is really fast and flexible.
       | 
       | https://github.com/ajeetdsouza/zoxide is a fantastic cd
       | replacement, which stores where you cd to, and you can then do a
       | partial match like "z hel" might take you to
       | "~/projects/helloworld".
       | 
       | https://github.com/bootandy/dust is a compliment to "du", shows
       | which directories are using the most disk space.
        
         | kstrauser wrote:
         | I love zoxide! Also for your list: lsd, a prettier ls.
        
           | bloopernova wrote:
           | so... many... colours!
           | 
           | Looks great, thank you for the recommendation.
        
         | IshKebab wrote:
         | ncdu is the best du replacement by far.
        
           | polygamous_bat wrote:
           | I've always used dust as a replacement, and so I am curious
           | to know if you have tried both tools: do you have thoughts on
           | what makes ncdu better?
        
             | IshKebab wrote:
             | Dust is probably the best you can get without
             | interactivity, so it's good for logs.
             | 
             | But ncdu is a fully interactive file browser that lets you
             | navigate through the tree, and crucially it lets you delete
             | things without requiring a full rescan. It's amazing for
             | freeing up disk space by deleting things you don't need
             | anymore, which is probably 95% of the reasons I run `du`.
        
         | qmmmur wrote:
         | Wow, I installed mise-en-place. It's exactly what I wanted asdf
         | to be.
        
           | bloopernova wrote:
           | It's so much faster than asdf, the dev did a really great
           | job.
        
             | drewbitt wrote:
             | The rename to mise broke my workflow for a bit and the
             | library is changing fast & adding new features frequently.
             | That's good, but I hope it can stabilize a bit, so things
             | don't get deprecated again. Still love it.
        
               | bloopernova wrote:
               | Yeah, the name "rtx" was not easy to search for. Mise is
               | a much nicer name too.
               | 
               | I am currently using it with direnv, but there's enough
               | functionality in mise to replace that too. I keep meaning
               | to spend some time making mise work without direnv, but
               | it's not been urgent since everything pretty much just
               | works now.
               | 
               | I like the active development since the dev really seems
               | to care about doing a great job, and I've been lucky
               | enough so far that my (simple) workflows haven't been
               | impacted.
        
               | prh8 wrote:
               | Do you know if there's a way to do per directory aliases
               | with Mise? Every time I see a tool for directory
               | environment, that's the one feature in really hoping for
        
               | bloopernova wrote:
               | direnv + mise does exactly that. When I cd to various
               | directories I get different env vars, it's pretty neat.
               | Setting aliases would just be a case of adding them.
               | 
               | https://github.com/jdx/mise/discussions/1525 for an
               | example of how I use direnv with mise.
               | 
               | https://mise.jdx.dev/direnv.html
               | 
               | https://mise.jdx.dev/templates.html
        
         | arlort wrote:
         | Another three very neat ones are
         | 
         | - https://github.com/eza-community/eza (ls with some added
         | visual sugar)
         | 
         | - https://github.com/ClementTsang/bottom (htop but with graphs)
         | 
         | - https://github.com/sharkdp/bat (cat with syntax highlight)
        
         | satvikpendem wrote:
         | Also, cargo-binstall (cargo binary install) which allows you to
         | not have to compile every single time you cargo install and
         | instead allows you to just install the binaries for a specific
         | program. It also integrates with cargo install-update.
        
         | tomatao wrote:
         | How do these compare to https://github.com/moonrepo/proto ?
        
           | bloopernova wrote:
           | I haven't used proto so unfortunately I can't answer your
           | question, sorry about that.
        
         | letmeinhere wrote:
         | My favorite of the new `du`s is dua-cli, an ncdu clone (an
         | interactive TUI). I went hunting because the latter didn't have
         | a light-mode.
        
       | sanity wrote:
       | Interesting, I found Semantic Merge [1] years ago but it was
       | never open source.
       | 
       | This just does diff but not merge, but at least it's open source
       | - and the diffs look a _lot_ nicer, I 've already made it my
       | default.
       | 
       | Any plans to extend it to merging?
       | 
       | [1] https://docs.plasticscm.com/semanticmerge
        
         | rideontime wrote:
         | Was going to suggest this myself, this was a godsend when I was
         | working with a big team on a C# project going through a messy
         | refactor.
        
         | OJFord wrote:
         | > Any plans to extend it to merging?
         | 
         | The GitHub readme:
         | 
         | > Can difftastic do merges?
         | 
         | > No. AST merging is a hard problem that difftastic does not
         | address.
         | 
         | > AST diffing is a also lossy process from the perspective of a
         | text diff. Difftastic will ignore whitespace that isn't
         | syntactically significant, but merging requires tracking
         | whitespace.
        
       | mlavrent wrote:
       | I'm almost not sure why tools like git don't ship with this as
       | default. Been using difft for about a year now, and my main
       | complaint is that it makes it hard to go back and use other diff
       | tools when I don't have difft available :).
       | 
       | I am curious if there's been any work on _semantic_ diff tools as
       | well (for when eg the syntax changes but the meaning is the
       | same). It seems like an intractable problem in the general but
       | maybe it's doable and/or useful for smaller DSLs or subsets of
       | some languages?
        
         | ruined wrote:
         | >I am curious if there's been any work on _semantic_ diff tools
         | as well (for when eg the syntax changes but the meaning is the
         | same).
         | 
         | if you do this your difftool becomes a compiler
        
           | hobs wrote:
           | That's exactly what I have done with diffing SQL in lazy mode
           | - just use a server and diff the AST/plan.
        
             | slotrans wrote:
             | Two semantically equivalent SQL statements can plan
             | differently...
        
               | rrrrrrrrrrrryan wrote:
               | The exact same SQL statement can plan differently if
               | table statistics change.
        
               | hobs wrote:
               | Absolutely and for this case a different plan mattered.
        
           | mlavrent wrote:
           | Sorry, I should've been clearer. I'm interested if there's
           | any tool that does this kind of thing statically, without
           | running the code. I guess a simple approach is to compile
           | both programs and see if the generated code is the same, but
           | I'd guess reasoning at the generated-code level will probably
           | produce a lot more false positives (i.e. tool will report a
           | change when there isn't one) than if you reason about the
           | original program.
        
             | jerf wrote:
             | This gets really hard, really fast. That is, yes,
             | reasonably obviously doing this completely 100% accurately
             | requires a solution to the halting problem, but even
             | getting to "useful" is _really really hard_. Even the
             | Haskell world doesn 't try to solve the "equivalence of
             | functions" problem, and it's even more complicated in
             | imperative languages.
             | 
             | You probably have a mental image of catching something
             | really simple, and, yeah, "1 + 1" -> "2" is reasonably
             | easy, but in reality there aren't a lot of those super easy
             | changes. Most of the time there is _something_ confounding
             | the situation.
             | 
             | Truly neutral refactorings are pretty uncommon in their own
             | right. You can see that when someone is discussing semantic
             | versioning and pointing out that if you define a "major
             | version" as "there exists at least one possible use of the
             | code whose behavior will be changed as a result of this
             | library change", almost any API change is automatically a
             | major version change, which isn't really what anyone wants.
             | E.g., in Python, the mere fact that introspecting on an
             | object's methods will show one more method than it used to
             | isn't really what we want a major version change for. In
             | general, proving refactorings are actually 100% safe is
             | equally difficult; even simple arithmetic changes can
             | result in things overflowing at different times or in
             | different ways, it's virtually impossible to rewrite an
             | expression involving floats without the change being
             | witnessable _somehow_ , extracting a function could make it
             | so that code that previously didn't overflow the stack now
             | does, memory allocation changes can be the difference
             | between OOMing and not and may interact with GC in
             | unpredictable ways if you get _really_ precise, etc.
        
               | kstrauser wrote:
               | Here's a fun related article on Indistinguishability
               | Obfuscation:
               | https://cacm.acm.org/research/indistinguishability-
               | obfuscati...
               | 
               | TL;DR verifying that 2 functions have the same output is
               | really freaking hard.
        
             | brabel wrote:
             | The Unison language (https://www.unison-lang.org/) knows
             | how to compute whether the semantic meaning of the code has
             | changed (though I don't think it's possible to get the
             | actual diff to visualize it).
             | 
             | You can edit a function you've committed into the Unison
             | code repo, and if you didn't change the semantics of the
             | function, it's actually stored under the exact same hash...
             | All places using the function refer to it by its hash, so
             | nothing needs to be recompiled either, and no tests need to
             | be rerun.
             | 
             | Things like renaming variables, reordering code whose order
             | doesn't matter (common in functional programming) and
             | things like that do NOT change the hash.
             | 
             | I believe this is only possible because Unison is a Pure
             | Functional Language. If it's not, it becomes a NP problem
             | to decide if two programs are exactly equivalent, probably.
             | 
             | I wonder if Unison could provide the actual semantic diff
             | you're thinking of, it's probably not much more complex
             | than actually knowing the meaning of the code did change.
             | Maybe create a Feature Request :)
             | https://github.com/unisonweb/unison
        
           | Chris_Newton wrote:
           | _if you do this your difftool becomes a compiler_
           | 
           | Some linters and formatters are effectively compilers
           | already, so that doesn't seem completely implausible in
           | itself. Finding canonical representations of common coding
           | patterns so you can quickly and reliably determine that they
           | are equivalent is a different question, though.
        
         | otherjason wrote:
         | Difftastic is a useful tool, but in my experience, it's far too
         | slow to be suitable as the default selection for a ubiquitous
         | tool like git.
        
           | drcongo wrote:
           | I'm finding it instantaneous here on a large dirty codebase.
           | In what way is it slow for you?
        
             | acdha wrote:
             | Diff a large JSON file - I think it has to do with how
             | large a single structure since I notice it most with static
             | test fixtures.
        
               | drcongo wrote:
               | Interesting, thanks, I'll keep an eye out for that. I
               | rarely need to diff particularly large files.
        
               | acdha wrote:
               | Yeah, it's the kind of thing you notice when you enable
               | it as the default git diff helper and it's great almost
               | all of the time until you hit that one weird repo which
               | has gigantic data files.
        
         | kstrauser wrote:
         | I think shipping good ol' diff as the default makes sense. It's
         | going to be there already on any system you might want to run
         | git on, it's fast, it's tiny, and everyone knows the basics of
         | how to use it.
         | 
         | But I'm glad it's easy to change that default.
        
         | rob74 wrote:
         | > _I am curious if there's been any work on _semantic_ diff
         | tools as well (for when eg the syntax changes but the meaning
         | is the same)_
         | 
         | So when using such a diff tool you can spend hours refactoring
         | something, and then git will refuse to commit your changes
         | because your refactoring was successful in not changing the
         | behavior of the code? I understand what you mean, but if we
         | arrive at that point maybe we should stop calling it "diff", to
         | avoid confusion...
        
           | kstrauser wrote:
           | Git doesn't use the output of `diff` to determine whether
           | anything has changed.
        
             | samatman wrote:
             | True, although not widely known it would seem.
             | 
             | It does use diff to generate patches, however. I know in
             | today's GitHub-dominated landscape, that's considered a bit
             | of a dusty feature, but it would be a pity to break it.
        
               | viraptor wrote:
               | If you want to generate patches rather than look at local
               | differences, there's a specific command for that -
               | https://git-scm.com/docs/git-format-patch
        
         | DarkPlayer wrote:
         | > I am curious if there's been any work on _semantic_ diff
         | tools as well (for when eg the syntax changes but the meaning
         | is the same)
         | 
         | We are working on https://semanticdiff.com/ which detects basic
         | semantic changes like converting a literal from decimal to hex
         | or reordering keys within JSON objects. It is not a command
         | line utility but a VS Code extension and GitHub App. You can
         | check out https://semanticdiff.com/blog/semanticdiff-vs-
         | difftastic/ if you want to learn more about how it works and
         | how it differs from difftastic.
        
           | Izikiel43 wrote:
           | Thank you, you just simplified my life greatly, will use it
           | for a demo tomorrow.
        
         | more-coffee wrote:
         | I'm trying difft for git now and I also really like it. One
         | reason why I think it shouldn't be the default is that it hides
         | whitespace differences. Maybe that's configurable though,
         | haven't looked into that
        
       | pmayrgundter wrote:
       | "Do you know how to read @@ -5,6 +5,7 @@ syntax? Difftastic shows
       | the actual line numbers from your files, both before and after."
       | 
       | Preach!
       | 
       | Just dropped it in and did a git diff.. works like a charm!
        
         | neuromanser wrote:
         | > Do you know how to read @@ -5,6 +5,7 @@ syntax?
         | 
         | Do you _not_?
        
           | wffurr wrote:
           | No, I sure don't.
        
           | pmayrgundter wrote:
           | 20 years staring at it and no, i don't. i usually have to
           | work it out from context. i think if you have vi or ed
           | sensory organs it might work better for ya. which.. i still
           | chuckle that vi is the visual editor, bc ed lol
        
           | teaearlgraycold wrote:
           | Never needed to.
        
           | snthpy wrote:
           | Same here. If there's a great ELI5 explanation somewhere,
           | please post a link. Thanks
        
             | NateEag wrote:
             | The comma-separated number pairs are starting line number
             | of hunk, and the number of lines in the hunk.
             | 
             | The one starting with a minus sign is for the original
             | file, the one with the plus prefix is for the new file.
             | 
             | See https://www.gnu.org/software/diffutils/manual/html_node
             | /Deta... for a canonical source and more detail
        
           | michaelcampbell wrote:
           | No one knows anything until they learn it.
        
       | hrdwdmrbl wrote:
       | It seems like a major lapse in product innovation that Github has
       | not come out with something like this. They don't even have
       | something to help you when the indentation changes, they usually
       | just show it as a giant add & remove. Their diff viewer can and
       | should be smarter.
        
         | sroussey wrote:
         | GitHub has the option to ignore whitespace in a diff.
        
           | mbork_pl wrote:
           | Which is useful, but too crude.
        
         | neuromanser wrote:
         | Github can't even recognize syntax, let alone provide semantic
         | diffs! In fact, Github can't even tell that foo.cpp.in is
         | different from foo.mk.in! Any foo.t is declared to be Perl,
         | with no way to fix it...There are a decade-old tickets!
        
         | bPspGiJT8Y wrote:
         | Tree-sitter optimizes for performance (to use in editors), not
         | for correctness. In fact even TS' core developers advocate for
         | not bothering too much with correctness of grammars[1]. I
         | imagine this constraint would be a deal-breaker for GitHub or
         | anyone else in their position.
         | 
         | [1] https://github.com/tree-sitter/tree-
         | sitter/issues/130#issuec...
        
       | asicsp wrote:
       | Previous discussions:
       | 
       | https://news.ycombinator.com/item?id=27768861 _(297 points | 3
       | years ago | 61 comments)_
       | 
       | https://news.ycombinator.com/item?id=32746258 _(698 points | 2
       | years ago | 90 comments)_
       | 
       | https://news.ycombinator.com/item?id=30841244 _(983 points | 2
       | years ago | 219 comments)_
        
       | zokier wrote:
       | There is also gumtree that does ast based diffing
       | https://github.com/GumTreeDiff/gumtree
        
       | kstrauser wrote:
       | For those who don't already know, this is built on tree-sitter
       | (https://tree-sitter.github.io/tree-sitter/) which does for
       | parsing what LSP does for analysis. That is, it provides a
       | standard interface for turning code into an AST and then making
       | that AST available to clients like editors and diff tools.
       | Instead of a neat tool like this having to support dozens of
       | languages, it can just support tree-sitter and automatically work
       | with anything that tree-sitter supports. And if you're developing
       | a new language, you can create a tree-sitter parser for it, and
       | now every tool that speaks tree-sitter knows how to support your
       | language.
       | 
       | Those 2 _massive_ innovations are leading to an explosion of
       | tooling improvements like this. Now every editor, diff tool, or
       | whatever can support dozens or hundreds of languages without
       | having to duplicate all the work of every other similar tool.
       | That 's freaking amazing.
        
         | bfrog wrote:
         | While I agree tree-sitter is an amazing tool, writing the
         | grammar out can be incredibly difficult I found. I tried
         | writing out a grammar and highlighting query set for vhdl with
         | tree-sitter, and found that there were a lot of difficulties in
         | expressing vhdl grammar in tree-sitter.
        
           | kstrauser wrote:
           | No argument from me on that. The upside is that one person,
           | somewhere, has to get it right one time and then we can all
           | use it.
        
             | grub5000 wrote:
             | Seems like something LLMs should be useful for, if not now
             | then soon enough
        
               | HumanOstrich wrote:
               | I think many people are exhausted (at least I am) with
               | the constant irrational exuberance of bolting AI onto
               | every technology, product, and service in existence to
               | end all of humanity's problems. It won't work like that.
        
               | germandiago wrote:
               | In fact, reminds me of the time at which they used
               | Blockchain for everything.
               | 
               | Just a bubble right now. It will come back to its natural
               | uses after it. Everyone is doing AI now and I am pretty
               | sure it is to attract investment even if some might know
               | their product will go nowhere.
        
               | dreamcompiler wrote:
               | Correction: Everybody _says_ they 're doing AI now
               | because that's the magic buzzword for getting money.
               | 
               | I spent the 1990s building actual AI software, but we had
               | to call it something else because if you even whispered
               | "AI" in the 90s your funding would dry up instantly.
        
               | klabb3 wrote:
               | Someone should build a tool that augments any text with
               | current year tech buzzwords for optimal investor appeal.
               | I wonder what tech could be used for that... wait
        
               | pizza wrote:
               | tree-sitter?
        
               | mehdix wrote:
               | Not quite what you're looking for, but checkout
               | bullshit.js[0].
               | 
               | [0]: https://mourner.github.io/bullshit.js/
        
               | germandiago wrote:
               | No... I mean, this is a perfect example, seriously. Made
               | me laugh.
        
         | ievans wrote:
         | Absolutely agreed, and copying from a comment I wrote last
         | year: I think the fact that tree-sitter is dependency-free is
         | worth highlighting. For context, some of my teammates maintain
         | the OCaml tree-sitter bindings and often contribute to grammars
         | as part of our work on Semgrep (Semgrep uses tree-sitter for
         | searching code and parsing queries that are code snippets
         | themselves into AST matchers).
         | 
         | Often when writing a linter, you need to bring along the
         | runtime of the language you're targeting. E.g., in python if
         | you're writing a parser using the builtin `ast` module, you
         | need to match the language version & features. So you can't
         | parse Python 3 code with Pylint running on Python 2.7, for
         | instance. This ends up being more obnoxious than you'd think at
         | first, especially if you're targeting multiple languages.
         | 
         | Before tree-sitter, using a language's built-in AST tooling was
         | often the best approach because it is guaranteed to keep up
         | with the latest syntax. IMO the genius of tree-sitter is that
         | it's made it way easier than with traditional grammars to keep
         | the language parsers updated. Highly recommend Max Brunsfield's
         | strange loop talk if you want to learn more about the design
         | choices behind tree-sitter:
         | https://www.youtube.com/watch?v=Jes3bD6P0To
         | 
         | And this has resulted in a bunch of new tools built off on
         | tree-sitter, off the top of my head in addition to difftastic:
         | neovim, Zed, Semgrep, and Github code search!
        
           | drcongo wrote:
           | Don't forget Zed! https://zed.dev
        
             | germandiago wrote:
             | Looks great! It has lsp support for code completion?
             | Supports C++?
        
               | drcongo wrote:
               | LSP support is semi-built-in, but lots of improvements to
               | come in that area apparently to support more language
               | servers. With Python, it currently only has Pyright
               | built-in which is more of an annoyance if you're working
               | with code where the venv is inside a container but
               | there's very active tickets on their GitHub about
               | building out the LSP support. I currently use it as my
               | second editor - I have Sublime set up to be pretty much
               | perfect for my usage, but Zed is catching up fast. I find
               | I'm very fussy about editors, I can't get on with VSCode
               | at all, but I feel warm and fuzzy toward Zed - the UX is
               | great, performance superb, external LSP support is
               | probably the one feature stopping me using it as my
               | primary editor.
        
               | germandiago wrote:
               | I tried Vs code a ton of times. It is reasonably good,
               | but I am SO used to Emacs that it is almost impossible to
               | move from there for me.
               | 
               | Vs code is better at debugging and maybe slightly better
               | at remote connections, that yes. But for the rest of
               | things I am way more productive with Emacs than anything
               | else.
        
           | ossusermivami wrote:
           | don't forget old man emacs is now using tree sitter
        
           | jrave wrote:
           | helix (https://helix-editor.com/) is using treesitter and LSP
           | as well
        
           | TeMPOraL wrote:
           | Okay, but how does that work with language versions? Like, if
           | I get a "C++ parser" for tree-sitter, how do I know if it's
           | C++03, C++17, C++21 or what? Last time I checked (which was
           | months ago, to be fair), this wasn't documented anywhere, nor
           | were there apparent any mechanisms to support langauge
           | versions and variants.
        
             | pfdietz wrote:
             | And then there's all the variants of SQL...
        
             | MathMonkeyMan wrote:
             | You can probably rely on backward compatibility of the
             | language and use the "latest." The question is, which
             | version is the grammar written against?
        
             | Arech wrote:
             | That's what I was looking at in the very beginning. Here's
             | how it unfolds: Grammar page (https://github.com/tree-
             | sitter/tree-sitter-cpp) reference two documents at the very
             | end:
             | 
             | - Hyperlinked C++ BNF Grammar
             | (https://alx71hub.github.io/hcb/)
             | 
             | - EBNF Syntax: C++ (ISO/IEC 14882:1998(E))
             | https://www.externsoft.ch/download/cpp-iso.html
             | 
             | The second doc has a year in the title, so it's ancient af.
             | The first one has multiple `C++0x` red marks (whatever that
             | mean, afair that's how C++11 was named before
             | standardization). It mentions `constexpr`, but doesn't know
             | `consteval`, for example. And doesn't even mention any of
             | C++11 attributes, such as [[noreturn]], so despite the
             | "Last updated: 10-Aug-2021", it's likely pre-C++11 and is
             | also ancient af and have no use in a real world.
             | 
             | Who might have thought. /s
        
         | epistasis wrote:
         | I'm imagining what I could have done in my compilers class with
         | something like tree-sitter...
         | 
         | It feels kind of as foundational as YACC.
        
           | ivanjermakov wrote:
           | It is literally an alternative to YACC and other parser
           | generators.
        
         | duped wrote:
         | I don't believe this is correct - there's no such thing as
         | "speaking tree-sitter." Every tree-sitter parser emits a
         | different concrete syntax tree, not a standard abstract syntax
         | tree.
         | 
         | LSP truly solves the M editors to N languages needing M * N
         | many integrations by using a standard interface for a query
         | oriented compiler. Tree sitter doesn't solve this problem, it
         | just makes it way easier to write N many integrations for your
         | editor/tool.
        
           | kstrauser wrote:
           | That depends on how deep you want to go with the result. I
           | use the Nova editor which uses tree-sitter for syntax
           | highlighting, and I've packaged several languages for it.
           | Each time it goes like this:
           | 
           | 1. Clone someone's tree-sitter grammar off GitHub.
           | 
           | 2. Build it into a Mac .dylib.
           | 
           | 3. Create a Nova extension that says "use this .dylib to
           | highlight that language."
           | 
           | 4. Use it.
           | 
           | I don't have to make any changes to Nova itself, and the
           | amount of configuration I have to write is so tiny that Nova
           | could have a DIY wizard if they wanted it to.
           | 
           | The source for Difftastic discussed here (at https://github.c
           | om/Wilfred/difftastic/blob/master/src/parse/...) is also very
           | simple: for each of a list of supported languages, import the
           | tree-sitter parser and wrap a teensy amount of configuration
           | around it.
        
             | duped wrote:
             | > 3. Create a Nova extension that says "use this .dylib to
             | highlight that language."
             | 
             | How is that possible if the different tokens emitted by
             | tree sitter don't have standardized names? Isn't there some
             | kind of configuration that maps the rules in the grammar to
             | whatever convention Nova uses for their token names?
             | 
             | Now tree sitter does make this super easy, but my point was
             | you still have to have some kind of per-language
             | configuration/logic to work, whereas the entire point of
             | LSP is to have _none_.
        
               | kstrauser wrote:
               | That's done with the "highlights.scm" query
               | (https://tree-sitter.github.io/tree-sitter/syntax-
               | highlightin...) that maps nodes to their types with lots
               | of standard names (https://github.com/tree-sitter/tree-
               | sitter/blob/master/highl...).
               | 
               | The maintainer of the tree-sitter grammar is usually one
               | who maintains that mapping. At least, every time I've
               | wanted to use it, it's been the case that all of that was
               | already done and part of the grammar's repo.
        
         | emporas wrote:
         | Was reading about emacs and tree-sitter today [1]. Tree-sitter
         | is a force to be reckoned with.
         | 
         | [1] https://www.masteringemacs.org/article/how-to-get-started-
         | tr...
        
         | chubot wrote:
         | BTW there is interesting feedback from 4 people on a Treesitter
         | post yesterday:
         | 
         | https://news.ycombinator.com/item?id=39762495
         | 
         | (1) The top comment is from the author of difftastic (the
         | subject here), saying that treesitter Nim plugin can't be
         | merged, because it's 60 MB of generated C source code. There's
         | a scalability problem supporting multiple languages.
         | 
         | The author of Treesitter proposes using the WASM runtime, which
         | is new.
         | 
         | (2) The original blog post concludes with some Treesitter
         | issues, prefering Syntect (a Rust library that accepts Textmate
         | grammars)
         | 
         |  _Because of these issues I'll evaluate what highlighter to use
         | on a case-by-case basis, with Syntect as the default choice._
         | 
         | https://www.jonashietala.se/blog/2024/03/19/lets_create_a_tr...
         | 
         | Other feedback:
         | 
         | (3) _The idea of a uniform api for querying syntax trees is a
         | good one and tree-sitter deserves credit for popularizing it.
         | It 's unfortunately not a great implementation of the idea_
         | 
         | (4) _[It] segfaults constantly ... More than any NPM module I
         | 've ever used before. Any syntax that doesn't precisely match
         | the grammar is liable to take down your entire thread._
         | 
         | ---
         | 
         | I think some of the feedback was rude and harsh, and maybe even
         | using Treesitter outside its intended use cases. But as someone
         | who's been interested in Treesitter, but hasn't really used it,
         | it seems real.
         | 
         | One problem I see is that Treesitter is meant to be
         | incremental, so it can be used in an editor/IDE. And that's a
         | significantly harder problem than batch syntax highlighting,
         | parsing, semantic understanding.
         | 
         | ---
         | 
         | That is, difftastic is a batch tool, i.e. you run it with git
         | diff.
         | 
         | So to me the obvious thing for difftastic is to throw out the
         | GLR algorithm, and throw out the heinous external lexers
         | written in C that are constrained by it, and just use normal
         | batch parsers written in whatever language, with whatever
         | algorithm. Recursive descent.
         | 
         | These parsers can output a CST in the TreeSitter format, which
         | looks pretty simple.
         | 
         | They don't even need to be linked into the difftastic binary --
         | you could emit an CST / S-expression format and match it with
         | the text.
         | 
         | Unix style! Parsers can live in different binaries and still be
         | composed.
         | 
         | The blog post use case can also just use batch parsers that
         | output a CST. You don't Treesitter's incremental features to
         | render HTML for your blog.
        
           | diffxx wrote:
           | As one of the harsh and rude commentators, I would say I
           | basically agree with your interpretation. You also correctly
           | inferred that I have experience with working with it in an
           | area that is arguably outside of its true use case.
           | 
           | At the same time, I believe that there needs to be a
           | corrective about what tree-sitter should and should not be
           | used for. There are companies building security products on
           | top of tree-sitter which I think is an objectively bad idea
           | given its problems and limitations. Difftastic is to me a
           | grey area because it could lead hypothetically to a security
           | issue if it generates an incorrect diff due to an incorrect
           | tree-sitter grammar. Unlikely but not impossible.
           | 
           | Your point about batch vs incremental is spot on, though even
           | for IDEs, I think incremental is usually overkill (I have
           | written a recursive descent parser for a language in c that
           | can do 3million lines per second on a decent laptop which is
           | about 60k lines per 20 ms, which is the window I look to for
           | reactivity). How many non-generated source files exceed say
           | 100k lines? Incremental parsing feels like taking on a lot of
           | complexity for rather limited benefit except in fairly niche
           | use cases (granting that one person's niche is another's
           | common case).
           | 
           | That being said, it is impressive that their incremental
           | algorithm works as well as it does but the cost is that
           | grammar writers are forced to mold a language grammar that
           | might not fit into the GLR algorithm. When it doesn't work as
           | expected, which is not uncommon in my experience, the error
           | messages are inscrutable and debugging either the generator
           | or the generated code is nigh impossible.
           | 
           | Most of the happy users have no idea how the sausage is made,
           | they just see the prettier syntax highlighting that works
           | with multiple tools. I get that my criticism is as welcome as
           | a wet blanket, but I just think there is something much
           | better possible which your comment hints at.
        
             | kstrauser wrote:
             | FWIW, as a happy user, I'm mainly happy that it exists at
             | all. In the short term, it reduces the work supporting M
             | editors and N languages from to M+N. That's nice. More
             | importantly, it puts a bug in everyone's ear that this is a
             | good and achievable thing. Maybe the next step will be a
             | tree-sitter-API-compatible replacement that fixes some of
             | those problems and we can all migrate onto that.
             | 
             | That is, the big win is getting people to buy into the
             | concept of syntax (and analysis) as a library and not as a
             | feature of one specific editor. Once we're all spoiled by
             | that, perhaps a better implementation or an nice API will
             | come along and astound us all.
        
             | porker wrote:
             | > Your point about batch vs incremental is spot on, though
             | even for IDEs, I think incremental is usually overkill
             | 
             | I'd understood that incremental was used so that as someone
             | writes code the IDE can syntax highlight the incomplete and
             | syntactically incorrect code with better accuracy. Is that
             | not the case?
        
               | kstrauser wrote:
               | It is, but the counter argument is that parsers are
               | already so fast that streaming and all-at-once parsing
               | are indistinguishably quick on even huge files.
               | 
               | I don't believe that's true, but it's likely correct for
               | the common use case of files a few pages long, written in
               | well supported languages.
        
               | diffxx wrote:
               | I am quite sure that batch will work with good
               | responsiveness for many, if not most, common languages
               | provided source files have fewer than say 30k lines in
               | them. If you just think about the io performance of
               | modern computers, it should not be that difficult to
               | parse at 25MB/sec which I estimate translates to between
               | 500K to 1M loc, which again is in the 15k-30k loc range
               | per 30ms.
               | 
               | I'm not saying that incremental is bad per se, but that
               | the choice of guaranteeing incrementalism complicates
               | things for cases where it isn't necessary. I am not super
               | familiar with lsp, but I can imagine lsp having a syntax
               | highlighting endpoint that has both batch and incremental
               | modes. A naive implementation could just run the batch
               | mode when given an incremental request and later add
               | incremental support as necessary. In other words, I think
               | it would be best if there were another layer of
               | indirection between the editor and the parser (whether
               | that is tree-sitter or another implementation).
               | 
               | Right now though, you have to opt in whole hog to the
               | tree-sitter approach. As mentioned above, incrementalism
               | has no benefit and only cost for a batch tool like
               | difftastic or semgrep to mention two named in this
               | thread.
        
               | kstrauser wrote:
               | That makes sense to me. I don't know for sure that you're
               | right but it sure seems plausible.
               | 
               | I do wonder how much of a range there is on non-brand-new
               | computers though. I'm typing this on an M2 Max with 64GB
               | of RAM. I also have a Raspberry Pi in the other room, and
               | I know from hard experience that what runs screamingly
               | fast on my Mac may be painfully slow on the Pi.
               | 
               | I could also imagine power benefits to an incremental
               | model. If I type a single character in the middle of a
               | 30KLOC document, a batch process would need to rescan the
               | entire thing where a smart incremental process could say
               | "yep, you're still in the middle of a string constant".
        
         | abdullahkhalids wrote:
         | Can one write a tree-sitter grammar for English (or any other
         | natural language), that basically labels each sentence as a
         | statement, so I can use difftastic to show changes on sentences
         | rather than visual lines?
         | 
         | This is because visual line diffs for an essay is bonkers.
         | Usually the sentence changed starts in the middle of a visual
         | line.
        
           | fragmede wrote:
           | word diff gets you halfway without that complexity
        
             | abdullahkhalids wrote:
             | Do you mean the diff system inbuilt into Microsoft Word or
             | Google docs etc?
        
               | senknvd wrote:
               | There is a --word-diff flag in git diff. It can also be
               | customized using --word-diff-regex to possibly match
               | sentences.
        
               | abdullahkhalids wrote:
               | I see. From the docs [1]                   --word-diff-
               | regex=<regex>                 ... A match that contains a
               | newline is silently truncated(!) at the newline.
               | 
               | If I understand this correctly, if you use newlines
               | inside a sentence (if you are writing a fixed width
               | document, for example), this won't work.
               | 
               | [1] https://git-scm.com/docs/git-diff
        
               | mzs wrote:
               | pipe through "fmt -sw999999999999" first
        
           | pxeger1 wrote:
           | The common advice[0] is to just write one sentence per line.
           | I usually split at commas etc as well. Then use editor soft
           | wrapping instead of fixing a maximum line length - but if
           | your lines get longer than the screen width that might be a
           | sign your sentences are too complex.
           | 
           | [0]: anyone have a good source for this? I'm not sure where I
           | first encountered it
        
             | meatmanek wrote:
             | https://news.ycombinator.com/item?id=31808093
        
             | bpeebles wrote:
             | https://rhodesmill.org/brandon/2012/one-sentence-per-line/
             | is one possibility. And discusses a Kernighan Unix memo
             | from 1974 advocating for the practice. "UNIX for Beginners"
             | https://web.archive.org/web/20130108163017if_/http://miffy.
             | t... (PDF)
        
             | abdullahkhalids wrote:
             | I am not a slave of the machine. The machine is my tool.
             | The machine will conform to what I want. Not the other way
             | around.
             | 
             | There is no philosophy more important in this age.
        
               | antonvs wrote:
               | In my work I encounter quite a few people beating their
               | heads against walls trying to make the machine "conform
               | to what they want".
               | 
               | You can often achieve your goals much more quickly by
               | using tools they way they best support being used.
        
               | alpaca128 wrote:
               | And often there is a middleground. In this case one could
               | write a script that outputs a reformatted file with one
               | sentence per line. In Vim this could even be a simple
               | macro as the editor already has a key for jumping to the
               | next sentence.
        
             | sovietswag wrote:
             | It turns out that there is a lot of discourse out there
             | about "semantic newlines", under a few different names. So
             | far the names I've seen are:
             | 
             | - One Sentence Per Line (OSPL) - Semantic Line Breaks
             | (SemBr) - Semantic Linefeeds - Ventilated Prose - Semantic
             | newlines
             | 
             | Reading through the pages below was helpful in getting a
             | better idea of what language people use to discuss this.
             | They're mostly historical retrospectives or arguments for
             | the merit of semantic newlines.
             | 
             | https://rhodesmill.org/brandon/2012/one-sentence-per-line
             | https://ramshankar.org/blog/posts/2019/semantic-line-breaks
             | https://vanemden.wordpress.com/2009/01/01/ventilated-prose
             | https://discuss.python.org/t/semantic-line-breaks/13874
             | https://discuss.python.org/t/one-sentence-per-line-for-
             | peps-... https://sembr.org
             | https://asciidoctor.org/docs/asciidoc-recommended-
             | practices/...
             | 
             | (Actually I think one-sentence-per-line denotes something
             | slightly different from semantic-line-breaks, not that I
             | know what that difference is).
        
             | rokkitmensch wrote:
             | I will write excessively complex sentences whenever I darn
             | please, and will be hogtied before I stop at the whims of a
             | /diff tool/.
             | 
             | Mock outrage aside, whimsy and play in written language is
             | vastly cheaper than in industrial programming environments.
             | Provided, of course, the author can yet communicate while
             | horsing around.
        
               | lupire wrote:
               | Did you misunderstand?
               | 
               | One sentence per line doesn't mean your sentence has to
               | be limited in length.
        
               | rokkitmensch wrote:
               | > but if your lines get longer than the screen width that
               | might be a sign your sentences are too complex
        
         | joshspankit wrote:
         | How close are we to being able to copy a function in to the
         | clipboard, then highlight some lines of code and paste the
         | function around it (like highlight > quote marks)?
        
           | OJFord wrote:
           | What does it mean to paste a function around some lines of
           | code? As in what're the manual steps you do because that's
           | not possible today?
        
             | alpaca128 wrote:
             | I assume auto-wrapping a copied function signature around a
             | selecting block as if it was just parentheses or something.
             | I don't think I ever needed that, but a variant of that
             | might be useful for XML where wrapping something in a pair
             | of tags is a common operation.
        
             | joshspankit wrote:
             | Mostly I would find this helpful during refactoring where
             | it's normal to move some code out in to a specific
             | function. It would also be helpful for loops, if/else,
             | validation, try/catch since you could copy the boilerplate
             | and paste it over the code block in one move.
        
           | worksonmine wrote:
           | I don't know what exactly you mean by pasting a function
           | around the selection, but you can paste selections, registers
           | or even files at specific lines with some vim-fu. If it's
           | generic enough you could write a function or even a keyboard
           | shortcut if it's very simple.
           | 
           | I have set ",',(,[,{ in visual mode to cut the selection
           | insert the pairs then paste it back as a very hacky solution,
           | but it gets the job done. If you want something more advanced
           | to add or change anything around the selection tpope has
           | solved that with vim-surround[1].
           | 
           | [1]: https://github.com/tpope/vim-surround
        
         | danielvaughn wrote:
         | As soon as you said tree sitter I immediately understood. Yes,
         | I can't believe I never realized that you could totally build a
         | syntax-aware VCS on top of it. That's brilliant.
         | 
         | I just wrote a language parser a few months ago in tree sitter
         | and it's probably the most delightful software I've used apart
         | from ffmpeg.
        
         | fiddlerwoaroof wrote:
         | The main issue I have with tree-sitter is that it's approach
         | can't work for many languages I care about: Common Lisp cannot
         | be parsed without a full lisp implementation; Haskell's syntax
         | is complicated enough that the grammar is incomplete; C/C++
         | can't be parsed accurately if only because of the pre-
         | processor; parsing perl is Turing-complete, etc. I think the
         | suggestion elsewhere makes sense: don't make us write parsers
         | in a new ecosystem, but instead define a format for existing
         | parsers to produce as a side-output.
        
           | gwd wrote:
           | > C/C++ can't be parsed accurately if only because of the
           | pre-processor
           | 
           | Yeah, decided to check this out to see if it could help
           | review in our massive C-based project. Unfortunately, in a
           | recent patch, of the 90 "hunks", 88 of them had fallen back
           | to "normal diff" because "$N C parse errors, exceeded
           | DFT_PARSE_ERROR_LIMIT").
        
           | amelius wrote:
           | C++ also can't be parsed like that because you need to
           | process a declaration before you know what role a symbol
           | plays in the grammar.
        
         | thaumasiotes wrote:
         | This question is coming from a place of total ignorance:
         | 
         | One appeal of the general idea of a structural diff tool, for
         | me, is ignoring the ordering of things for which ordering makes
         | no difference.                   x = 4         y = 7
         | 
         | are independent statements and the code will be no different if
         | I replace those two statements with                   y = 7
         | x = 4
         | 
         | However, this information is not actually present in the
         | abstract syntax tree. If I instead consider these two
         | statements:                   x += 3         x *= 7
         | 
         | it is apparent that reordering them will cause changes to the
         | meaning of the code. But as far as the AST goes, this is the
         | same thing as the example where reordering was fine.
         | 
         | What kinds of things are we doing with our new AST tooling?
        
           | etbebl wrote:
           | > x = 4 > y = 7 > >are independent statements and the code
           | will be no different if I replace those two statements with >
           | > y = 7 > x = 4
           | 
           | Not always, e.g. in a multi threaded situation where x and y
           | are shared atomics. Then unless we authorize C++ to take more
           | liberties in reordering, another thread will never see y as 7
           | while x is not yet 4 in the first example, but not the
           | second. This kind of subtlety can't be determined from syntax
           | alone.
        
             | thaumasiotes wrote:
             | OK, I tended to agree that the AST was inadequate for this
             | task. But what are we doing with it? That's most of what I
             | want from "structural code diff".
        
           | libre-man wrote:
           | What you want to determine this is not an AST, you want a
           | Program Dependence Graph (PDG), which does encode this
           | information. Creating them is not close to as simple as
           | creating a AST, and for many languages requires either
           | assumptions that will be broken, or result in something very
           | similar to an AST (every node has a dependency on the
           | previous node).
        
             | thaumasiotes wrote:
             | OK. What good is the AST? Why do I care about "structural
             | diffs" that don't do this?
             | 
             | The page has several examples:
             | 
             | 1. Understand what actually changed.
             | 
             | This appears to show that `guess(path,
             | guess_src).map(tsp::from_language)` has been changed to
             | `language_override.or_else(|| guess(path,
             | guess_src)).map(tsp::from_language)`. The call to `map` is
             | part of a single line of code in the old file, but has been
             | split onto a line of its own in the new file to accommodate
             | the greater complexity of the expression.
             | 
             | The bragging associated with the example is "Unlike a line-
             | oriented text diff, difftastic understands that the inner
             | expression hasn't changed here", but I don't really care
             | about that. I need to pay close attention to which bits of
             | the line have been manipulated into which positions anyway.
             | I'm more impressed by ignoring the splitting of one line
             | into several, which does seem to be a real benefit of
             | basing the diff on an AST.
             | 
             | 2. Ignore formatting changes.
             | 
             | This example shows that when I switch the source from which
             | `mockable` is imported from "../common/mockable.js" to
             | "./internal.js", the diff will actively obscure that
             | information by highlighting `mockable` and pretending that
             | `"./internal.js"` is uninteresting code that was there the
             | whole time (because it was already the source of some other
             | imports). This badly confuses a boring visual change
             | ("let's use the syntax for importing several things,
             | instead of one thing") with a very significant semantic
             | change ("let's import this module from a completely
             | different file"). I'm not comfortable with this; there must
             | be a better way to present this information than by
             | suggesting that I shouldn't be worried about it.
             | 
             | (A textual diff, in this case, has the same problem. But
             | when the pitch is that your new tool is better than a
             | textual diff because it understands the code, failing to
             | highlight an important change to the code is worse than it
             | used to be!)
             | 
             | 3. Visualize wrapping changes.
             | 
             | This shows that when I change the type of some field from
             | `String` to `Option<String>`, the diff will not highlight
             | the text "String", because that part hasn't changed. This
             | is a change from a textual diff, but it doesn't appear to
             | add much value.
             | 
             | There's a second example to do with code that belongs both
             | before and after other code, in this case an
             | opening/closing tag pair in XML, but in that case the
             | structural diff appears to be identical to a textual diff.
             | 
             | 4. Real line numbers.
             | 
             | "Do you know how to read @@ -5,6 +5,7 @@ syntax? Difftastic
             | shows the actual line numbers from your files, both before
             | and after."
             | 
             | I agree that that's a real benefit, but again it doesn't
             | seem to have anything to do with the difference between
             | textual and structural diffs.
             | 
             | ------
             | 
             | I think the conceptual appeal of a "structural diff" is
             | that it fails to highlight changes to the code that don't
             | change the behavior of the software. Difftastic clearly
             | believes something different; in the second example, they
             | are failing to highlight a change to the code that _does_
             | change the behavior of the software. And in the other
             | examples, they are failing to highlight things that haven
             | 't changed from some perspectives, but could be argued to
             | have changed from other perspectives -- and that in either
             | case don't derive much benefit from not being highlighted.
             | If changing `String` to `Option<SpecialType>` produced a
             | diff that highlighted `SpecialType` in a separate color
             | from the surrounding `Option<>` wrapping, indicating that
             | the one line of code contained two relevant changes, that
             | might be interesting, but otherwise I don't see the point
             | of not highlighting the inner `String` along with the new
             | wrapping.
             | 
             | So... what _is_ the appeal of structural diffs?
        
           | MathMonkeyMan wrote:
           | In a sense, plain old diff is a structural diff. The grammar
           | is a sequence of lines of characters.
           | 
           | All tree-sitter gives you is a _different_ grammar, so that a
           | structural diff can operate on different trees given the same
           | text as diff.
           | 
           | A parse tree still doesn't know anything about the meaning of
           | a program, which is what you need to know in order to
           | determine that those assignments to x and y are unordered.
        
         | pfdietz wrote:
         | Tree-sitter is nice, but I would like parsers that make a
         | better effort on invalid inputs. Something like an Early parser
         | that maximizes some quality function. This would be useful for
         | parsing (for example) C and C++ where the preprocessor prevents
         | true parsing of unpreprocessed code. I understand that tree-
         | sitter is intended for interactive use in editors where it
         | can't spend too much time parsing.
        
       | adamtaylor_13 wrote:
       | Does anyone know how to enable this for .html.erb files? I found
       | it doesn't work properly in Ruby .erb files which makes it
       | fallback to just regular ol diff behavior.
        
         | coldbrewed wrote:
         | That may require a tree-sitter implementation for erb templated
         | html; it may exist but if so it's less of a mainstream thing.
         | 
         | Some quick googling turns up https://github.com/tree-
         | sitter/tree-sitter-embedded-template which may or may not meet
         | your needs.
        
       | mihaigalos wrote:
       | Nice tool. Also relevant: https://github.com/dandavison/delta
        
       | adamc wrote:
       | Doesn't seem to have a Debian install.
        
         | pas wrote:
         | https://github.com/Wilfred/difftastic/issues/560 help wanted :)
        
       | Night_Thastus wrote:
       | No MSYS install, sadly. :(
        
         | quasarj wrote:
         | It's just a cargo package. Is there a working rust/cargo
         | toolchain under MSYS?
        
           | Night_Thastus wrote:
           | Yes, there is one here:
           | https://packages.msys2.org/base/mingw-w64-rust
           | 
           | Looks like it wasn't all that hard. Honestly the harder part
           | is getting my shell to remember the changes to PATH.
        
       | mnw21cam wrote:
       | No package for Debian-like systems yet.
        
       | aus10d wrote:
       | Really cool idea!
        
       | airstrike wrote:
       | Fantastic tool. Now we just need the vscode extension ;-)
        
         | modernerd wrote:
         | SemanticDiff is probably the closest for now, although I don't
         | think it uses tree-sitter.
         | 
         | https://semanticdiff.com/
         | 
         | Found via https://github.com/Wilfred/difftastic/issues/194.
        
       | pjturpeau wrote:
       | It seems to be a great tool, however on the few checks I did on
       | big XML files, it shows modified lines in normal green and
       | modified attributes in bold green, which makes them difficult to
       | detect visualy.
       | 
       | I didn't find in the documentation how it is possible to change
       | the style of the diff, or to ask for another color in the bold
       | case.
       | 
       | Any idea?
        
         | jez wrote:
         | Unfortunately it doesn't appear to allow customizing colors
         | yet.
         | 
         | I chimed in on this issue[1] to express support for that. You
         | may wish to also chime in on that one, or open a new issue if
         | you think that the feature you're looking for is sufficiently
         | different from the one discussed in that thread.
         | 
         | [1]: https://github.com/Wilfred/difftastic/issues/611
        
       | blackfawn wrote:
       | Difftastic seems really nice! Unfortunately it shows some changed
       | binary files which makes it sort of unusable. `file` reports
       | these files as "ELF 64-bit LSB shared object, x86-64, version 1
       | (SYSV), dynamically linked, stripped" and the MIME type/encoding
       | is "application/x-sharedlib; charset=binary" so not sure why
       | difftastic is trying to show them as thousands of changed lines
       | of text...
        
       | nibab wrote:
       | This is great! I wish my PR review tools allowed me to plug in
       | something like this. Hopefully one day we will go back to the
       | world of customizable/plugin-based software. Most of my web tools
       | are very prescriptive about the user experience and dont let me
       | tailor my tools.
        
       | sanxchit wrote:
       | What an amazing tool, wish it had a GUI version as well.
        
         | layer8 wrote:
         | From the screenshot examples in the readme, I'm not sure how
         | substantial the benefits are over GUI tools like Kdiff3 or
         | WinMerge that have existed for ages.
        
           | sanxchit wrote:
           | I've used WinMerge, its very different from the above tool
           | because its still a text diff. It often gets the diffs wrong
           | when multiple lines are involved. Take this trivial C#
           | example: https://www.diffchecker.com/gr0H7qA1/
           | 
           | Most diff tools throw out something that is quite difficult
           | to parse, but difftastic gave me the most concise diff so
           | far.
        
       | abledon wrote:
       | onnly found out about this because it was an option to view diffs
       | when installing git using Nix
        
       | keybored wrote:
       | I think I use this indirectly through the git-delta pager which
       | is a great pager replacement for git.
        
         | Aissen wrote:
         | No, I don't think delta integrates with difftastic:
         | https://github.com/dandavison/delta/issues/535
        
       | akkartik wrote:
       | Is there a way to make the output more familiar to diff users?
       | I've turned on --inline. I also mostly don't care enough about
       | line numbers to want them on _every_ line, so prefer the  '<' and
       | '>' leaders.
       | 
       | Also, on Arch there doesn't seem to be a man page.
        
       | xyzelement wrote:
       | I don't write enough code / write it professionally anymore to
       | integrate it into my life BUT MAN this is a great idea.
       | 
       | In general, we're overflowing in TMI which makes it hard to suss
       | out what matters. For example at work I often read docs that
       | describe what we do for customer X vs customer Y and it takes a
       | ton of work to suss out the 1% of text that is _different_
       | between those two, which is really what you want to understand
       | and validate.
       | 
       | So anything that makes just the impactful change stand out is
       | beyond welcome.
        
       | jmholla wrote:
       | I tried switching to this, but I found it noisy and use weird
       | formatting for things that didn't change. I went back to using
       | icdiff[0].
       | 
       | [0]: https://github.com/jeffkaufman/icdiff
        
       | markrages wrote:
       | Does the output work with patch(1)? Or does this use a different
       | patch?
        
       | drcongo wrote:
       | I love this so much. I _hate_ reading cli diffs, but this is
       | instantly understandable.
        
       | throwawaygo wrote:
       | How long until this is just a prompt?
        
       | brainzap wrote:
       | fuck yeah finally, I am so sick of text based diff
        
       | Arrgh wrote:
       | Long, long ago, I used a library called Augeas
       | (https://augeas.net/) in a drift-detection product*, so that if
       | we detected a difference in config files, either on the same
       | server through time, or on different servers that were supposed
       | to be similar, we could de-noise the diffs, and more importantly,
       | let users write fine-grained, but syntax-tolerant, allow-lists
       | like "this particular setting is allowed to differ"... or even
       | "this particular setting can have one of the following list of
       | values". :)
       | 
       | * the company was acquired by Splunk years after we shelved that
       | product
        
       | dsp_person wrote:
       | Interesting that while the arch linux the package weighs in a
       | 7MB, it extracts to 80MB installed, and the `difft` binary is a
       | whole 78MB. On a ZFS dataset with LZ4 compression du says it's
       | 17MB. I wonder why not just compress whatever is so compressible
       | in the binary? It would probably even load faster to decompress
       | it in ram.
        
         | jiripospisil wrote:
         | Just a data point because you mentioned compression and I got
         | curious - it's about 10MB on btrfs with zstd:1.
         | Processed 1 file, 614 regular extents (614 refs), 0 inline.
         | Type       Perc     Disk Usage   Uncompressed Referenced
         | TOTAL       14%       10M          77M          77M
         | none       100%      1.1M         1.1M         1.1M
         | zstd        12%      9.8M          76M          76M
        
           | dsp_person wrote:
           | Apparently ZFS 2.2.0 added an early abort capability for zstd
           | compression. Maybe I'll switch over to it.
        
         | speed_spread wrote:
         | Its probably related to the comment here about the Nim parser
         | being an unmergable 60mb C file. It looks like TreeSitter
         | requires lots and lots of code, a lot of which must be
         | redundant / compressible.
        
         | dark-star wrote:
         | > It would probably even load faster to decompress it in ram.
         | 
         | An executable is not loaded into RAM completely and then
         | started. It's memory mapped and only the parts that actually
         | get used are loaded (when they are needed).
         | 
         | This is why large binaries get slower when you compress them,
         | because then the demand-paging doesn't work anymore
         | 
         | I did the test with a 280mb Windows EXE some years ago,
         | compressed down to like 70 megs or so but took multiple seconds
         | longer to start up than the original
         | 
         | For some scenarios might make sense (running binaries across
         | the network maybe) but in most cases an uncompressed binary
         | will start up faster
        
       | nialv7 wrote:
       | Difftastic is great, but when I need to create patches I need to
       | go back to good ol' diff...
       | 
       | Is there a patching tool that can apply difftastic diffs?
        
       | replwoacause wrote:
       | I want for a nice diff tool on Mac that doesn't cost an arm and a
       | leg like Kaleidoscope. Right now I am using one called
       | CompareMerge2 from the App Store which is pretty good.
        
         | PlunderBunny wrote:
         | I too, settled on CompareMerge2 after casting around
         | fruitlessly, and lusting after Kaleidoscope.
        
           | catlan wrote:
           | Here a coupon code for 40% off: HN1337
        
             | kstrauser wrote:
             | Thanks for sharing that! I, too, have drooled over
             | Kaleidoscope for ages. But I confess: at the discount rate
             | it's still more expensive than my IDE (Nova). It has a
             | million cool features that alas I don't need. I just want
             | that gorgeous source code diffing, at a lower price point.
             | 
             | (I don't expect you to say "great idea, sir, here's your
             | coupon!". I'm sure you've done your analysis and you're at
             | the price point that works best for you. I'm also not
             | saying it's overpriced, just that it's more than _I 'm_
             | willing to pay for it given the functionality _I_ want from
             | it. This is just in the spirit of friendly user feedback.)
        
       | yboris wrote:
       | Related tool: _diff2html_ also as a _CLI_ - with one command
       | opens a browser tab showing HTML diff (side by side or line by
       | line) - a great way to review your work before committing.
       | 
       | https://diff2html.xyz/
        
       | ein0p wrote:
       | I wish someone would make a console 3 way merge with some
       | rudimentary editing support. I know there's vimdiff, but I just
       | don't gel with it. For the lack of a better description I just
       | want a console version of Meld's 3-pane merge.
        
       | gexla wrote:
       | Thanks for this. I'll give it a try. And for diffs in a different
       | context, I really like Beyond Compare.
        
       | domenkozar wrote:
       | In devenv.sh:                 difftastic.enable = true;
        
       | duncan_britt wrote:
       | Can it be made to work with magit in emacs?
        
       | arcastroe wrote:
       | Could this be integrated with something like Gitea or is this
       | purely for CLI based use?
        
       | jerrygoyal wrote:
       | how to set it up as the default git diff in vscode?
        
       | trashymctrash wrote:
       | If you are curious, here is how you can integrate it into a
       | Jetbrains IDE: https://glyphy.com/a/2022/structural-diff-with-
       | difftastic-an...
       | 
       | I tried it, but unfortunately it's not as seamless as it could
       | be, so I reverted back to Jetbrain's native diffing, which is
       | quite good anyway.
        
       | mtmk wrote:
       | I'm really pleased to hear about this and looking forward to the
       | updates in my daily tooling. I feel like for decades I haven't
       | seen a fundamental improvement in diff/merge tooling. It will
       | finally be great to not parse the change in my head. For example,
       | one diff problem I keep noticing has been the curly brace shift
       | when you add a new function or a block. Even if it can only
       | identify those, that would be a win for me.
        
       | alkonaut wrote:
       | Is there a way to tell difft that a particular extension is a
       | particular format? Or that it could guess for unknown formats
       | whether they are xml by e.g. scanning the first lines for <tags>
       | before falling back to text?
       | 
       | Because xml files are very often not using the extension xml and
       | treating them as plaintext means losing a lot of the structural
       | goodness.
        
         | planetpluta wrote:
         | Yes there is an `---override` option you can use to specify the
         | language in which a file should be parsed.
         | 
         | https://github.com/Wilfred/difftastic/blob/master/CHANGELOG....
        
           | alkonaut wrote:
           | Perfect, thanks.
        
       | rutchkiwi wrote:
       | Looks cool, but I'm confused about what the different colors
       | mean. What is purple/blue? (green and red is easy to understand)
        
       | tln wrote:
       | Wow, this looks great. Get that man some more Emacs and coffee!
        
       ___________________________________________________________________
       (page generated 2024-03-22 23:01 UTC)