[HN Gopher] The fastest object diff library in JavaScript
___________________________________________________________________
The fastest object diff library in JavaScript
Author : AsyncBanana
Score : 76 points
Date : 2021-11-06 14:52 UTC (8 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| throwaway413 wrote:
| Is this a true deep diff utility? Didn't check the code, but the
| readme makes no mention of depth capability beyond comparing to
| some other deep diffing libs.
| nathcd wrote:
| Yeah, it's just a single function that recurses. So the limit
| will be your VM's call stack limit.
| parhamn wrote:
| Interesting it's so much faster, the code looks very natural and
| trickless besides the richobject check (at least at first
| glance).
|
| Wonder what everyone else is doing to make it slow.
|
| Also curious what mobx and reacts object diffing looks like now.
| AsyncBanana wrote:
| For a bit of info on why it is often faster, you can look at
| https://github.com/AsyncBanana/microdiff/issues/2. I will
| probably add something to the README about that soon too.
| megous wrote:
| It doesn't handle arrays, for one example.
| megous wrote:
| In all seriousness, what's so noteworthy about this? It's one
| trivial 50 line recursive function. This is kind of stuff devs
| just type out instead of searching for a npm package, because
| it's faster that way.
| peey wrote:
| Would you rather implement `leftPad` or use the npm package?
| hutrdvnj wrote:
| Let's say I maintain dozen of packages, then I would rather
| like to write it once and include it everywhere. I could copy
| and paste all the common functions instead of including, but
| if I notice a bug or have to upgrade the code for some
| reason, then I have to edit the code dozen times. But if I
| only have to maintain one package, then I prefer the
| copy&paste approach.
| megous wrote:
| I'd use standard String.prototype.padStart. For anything more
| complicated, but not too complex (say "libraries" with
| 100-300 lines of code), it's very tedious and time consuming
| to find something on npm that satisfies the specific
| contraints of my app precisely. This certainly fits in that
| category.
|
| For libraries above 1-2k lines of code that can be quite
| complicated (eg. things like database connectors), it starts
| being reasonable to mold expectations of my app around the
| library.
|
| Also using coherent large utility libraries is something
| worthwhile (or was in the past), libs like underscore/lodash,
| etc.
|
| But composing app from random underdocumented 2 function
| "libraries" from npm is just pain if you know what you're
| looking for. 1) you can't trust the code, so 2) you have to
| read it anyway and reading is often slower than writing
| edflsafoiewq wrote:
| For trivial functions, I'd rather just copy them into my
| source code.
| qwertox wrote:
| I wouldn't be able to write this. I mean, I could, but I lack
| the knowledge of Typescript and willingness to dedicate the
| needed time to create this while I have got other things to do.
|
| The code is clean, well organized and nice to look at, easy to
| read through and understand, and it's great that someone has
| made the effort to create this and share it for us to use.
| StevenWaterman wrote:
| You only see the finished product, not the 100 failures and
| incremental improvements over time. You could certainly write
| something yourself from scratch, but I doubt it would be
| competitive against this.
| megous wrote:
| > not the 100 failures and incremental improvements over time
|
| This code has 3 actual code commits over 6 days, none of
| which changed anything fundamental. What are you talking
| about?
|
| It's quite possible to compete against this on multiple
| levels. One would be correctness. The other a bit less
| wasteful interface. Another would be handling arrays in a
| more useful way (or at all). And yet another would be safety.
| Another would be speed and memory use. Yet another would be
| non-recursive implementation. Another would be documentation
| (what's this diffing? enumerable/non-enumerable properties?
| how it deals with getters? how about inherited properties? -
| I can't know any of this without looking long and hard at the
| code)... etc. etc.
| can16358p wrote:
| Note on only the first part: the author might have tested
| bunch of stuff on their local without commiting until
| they've found something worth, perhaps? That would result
| in very few commits.
| engmgrmgr wrote:
| Mmm, agree with parent; can get faster. Unfortunately lots of
| perf critical code is created out of necessity and defaults
| to proprietary.
|
| You could profile this lib and figure out where it's spending
| most of its time and figure out if there any ways to improve
| (starts getting into implications of JS <-> system), but as
| you get more idiosyncratic you lose flexibility and ease of
| use.
|
| Eg type checks, heap additions, recursion, etc can get "slow"
| AsyncBanana wrote:
| I agree it can be pretty easy to set up a basic version of
| this, but eliminating edge cases, optimizing, and unit testing
| can make it take significantly longer.
| ianberdin wrote:
| Please compare with a modern competitor: json-patch-plus
| https://github.com/n1ru4l/graphql-live-query/blob/main/packa...
|
| It's a bit different, but I think it covers more cases.
| kosinus wrote:
| I maintain symmetry[1] and wanted to try the benchmark against
| it, but the results are very inconsistent. Symmetry is anywhere
| between 50% slower to 20% _faster_ than microdiff on the small
| object benchmark.
|
| Symmetry doesn't get a lot of activity, but I've been using it in
| production for many years. It does some more work on array
| diffing, which this benchmark doesn't cover, by implementing part
| of Myers' algorithm.
|
| [1]: https://github.com/Two-Screen/symmetry/
| laurent123456 wrote:
| > type: "CREATE" | "REMOVE" | "CHANGE";
|
| Since performance is a goal, why not define an enum and use
| integer values? I don't know about speed, but the resulting diff
| would be smaller.
|
| Also I wonder what "const t = true" is for.
| wruza wrote:
| Usually it doesn't make a difference neither in speed, nor in
| size, because of "interning" of literals. Unless you jsonify
| it, of course. It could show up, if you'd later compare .type
| with some "CRE"+"ATE" values made funny way, but not for
| literals, e.g.: // x.js export const foo
| = "foo" // y.js if (x.foo === "foo") {
| // ^ as cheap as int } const foo2 =
| "FOO".toLowerCase() if (x.foo === foo2) { // ^
| questionable // depends on jit/optimization }
|
| https://stackoverflow.com/questions/5276915/do-common-javasc...
|
| https://en.wikipedia.org/wiki/String_interning
| drodil wrote:
| Nice one. Have you tried replacing the for loops with forEach? It
| might be even more performant.
| sbr464 wrote:
| Function callbacks are definitely slower. You could probably
| shave another 10% by changing the array.push() calls to direct
| assignments: array[array.length] = value. Same with the nested
| array.map, changing to a for loop.
| Etheryte wrote:
| In Javascript, forEach, map, reduce and other similar array
| methods are considerably slower than a regular for loop. You
| can see [0] and the linked questions for more technical
| details.
|
| [0] https://stackoverflow.com/q/22155280/1470607
| [deleted]
| megous wrote:
| What if object references have loops? That doesn't seem to be
| handled. Also it looks like it will diff not just own properties
| but properties from prototype objects, too.
| tantalor wrote:
| > loops
|
| Don't
| akie wrote:
| Ok, let's pretend they don't exist and potentially hang the
| app.
| tekkk wrote:
| Hey this is really cool! Not sure how this compares to
| https://github.com/benjamine/jsondiffpatch which has been
| unmaintained for some time. But if this could replace it I'd
| happy to start using it.
| AsyncBanana wrote:
| Thanks for your interest! Currently, Microdiff does not have a
| patching functionality by default, and the API is different. If
| there is enough interest, though, I might make a compatibility
| layer to ease the differences, and it probably would not be
| hard to do yourself too.
| wereHamster wrote:
| > const t = true;
|
| That should be the job of a minifier, makes the code less
| readable.
| wruza wrote:
| Fast diffing of objects may be seen as rarely needed to a regular
| user, but it's very important part of handling vdom/vnode diffs.
| Recently I researched few lesser web ui libraries for my own use
| and found out that many of them do not diff vnodes correctly,
| e.g. in case of RegExp or Date arguments. I don't remember exact
| library names vs flaws, but my conclusion was that almost all of
| non-actively supported libs had it in one way or another. If you
| want to explore, check that first, because it's a surprising pain
| point.
|
| Seems that this particular library is aware of these potential
| issues (which motivated me to write this comment):
| https://github.com/AsyncBanana/microdiff/issues/2
| nathcd wrote:
| But vnodes have a predefined shape, whereas this library is for
| diffing (relatively) arbitrary objects, right? If your data
| isn't arbitrarily shaped, I would assume you'd get much better
| performance by just implementing a simple direct diff, no?
| eyelidlessness wrote:
| And, often but probably not always, using a class instance
| rather than a POJO. Seems unintuitive (function call
| overhead), but it can signal to the JIT that the object's
| type is relatively stable.
| wruza wrote:
| Yes. I should have noted for clarity that vdom diff is a
| special case, not a general one like that of subj. But their
| similarity reminded me of the experience.
| nathcd wrote:
| Ah that's my mistake, I read more into your comment than
| you wrote. Sorry!
|
| It's a good point about edge cases in diffing. Dynamic
| value diffing in dynamically-typed languages is an
| interesting problem to think about.
| spankalee wrote:
| I know some diffing libraries are reluctant to do this for perf
| reasons, but many of them do not handle cycles correctly leading
| to potential infinite loops based on the shape of the objects and
| order of arguments.
|
| Fast is great, but unless you know that your data doesn't have
| cycles, infinite loops take a long time :)
___________________________________________________________________
(page generated 2021-11-06 23:01 UTC)