[HN Gopher] How we decreased GitLab repo backup times from 48 ho...
___________________________________________________________________
How we decreased GitLab repo backup times from 48 hours to 41
minutes
Author : nayak
Score : 566 points
Date : 2025-06-06 15:43 UTC (2 days ago)
(HTM) web link (about.gitlab.com)
(TXT) w3m dump (about.gitlab.com)
| judofyr wrote:
| See also: https://www.tumblr.com/accidentallyquadratic
|
| Quadratic complexity sits in an awkward sweet spot: Fast enough
| for medium-sized _n_ to pass first QA, but doomed to fail
| eventually as _n_ grows.
| vjerancrnjak wrote:
| This particular change was not accidental. It was announced as
| quadratic.
| hinkley wrote:
| We have a habit of taking our eye off of old problems by trying
| to juggle several new ones. By the time someone notices that we
| have a problem, the dogleg in the graphs where the n2 solution
| stopped fitting into the CPU cache has been obvious for months
| but nobody was looking, and we dance around that fact that we
| had time to take a reasonable approach to fix the problem if we
| had noticed it when it became measurable, by adding anxiety to
| the cleanup work.
|
| And then someone learns from this experience, gets the bright
| idea to set up an alert for such things, but the alert doesn't
| factor in things like customer base growth or feature creep
| slowly pushing up the expected runtime. Eventually organic load
| gets close to the alarm and then the fucking thing goes off on
| a three day weekend (why is it always a long weekend or just
| before one?) and then we wage war on alarm overreach and the
| whole cycle repeats itself.
|
| We like to think of ourselves as blazing trails in the
| wilderness but most of the time we are doing laps around the
| parking lot.
| jeffbee wrote:
| Are there any reimplementations of git, by professional
| programmers using real tools? The source in question -- object.c
| -- is "banging rocks together" material.
| landr0id wrote:
| https://github.com/GitoxideLabs/gitoxide
| umanwizard wrote:
| Ignoring your inaccurate flamebait and answering the underlying
| question: there are several reimplementations of git, including
| "got" (Game of Trees) from the OpenBSD developers, jgit
| (reimplementation in Java), the Haskell git package, the
| gitoxide rust crate (and I assume several more half-finished
| Rust hobby projects), and probably others; that's just what I
| found with a cursory google.
| Vilian wrote:
| Git, but as it was intended to be used
| kgeist wrote:
| We use go-git in one of our Go projects that relies on Git for
| version control of its data.
| gonzus wrote:
| https://github.com/libgit2/libgit2
| fHr wrote:
| lmao
| hiddew wrote:
| "fixed it with an algorithmic change, reducing backup times
| exponentially"
|
| If the backup times were O(n^2), are they now O(n^2 / 2^n)? I
| would guess not.
| umanwizard wrote:
| This is not the precise mathematical definition of exponential,
| but rather the colloquial one, where it just means "a lot".
| cvoss wrote:
| You shouldn't use a word that can carry a precise
| mathematical meaning in a sentence that literally uses
| mathematical notation in order to speak precisely and then
| expect readers not to interpret the word in the precise
| mathematical way.
| blharr wrote:
| I somewhat agree, but for lack of a better word, what would
| you use? Quadratically doesn't have the same punch
| umanwizard wrote:
| "Dramatically" ?
| remram wrote:
| "a lot"
| jjmarr wrote:
| "by a factor of `n`" also sounds impressive.
| jrochkind1 wrote:
| If you just mean "a lot" in a non-technical sense, there
| are plenty of words available. enormously. immensely.
| tremendously. remarkably. incredibly. vastly.
| bobbylarrybobby wrote:
| "From quadratic to linear" seems fine.
| bobbylarrybobby wrote:
| "From quadratic to linear" or "... to constant" seems
| fine.
| morepedantic wrote:
| Algorithmic? Big-O? Polynomially? Linear improvement?
| O(n^2) to O(n)? Or if you want to be less mathematically
| precise: enormous improvement?
|
| Using exponential in this way in any context is a faux
| pas, because it's highly ambiguous, and requires context
| for clarification. But in this situation the context
| clearly resolved to the mathematically accurate
| definition, except it was used in the other way.
| globular-toast wrote:
| Runtimes dropped precipitously.
| ndriscoll wrote:
| Quadratically doesn't have the same punch because it is
| _actually_ exponentially less than exponentially. So
| doing it for extra punch (as opposed to not knowing the
| correct word) in a technical context would just be lying.
| It 'd be like a paper saying they had a result with p
| less than one in a trillion for "extra punch" when they
| actually had p=0.1.
| IshKebab wrote:
| You should if you expect your readers to be normal humans
| who understand obvious context, and not pedantic HN readers
| who understand obvious context but delight in nit-picking
| it anyway.
| Dylan16807 wrote:
| You _can_ , but it's not _should_.
| morepedantic wrote:
| >pedantic
|
| Who the fuck do you think is the intended audience for an
| article about an algorithm in `git bundle create`? I
| spent approximately two minutes of my life trying to
| figure out where the O(n^2) algorithm was being invoked
| in such a way that it influenced an exponential.
|
| Exponential was bolded in the same sentence as a big-O.
| 50/50 troll/author oversight.
| saagarjha wrote:
| Maybe not 'morepedantic
| globular-toast wrote:
| Ah yes because "normal humans" know what O(n^2) means but
| damnit they are going to use exponential wrong.
| tomjakubowski wrote:
| I'm a normal human and I know what O(n^2) means. There
| are dozens of us.
| sneak wrote:
| "Words mean things."
|
| If you can't agree with this, then you shouldn't be
| speaking or writing, IMO.
|
| Those who argue that words that mean different things are
| actually equivalent have no business dealing with language.
| morepedantic wrote:
| I understood every word, phrase, and sentence you wrote.
| But I did not understand your point. Still, I got the
| meaning of your words, so presumably you're satisfied.
| hskalin wrote:
| I find the whole article rather poorly written. Most likely
| using an LLM.
| morepedantic wrote:
| Especially when the colloquial meaning derives from the
| mathematical meaning.
| MyFedora wrote:
| We simplify the big O notation in computer science. This is
| standard practice.
| rovr138 wrote:
| Just drop the constants, it doesn't matter /s
|
| Production systems running and melting here...
| morepedantic wrote:
| >Ultimately, we traced the issue to a 15-year-old Git
| function with O(N2) complexity and fixed it with an
| algorithmic change, reducing backup times exponentially.
|
| No, not in the exact same sentence as a big-O. That's either
| author error, or an intentional troll. Either way it's egg on
| their faces.
| marcellus23 wrote:
| Meaningless and non-constructive pedantry.
| chrisweekly wrote:
| I'm not the OP you're responding to, but to be fair, in a
| sentence about big-O perf characteristics, which includes the
| word "algorithms", using "exponentially" in a colloquial non-
| technical sense is an absolutely terrible word choice.
| linguistbreaker wrote:
| Exponentially bad word choice even... since we're using
| that word however we want now?
|
| I don't think this is meaningless or non-constructive
| pedantry - we're a technical community and those are
| technical words.
| msgodel wrote:
| I disagree. Misuse of the word "exponential" is a major pet
| peeve of mine. It's a particular case of the much more common
| "use mathematically precise phrasing to sound
| careful/precise" that you often find in less than honest
| writing.
|
| Here they are actually using it to refer to growth functions
| (which is rare for this error) and being honest (which is
| also rare IMO) but it's still wrong. They should have written
| about quadratic or quadratic vs linear.
|
| Regardless sloppy language leads to sloppy thought.
| keybored wrote:
| Sloppy writing is up orders of magnitude lately.
| 0cf8612b2e1e wrote:
| It will decimate readership.
| sneak wrote:
| My personal pet peeve is when the term "exponentially" is
| used to refer to a change between precisely two data
| points.
|
| It's a very specific subset of the one you're describing.
|
| "Tell me you don't know what you're talking about without
| telling me you don't know what you're talking about."
| morepedantic wrote:
| >My personal pet peeve
|
| Just be glad you have only one.
| morepedantic wrote:
| >pedantry
|
| I wasted 2 minutes of my life looking for the exponential
| reduction. So did many others.
|
| Now I'm wasting more of my life shit posting about it, but at
| least that's a conscious choice.
| csnweb wrote:
| If you replace an n^2 algorithm with a log(n) lookup you get an
| exponential speed up. Although a hashmap lookup is usually
| O(1), which is even faster.
| ryao wrote:
| That is not true unless n^C / e^n = log(n) where C is some
| constant, which it is not. The difference between log(n) and
| some polynomial is logarithmic, not exponential.
| csnweb wrote:
| But if you happen to have n=2^c, then an algorithm with
| logarithmic complexity only needs c time. Thats why this is
| usually referred to as exponential speedup in complexity
| theory, just like from O(2^n) to O(n). More concretely if
| the first algorithm needs 1024 seconds, the second one will
| need only 10 seconds in both cases, so I think it makes
| sense.
| ryao wrote:
| N is a variable in what I posted, not a constant.
| wasabi991011 wrote:
| It depends if you consider "speedup" to mean dividing the
| runtime or applying a function to the runtime.
|
| I.e. you are saying and f(n) speedup means T(n)/f(n), but
| others would say it means f(T(n)) or some variation of
| that.
| morepedantic wrote:
| The man, or llm, used the mathematically imprecise
| definition of exponential in a sentence with a big-O
| notation. I don't think he's going to be writing entire
| arguments formally.
| ndriscoll wrote:
| They're still using the map in a loop, so it'd be nlogn for a
| tree-based map or n for a hash map.
| sn9 wrote:
| The algorithm complexity went down in the function they patched
| (6x improvement in their benchmark), but in the context of how
| they benefited with how they were using the algorithm the
| impact was much larger (improved to taking 1% of the time),
| which is plausibly exponential (and figuring out the actual
| complexity is neither relevant nor an economic use of time).
| ndriscoll wrote:
| > figuring out the actual complexity is neither relevant nor
| an economic use of time
|
| The fix was replacing a nested loop with a map. Figuring out
| that this goes from O(n^2) to O(n) (modulo details like
| bucket count) is immediate if you know what the words mean
| and understand enough to identify the problem and make the
| fix in the first place.
| wasabi991011 wrote:
| I interpreted that as n->log(n) since log and exp are inverses.
|
| Also because I've often heard tha the quantum Fourier transform
| is an exponential speedup over the discrete Fourier transform,
| and there the scaling goes n^2->nlogn.
| morepedantic wrote:
| I was also looking for the exponential algorithm.
| abhashanand1501 wrote:
| In fact O(n^2) is exponentially more than O(log n).
| gre wrote:
| looks like the commit is here:
| https://github.com/git/git/commit/a52d459e72b890c192485002ec...
| mortar wrote:
| Thanks, this helped me find the submission and related
| discussion thread -
| https://lore.kernel.org/git/20250401-488-generating-bundles-...
| SomaticPirate wrote:
| How was the flame graph created? (Not very familiar with C and
| the performance tools around it)
| plorkyeran wrote:
| https://github.com/brendangregg/FlameGraph
|
| You record performance data with `perf`, then use the scripts
| there to turn it into a SVG.
| immortaljoe wrote:
| OP here. In this particular case, I used
| https://github.com/flamegraph-rs/flamegraph
| IshKebab wrote:
| I strongly recommend not using this. Instead use pprof - it
| has a MUCH better interactive flamegraph, plus other nice
| performance visualisations (e.g. a call graph):
|
| https://github.com/google/pprof go install
| github.com/google/pprof@latest pprof -http=: prof.out
|
| I normally collect the profiles with gperftools
| (https://github.com/gperftools/gperftools) and then just
| LD_PRELOAD=/usr/lib/libtcmalloc_and_profiler.so
| CPUPROFILE=prof.out <your command>
|
| I've been meaning to try Samply though. Not sure if it works
| with pprof.
| landr0id wrote:
| You can also use https://github.com/mstange/samply to make
| recording and viewing in the Firefox profiler easier.
|
| It will spin up a localhost server after the trace ends, the
| profiler uses the localhost server and nothing is shared with
| Firefox servers unless you explicitly choose to upload the data
| and create a permalink.
| edflsafoiewq wrote:
| So they replaced a nested for loop for checking for duplicates
| with a set data structure. Surprised such a common thing was in
| git.
| tomxor wrote:
| One of the many reasons I moved to self hosting. I use ZFS to
| backup every 15 minutes, ... could do it even more frequently but
| that seems a little pointless.
|
| Also moved away from Gitlab because it's so damn slow.
| glookler wrote:
| ZFS is great, getting off gitlab would also be great, what did
| you switch to?
| tomxor wrote:
| Gitea, i can't recommend it enough. Lightening fast compared
| to the proprietary offerings, clean simple UI like an older
| Github, and super simple to self host.
| mdaniel wrote:
| So long as you have an existing CI/CD story, or are willing
| to invest in head-to-heading them, to find one that fits
| your needs. Because both it and Forgejo's "we're deploying
| act, what could go wrong" give me the heebie-jeebies. And,
| aside from that, there are _so many_ FLOSS ones they could
| have chosen that legitimately _work_ making that decision
| extra opaque to me
| haroldp wrote:
| This was my thought too, but I figured I just didn't understand
| the problem. Why use git commands to backup when a file system
| copy seems like it would do? zfs snapshot takes less than a
| second on any size repo. zfs send transfers just the changes
| since the last backup, as fast as your network, more or less.
| tomxor wrote:
| Yup, once you've used snapshots for backups once, doing any
| kind of filesystem level backup just seems absurd. I now use
| it as the basis for every piece of infrastructure I add that
| holds valuable data. The only place it doesn't really fit is
| distributed databases.
| mdaniel wrote:
| It for sure has an excruciatingly slow UI, but I haven't yet
| been able to stomach the Ruby magick enough to dig in and find
| out if it's _all of GitLab_ that 's slow, and thus their
| culture of who-cares just propagates to the UI, or it's
| literally just the web frontend that's molasses yet because so
| many folks interact with it over the web that's where the
| conclusion ends up
|
| I know their backend git proxy is written in golang, their
| runner agent is written in golang, it spawns CI jobs using
| containerd, written in golang, and they use postgresql and a
| redis-esque KV -- although for that part I do believe they're
| still using Sidekick _(ruby)_ for doing job dispatch, so that
| one could very easily lolol back into not actioning tasks
| efficiently
|
| GitHub Actions sure does enjoy just sitting there twiddling its
| thumbs when I push "run job," and is also both Ruby and their
| own crazypants ajax-y web framework, so I don't think it's
| exactly the shining star of performance itself
| treesknees wrote:
| Duplicate of https://news.ycombinator.com/item?id=44197061
| mdaniel wrote:
| Since that one only has one comment, I'm pretty sure that means
| the HN front-page is luck of the draw and I wouldn't have
| bothered even commenting about the other one
| LgWoodenBadger wrote:
| IME, it has always turned out to be the correct decision to
| eliminate any n^2 operation in anything I've written.
|
| I don't write exotic algorithms, but it's always astounding how
| small n needs to be to become observably problematic.
| parthdesai wrote:
| My rule of thumb for 80%-90% of the problems is, if you need
| complicated algorithm, it means your data model isn't right.
| Sure, you do need complicated algorithms for compilers, db
| internals, route planning et all, but all things considered,
| those are minority of the use cases.
| anttihaapala wrote:
| This is not a complicated algorithm. A hash map (dictionary)
| or a hash set is how you would always do deduplication in
| Python, because it is easiest to write / least keystrokes
| anyway. That is not the case in C though, as it is much
| easier to use arrays and nested loops instead of hash maps.
| throwaway2037 wrote:
| > That is not the case in C though, as it is much easier to
| use arrays and nested loops instead of hash maps.
|
| I am confused. There are plenty of open source, fast hash
| map impls in C.
| saagarjha wrote:
| Yes, the problem is getting them into your project.
| paulddraper wrote:
| This isn't knapsack. This is a dict lookup.
| LtWorf wrote:
| I wrote a funny algorithm to group together words that end
| the same way to write them once in my binary wordlist file,
| since there is an array that points to the start character
| and a \0 to end the word. My initial solution was O(n2) but
| it was too slow on a real wordlist so I had to come up with
| something better. In the end I just sort the list with
| quicksort, but revert the order of the words and then the
| groupable ones end up nearby each other.
| koala_man wrote:
| Good call. O(N^2) is the worst time complexity because it's
| fast enough to be instantaneous in all your testing, but slow
| enough to explode in prod.
|
| I've seen it several times before, and it's exactly what
| happened here.
| david422 wrote:
| We just had this exact problem. Tests ran great, production
| slowed to a crawl.
| bcrl wrote:
| I was just helping out with the network at an event. Worked
| great in testing, but it failed in production due to
| unicast flooding the network core. Turns out that some of
| the PoE Ethernet switches had an insufficiently sized CAM
| for the deployment combined with STP topology changes
| reducing the effective size of the CAM by a factor of 10 on
| the larger switches. Gotta love when packet forwarding goes
| from O(1) to O(n) and O(n^2)! Debugging that in production
| is non-trivial as the needle is in such a large haystack of
| packets so as to be nearly impossible to find in the output
| of tcpdump and wireshark. The horror... The horror...
| hinkley wrote:
| First big project I worked on a couple of us sped up the db
| initialization scripts so we could use a less trivial set
| of test data to stop this sort of shenanigans.
|
| Things like inserting the test data first and turning on
| constraints and possibly indexes afterward.
| SAI_Peregrinus wrote:
| I'd say the exception is when `n` is under about 10, and is
| counting some sort of hardware constrained thing (e.g. some
| operation over all CAN interfaces pesent on an OBDII connector
| can be O(n^(2)) since n will always be between 1 and 4). If you
| wouldn't have to physically replace hardware for `n` to
| increase, you _really_ need to avoid n^2 operations. And even
| then consider them carefully, perhaps explicitly failing if `n`
| gets too big to allow for noticing rework is needed before new
| hardware hits the field.
| echelon wrote:
| > perhaps explicitly failing if `n` gets too big
|
| That's the problem. A lot of these quadratic time algorithms
| don't set limits.
|
| Even 'n!' is fine for small 'n'. Real production use cases
| don't have small 'n'.
| lassejansen wrote:
| Or, phrased differently, if n has an upper limit, the
| algorithm is O(1).
| connicpu wrote:
| As long as you have tests that regularly exercise your
| algorithm at n=max where you would notice if they were
| exceptionally slow
| masklinn wrote:
| > Real production use cases don't have small 'n'.
|
| Real production use cases absolutely have small n. You
| don't hear about them, because it's very hard for them to
| cause issues. Unless the use case changes and now the n is
| not small anymore and nobody noticed the trap.
| haiku2077 wrote:
| I have an app that's been running an O n^2 algorithm in
| "production" (free open source app used by various
| communities) for about half a year now.
|
| It's been fine because "n" is "number of aircraft flying in
| this flight simulator" - and the simulator's engine starts
| to fail above around 2000 anyway. So even in the worst case
| it's still going to run within milliseconds.
| EdwardCoffin wrote:
| Bruce Dawson says: _I like to call this Dawson's first law of
| computing: O(n^2) is the sweet spot of badly scaling
| algorithms: fast enough to make it into production, but slow
| enough to make things fall down once it gets there._
|
| https://bsky.app/profile/randomascii.bsky.social/post/3lk4c6...
| paulddraper wrote:
| The second law is that O(n * log n) is for practical intents
| and purposes O(n).
| EdwardCoffin wrote:
| To be clear though, that isn't _his_ second law, at least
| as of two months ago, according to https://bsky.app/profile
| /randomascii.bsky.social/post/3lk4c6...
| to11mtm wrote:
| Fair, but `n log n` definitely is the historical "good
| enough to actually sleep at night" in my head, every time
| I see it I think of the prof who taught my first CSC
| course and our data structures course due to how often it
| came up.
|
| Also, the wise statement that 'memory is fairly cheap
| compared to CPU for scaling'. It's insane to see how
| often folks would rather manually open and scan a
| 'static-on-deploy' 20-100MB Json file for each request vs
| just parsing it into structures in memory (where, for
| most cases, the in memory usage is a fraction of the json
| itself) and just caching the parsed structure for the
| length of the application.
| hinkley wrote:
| Not often but occasionally I will chose the nlogn
| algorithm which obviously has no bugs over the O(n)
| algorithm with no obvious bugs.
|
| Less brittleness is worth paying a few percent.
| Especially if it unmuddies the waters enough for someone
| to spot other accidental (time) complexity.
| refulgentis wrote:
| Considerably more than a few percent, IMHO. :)
|
| But I also don't dabble in this area _nearly_ enough to
| know whether there 's years of tears and toil finding out
| repeatedly that O(n) is ~impossible to implement and
| verify :) | n | n log n | | 5
| | 8.0472 | | 10 | 23.0259 | | 25 |
| 80.4719 | | 50 | 195.6012 | | 100 |
| 460.5170 |
| hinkley wrote:
| This is magic thinking about how C, memory hierarchies,
| networking, and system calls work.
| Someone wrote:
| Depends on the constants and on the value of _n_. If the
| constant for the _O(n log n)_ algorithm is five times
| that of the _O(n)_ algorithm, the _O(n)_ algorithm is
| faster for _n < 100_.
|
| If you expect that _n < 100_ will always hold, it may be
| better to implement the _O(n)_ algorithm and add a
| logging warning if _n > 250_ or so (and, maybe, a fatal
| error if _n > 1000_ or so), instead of spending time to
| write both versions of the algorithm _and_ spend time
| finding the cut off value for choosing between the two.
| hinkley wrote:
| Fatal errors tend to blow up in production rather than
| test.
|
| One of the simplest solutions for detecting cyclic graphs
| is instead of collecting a lookup table or doing
| something non-concurrent like marking the nodes, is to
| count nodes and panic if the encountered set is more than
| an order of magnitude more than you expected.
|
| I came onto a project that had done that before and it
| blew up during my tenure. The worst case graph size was
| several times the expected case, and long term customers
| were growing their data sets vertically rather than
| horizontally (eg, ever notice how much friction there is
| to making new web pages versus cramming more data into
| the existing set?) and now instead of 10x never happening
| it was happening every Tuesday.
|
| I was watching the same thing play out on another project
| recently but it got cancelled before we hit that
| threshold for anything other than incorrect queries.
| rcxdude wrote:
| It's also often in the range where constant factors can
| make a big difference over a wide range of n
| paulddraper wrote:
| Yes, that isn't actually Dawson's second law.
| brucedawson wrote:
| Should it be my second law?
|
| https://bsky.app/profile/randomascii.bsky.social/post/3lr
| 24s...
| hinkley wrote:
| But sometimes a big enough C can flip which solution helps
| you hit your margins.
| hansvm wrote:
| In my mind, that's always been the point in dropping log
| factors. The algorithms are comparable enough that the
| actual implementation starts to matter, which is all
| we're really looking for in a Big-O analysis.
| sn9 wrote:
| Skiena has a great table in his algorithms book mapping
| time complexity to hypothetical times for different input
| sizes.
|
| For _n_ of 10^9, where lg _n_ takes 0.03 us and _n_ takes 1
| s, _n_ lg _n_ takes 29.9 s and _n_ ^2 takes _31.7 years_.
| swyx wrote:
| more from table please?
| johnisgood wrote:
| I would rather have the table and related content. Name
| of the book?
| hinkley wrote:
| I made the "mistake" in an interview of equating two super-
| quadratic solutions in an interview. What I meant was what
| Dawson meant. It doesn't matter because they're both too
| ridiculous to even discuss.
| dieortin wrote:
| They're too ridiculous... unless a more optimal solution
| does not exist
| hinkley wrote:
| Absolutely not.
|
| If the cost of doing something goes above quadratic, you
| shouldn't do it at all. Because essentially every
| customer interaction costs you more than the one before.
| You will never be able to come up with ways to cover that
| cost faster than it ramps. You are digging a hole,
| filling it with cash and lighting it on fire.
|
| If you can't do something well you should consider not
| doing it at all. If you can only do it badly with no hope
| of ever correcting it, you should outsource it.
| Retric wrote:
| Chess engines faced worse than quadratic scaling and came
| out the other side...
|
| Software operates in a crazy number of different domains
| with wildly different constraints.
| crabmusket wrote:
| I believe hinkley was commenting on things that are
| quadratic in the number of users. It doesn't sound like a
| chess engine would have that property.
| tux3 wrote:
| They did make it sound like almost anything would
| necessarily have n scale with new users. That assumption
| is already questionnable
|
| There's a bit of a "What Computational Complexity Taught
| Me About B2B SaaS" bias going.
| delusional wrote:
| All of modern Neural Network AI is based on GEMM which
| are O(n^2) algorithms. There are sub-cubic alternatives,
| but it's my understanding that the cache behavior of
| those variants mean they aren't practically faster when
| memory bound.
|
| n is only rarely related to "customers". As long as n
| doesn't grow, the asymptotic complexity doesn't actually
| matter.
| saagarjha wrote:
| The GEMM is O(n^3) actually. Transformers are quadratic
| in the size of their context window.
| hinkley wrote:
| I read that as a typo given the next sentence.
|
| I'm on the fence about cubic time. I was mostly thinking
| of exponential and factorial problems. I think some very
| clever people can make cubic work despite my warnings.
| But most of us shouldn't. General advice is to be ignored
| by masters when appropriate. That's also the story arc of
| about half of kung fu movies.
|
| Did chess solvers really progress much before there was a
| cubic approximation?
| delusional wrote:
| > I read that as a typo given the next sentence.
|
| Thank you for the courtesy.
|
| > I think some very clever people can make cubic work
| despite my warnings.
|
| I think you're selling yourself short. You don't need to
| be that clever to make these algorithms work, you have
| all the tools necessary. Asymptotic analysis is helpful
| not just because it tells us a growth, but also because
| it limits that growth to being in _n_. If you're doing
| matmul and n is proportional to the size of the input
| matrix, then you know that if your matrix is constant
| then the matmul will always take the same time. It does
| not matter to you what the asymptotic complexity is,
| because you have a fixed n. In your program, it's O(1).
| As long as the runtime is sufficient, you know it will
| never change for the lifetime of the program.
|
| There's absolutely no reason to be scared of that kind of
| work, it's not hard.
| hinkley wrote:
| Right but back up at the top of the chain the assertion
| was that if n grows as your company does then IME you're
| default dead. Because when the VC money runs out you
| can't charge your customers enough to keep the lights on
| and also keep the customers.
| endgame wrote:
| I feel this is too hardline and e.g. eliminates the
| useful things people do with SAT solvers.
| hinkley wrote:
| The first SAT solver case that comes to mind is circuit
| layout, and then you have a k vs n problem. Because you
| don't SAT solve per chip, you SAT solve per model and
| then amortize that cost across the first couple years'
| sales. And they're also "cheating" by copy pasting cores,
| which means the SAT problem is growing much more slowly
| than the number of gates per chip. Probably more like
| n^1/2 these days.
|
| If SAT solvers suddenly got inordinately more expensive
| you'd use a human because they used to do this but the
| solver was better/cheaper.
|
| Edit: checking my math, looks like in a 15 year period
| from around 2005 to 2020, AMD increased the number of
| cores by about 30x and the transistors per core by about
| 10x.
| IshKebab wrote:
| That's quite a contortion to avoid losing the argument!
|
| "Oh well my algorithm isn't _really_ O(N^2) because I 'm
| going to print N copies of the answer!"
|
| Absurd!
| hinkley wrote:
| What I'm saying is that the gate count problem that is
| profitable is in m3 not n3. And as long as m < n^2/3 then
| you are n2 despite applying a cubic time solution to m.
|
| I would argue that this is essentially part of why Intel
| is flagging now. They had a model of ever increasing
| design costs that was offset by a steady inflation of
| sales quarter after quarter offsetting those costs. They
| introduced the "tick tock" model of biting off a major
| design every second cycle and small refinements in
| between, to keep the slope of the cost line below the
| slope of the sales line. Then they stumbled on that and
| now it's tick tick tock and clearly TSM, AMD and possibly
| Apple (with TSM's help) can now produce a better product
| for a lower cost per gate.
|
| Doesn't TSM's library of existing circuit layouts
| constitute a substantial decrease in the complexity of
| laying out an entire chip? As grows you introduce more
| precalculated components that are dropped in, bringing
| the slope of the line down.
|
| Meanwhile NVIDIA has an even better model where they spam
| gpu units like mad. What's the doubling interval for gpu
| units?
| crote wrote:
| That only matters when the constants are nontrivial and N
| has a potential to get big.
|
| Not every app is a B2C product intending to grow to
| billions of users. If the costs start out as near-zero
| and are going to grow to still be negligible at 100%
| market share, who cares that it's _technically_
| suboptimal? Sure, you could spend expensive developer-
| hours trying to find a better way of doing it, but YAGNI.
| hinkley wrote:
| I just exited a B2B that discovered they invested in
| luxury features and the market tightened their belts by
| going with cheaper and simpler competitors. Their n
| wasn't really that high but they sure tried their
| damnedest to make it cubic complexity. "Power" and
| "flexibility" outnumbered, "straightforward" and even
| "robust" but at least three to one in conversations. A
| lot of my favorite people saw there was no winning that
| conversation and noped out long before I did.
|
| The devs voted with their feet and the customers with
| their wallets.
| Tainnor wrote:
| Gaussian elimination (for square matrices) is O(n^3)
| arithmetic operations and it's one of the most important
| algorithms in any scientific domain.
| hinkley wrote:
| I'll allow that perhaps I should have said "cubic"
| instead of "quadratic" - there are much worse orders in
| the menagerie than n^3. But it's a constraint we bang
| into over and over again. We use these systems because
| they're cheaper than humans, yes? People are still trying
| to shave off hundredths of the exponent in matrix
| multiplication for instance. It makes the front page of
| HN every time someone makes a "breakthrough".
| marcinzm wrote:
| > no hope of ever correcting it
|
| That's a pretty bold assumption.
|
| Almost every startup that has succeeded was utterly
| unscalable at first in tons of technical and business
| ways. Then they fixed it as they scaled. Over-optimizing
| early has probably killed far more projects and companies
| than the opposite.
| hinkley wrote:
| > That's a pretty bold assumption.
|
| That's not a bold assumption it's the predicate for this
| entire sidebar. The commenter at the top said some things
| can't be done in quadratic time and have to be done
| anyway, and I took exception.
|
| >> unless a more optimal solution does not exist
|
| Dropping into the middle of a conversation and ignoring
| the context so you can treat the participants like they
| are confused or stupid is very bad manners. I'm not
| grumpy at you I'm grumpy that this is the eleventeenth
| time this has happened.
|
| > Almost every startup
|
| Almost every startup fails. Do you model your behavior on
| people who fail >90% of the time? Maybe you, and perhaps
| by extension we, need to reflect on that.
|
| > Then we fixed it as we scaled
|
| Yes, because you picked a problem that _can_ be
| architected to run in reasonable time. You elected to do
| it later. You trusted that you could delay it and turned
| out to be right.
|
| >> unless a more optimal solution does not exist
|
| When the devs discover the entire premise is
| unsustainable or nobody knows how to make it sustainable
| after banging their heads against it, they quickly find
| someplace else to be and everyone wonders what went
| wrong. There was a table of ex employees who knew exactly
| what went wrong but it was impolitic to say. Don't want
| the VCs to wake up.
| patrick451 wrote:
| There are two many obvious exceptions to even start
| taking this seriously. If we all followed this advice, we
| would never even multiply matrices.
| Tepix wrote:
| So, how would you write a solver for tower of Hanoi then?
| Are you saying you wouldn't?
| hinkley wrote:
| As a business? Would you try to sell a product that
| behaved like tower of Hanoi or walk away?
| vlovich123 wrote:
| I have an n^3 operation that's currently a huge bottleneck at
| only 10k elements. Not sure how to fix it.
| IshKebab wrote:
| I once looked into tree diffing algorithms (the literature is
| all about diffing XML even though it's really trees). The
| obvious dumb algorithm (which seems to be what everyone uses)
| is n^4.
| esprehn wrote:
| Modern computers are pretty great at scanning small blocks of
| memory repeatedly, so n^2 can be faster than the alternative
| using a map in cases for small N.
|
| I spent a lot of time fixing n^2 in blink, but there were some
| fun surprises:
|
| https://source.chromium.org/chromium/chromium/src/+/main:thi...
|
| For large N without a cache :nth-child matching would be very
| slow doing n^2 scans of the siblings to compute the index. On
| the other hand for small sibling counts it turned out the cache
| overhead was noticably worse than just doing an n^2. (See the
| end of the linked comment).
|
| This turns out to be true in a lot of surprising places, both
| where linear search beats constant time maps, and where n^2 is
| better than fancy algorithms to compensate.
|
| Memory latency and instruction timing is the gotcha of many fun
| algorithms in the real world.
| throwaway2037 wrote:
| > where linear search beats constant time maps
|
| Can you give an example? You said lots of good things in your
| post, but I struggling to believe this one. Also, it would
| help to see some wall clock times or real world impact.
| hansvm wrote:
| Pick any compiled language and test it. Pick an algorithm
| making heavy use of a small (<10, maybe up to a hundred
| elements) hashset, and try using a linear structure
| instead. The difference will be most apparent with
| complicated keys, but even strings of more than a few
| characters should work.
|
| Some example workloads include:
|
| 1. Tokenization (checking if a word is a keyword)
|
| 2. Interpretation (mapping an instruction name to its
| action)
|
| 3. ML (encoding low-cardinality string features in
| something like catboost)
|
| 4. JSON parsers (usually key count is low, so parse into a
| linear-scan hashmap rather than a general-purpose hashmap)
|
| Details vary in the exact workload, the hardware you're
| using, what other instructions you're mixing in, etc. It's
| a well-known phenomenon though, and when you're doing a
| microoptimization pass it's definitely something to
| consider. 2x speedups are common. 10x or more happen from
| time to time.
|
| It's similar to (but not _quite_ the same as) the reason
| real-world binary search uses linear scans for small
| element counts.
|
| When you go to really optimize the system, you'll also find
| that the linear scan solution is often more amenable to
| performance improvements from batching.
|
| As to how much it matters for your composite program? Even
| at a microoptimization level I think it's much more
| important to pay attention to memory access patterns. When
| we wrote our protobuf parser that's all we really had to
| care about to improve performance (33% less execution time
| for the entire workload, proto parsing being much better
| than that). You're much more likely to be able to write
| sane code that way (contrasted with focusing on
| instructions and whatnot first), and it's easier to layer
| CPU improvements on top of a good memory access pattern
| than to go the other way around.
| crote wrote:
| You've got to keep in mind that computers aren't the
| 1-instruction-at-a-time purely sequential machines anymore.
|
| Let's say you've got a linear array of bytes, and you want
| to see if it contains a specific value. What would a modern
| CPU need to execute? Well, we can actually compare 64
| values at a time with _mm512_mask_cmp_epu8_mask! You still
| need a little bit of setup and a final "did any of them
| match" check, of course. Want to compare 512 values? You
| can probably do that in less than 10 clock cycles with
| modern machines
|
| Doing the same with a hash set? Better make sure that hash
| algorithm is fast. Sure it's O(1), but if calculating the
| hash takes 20 cycles it doesn't matter.
| ncruces wrote:
| This is a good point.
|
| A string search algorithm that uses SIMD to do quickly
| discard a majority of 16, 32 or 64 attempts in parallel,
| and then verify the surviving ones quadratically (again
| 16, 32 or 64 bytes at a time) can go a _very_ long way
| against a sublinear algorithm that understands needle
| structure, but necessarily needs to process the haystack
| one byte at a time.
| kevincox wrote:
| This is true. But unless you are sure that current and future
| inputs will always be small I find it is better to start with
| the algorithm that scales better. Then you can add a special
| case for small sizes if it turns up in a hot path.
|
| This is because performance is typically less important for
| the fast/small case and it is generally acceptable for
| processing twice as much to be twice (or slightly more than
| twice) as slow, but users are far more likely to hit and
| really burned by n^2 algorithms in things you thought would
| almost always be small and you never tested large enough
| sizes in testing to notice.
|
| I wrote more on this topic here
| https://kevincox.ca/2023/05/09/less-than-quadratic/
| hinkley wrote:
| A lot of computations are really higher complexity order
| functions with stair steps at certain intervals based on
| hardware trying to pretend they are constant time. All
| operations cost the same amount until n doubles again and
| then it's slower. If you zoom out toward infinity, the stair
| steps smooth out into a logarithmic curve. It becomes
| logarithmic in the former case and square root in the latter.
| Even dividing two numbers or doing a memory address lookup
| stops being C, which is part of why prime factoring worked
| for RSA for so long.
|
| If anyone had made clockless logic work you would see that
| adding 1 + 1 is in fact faster than adding 2^63 + 1.
|
| If you put enough data into a hash table the key length has
| to increase logarithmically to the table size in order to
| have distinct keys per record. Even Knuth points out that
| hash tables are really nlogn - something I'm pretty sure my
| CS professors left out. In multiple classes. Man, did I get
| tired of hash tables, but I see now why they harped on them.
| Case on point, this article.
| solardev wrote:
| For those of us without a formal comp sci background, is there
| a way for the IDE to detect and warn about these automatically?
| Or any other easy telltale signs to look for?
|
| As a self taught dev, when I encounter nested loops, I have to
| mentally go through them and try to see if they iterate through
| each item more than once. But that's not a very foolproof
| method.
| chacha102 wrote:
| Too much domain knowledge for an IDE to catch. I'm self
| taught as well, and it comes down to spending more time
| thinking about the code than writing the code.
|
| It's a fairly simple thought experiment to ask yourself what
| if there was 10x items in this array? 100x? That is
| essentially what the O(n) notation is trying to quantify. You
| just don't need to do it that formally.
| Arcuru wrote:
| 48 hours is a crazy amount of time to spend just to compress a
| git folder, it's only a couple GB. 41 minutes still seems like
| quite a long time.
|
| Why aren't they just snapshotting and archiving the full git
| repo? Does `git bundle` add something over frequent ZFS backups?
| bombela wrote:
| > Be aware that even with these recommendations, syncing in
| this way has some risk since it bypasses Git's normal integrity
| checking for repositories, so having backups is advised. You
| may also wish to do a git fsck to verify the integrity of your
| data on the destination system after syncing.
|
| https://git-scm.com/docs/gitfaq#_transfers
|
| It doesn't tell you how to make a backup safely though.
|
| On a personal scale, Syncthing and Btrfs snapshots work plenty
| good enough. It's as fast as the storage/network too.
| nightfly wrote:
| Syncthing is the only way I've ever corrupted a git repo
| before
| hobofan wrote:
| I think that's why they specified the "BTRFS snapshots"
| part. Yes, directly syncing a .git directory seems like a
| recipe for disaster with how often I've seen individual
| files lagging to sync, but I guess with BTRFS snaphots one
| can ensure that only a consistent view of a git directory
| is being backed up and synced.
| bombela wrote:
| Nah I truly do it the wrong way around. Syncthing on the
| git repos. And one of my device in the Syncthing cluster
| does btrfs snapshots minutely for recovery and further
| backups.
|
| Because it's at a personal scale, the only time I can
| corrupt a git repo is if I work on the same repo (and
| it's workdir) from more than one device in the time it
| takes for Syncthing to replicate the changes.
|
| But even then it's not a big deal because git fsck is
| quick. And I have my snapshots, and the syncthing
| versioning, and git defaults to two weeks before pruning.
| And because of how git works, using hash to identify
| contents, files are not easily overwritten either.
|
| In 10y I only had one git corruption (I ran a command on
| the same repo on a different machine via ssh, yielding a
| synctning conflict). Syncthing kept copies of the
| conflict file. One commit disappeared from the history
| but not from the database. It was easy to rebase the
| changes. I think I used git fsck to deleted the syncthing
| versioned files.
| ndriscoll wrote:
| If filesystem snapshots weren't safe, wouldn't that also mean
| git is prone to corrupting your repo in the event of a power
| loss or crash? That seems like a bad bug.
| unsnap_biceps wrote:
| zfs snapshots are difficult to offsite in non-zfs replicas, say
| like an S3 bucket.
|
| That said, there's another less known feature that bundles help
| out with when used with `git clone --bundle-uri` The client can
| specify a location to a bundle, or the server can send the
| client the bundle location in the clone results and the client
| can fetch the bundle, unpack it, and then update the delta via
| the git server, so it's a lot lighter weight on the server for
| cloning large repos, and _a ton_ faster for the client for
| initial clones.
| nightfly wrote:
| ZFS can send to file or whatever you want to pipe to, you can
| have incremental sends, and if you convert to bookmarks on
| the sender you don't have to keep the historical data after
| you send it
| benlivengood wrote:
| I think if you want consistent snapshot backups on non-zfs
| destinations the safest thing is to clone the snapshot and
| rsync from the clone. Not a single-step operation but
| preserves the atomicity of the snapshot.
|
| EDIT: you could also rsync from a .zfs snapshot directory if
| you have them enabled.
| AtlasBarfed wrote:
| ... so they added caching to things that should have been
| cached?
|
| ... is this really the way people "back up" git repos? I mean,
| it is git, so isn't there some way to mirror changes to the
| repo in another repo and just use ZFS / snapshots / backup
| software / etc to do that? It's a distributed version control
| system. Just make sure the version control information is ...
| distributed?
| broken_broken_ wrote:
| Reading the article I thought exactly the same! I'd be curious
| to know how much time the same would take with zfs.
| gregorvand wrote:
| 15 years also seems like a long time
| DamonHD wrote:
| Here comes an unpopular nitpick: "... we traced the issue to a
| 15-year-old Git function with O(N2) complexity and fixed it with
| an algorithmic change, reducing backup times exponentially."
|
| Uh no you didn't. Not possible. At most a polynomial reduction is
| possible else complexity theory needs a re-write.
|
| (OK, yes, k could be doing some heavy lifting here, but I doubt
| it.)
|
| If you are going to quote a maths formula then please don't use
| "exponetially" to mean "lots".
|
| I stopped reading there: I don't want to have to read each word
| and wonder if they actually meant it, or it's just bad hype.
| immortaljoe wrote:
| OP here. Feedback is always welcome, I did mean exponentially
| in the colloquial sense. I do see how it is confusing here,
| will change it.
| DamonHD wrote:
| Thank you.
|
| (I don't think that anyone should use "exponentially" that
| way: it is an art term with a specific and particular
| meaning, so find another word if you mean something else!
| Like misusing specific legal or sporting terms...)
| ctxc wrote:
| Art term?
| wizzwizz4 wrote:
| https://en.wiktionary.org/wiki/term_of_art
| ctxc wrote:
| Ah, thanks. I could only think of art as in literal,
| artistic art.
| kccqzy wrote:
| If you have that tendency, you just need to think of
| TAOCP, this industry's best distillation of intellectual
| achievement, with the word "art" in its name.
| taftster wrote:
| Which term is appropriate here? What would you suggest?
| (honest question)
| DamonHD wrote:
| "hugely" or "a lot" or "to O(XXX)" whatever the new XXX
| complexity is.
| serial_dev wrote:
| You can't use exponentially in the colloquial sense in a post
| about software performance.
|
| If my barista says that his new coffee is exponentially
| better, it's ok to use it colloquially.
|
| If my git hosting provider writes about an impactful
| performance improvement, it's not.
| bombela wrote:
| I can forgive exponential as a figure of speech. You missed
| when they describe using a map for the side effect of [key]
| dedupliction.
|
| Instead of saying "set". The code itself uses the type
| "strset".
| Lammy wrote:
| > What this means for GitLab customers -- [a bunch of stuff about
| how customers can now back up more frequently and more robustly]
|
| Realtalk: They should rewrite this post's headline to be in a
| positive tense instead of leading with a negative word. I'm glad
| I read the post, because it is a cool and good fix, but I saw
| "Decreasing [...] repo backup" and my first thought was that it
| was an announcement of some service downgrade like some sort of
| cost-cutting measure.
| serial_dev wrote:
| I don't think it's unreasonable to expect interested people to
| read five words from the title. I know people don't always do
| that, but complaining about it as if they did something wrong
| is ridiculous.
| robertlagrant wrote:
| No one was complaining.
| divbzero wrote:
| The performance improvement that GitLab contributed to Git is
| slated to be released with v2.50.0:
|
| https://github.com/git/git/commit/bb74c0abbc31da35be52999569...
| Nemo_bis wrote:
| Nice that GitLab is active upstream! Do I see correctly that
| the most active contributors to git are currently GitLab
| employees?
| dfc wrote:
| I dont know how you came to that conclusion. Just looking at
| the top 4 contributors over the last 2 years[1] it looks like
| one works for google(gitster), two work at Github and one
| works at GitLab(pks-t).
|
| [1]: https://github.com/git/git/graphs/contributors?from=6%2F
| 3%2F...
| Nemo_bis wrote:
| I had looked at the monthly activity. Indeed on a longer
| timespan it looks different, thanks.
| dfc wrote:
| That's weird I looked at the monthly and did not see any
| Gitlab employees. Were you looking at a different repo?
| pjmlp wrote:
| A very good example that writing code in C doesn't help for
| performance, when the algorithms or data structures aren't
| properly taken into consideration.
| IshKebab wrote:
| I would say C makes this sort of thing far more likely because
| it's usually a ton of effort to obtain suitable containers. In
| C++ or Rust they have plenty of things like
| `unordered_set`/`HashSet` built in, so people are much more
| likely to use it and not go "eh, I'll use a for loop".
|
| In this case Git already had a string set, but it's still not
| standard so there's a good chance the original author just
| didn't know about it.
| pjmlp wrote:
| Yeah, not something that WG14 will ever care about.
| masklinn wrote:
| > In this case Git already had a string set, but it's still
| not standard so there's a good chance the original author
| just didn't know about it.
|
| The original commit was made in January 2009 (https://github.
| com/git/git/commit/b2a6d1c6868b6d5e7d2b4fa912...), strmap was
| added in November 2020 (https://github.com/git/git/commit/ae2
| 0bf1ad98bdc716879a8da99..., strset was added a few days
| later: https://github.com/git/git/commit/1201eb628ac753af5751
| 258466...). It was first proposed in 2018 (https://lore.kerne
| l.org/git/20180906191203.GA26184@sigill.in... the proposal
| specifically mentions it fixing possibly quadratic sites).
|
| As noted in the comment, git _did_ have a sorted string list
| with bisection search, and that 's from 2008 (and it actually
| dates back to 2006 as the "path list" API, before it was
| renamed following the realisation that it was a generalised
| string list). Though as the hashmap proposal notes, it's a
| bit tricky because there's a single type with functions for
| sorted and functions for unsorted operations, you need to
| know whether your list is sorted or not independent of its
| type.
| morepedantic wrote:
| I've seen this exact problem in C code many times in my life,
| especially in kernel space where data structures and memory
| allocations are fun.
|
| Ironically, this is much _faster_ for small sets. Sometimes
| the error is intentional, because the programmer believes
| that all inputs will be small. IME, those programmers were
| wrong, but that's the inverse of survival bias.
| masklinn wrote:
| And in fairness I can understand not considering someone
| might eventually be bundling a repository with tens of
| thousands of refs back in early 2009.
| ashishb wrote:
| O(n^2) is fast enough to end up I'm production and slow enough to
| cause problems at scale.
|
| The worst troublesome cases of inefficient production are almost
| always O(n^2).
| hinkley wrote:
| Had a modular monorepo back when UML was still a thing people
| did. People were having trouble opening a TogetherJ project -
| it was taking 30 minutes. I dug into it, and I don't recall how
| I timed it but I kept adding more submodules and the runtime
| went up ridiculously fast. I stopped at a project that took 18
| hours to load. Started it before going home one day and timed
| it the next morning.
|
| When I plotted the runtime, I got n^5 for the fitting curve.
| That's the largest polynomial I've encountered in the wild.
| Second place has always been cubic.
|
| Their response was that it had something to do with processing
| config files per module, and a suggestion to rearrange them as
| a workaround. They fixed the problem in the next patch and the
| load time went from 18 hours to a couple minutes.
| divbzero wrote:
| There seems to be a lesson here about the balance between
| premature _vs._ anticipatory optimization. We're generally warned
| against premature optimization but perhaps, as a rule of thumb,
| we should look for optimizations in frequently-called functions
| that are obvious and not onerous to implement.
| Quekid5 wrote:
| If a set-of-strings was trivially available in the source
| language (at time of implementation) the original programmer
| would probably have done this (relatively) trivial
| optimization... This is a symptom of anemic languages like C.
| prymitive wrote:
| Well done, next maybe make the web ui usable because right now
| there's absolutely no distinction between the UI itself and the
| user content, which IMHO combined with action buttons in the
| middle of the page makes for a really poor ux.
| SoftTalker wrote:
| Cool discovery but the article could have been about 1/10 as long
| and still communicated effectively. At least they didn't post it
| as a video, so it was easy to skim to the important details.
| kokada wrote:
| Yes. I read the whole article thinking that this must have been
| generated by LLM, because at least the style remembers it.
| ruuda wrote:
| That was also my thought.
| Scaevolus wrote:
| Em dashes and bullet points!
| jaygreco wrote:
| Glad I wasn't the only one who thought this. The post is also
| missing one obvious thing that I expect in any technical
| post: code snippets. Let me see the code.
|
| ChatGPT has ruined bullet points for the rest of us...
|
| No offense but writing this blog post couldn't take more than
| a few minutes, why spoil it with LLM? Shoot, use one to check
| grammar and recommend edits even.
| bearjaws wrote:
| Don't take my bullet points away from me
| jorvi wrote:
| They came for the em dashes, and I did not speak up. Then
| they came for the bullet points, and I did not speak up..
| djdeutschebahn wrote:
| Exactly thought the same. Reading experience of the post
| would have been definitely improved, with less text.
| 1oooqooq wrote:
| it could have been longer. I still don't know why they were
| doing backup bundles with two refs :)
| 6LLvveMx2koXfwn wrote:
| They weren't, if you look at the fix [1] the dedupe loop was
| run in _all_ cases, not just those with known dupes, so the
| performance hit was any bundle with lots of refs.
|
| 1.https://github.com/git/git/commit/bb74c0abbc31da35be5299956
| 9...
| rjmunro wrote:
| But why couldn't they just dedupe the refs from the command
| line before starting the actual bundling - surely there are
| never more than a couple of hundred of those (one per
| branch)?
| 6LLvveMx2koXfwn wrote:
| The point is the performance hit had nothing to do with
| dupe count (which could be zero), and everything to do
| with ref count.
| davidron wrote:
| For those that haven't read the article yet, scroll down to the
| flame graph and start reading unit it starts talking about back
| porting the fix. Then stop.
| wgjordan wrote:
| "How we decreased reading 'How we decreased GitLab repo
| backup times from 48 hours to 41 minutes' times from 4.8
| minutes to 41 seconds"
| hliyan wrote:
| Interesting that multiple people are noticing this same thing.
| For me, this could have been:
|
| "We found that a large portion of the 48 hours taken to backup
| our rails respository was due to a function in `git bundle
| create` that checks for duplicate references entered as command
| line arguments. The check contained a nested for loop ( O(N2)
| ), which we replaced with a map data structure in an upstream
| fix to Git. The patch was accepted, but we also backported fix
| without waiting for the next Git version. With this change,
| backup time dropped to 41 minutes."
| m104 wrote:
| It's the "impact" style of technical write-ups: sell the
| problem and the scale, then present the solution, which is thus
| presented and understood through the lens of business and
| customer success.
|
| Generously, this writing style is supposed to show the business
| value of teams and individuals, for promotions or other
| recognition. But yeah, it can be frustrating to read this
| style.
| bob1029 wrote:
| I'm confused why you wouldn't simply snapshot the block-level
| device if the protocol of the information on top is going to
| cause this much headache. Quiescing git operations for block
| level activity is probably not trivial, but it sounds like an
| easier problem to solve to me.
|
| This is the approach I've taken with SQLite in production
| environments. Turn on WAL and the problem gets even easier to
| solve. Customer configures the VM for snapshots every X minutes.
| Git presumably doesn't have something approximating a WAL, so I
| understand the hesitation with this path. But, I still think the
| overall strategy is much more viable and robust to weird edges
| within git.
| nonameiguess wrote:
| Gitlab isn't just a managed service. They release the software
| for self-hosted instances as well. There is no guarantee or
| requirement that users all run Gitlab on the same filesystem or
| even a filesystem that supports block level snapshots at all.
| Presumably, they want a universal backup system that works for
| all Gitlabs.
| bob1029 wrote:
| I've never heard of a .git folder that spanned multiple
| filesystems. It sounds like we are now conflating the git
| workspace with everything else in the product.
|
| There are system requirements that a customer would be
| expected to adhere to if they wanted a valid enterprise
| support contract with one of these vendors.
| to11mtm wrote:
| I think GP's point is that the filesystems used by someone
| self-hosting gitlab may not be the same as what gitlab
| themselves are using.
|
| File systems can be weird. Sometimes the OS can be weird
| and fsync type calls may not do what you expect. At least
| at one point MacOS fsync didn't behave the same way as
| Linux (i.e. Linux should ensure the write is truly done and
| not just in cache so long as the drive isn't lying). [0]
|
| > There are system requirements that a customer would be
| expected to adhere to if they wanted a valid enterprise
| support contract with one of these vendors.
|
| Gitlab has a community edition. Not handling data well
| would be bad for their public image.
|
| [0] - https://news.ycombinator.com/item?id=30372218
| nonameiguess wrote:
| I can't reply to the other reply for some reason, but what
| they said is indeed what I meant. gitlab.com might be
| running their Gitaly servers on btrfs or zfs or lvm volumes
| or whatever, but other customers may be using ext2. Gitlab
| the company _could_ require customers to only run Gitaly on
| a specific filesystem, but up to now, they never have, it
| would be pretty shitty to suddenly change their minds after
| a decade plus of establishing one expectation, and whoever
| the developer is who submitted the patch to upstream Git
| and got a technical blog post out of it has absolutely no
| power to dictate contract terms to enterprise customers
| personally and instead did what is actually in their power
| to do.
| amtamt wrote:
| > This is the approach I've taken with SQLite in production
| environments. Turn on WAL and the problem gets even easier to
| solve.
|
| A few months back a better solution was provided by SQLite:
| sqlite3_rsync
|
| https://www.sqlite.org/rsync.html
| lbotos wrote:
| > Git presumably doesn't have something approximating a WAL, so
| I understand the hesitation with this path.
|
| Bingo. One of the worst problems is helping a client piece back
| together a corrupted repo when they are using snapshots. Check
| my profile to see how I know. :)
|
| It's usually an OMG down scenario, and then you are adding in
| the "oh no, now the restore is corrupted."
|
| It's fixable but it's definitely annoying.
| einpoklum wrote:
| I had a somewhat similar experience when writing a "remove
| duplicates" extension for Thunderbird:
|
| https://addons.thunderbird.net/en-US/thunderbird/addon/remov...
|
| I used a hash to begin with, using a simplistic digest function
| to the message headers I was comparing, getting me a 4-byte hash
| key.
|
| That worked, but was kind of slow.
|
| Finally, the idea came to me to not apply _any_ digesting, and
| use the combined concatenated headers of the messages as hash
| keys. About 2k bytes per hash key!
|
| The result: About 20x perf improvement if memory serves.
|
| How is that possible? The reason is that the code all runs in a
| Javascript machine; and applying the digest was not a built-in
| function, it was looping over the headers and doing the
| arithmetic. Thousands upon thousands of JS abstract machine
| steps. The use of the large hash key may be inefficient, but -
| it's just one JS object / dictionary operation, and one of the
| most heavily-optimized in any implementation.
| sneak wrote:
| ...and they say leetcode-style problems are out of date.
|
| While perhaps implementing low level sorts is silly, knowing
| which data structures to use and when, and what underlying big-o
| performance they have, is critical, as demonstrated here.
| kristianp wrote:
| It would have been nice if the post included some details, such
| as before and after code.
| niffydroid wrote:
| Why use this method over just having a script do a clone or pull
| elsewhere? Got is about distribution right?
| iamcreasy wrote:
| Cool! Please post another flame graph after applying the patch.
| tcdent wrote:
| TLDR if you only add objects to an array that are already not
| contained in said array you won't have to iterate back through it
| to remove the duplicates you created.
|
| wild that this a pattern like this would be part of git-core, but
| I guess we all overlook stuff on a regular basis
| katella wrote:
| I don't know much about git backups, but why would there be race
| conditions if you just create the backup off the local repo
| instead of remote?
| KETpXDDzR wrote:
| With git it should be straightforward to implement an
| incremental, dedup backup solution since all objects are stored
| with their hashs in filename.
| mdaniel wrote:
| Run `man git-repack` or its `man git-gc` friend and recall that
| most filesystems _hate_ dealing with a bazillion small files
|
| I think there have been several attempts to use S3-ish
| blobstores as git backends but of the two major git hosters,
| only one of them is MIT licensed and they don't attempt that
| stunt, so safe to assume it's not a viable approach
| DeepYogurt wrote:
| I miss a good tech blog post. Cheers for this
| ianks wrote:
| Reminds me of the timeless advice I got for acing leet code
| interviews: "remember to use a hash map." Tends to work out
| pretty well.
| SkiFire13 wrote:
| > Ultimately, we traced the issue to a 15-year-old Git function
| with O(N2) complexity and fixed it with an algorithmic change,
| reducing backup times exponentially.
|
| I have yet to finish the article, but this means they improved
| the complexity to something like O(logN), _right_? I hate when
| people confuse quadratic improvement for exponential ones.
| globular-toast wrote:
| Really annoying to see someone use exponential wrong when
| they're talking about performance of algorithms. We're supposed
| to know what this means! They went from quadratic to linear.
| jas39 wrote:
| What a poor write-up, neither defining the problem nor the
| solution with any clearity. Did anyone come away with any
| information that can be used for anything?
| fHr wrote:
| absolute chads, nice!
| hinkley wrote:
| This feels like some of the problems I've ended up tackling. The
| people too close to the problem don't see it so someone has to
| reach across the aisle and make something happen in code they
| normally wouldn't touch.
|
| Even with the quadratic complexity I suspect this problem has
| been brewing for a while. This commit has been biding its time
| for 16 years waiting to ruin someone's day (let's not confuse the
| utility of Make it Work early in a project with justifying
| _retaining_ that code in perpetuity. Make it Right, Make it
| Fast.) Backups have probably been going over six hours for years
| and steadily climbing.
|
| "We" feels a lot like one or two people jumping on a grenade.
| gbacon wrote:
| Algorithmic improvements beat micro-optimizations.
| James_K wrote:
| Perhaps nested for loops should be some kind of compiler warning.
| abhashanand1501 wrote:
| Lot of comments complaining that going from O(n^2) to O(log n) is
| not an exponential improvement, but it is indeed an exponential
| improvement. In fact O(n) is exponentially more than O(log n).
___________________________________________________________________
(page generated 2025-06-08 23:02 UTC)