[HN Gopher] "Bugs are 100x more expensive to fix in production" ...
       ___________________________________________________________________
        
       "Bugs are 100x more expensive to fix in production" study might not
       exist (2021)
        
       Author : rafaepta
       Score  : 19 points
       Date   : 2025-06-01 21:02 UTC (1 hours ago)
        
 (HTM) web link (www.theregister.com)
 (TXT) w3m dump (www.theregister.com)
        
       | pdimitar wrote:
       | > _Laurent Bossavit, an Agile methodology expert_
       | 
       | Congratulations, you got me to stop reading just at the start of
       | the article.
       | 
       | On topic, I don't think any good engineer ever claimed the title
       | of the article. The "more expensive" part stems from having to
       | rush and maybe do a sloppy job, introducing regressions, higher
       | hosting costs or other maladies.
       | 
       | So the "higher cost" might just be a compounding value borne out
       | of panicky measures. Sometimes you really do have to get your
       | sleeves rolled up and timebox any fix you have in mind and just
       | progress and/or actually kill the problem. Often though, you just
       | deflect the problem to somewhere else temporarily where the
       | "bleeding" will not be as significant. Which buys you the time to
       | do a better job.
       | 
       | Titles like those of the articles are highly dramatized. I am
       | surprised any serious working person ever took them seriously.
        
         | tobyjsullivan wrote:
         | > I don't think any good engineer ever claimed the title
         | 
         | I don't claim to be a good engineer but I have made the claim
         | in the title many times. Though it's usually in the form of a
         | more nuanced statement.
         | 
         | It's about time, rather than money. If you can change a line of
         | code to fix a bug before making your commit, that's a lot
         | faster than all the rigmarole of shipping the same fix later
         | (new PR, code review, wait for CI, merge, deploy, etc.). Not to
         | mention troubleshooting and debugging effort.
         | 
         | The multiplier depends on your context but the bar isn't high.
         | 100x a 30-second fix is about an hour of effort. I've worked in
         | several teams where the average effort to change a line on prod
         | approached that (with honest measurement, including context
         | switching costs)
        
           | TeMPOraL wrote:
           | I believe in a soft form of that too (i.e. no specific
           | numbers); the severity really depends on the type of project.
           | 
           | In few industrial and enterprise projects I worked on, once
           | you cross past "testing", fixing a bug involved coordinating
           | with another team, which was doing a test deployment or
           | evaluation at customer site; at that point, extra process
           | would kick in, and if it was severe enough (or you were
           | unlucky enough) the customer side got wind of it, you could
           | expect some extra e-mail rounds and possibly a meeting.
           | 
           | Now, if your bug didn't get noticed then, and failed in
           | actual production... the time and cost multiplier was
           | effectively unbounded. A fix for a simple and low-impact bug
           | could take a week to get from commit to being ready to
           | release, and then wait a month in limbo, because there are
           | schedules to these kinds of projects (can't really do
           | continuous delivery if each release triggers a validation and
           | sign-off process on customer's end, that engages a team of
           | people for a day or more). A fix for a more complex or
           | impactful bug could become... the last thing you release, if
           | the customer gives up on your product over it. Etc.
           | 
           | People like to focus on technical aspects (restoring
           | databases, hard-to-reach hardware, etc.) when discussing this
           | concept, but there are whole classes of projects where the
           | driving aspect is bureaucracy - coordinating business side,
           | altering long-term project plans, getting sign-off on
           | testing, re-evaluating regulatory compliance, etc. That can
           | quickly get arbitrarily expensive.
        
       | tobyjsullivan wrote:
       | Subtitle:
       | 
       | > It's probably still true, though, says formal methods expert
       | 
       | Seems like click bait. The thesis is predicated on the idea that
       | people claim this is the result of some study. I've never once
       | heard it presented that way. It's a rule of thumb.
        
         | TeMPOraL wrote:
         | I'm quite sure that, over the years, I've seen this claim
         | presented many times with a citation or at least reference
         | pointing at a study somewhere; can't find any particular
         | example right now, unfortunately.
         | 
         | (This claim sits in my memory adjacent to things like "fixed
         | number of bugs per 1000 lines of code", in a bucket labeled
         | "seen multiple times, supposedly came out of some study on
         | software engineering, something IBM or ACM or such".)
        
       | tanseydavid wrote:
       | I don't get the hairsplitting here--it seems obvious to me that
       | if you build the wrong feature, you have to replace it with
       | something else which needs building as well as something akin to
       | demolition of the first feature.
       | 
       | Repeat this cycle more than once for the same feature and it
       | clearly accrues to real impact...
       | 
       | the 100x may be exaggerated but that's beside the point to me --
       | I think even 2x or 3x on a feature is regrettable and oftentimes
       | avoidable
        
       | igouy wrote:
       | 2021
        
       | michaelmrose wrote:
       | Fixing it in production means it could have effected your
       | production users who in turn couldn't do whatever it is that they
       | do to actually make or give you money with an unknown but
       | potentially significant effect on your bottom line.
       | 
       | It also involves more and oft more senior people who are paid
       | more as it must be triaged, assigned, and managed.
       | 
       | Whilst it is unlikely that this falls exactly neatly on different
       | orders of magnitude eg exactly 10 and 100x more if its taken to
       | mean that its substantially and very substantially more expensive
       | this seems fine.
        
         | jerlam wrote:
         | Get a reputation for buggy, unreliable software and soon you
         | won't have a lot of paying customers. Doesn't really fall under
         | the definition of "fixing bugs" but a lot more impactful.
        
       | janice1999 wrote:
       | If you ship firmware to devices, it could be far more expensive.
       | [1]
       | 
       | [1] https://www.bleepingcomputer.com/news/hardware/botched-
       | firmw...
        
       | bravesoul2 wrote:
       | Actual title:
       | 
       | Everyone cites that 'bugs are 100x more expensive to fix in
       | production' research, but the study might not even exist
        
       | 0xbadcafebee wrote:
       | Forget the study, let's just do a simple thought experiment. Your
       | developer gets paid $140k/yr (let's round up to ~$70/hr). Let's
       | say a given bug found in testing takes 1 hour to fix; that's $70
       | (not counting the costs of ci/cd etc). If they miss it in test,
       | and it hits production, would it cost $7,000 to fix? Depends what
       | you mean by "bug", what it affects, and what you mean by "fix in
       | production".
       | 
       | - Did you screw up the font size on some text you just published?
       | Ok, you can fix that in about 5 seconds, and it affects pretty
       | much nothing. Doesn't cost 100x.
       | 
       | - Did your sql migration just delete all records in the
       | production database? Ok, that's going to take longer than 5
       | seconds to fix. People's data is gone, apps stop working, the
       | lack of or bad data fed to other systems causes larger downstream
       | issues, there's the reputational harm, the money you'll have to
       | pay back to advertisers for their ads / your content being down,
       | and all of that multiplied by however long it takes you to
       | restore the database from backup (um... you do test restoring
       | your backups... right?). That's closer to 100x more expensive to
       | fix in production.
       | 
       | - Did you release a car, airplane, satellite, etc with a bug?
       | We're looking at potentially millions in losses. Way more than
       | 1000x.
       | 
       | And those are just the easy ones. What about a bug you release,
       | that then is adopted (and depended on) by downstream api
       | consumers, and that you then spend decades to patch over and
       | engineer around? How about when production bugs cause your
       | product team to lose confidence in deployments, so they spend
       | weeks and weeks to "get ready" for a single deploy, afraid of it
       | failing and not being able to respond quickly? That fear will
       | dramatically slow down the pace of development/shipping.
       | 
       | The "long tail" of fixing bugs in production involves a lot more
       | complexity than in non-production; that's where the extra cost
       | comes from. These costs could end up costing 10,000x over the
       | long term, when all is said and done. Security bugs, reliability
       | bugs, performance bugs, user interface bugs, etc. There's a
       | universe of bugs which are much harder/costlier to fix in
       | production.
       | 
       | But you know what is certain? It always costs more to fix in
       | production. 1.2x, 10x, 1000x, that's not the point; the point is,
       | fix your bugs before it goes to production. ("Shift Left" is how
       | we refer to this in the DevOps space, but it applies to
       | everything in the world that has to do with quality. Improve
       | quality before it gets shipped to customers, and you save money
       | in the long run.)
        
         | Supermancho wrote:
         | Bugs are more like viruses in practice. The cost is useful to
         | measure in negative lifespan, not cost to fix, per se. This is
         | why many bugs are never fixed. Those cost nothing to fix,
         | because they don't have to be.
         | 
         | > Did your sql migration just delete all records in the
         | production database? That's closer to 100x more expensive to
         | fix in production.
         | 
         | Companies that do this often, don't stay in business. It's not
         | 100x more expensive if you're not in business. Survivorship
         | ensures that classes of bugs don't have a consistent negative
         | return, because they are often fatal.
        
         | TuringNYC wrote:
         | >> Did you screw up the font size on some text you just
         | published? Ok, you can fix that in about 5 seconds, and it
         | affects pretty much nothing. Doesn't cost 100x.
         | 
         | Actually, i find these to be worse, because i've been in scrum
         | meetings where 6 people spend 2 minutes talking about this bug,
         | then another 2 minutes talking about the QA of it the next day.
         | Tiny issues are very expensive to fix if you have formulaic
         | team members who arent taking the reigns.
        
       | faizshah wrote:
       | I have some thoughts on this (in the context of modern SaaS
       | companies).
       | 
       | The most expensive parts of fixing a bug are
       | discovering/diagnosing/triaging the bug, cleaning up corrupted
       | records, and customer communication. If you discover a bug in
       | development or even better while you are coding the function or
       | during a code review you get to bypass triaging, customer calls,
       | escalations, RCAs, etc. At a SaaS company with enterprise
       | customers each of those steps involves multiple meetings with
       | your Support, Account Manager, Senior Engineer, Product Manager,
       | Engineering Manager, Department Manager, sometimes Legal or a
       | Security Engineer and then finally the actual coder. So of course
       | if you can resolve an issue (at a modern SaaS company) during
       | development it can be 10-100x less expensive just because of how
       | much bureaucracy is involved in running a large scale enterprise
       | SaaS company.
       | 
       | It also brings up the interesting side effect of companies
       | adopting non-deterministic coding (AI Code) in that now bugs that
       | could have been discovered during design/development by a human
       | engineer while writing the code can now leak all the way into
       | prod.
        
       | cpeterso wrote:
       | HN discussion about the Register article from 2021:
       | https://news.ycombinator.com/item?id=27917595
       | 
       | HN discussion about the original blog post from 2021:
       | https://news.ycombinator.com/item?id=27892615
        
       | dlcarrier wrote:
       | Waterfall development never existed, either.
       | 
       | Business management and self help publishing long predates
       | research, and nothing has changed. For some reason, software
       | development has been extra susceptible to their nonsense.
        
       ___________________________________________________________________
       (page generated 2025-06-01 23:00 UTC)