[HN Gopher] Spending 3 months investigating a 7-year old bug and...
       ___________________________________________________________________
        
       Spending 3 months investigating a 7-year old bug and fixing it in 1
       line of code
        
       Author : asicsp
       Score  : 185 points
       Date   : 2024-06-21 13:49 UTC (9 hours ago)
        
 (HTM) web link (lemmy.world)
 (TXT) w3m dump (lemmy.world)
        
       | m3kw9 wrote:
       | These one line fix always seem like a stupid bug , but in reality
       | most bugs are like this and the fix is in the discovery
        
         | xeromal wrote:
         | One of the reasons I struggle to give ETAs on fixing a bug. The
         | moment I know what the issue is, the solution to fix it is
         | usually already figured out barring a rearchitecture of some
         | services or infrastructure.
        
       | shermantanktop wrote:
       | This kind of bug is always an emotional rollercoaster of
       | anticipation, discovery, disappointment, angst, self-criticality,
       | and satisfaction.
        
       | TehShrike wrote:
       | I particularly liked this part:
       | 
       | > Knowing very little about USB audio processing, but having cut
       | my teeth in college on 8-bit 8051 processors, I knew what kind of
       | functions tended to be slow. I did a Ctrl+F for "%" and found a
       | 16-bit modulo right in the audio processing code.
       | 
       | That feeling of saving days of work because you remember a clue
       | from previous experience is so good.
        
         | djoldman wrote:
         | Those are the times one gets the opposite of imposter syndrome.
        
           | ASalazarMX wrote:
           | Fortunately it's a temporal state, otherwise there's risk of
           | entering the Dunning-Kruger effect.
           | 
           | "You did awesome, but don't let it go to your head."
        
             | dylan604 wrote:
             | F-that! That's one of those times where I re-enact the
             | scene from the Bond Golden Eye film where the guy jumps up
             | extending both arms yelling "Yes! I am invincible!" Of
             | course I totally expect the hubris to be short lived, just
             | maybe not with liquid nitrogen
             | 
             | https://www.youtube.com/watch?v=fXW02XmBGQw
        
               | squigz wrote:
               | I alternate between "I am the best programmer to ever
               | exist" and "I am completely incompetent at this and I
               | should quit" while debugging.
        
               | dylan604 wrote:
               | I've been known to inform people that the person that
               | wrote the incredibly horrendous code that caused whatever
               | problems to occur should be fired immediately knowing
               | good and well that I was the only dev to write any of the
               | code.
        
         | nikanj wrote:
         | This essentially is why senior engineers get much bigger
         | salaries
        
           | mulmen wrote:
           | As a total newbie I saved my company a quarter million
           | dollars in Oracle licensing in a single afternoon by
           | rewriting a PL/SQL function. That change was a few lines of
           | SQL. Seniors don't have a monopoly on good ideas.
           | 
           | Salary is driven by market conditions and nothing else. It is
           | not an approximation of merit or even delivered value.
        
             | close04 wrote:
             | Statistically speaking a senior (more experienced) engineer
             | is more likely to consistently deliver time saving results,
             | while a junior is more likely to occasionally do it, if
             | ever.
             | 
             | Proving it's not a one time thing is what pushes you in the
             | salary and seniority ranking.
        
               | nashashmi wrote:
               | Senior engineers have less opportunity to write time
               | consumingly careful code because they get paid so much.
               | Much easier to throw new great hardware at it.
        
               | fifilura wrote:
               | Senior engineers have less time to write code period.
               | 
               | And this is what saves the day.
               | 
               | Code is a liability.
        
               | JonChesterfield wrote:
               | The corporate structures that reward people who prove
               | especially good at building the product with more
               | meetings and less time building the product are perhaps
               | not optimal in their deployment of resources.
               | 
               | Maximising the fraction of the product built by people
               | who don't know what they're doing would however explain
               | the emergent properties of modern software.
        
               | dgfitz wrote:
               | Senior engineers can write time-consuming, careful code
               | efficiently. This is why they are seniors.
        
             | stavros wrote:
             | This is laughably false. The highly-paid, experienced
             | seniors produce so much more value than juniors that it's
             | not even in the same ballpark. It's also usually the kind
             | of value that juniors don't even notice, because it's not
             | measured in lines of code.
             | 
             | A good junior will write a hundred lines of code in a day.
             | A good senior will delete a hundred because they realize
             | the user dictated a solution instead of detailing their
             | problem, asked them, and figured out that they can solve
             | that problem by changing a config variable somewhere.
        
             | johnnyanmac wrote:
             | Not a monopoly, but a majority. Many juniors who do have
             | that potential don't ever get put in such a situation.
             | 
             | Junior/senior isn't necessarily about skill level; I'm sure
             | many can find a senior with 1YOE ten times over. It's about
             | trust both in technical and sociopolitical navigation
             | through the job. That's only really gained with time and
             | experience (and yes, isn't perfect. Hence, the
             | aforementioned 1x10 senior. Still "trusted" more than a 1
             | year junior).
        
         | drewg123 wrote:
         | If modulus is expensive, and he's checking a power-of-2, why
         | not just use a bitwise AND.
         | 
         | Eg, for positive integers, x % 16 == x & 15. That should be
         | trivially cheap.
        
           | ladberg wrote:
           | It wasn't `x % 16` it was `x % y` where x and y are 16-bit
           | integers. A compiler would also have taken care of it if it
           | were just a literal.
        
             | drewg123 wrote:
             | Whoops.. I misread what he was doing.
        
         | beebmam wrote:
         | why is this not considered a compiler optimization and/or
         | language problem? it seems to me that compiler optimizations
         | for expressive programming languages should be able to handle
         | something like this
        
           | JonChesterfield wrote:
           | What would you hope a compiler to optimise x % y into?
           | 
           | Higher level change-the-algorithm aspirations haven't really
           | been met by sufficiently smart compilers yet, with the
           | possible exception of scalar evolution turning loops into
           | direct calculation of the result. E.g. I don't know of any
           | that would turn bubble sort into a more reasonable sort
           | routine.
        
       | pelagicAustral wrote:
       | Some of the stuff I've struggled with the most over the years
       | have been SQL constraints that are not documented. I remember
       | (probably like 10 years ago), I deployed an update to an ancient
       | Windows Forms implementation that deprecated some login and
       | instead made use of Windows Authentication. It worked like a
       | charm for all users, but one! Checked everything, replicated the
       | machine, tried so many weird stuff, and in the end, what was
       | happening is that the "Users" table had a constraint in the
       | number of characters for the username. This username was over the
       | limit and was not being validated... Another one was a report
       | that was giving the wrong amount, but getting the data from
       | database seemed to do the math right... it was the damn Money
       | datatype, changed to decimal, done...
        
       | readthenotes1 wrote:
       | "...it was based on a USB product we had already been making for
       | PCs for almost a decade.
       | 
       | This product was so old in fact that nobody knew how to compile
       | the source code. "
       | 
       | I think you mean "Management was so bad, nobody knew how to
       | compile the source code".
       | 
       | There are plenty of systems out there that can and and plenty
       | that cannot be reproduced from source. The biggest difference is
       | the card taken to do so, not the age.
        
       | brogrammernot wrote:
       | This exact type of thing is why when I switched to the dark side
       | (product) and sat in management meetings where often non-
       | technical folks would go "we could measure by lines of code or
       | similar" for productivity I often pointed out how that was a bad
       | idea.
       | 
       | Did I win? Of course not, it's hard for non-technical people to
       | fully appreciate these things and any sort of larger
       | infrastructure work, esp for developer productivity because it
       | goes back to well how you going to measure that ROI.
       | 
       | Anyways, this was fun to read and brought back good engineering
       | memories. I'd also like to say, as it brought back a bug I chased
       | forever, fuck you channelfactory in c#.
        
         | jonathanlydall wrote:
         | I really miss working with WCF, said no one ever.
        
         | neonsunset wrote:
         | Troubleshooting vendor WCF SDK version mismatch was not fun,
         | and the guy who had to reverse engineer it to attempt a .NET
         | Core port probably lost a few years off his lifespan (this was
         | before CoreWCF was a thing).
         | 
         | When people bash gRPC today, they don't know of the horrors of
         | the past.
        
           | brogrammernot wrote:
           | Yeah, I've lived the life of straddling .NET Core and ASP.NET
           | while also dealing with React vs Angular2+ and having half of
           | the system in the script bundling hell that was razor views
           | and all sorts of craziness.
           | 
           | That experience is actually what led me to switch over to
           | Product among other things, I get it when people joke (half
           | joke) about considering retirement rather than going through
           | that again.
        
             | neonsunset wrote:
             | At the time, we had already been using React for front-end
             | widgets so migrating most other parts to then latest .NET
             | Core 3.1 went surprisingly smooth. There were a couple of
             | EF queries that stopped working as EF Core disabled
             | application side evaluation by default, but that was
             | ultimately a good thing as the intention wasn't to pull
             | more data than needed.
             | 
             | Instead, the actual source of problems was K8S and the huge
             | amount of institutional knowledge it required that wasn't
             | there. I still don't think K8S is that good, it's useful
             | but it and containerized environments in general to this
             | day have a lot of rough edges and poorly implemented design
             | aspects - involved runtimes like .NET CLR and OpenJDK end
             | up having to do special handling for them because reporting
             | of core count and available memory is still scuffed while
             | the storage is likely to be a network drive. The latter is
             | not an issue in C# where pretty much all I/O code is non-
             | blocking so there is no impact on application
             | responsiveness, but it still violates many expectations.
             | Aspects of easy horizontal scaling and focus on lean
             | deployments are primarily more useful for worse languages
             | with weaker runtimes that cannot scale as well within a
             | single process.
             | 
             | I suppose, a silver lining to your situation on the other
             | hand is that developers get to have a PO/PM with strong
             | technical background which makes so many communication
             | issues go away.
        
         | Swizec wrote:
         | Have you ever suggested that management/leadership should
         | measure productivity by lines of document text written? They
         | might better grok how that's a bad idea. Especially since many
         | of them much prefer to communicate in bullet-pointed slides
         | than documents.
        
       | pvaldes wrote:
       | _" I also ended up needing to find a Perl script that was buried
       | deep in some university website. I still don't know anything
       | about Perl, but I got it to run"_
       | 
       | Find dusty Perl script forgotten for years. Still works
       | 
       | Not the first time that I hear that
        
         | nikanj wrote:
         | Outside of javascript, it's a pretty reasonable assumption that
         | if you have the sources, you can get them to run
        
       | creeble wrote:
       | Ha, coincidentally, I designed and built an 8051-based MIDI
       | switch in the early 90's. There weren't that many good tools at
       | the time, and I designed everything from the software and UI to
       | the circuit board and rack-mount case.
       | 
       | I even wrote an 8051 assembler in C, but found a good tiny-C
       | compiler for it before it went into production.
       | 
       | You are not a programmer unless you've written key-debounce code
       | :)
       | 
       | (OTOH, some of the worst programmers I've ever had the
       | displeasure of working with were amazing low-level code hackers.
       | In olden times, it seems like you were either good at that level
       | of abstraction, or you were good at a much different ["higher"]
       | level, seldom both.)
        
       | winrid wrote:
       | Reminds me of fixing an ~11yr old bug in Enemy Territory. I had
       | to spend a night debugging the C code only to realize the issue
       | was in the UI config: https://github.com/etlegacy/etlegacy-
       | deprecated/pull/100/fil...
       | 
       | (IIRC UI scrolled twice for every mouse movement + you couldn't
       | select items in server browser with mouse wheel as it would skip
       | every other one)
        
         | lostlogin wrote:
         | That was such a great game but sadly it seemed to fizzle out.
         | There were lots of neat exploits which made it even better. I
         | also liked the communication style, with pre canned message you
         | could give with certain key combos.
        
           | winrid wrote:
           | There's usually a full server or two. ETLegacy has plenty of
           | players for me.
        
           | Terr_ wrote:
           | In a similar vein, the voice tree from Starseige:Tribes
           | (1998) was mind-blowing for the dialup era.
           | 
           | Ex: VIAB -> "I am attacking the enemy base!"
        
       | magwa101 wrote:
       | Similarly, I spent 6 weeks on a kernel token-ring driver
       | intermittent initialization issue. This required kernel restarts
       | over and over to observe the issue. Breakpoints were useless as
       | they hid the issue. Turns out initialization in a specific step
       | was not synchronous and reading the status was a race condition.
       | It tooks weeks of staring, joking around, thinking, bs'ing, then
       | suddenly, voila. Changed the order of the code, worked.
        
       | rented_mule wrote:
       | The worst I experienced in this direction was also on a consumer
       | device about 15 years ago. Performance was degraded and we
       | couldn't explain it. A team of 5 of us was assembled to figure it
       | out.
       | 
       | We spent over three months on it before finding a root cause. It
       | was over two months before we could even understand how to
       | measure it - we were seeing parts of the automated overnight test
       | suite run taking longer, but every night it would be different
       | tests that were slow. A key finding was that almost everything
       | was slow on some boots of the device and fast on other boots of
       | the device, and there was a reboot before each test was run.
       | Doing some manual testing showed it being close to a 50% chance
       | of a boot leading to slowness. Now what?
       | 
       | I eventually got frustrated and took the brute force / mindless
       | approach... binary search over commits. Unfortunately, that
       | wasn't easy because our build was 45-60 minutes, and then there
       | was a heavily manual installation process that took 10-20
       | minutes, followed by several reboots to see if anything was slow.
       | And there were several thousand commits since the last known good
       | build (the previously shipped version of the device). The
       | build/install/testing process was not easily automated, and we
       | were not on git, otherwise using git-bisect would have been nice.
       | Instead, I spent weeks doing the binary search manually.
       | 
       | That yielded the offending commit. The problem was that it was a
       | massive commit (tens of thousands of lines of code) from a group
       | in another part of the company. It was a snapshot of all of their
       | development over the course of a couple of years. The commit
       | message, and the authors, stated that the commit was a no-op with
       | everything behind a disabled feature flag.
       | 
       | So now it was onto code level binary search. Keep deleting about
       | half of the code in the commit, in this case by chunks that are
       | intended to be inactive. After eventually deleting all the
       | inactive code, there were still a few dozen lines of changes in a
       | Linux subsystem that did window compositing. Those lines of code
       | were all quite interdependent, so it was hard to delete much and
       | keep things functional, so now on to walking through code. At
       | least I could use my brain again!
       | 
       | Using the clue that the problem was happening about half the time
       | and given that this code was in C, I started looking for
       | uninitialized booleans. Sure enough, there was one called
       | something like `enable_transparency`. Disabled code was setting
       | it to `true`, but nothing was setting it to `false` when their
       | system was disabled. Before their commit, there was no variable -
       | `false` was being passed into the initializer call directly.
       | Adding `= false` to the declaration was the fix.
       | 
       | So, well over a year of engineering hours spent to figure out the
       | issue. The upside is that some people on the team didn't know how
       | to proceed, so they spent their time speeding up random things
       | that were slow. So the device ended up being noticeably faster
       | when we were done. But it was pretty stressful as we were closing
       | in on our launch date with little visibility into whether we'd
       | figure it out or not.
        
         | hoten wrote:
         | Oh man, that sounds rough. I salute you.
         | 
         | This probably wasn't an option back then with your toolchain,
         | but it's so reassuring to know modern compilers / ASAN are
         | amazing at catching this class of bugs today.
        
           | leni536 wrote:
           | AFAIK ASAN does not catch uninitialized variables, MSAN does.
           | MSAN is significantly harder to set up.
        
             | JonChesterfield wrote:
             | Branch on uninit lights up beautifully in valgrind which
             | has no set up, just run valgrind ./a.out
        
               | leni536 wrote:
               | Good point, although sometimes valgrind is too slow.
        
         | namrog84 wrote:
         | C++ senior dev here. Of the few teams I've been on, one of the
         | first things I make sure is setup right is cranking up
         | warnings, warnings as errors. Which include things like un
         | initialized variables. I then fix up the errors and make sure
         | they are part of build gates.
         | 
         | These types of problems(undefined behavior and or
         | uninitialized) are often hard(time consuming) to diagnose and
         | fairly common.
         | 
         | Lots of places overlook simple static analysis or built in
         | compile features.
        
       | halifaxbeard wrote:
       | Reminds me of a bug I fixed in yamux, simply because of how long
       | I've had to deal with it. Bug existed for as long as yamux did.
       | (yamux is used by hashicorp for stream muxing _everywhere_ in
       | their products.)
       | 
       | If yamux's keepalive fails/times out, and you're calling Read on
       | a demuxed stream, it blocks forever.
       | 
       | https://github.com/hashicorp/yamux/pull/127
        
       | tommiegannert wrote:
       | Kudos also to the original author for not doing premature
       | optimization, of course. It wasn't until the iPad that it was
       | needed. However, a TODO might have been useful. ;)
        
         | tedunangst wrote:
         | Only for users that didn't use both features at the same time.
         | Users who did probably experienced the same bug, but it took
         | until a critical mass of users reported the bug to get it
         | fixed. At which point the fix probably took four times longer
         | than necessary because the developer was unfamiliar with the
         | design and the toolchain had decayed.
        
       | omoikane wrote:
       | > given a fixed denominator, any 16-bit modulo can be rewritten
       | as three 8-bit modulos
       | 
       | Anybody know what's the exact transformation here? I searched
       | around and found this answer, but it doesn't work:
       | 
       | https://stackoverflow.com/a/10441333
        
         | o11c wrote:
         | If the denominator is a constant, wouldn't it be faster to use
         | the divmod identity to turn it into (divide, multiply,
         | subtract), then use the usual constant-divide-is-multiply-and-
         | shift optimization?
        
       | figassis wrote:
       | The number of times I bumped by head against a desk, after
       | missing multiple deadlines and then out of nowhere having a
       | random moment of clarity such has "this gives me X vibes, but it
       | would be insane if this was actually the case", and then I do a
       | quick string search and there it is.
        
       ___________________________________________________________________
       (page generated 2024-06-21 23:01 UTC)