[HN Gopher] Try to fix it one level deeper
___________________________________________________________________
Try to fix it one level deeper
Author : Smaug123
Score : 98 points
Date : 2024-10-15 21:05 UTC (1 days ago)
(HTM) web link (matklad.github.io)
(TXT) w3m dump (matklad.github.io)
| andai wrote:
| I was reading about NASA's software engineering practices.
|
| When they find a bug, they don't just fix the bug, they fix the
| engineering process that allowed the bug to occur in the first
| place.
| anotherhue wrote:
| Maintenance is never as rewarded as new features, there's
| probably some MBA logic behind it to do with avoiding
| commoditisation.
|
| It's true in software, it's true in physical infrastructure
| (read about the sorry state of most dams).
|
| Until we root cause that process I don't see much progress
| coming from this direction, on the plus side CS principles are
| making their way into compilers. We're a long way from C.
| giantg2 wrote:
| "Maintenance is never as rewarded as new features,"
|
| And security work is rewarded even less!
| riknos314 wrote:
| > And security work is rewarded even less
|
| While I do recognize that this is a pervasive problem, it
| seems counter-intuitive to me based on the tendency of the
| human brain to be risk averse.
|
| It raises an interesting question of "why doesn't the risk
| of security breaches trigger the emotions associated with
| risk in those making the decision of how much to invest in
| security?".
|
| Downstream of that is likely "Can we communicate the
| security risk story in a way that more appropriately
| triggers the associated risk emotions?"
| giantg2 wrote:
| The people making the decision don't have a direct
| negative impact. Someone's head might role, but that's
| usually far up the chain where the comp and connections
| are high enough to not care. The POs making the day to
| day decisions are under more pressure for new features
| than they are for security.
| SAI_Peregrinus wrote:
| What is the consequence for security breaches? Usually
| some negative press everyone forgets in a week. Maybe a
| lost sale or two, but that's hard to measure. If you're
| exceedingly unlucky, an inconsequential fine. At worst
| paying for two years of credit monitoring for your users.
|
| What's the risk? The stock price will be back up by next
| week.
| amonon wrote:
| easier to consider people to tend towards conservative
| rather than risk averse. if we were truly risk averse,
| society would be very different.
| hedvig23 wrote:
| Speaking of digging deeper, can you expand on that theory on
| why focus/man hours spent on maintenance leads to
| commoditization and why a company wants to avoid that?
| anotherhue wrote:
| Top of my head, new things have unbounded potential,
| existing ones have known potential. We assume the new will
| be better.
|
| I think it's part of the reason stocks almost always dip
| after positive earnings reports. No matter how positive
| it's always less than idealised.
|
| You might think there's a trick where you can sell
| maintenance as a new thing but you've just invented the
| unnecessary rewrite.
|
| To answer your question more directly, once something has
| been achieved it's safe to assume someone else can achieve
| it also, so the focus turns to the new thing. Why else
| would we develop hydrogen or neutron bombs when we already
| had perfectly good fission ones (they got commoditised).
| asdff wrote:
| Given enough iteration with the same incentives, two
| engineering teams might end up with the same sort of
| product overall. We see this with airframes. We established
| the prototypical airframe for the commercial airliner in
| the 1950s and haven't changed it in 70 years since. This is
| good for the airline but bad for the aircraft manufacturer.
| The airline can now choose between boeing or airbus or
| anyone else for their product. If boeing had some novel
| plane design that wasn't copied the world over, then the
| airline company would be beholden to them alone.
| xelxebar wrote:
| This is such a powerful frame of mind. Bugs, software
| architecture, tooling choices, _etc._ all happen within
| organizational, social, political, and market machinery. A bug
| isn 't just a technical failure, but a potential issue with the
| meta-structures in which the software is embedded.
|
| Code review is one example of addressing the engineering
| process, but I also find it very helpful to consider business
| and political processes as well. Granted, NASA's concerns are
| very different than that of most companies, but as engineers
| and consultants, we have leeway to choose where and how to
| address bugs, beyond just the technical and immediate dev
| habits.
|
| Soft skills matter hard.
| asdff wrote:
| It makes you wonder if there's been work designing software
| that is resilient to bugs. Maybe you can test this by writing
| a given function in a variety of different ways, simulate
| some type of bug (fat fingering is probably easiest), and
| compare outputs. Some of these functions might not work at
| all. Some might spit out the wrong result. But then there
| will probably be a few that are written in such a way to get
| very close to the true result, and maybe that variance is
| acceptable for your purposes. Given how we currently write
| code (in english in a way a human can read it) maybe its not
| so realistic. But if we get to the point with our generative
| code where you can generate good quality machine code without
| having it transmuted to human readable code for human
| verification, then this is how we would be operating: looking
| at distributions of results from a billion putative
| functions.
| toolz wrote:
| To that example though, is NASA really the pinnacle of
| achievement in their field? Sure, it's not a very competitive
| field (e.g. compared to something like the restaurant industry)
| and most of their existence has been about r&d for tech there
| wasn't really a market for yet, but still spaceX comes along
| and in a fraction of the time they're landing and reusing
| rockets making space launches more attainable and significantly
| cheaper.
|
| I'm hoping that example holds up, but I'm not well versed in
| that area so it may be a terrible counter-example but my
| overarching point is this: overly engineered code often
| produces less value than quickly executed code. We're not in
| the business of making computers do things artfully just for
| the beauty of the rigor and correctness of our systems. We're
| doing it to make computers do useful thing for humanity.
|
| You may think that spending an extra year perfecting a pace-
| maker might end up saving lives, but what if more people die in
| the year before you go to market than would've ended up dying
| had you launched with something almost perfect, but with
| potential defects?
|
| Time is expensive in so many more ways than just capital spent.
| the_other wrote:
| SpaceX came along decades after NASA's most famous projects.
| Would SpaceX have been able to do what they did if NASA
| hadn't engineered to their standard earlier on?
|
| My argument (and I'm just thought experimenting here) is that
| without NASA's rigor, their programmes would have failed.
| Public support, and thus the market for soace projects, would
| have dried up before SpaceX was able to "do it faster".
|
| (Feel free to shoot this down: I wasn't there and I havn't
| read any deep histories of the conpanies. I'm just
| brainstorming to explore the problem space)
| sfn42 wrote:
| The fallacy here is that you're assuming that doing things
| right takes more time.
|
| Doing things right takes less time in my experience. You
| spend a little more time up front to figure out the right way
| to do something, and a lot of the time that investment pays
| dividends. The alternative is to just choose the quickest fix
| every time until eventually your code is so riddled with
| quick fixes that nobody knows how it works and it's
| impossible to get anything done.
| dsego wrote:
| It's tough to sell this to leaders and managers that there
| could be more benefit to quality and stability at the cost
| of cutting scope and losing a few oh so indispensable
| features. But their incentive is to dream up imaginative
| OKRs and come up with deadlines to show visible progress
| and justify their roles until the next quarter.
| sendfoods wrote:
| Which blog/post/book was this? Thanks
| brody_hamer wrote:
| I learned a similar mantra that I keep returning to: "there's
| never just one problem."
|
| - How did this bug make it to production? Where's the missing
| unit test? Code review?
|
| - Could the error have been handled automatically? Or more
| gracefully?
| niccl wrote:
| In the course of interviewing a bunch of developers, and
| employing a few of them, I've concluded that this
| ability/inclination/something to do this deeper digging is one of
| the things I prize most in a developer. They have to know when to
| go deep and when not to, though, and that's sometimes a hard
| balancing act.
|
| I've never found a good way of screening for the ability, and
| more, for when not to go deep, because everyone will come up with
| some example if you ask, and it's not the sort of thing that I
| can see highlighting in a coding test (and _certainly_ not in a
| leet-code test!). If anyone has any suggestions on how to uncover
| it during the hiring process I'd be ecstatic!
| giantg2 wrote:
| "I've concluded that this ability/inclination/something to do
| this deeper digging is one of the things I prize most in a
| developer."
|
| Where have you been all my life? It seems most of teams I've
| been on value speed over future proofing bugs. The systems
| thinking approach is rare.
|
| If you want to test for this, you can create a PR for a fake
| project. Make sure the project runs but has error, code smells,
| etc. Have a few things like they talk about in the article,
| like a message of being out of disk space but missing critical
| message/logging infrastructure to cover other scenarios. The
| best part is, you can use the same PR for all levels that
| you're hiring for by expecting senior to get X% of the bugs,
| mids to get X/2% and noobs to get X/4%.
| niccl wrote:
| That's a really good idea. Thanks
| jerf wrote:
| "It seems most of teams I've been on value speed over future
| proofing bugs."
|
| So, obviously, if one team is future proofing bugs, and the
| other team just blasts out localized short-term fixes as
| quickly as possible, there will come a point where the first
| team will overtake the second, because the second team's
| velocity will by necessity has to slow down more than the
| first as the code base grows.
|
| If the crossover point is ten years hence, then it only makes
| sense to be the second team.
|
| However, what I find a bit horrifying as a developer is that
| my estimate of the crossover point keeps coming in. When I'm
| working by myself on greenfield code, I'd put it at about
| _three weeks_ ; yes, I'll go somewhat faster today if I just
| blast out code and skip the unit tests, but it's only _weeks_
| before I 'm getting bitten by that. Bigger teams may have a
| somewhat farther cross over point, but it's still likely to
| be small single-digit months.
|
| There is of course overdoing it and being too perfectionist,
| and that does get some people, but the people, teams,
| managers, and companies who _always_ vote for the short term
| code blasting simply have no idea how much performance they
| are leaving on the table almost immediately.
|
| Established code bases are slower to turn, naturally. But
| even so, I still think the constant short-term focus is
| vastly more expensive than those who choose it understand.
| And I don't even mean obvious stuff like "oh, you'll have
| more bugs" or "oh, it's so much harder to on board", even if
| that's true... no, I mean, even by _the only metric you seem
| care about_ , the team that takes the time to fix fundamental
| issues and invests in better logging and metrics and all
| those things you think just slow you down can _also_ smoke
| you on dev speed after a couple of months... _and_ they 'll
| have the solid code base, too!
|
| "Make sure the project runs but has error, code smells, etc."
|
| It is a hard problem to construct a test for this but it
| would be interesting to provide the candidate some code that
| compiles with warnings and just watch them react to the
| warnings. You may not learn everything you need but it'll
| certainly teach you something.
| daelon wrote:
| Slow is smooth, smooth is fast.
| ozim wrote:
| Unfortunately I believe there is no crossing point even in
| 10 years.
|
| If quick fix works it is most likely a proper fix, if it
| doesn't work then you dig deeper. It is also case if
| feature to be fixed is even worth spending so much time.
| rocqua wrote:
| A quick fix works now. It makes the next fix or change
| much harder because it just added a special case, or
| ignored an edge case that wasn't possible in the
| configuration at that time.
| ozim wrote:
| My main point is That's false dichotomy.
|
| There is bunch of stuff that could be "fixed better" or
| "properly" if someone took a better look but also a lot
| of times it is just good enough and is not somehow
| magically impeding proper fix.
| jerf wrote:
| It is and it isn't a false dichotomy.
|
| It is a false dichotomy in that in the Aristotelian sense
| of "X -> Y" means that absolutely, positively every X
| must with 100% probability lead to Y, it is absolutely
| true that "This is a quick fix -> This not the best fix"
| is false. Sometimes the quick fix is correct. A quick
| example: I'm doing some math of some sort and literally
| typed minus instead of plus. The quick fix to change
| minus to plus is reasonable.
|
| (If you're wondering about testing, well, let's say I
| wrote unit tests to assert the wrong code. I've written
| plenty of unit tests that turn out to be asserting the
| wrong thing. So the quick fix may involve fixing those
| too.)
|
| It is true in the sense that if you plot the quickness of
| the fix versus the correctness of the fix, you're not
| going to get a perfectly uniformly random two dimensional
| graph that would indicate they are uncorrelated. You'll
| get some sort of Pareto-optimal[1] front that will
| develop, becoming more pronounced as the problem and
| minimum size fix become larger (and they can get pretty
| large in programming). It'll be a bit loose, you'll get
| occasional outliers where you have otherwise fantastic
| code that just happened to have this tiny screw loose
| that caused a lot of problems everywhere and one quick
| fix can fix a lot of issues at once; I think a lot of us
| will see those once or twice a decade or so, but for the
| most part, there will develop a definite trend that once
| you eliminate all the fixes that are neither terribly
| fast nor terribly good for the long term, there will
| develop a fairly normal "looks like 1/x" curve of
| tradeoffs between speed and long-term value.
|
| This is a very common pattern across many combinations of
| X and Y that don't literally, 100% oppose each other, but
| in the real world, with many complicated interrelated
| factors interacting with each other and many different
| distributions of effort and value interacting, do
| contradict each other... but _only if_ you are actually
| on the Pareto frontier! For practical purposes in this
| case I think we usually are, at least relative to the
| local developers fixing the bug; nobody deliberately sets
| out to make a fix that is visibly obviously harder than
| it needs to be _and_ less long-term valuable than it
| needs to be.
|
| My favorite "false dichotomy" that arises is the supposed
| contradiction between security and usability. It's true
| they oppose each other... but only if your program is
| already roughly optimally usable and secure on the Pareto
| frontier and now you really can't improve one without
| diminishing the other. Most programs aren't actually
| there, and thus there _are_ both usability and security
| improvements that can be made without affecting the
| other.
|
| I'm posting this because this is one of those things that
| sounds really academic and abstruse and irrelevant, but
| if you learn to see it, becomes very practical and
| powerful for your own engineering.
|
| [1]: https://en.wikipedia.org/wiki/Pareto_front
| marcosdumay wrote:
| My impression is that bigger teams have a shorter crossover
| point.
|
| Weirdly, teams seem to adapt better to bad code. But that
| adaptation occurs through meetings. And meetings just
| destroy a team productivity.
| ozim wrote:
| I have seen enough BSers who claimed that they need "do the
| proper fix" doing analysis and wasting everyone's time.
|
| They would be vocal about it and then spend weeks delivering
| nothing "tweaking db indexes" while I immediately have seen
| code was crap and needed slight changes but I also don't have
| time to fight all the fights in the company.
| giantg2 wrote:
| That's the thing, my comment wasn't about that long
| analysis or doing the proper fix. It's all about asking if
| this is the root cause or not, or is there a similar
| related bug not yet identified. You could find a root cause
| and bring it back to the team if it's going to take weeks.
| At that point the team has the say on if that fix is
| necessary.
| bongodongobob wrote:
| Knowing when to go down the rabbit hole is probably more about
| experience/age than anything. I work with a very intelligent
| junior that is _constantly_ going down rabbit holes. His heart
| is in the right spot but sometimes you just need to make things
| work /get things done.
|
| I used to do it a lot too and I kind of had a "shit, I'm
| getting old" moment the other day when I was telling him
| something along the lines of "yeah, we could probably fix that
| deeper but it's going to take 6 weeks of meetings and 3
| departments to approve this. Is that really what you want to
| spend your time on?"
|
| Like you said, it's definitely a balancing act and the older I
| get, the less I care about "doing things the right way" when no
| one actually cares or will know.
|
| I get paid to knock out tickets, so that's what I'm going to
| do. I'll let the juniors spin their wheels and burn mental CPU
| on the deep dives and I'm around to lend a hand when they need
| it.
| layer8 wrote:
| However, you have to overdo it a sufficient number of times
| when you're still inexperienced, in order to gain the
| experience of when it's worth it and when it's not. You have
| to make mistakes in order to learn from them.
| giantg2 wrote:
| When it's worth it and when it's not seems to be more of a
| business question for the product owner. It's all opinion.
|
| I've been on a where I had 2 weeks left and they didn't
| want me working on anything high priority during that time
| so it wouldn't be half finished when I left. I had a couple
| small stories I was assigned. Then I decide to cherrypick
| the backlog to see how much tech debt I could close for the
| team before I left. I cleared something like 11 stories out
| of 100. I was then chewed out by the product owner because
| she "would have assigned [me] other higher priority
| stories". But the whole point was that I wasn't suppose dto
| be on high priority tasks because I'm leaving...
| seadan83 wrote:
| Why product owner? (Perhaps rather not say team lead?)
|
| Are these deeply technical product owners? Which ones
| would be best to make this decision and which less?
| giantg2 wrote:
| In a non-technical company with IT being a cost center,
| it seems that the product owner gets the final say. My TL
| supported me, but the PO was still upset.
| layer8 wrote:
| The product owner often isn't technical enough, or into
| the technical weeds enough, to be able to asses how long
| it might take. You need the technical experience to have
| a feeling of the effort/risk/benefit profile. You also
| may have to start going down the hole to assess the
| situation in the first place.
|
| The product owner can decide how much time would be worth
| it given a probable timeline, risks and benefits, but the
| experienced developer is needed to provide that input
| information. The developer has to present the case to the
| product owner, who can then make the decision about if,
| when, and how to proceed. Or, if the developer has
| sufficient slack and leeway, they can make the decision
| themselves within the latitude they've been given.
| giantg2 wrote:
| Yeah. The team agreed I should just do the two stories,
| which was what was committed to in that sprint. I got
| that done and then ripped through those other 11 stories
| in the slack time before I left the team. My TL supported
| that I didn't do anything wrong in picking up the
| stories. The PO still didn't like it.
| rocqua wrote:
| Regardless, these deep dives are so valuable in teaching
| yourself, they can be worth it just for that.
| userbinator wrote:
| Have you been asked "why do we never have the time to do it
| right, but always time to do it twice?"
| sqeaky wrote:
| His response is likely something like "I am hourly
| contractor, I have howevermuch time time they want", or
| something with the same no long gives a shit energy.
|
| But their manager likely believes that deeper fixes aren't
| possible or useful for some shortsighted bean-counter
| reason. Not that bean counting isn't important, but they
| are often cout ed early and wrong.
| bongodongobob wrote:
| Yeah don't get me wrong, I'm not saying "don't care about
| anything and do a shitty job" but sometimes the extra
| effort just isn't worth it. I'm a perfectionist at heart
| but I have to weigh the cost of meeting my manager's
| goals or getting behind because I want it to be perfect.
| Then 6 months later my perfect thing gets hacked apart by
| a new request/change. Knowing when and where to go deeper
| and when to polish things is a learned skill and has more
| to do with politics and the internal workings of your
| company more than some ideal. Everything is in constant
| flux and having insight into smart deep dives isn't some
| black and white general issue. It's completely context
| dependant.
| thelostdragon wrote:
| _" yeah, we could probably fix that deeper but it's going to
| take 6 weeks of meetings and 3 departments to approve this.
| Is that really what you want to spend your time on?"_
|
| This is where a developer goes from junior to serior.
| atoav wrote:
| Such qualities can sometimes be unearthed when you ask
| candidates to deal with a problem they can't know the answer
| to. In the end the ability to go deep has a lot to do with them
| being confident in their ability to be able to understand
| things that are new to them.
|
| Most people can go into a deep dive if you force them to do it,
| but how they conduct themselves while doing it can show you if
| this is a thing they would do on their own.
| Cpoll wrote:
| This kind of reminds me of
| https://en.m.wikipedia.org/wiki/Five_whys.
| peter_d_sherman wrote:
| >"There's a _bug_! And it is sort-of obvious how to fix it. But
| if you don't laser-focus on that, and _try to perceive the
| surrounding context_ , it turns out that the _bug is valuable_ ,
| and it is _pointing in the direction of a bigger related
| problem_. "
|
| That is an absolutely stellar quote!
|
| It's also more broadly applicable to life / problem solving /
| goal setting (if we replace the word 'bug' with 'problem' in the
| above quote):
|
| "There's a _problem_! And it is sort-of obvious how to fix it.
| But if you don't laser-focus on that, and _try to perceive the
| surrounding context_ , it turns out that the _problem is
| valuable_ , and it is _pointing in the direction of a bigger
| related problem_. "
|
| In other words, in life / problem solving / goal setting --
| smaller problems can be really valuable, because they can be
| pointers/signs/omens/subcases/indicators of/to larger surrounding
| problems in larger surrounding contexts...
|
| (Just like bugs can be, in Software Engineering!)
|
| Now if only our political classes (on both sides!) could see the
| problems that they typically see as problems -- as _effects_ not
| _causes_ (because that 's what they all are, _effects_ ), of as-
| of-yet unseen larger problems, of which those smaller problems
| are pointers to, "hints at", subcases of, "indicators of" (use
| whatever terminology you prefer...)
|
| Phrased another way, in life/legislation/problem solving/Software
| Engineering -- you always have to nail down first causes --
| otherwise you're always in _" Effectsville"_... :-)
|
| You don't want to live in _" Effectsville"_ -- because anything
| you change will be changed back to what it was previously in the
| shortest time possible, because everything is an _effect_ in
| _Effectsville_! :-)
|
| Legislating something that is seen that is the _effect_ of
| another, greater, as-of-yet unseen problem -- will not fix the
| seen problem!
|
| Finally, _all problems are always valuable_ -- but _if and only
| if their surrounding context is properly perceived_...
|
| So, an an excellent observation by the author, in the context of
| Software Engineering!
| raphlinus wrote:
| The title immediately brings to mind the Osterhout classic,
| "Always Measure One Level Deeper", [1], and I imagine was
| probably inspired by it. Also worth revisiting.
|
| [1]: https://cacm.acm.org/research/always-measure-one-level-
| deepe...
| matklad wrote:
| I was not actually aware of the paper, and it is indeed pure
| gold, thanks for linking it. It is _also_ extremely timely, as
| with my current project (TigerBeetle), we are exactly at the
| point of transition from first-principles back-of-the-envelope
| performance architecture to measurement-driven performance
| engineering, and all the advice here is directly applicable!
| WillAdams wrote:
| See the Book review: A Philosophy of Software Design (2020)
| (johz.bearblog.dev)
|
| https://news.ycombinator.com/item?id=27686818
|
| for more in this vein.
| Terr_ wrote:
| IMO it may be worth distinguishing between:
|
| 1. Diagnosing the "real causes" one level deeper
|
| 2. Implementing a "real fix" fix one level deeper
|
| Sometimes they have huge overlap, but the first is much more
| consistently-desirable.
|
| For example, it might be the most-practical fix is to add some
| "if this happens just retry" logic, but it would be beneficial to
| _know_ --and leave a comment--that it occurs because of a race
| condition.
| KaiserPro wrote:
| You need to choose your rabbit holes carefully.
|
| In large and complex codebases, its often more pragmatic to build
| a guard in your local area against that bug, than following the
| bug all the way downthe stack.
|
| Its not optimal, and doesn't make the system better as a whole.
| but its the only way to get things done.
|
| That doesn't mean you should be silent though, you do need to
| contact the team that looks after that part of the system
| hoherd wrote:
| This seems like the code implementation way of shifting left.
| https://news.ycombinator.com/item?id=38187879
| cantSpellSober wrote:
| In enterprise monorepos I find this hard because "one level
| deeper" is often code you don't own.
|
| Fun article, good mantra!
___________________________________________________________________
(page generated 2024-10-16 23:01 UTC)