[HN Gopher] You're missing your near misses
___________________________________________________________________
You're missing your near misses
Author : azhenley
Score : 57 points
Date : 2025-02-05 15:45 UTC (3 days ago)
(HTM) web link (surfingcomplexity.blog)
(TXT) w3m dump (surfingcomplexity.blog)
| antithesis-nl wrote:
| Oh, well, surprisingly, it seems this article hadn't been posted
| here yet?
|
| Do enjoy the discussion, and, whatever you do, _please_ don 't
| let the apparent incongruity of "near miss" when it's _clear_
| that it should be "near accident" derail the conversation...
| (insert-innocent-smiley-here)
| AlotOfReading wrote:
| The word "accident" is generally inappropriate in this context.
| For example, it's banned from FHA and NHTSA communications [0]
| among others because it implies that the incident was random or
| unpreventable. This article is talking about incidents that can
| be prevented and were narrowly avoided as opposed to
| "accidents".
|
| The FAA and the NTSB continue to use "accident", but they're
| somewhat unique in that and they have very specific technical
| definitions that don't match popular connotations.
|
| [0] https://www.fmcsa.dot.gov/newsroom/crash-not-accident
| AznHisoka wrote:
| Why do we call it "near miss" when it's more like a "near hit"?
| aqueueaqueue wrote:
| Planes were near.
|
| Planes missed.
| rzzzt wrote:
| Out of respect for George Carlin.
| bananaflag wrote:
| It's a miss that is near [the hit].
| renewiltord wrote:
| In _The Field Guide to Human Error Investigations_ Dekker talks
| about how "number of minor incidents" correlates inversely with
| "number of fatal incidents" for airlines (scaled per flight hour
| or whatever). I have forgotten whether this was all airlines or
| Western only. I wonder if it still holds.
|
| The rest of the book is also quite a good read including a fun
| take on Murphy's Law that goes "Everything that can go wrong will
| go right" which is the basis for normalization of deviance: where
| a group of people (an org whatever) slowly shifts from their
| performance metrics as they "get away with it".
|
| I wonder how modern organizations fight this. Most critically I
| imagine warfighting ability can experience massive multipliers
| based on adherence to design. But also civilian performance to a
| lesser extent (outcome is often less catastrophically binary).
|
| Anyway, I got a lot of mileage out of the safety book wrt
| software engineering.
| aqueueaqueue wrote:
| Are chaos monkeys relevant here. And at some level: testing!
| You definitely find more severe issues of you actually test
| stuff. In production.
| AlotOfReading wrote:
| Safety literature focuses on straightforward ways to build
| highly reliable systems involving humans. That applies to
| almost everyone on HN.
|
| What's funny is that the suggestions are usually pretty "common
| sense". You already know most of the information in Sidney
| Dekker's books and the NASA guidelines. They're essentially the
| same principles we all like to see in code.
|
| Things like: Consider the human factors. Make doing the right
| thing easy. Make wrong things obvious. Trust, but verify. Keep
| the signal to noise ratio high in communication. Etc.
| terribleperson wrote:
| One big lesson, I think (at least as flying does safety), is
| that even if the ideas are common sense, you have to take the
| common sense and turn it into rules and processes, and make
| sure people stick to those. And if it's a role where time
| matters, you have to drill those so people stick to them when
| seconds count.
| AlotOfReading wrote:
| There's a lot of value in the books. Applying "common
| sense" consistently and intentionally is extremely
| difficult. It's not a deep magic entirely divorced from the
| practices we already know and agree with, but rather the
| organizational equivalent of going to the gym.
| YZF wrote:
| This is part of the Israeli Air Force safety culture:
| https://www.talentgrow.com/podcast/episode92
|
| "By implementing a unique culture and methodology, the Israeli
| Air Force became one of the best in the world in terms of
| quality, safety, and training, effectively cutting accidents by
| 95%."
|
| Near misses and mistakes are investigated and there's a culture
| supporting the reporting of these and that has resulted in a huge
| change to the overall incident rate.
|
| Software is a little like this as well. The observable quality
| issues are sort of the tip of the iceberg. Many bugs and
| architectural issues can lurk underneath the surface and only
| some random subset of those will have impact. Focus only on the
| impactful ones without dealing with what's under the surface may
| not have a material impact on quality.
| ryandrake wrote:
| > Software is a little like this as well. The observable
| quality issues are sort of the tip of the iceberg.
|
| Most of the places I've worked cannot even resolve all of their
| _observable_ quality issues, let alone the hidden ones. Hell,
| at most places I 've worked, either it's a P0 emergency, a P1
| must-do, or the defect will basically never be fixed, despite
| how visible it is or easy it is to diagnose and correct. The
| bug list always grows far faster than things get resolved,
| until it gets to a certain size and someone decides to "declare
| bankruptcy" and mass-close everything older than N days. Then,
| the cycle continues.
| cadamsdotcom wrote:
| For better or worse a near-miss has zero cost to the org as a
| whole and thus justifies org level zero investment.
|
| That is okay as long as someone is noticing! As stated in the
| article, these types of near misses are noticed within the team
| and mitigated at that level so the org doesn't need to respond.
|
| That's a cost effective way to deal with them, so I would argue
| everything works the way it should.
| AlotOfReading wrote:
| A near miss means your processes need work. If you have any
| sort of reliable process, actual misses should be vanishingly
| rare and you'll need to look at proxies anyway to monitor for
| improvement, so why not use the best one available?
| fambalamboni wrote:
| The article is spot on. It's pretty much what happened when
| Maersk was hacked within an inch of bankruptcy.
|
| The flaw was identified, flagged and acknowledged before it
| happened:
|
| "In 2016, one group of IT executives had pushed for a preemptive
| security redesign of Maersk's entire global network. They called
| attention to Maersk's less-than-perfect software patching,
| outdated operating systems, and above all insufficient network
| segmentation. That last vulnerability in particular, they warned,
| could allow malware with access to one part of the network to
| spread wildly beyond its initial foothold, exactly as NotPetya
| would the next year."
|
| But:
|
| "The security revamp was green-lit and budgeted. But its success
| was never made a so-called key performance indicator for Maersk's
| most senior IT overseers, so implementing it wouldn't contribute
| to their bonuses."
|
| Basically, a near miss that wasn't incentivised for anyone to
| fix.
|
| If you're interested in this type of story, it's an absolute
| thriller to read:
| https://archive.is/Gyu2T#selection-3563.0-3563.212
___________________________________________________________________
(page generated 2025-02-08 23:00 UTC)