[HN Gopher] You're missing your near misses
       ___________________________________________________________________
        
       You're missing your near misses
        
       Author : azhenley
       Score  : 57 points
       Date   : 2025-02-05 15:45 UTC (3 days ago)
        
 (HTM) web link (surfingcomplexity.blog)
 (TXT) w3m dump (surfingcomplexity.blog)
        
       | antithesis-nl wrote:
       | Oh, well, surprisingly, it seems this article hadn't been posted
       | here yet?
       | 
       | Do enjoy the discussion, and, whatever you do, _please_ don 't
       | let the apparent incongruity of "near miss" when it's _clear_
       | that it should be  "near accident" derail the conversation...
       | (insert-innocent-smiley-here)
        
         | AlotOfReading wrote:
         | The word "accident" is generally inappropriate in this context.
         | For example, it's banned from FHA and NHTSA communications [0]
         | among others because it implies that the incident was random or
         | unpreventable. This article is talking about incidents that can
         | be prevented and were narrowly avoided as opposed to
         | "accidents".
         | 
         | The FAA and the NTSB continue to use "accident", but they're
         | somewhat unique in that and they have very specific technical
         | definitions that don't match popular connotations.
         | 
         | [0] https://www.fmcsa.dot.gov/newsroom/crash-not-accident
        
         | AznHisoka wrote:
         | Why do we call it "near miss" when it's more like a "near hit"?
        
           | aqueueaqueue wrote:
           | Planes were near.
           | 
           | Planes missed.
        
           | rzzzt wrote:
           | Out of respect for George Carlin.
        
           | bananaflag wrote:
           | It's a miss that is near [the hit].
        
       | renewiltord wrote:
       | In _The Field Guide to Human Error Investigations_ Dekker talks
       | about how "number of minor incidents" correlates inversely with
       | "number of fatal incidents" for airlines (scaled per flight hour
       | or whatever). I have forgotten whether this was all airlines or
       | Western only. I wonder if it still holds.
       | 
       | The rest of the book is also quite a good read including a fun
       | take on Murphy's Law that goes "Everything that can go wrong will
       | go right" which is the basis for normalization of deviance: where
       | a group of people (an org whatever) slowly shifts from their
       | performance metrics as they "get away with it".
       | 
       | I wonder how modern organizations fight this. Most critically I
       | imagine warfighting ability can experience massive multipliers
       | based on adherence to design. But also civilian performance to a
       | lesser extent (outcome is often less catastrophically binary).
       | 
       | Anyway, I got a lot of mileage out of the safety book wrt
       | software engineering.
        
         | aqueueaqueue wrote:
         | Are chaos monkeys relevant here. And at some level: testing!
         | You definitely find more severe issues of you actually test
         | stuff. In production.
        
         | AlotOfReading wrote:
         | Safety literature focuses on straightforward ways to build
         | highly reliable systems involving humans. That applies to
         | almost everyone on HN.
         | 
         | What's funny is that the suggestions are usually pretty "common
         | sense". You already know most of the information in Sidney
         | Dekker's books and the NASA guidelines. They're essentially the
         | same principles we all like to see in code.
         | 
         | Things like: Consider the human factors. Make doing the right
         | thing easy. Make wrong things obvious. Trust, but verify. Keep
         | the signal to noise ratio high in communication. Etc.
        
           | terribleperson wrote:
           | One big lesson, I think (at least as flying does safety), is
           | that even if the ideas are common sense, you have to take the
           | common sense and turn it into rules and processes, and make
           | sure people stick to those. And if it's a role where time
           | matters, you have to drill those so people stick to them when
           | seconds count.
        
             | AlotOfReading wrote:
             | There's a lot of value in the books. Applying "common
             | sense" consistently and intentionally is extremely
             | difficult. It's not a deep magic entirely divorced from the
             | practices we already know and agree with, but rather the
             | organizational equivalent of going to the gym.
        
       | YZF wrote:
       | This is part of the Israeli Air Force safety culture:
       | https://www.talentgrow.com/podcast/episode92
       | 
       | "By implementing a unique culture and methodology, the Israeli
       | Air Force became one of the best in the world in terms of
       | quality, safety, and training, effectively cutting accidents by
       | 95%."
       | 
       | Near misses and mistakes are investigated and there's a culture
       | supporting the reporting of these and that has resulted in a huge
       | change to the overall incident rate.
       | 
       | Software is a little like this as well. The observable quality
       | issues are sort of the tip of the iceberg. Many bugs and
       | architectural issues can lurk underneath the surface and only
       | some random subset of those will have impact. Focus only on the
       | impactful ones without dealing with what's under the surface may
       | not have a material impact on quality.
        
         | ryandrake wrote:
         | > Software is a little like this as well. The observable
         | quality issues are sort of the tip of the iceberg.
         | 
         | Most of the places I've worked cannot even resolve all of their
         | _observable_ quality issues, let alone the hidden ones. Hell,
         | at most places I 've worked, either it's a P0 emergency, a P1
         | must-do, or the defect will basically never be fixed, despite
         | how visible it is or easy it is to diagnose and correct. The
         | bug list always grows far faster than things get resolved,
         | until it gets to a certain size and someone decides to "declare
         | bankruptcy" and mass-close everything older than N days. Then,
         | the cycle continues.
        
       | cadamsdotcom wrote:
       | For better or worse a near-miss has zero cost to the org as a
       | whole and thus justifies org level zero investment.
       | 
       | That is okay as long as someone is noticing! As stated in the
       | article, these types of near misses are noticed within the team
       | and mitigated at that level so the org doesn't need to respond.
       | 
       | That's a cost effective way to deal with them, so I would argue
       | everything works the way it should.
        
         | AlotOfReading wrote:
         | A near miss means your processes need work. If you have any
         | sort of reliable process, actual misses should be vanishingly
         | rare and you'll need to look at proxies anyway to monitor for
         | improvement, so why not use the best one available?
        
       | fambalamboni wrote:
       | The article is spot on. It's pretty much what happened when
       | Maersk was hacked within an inch of bankruptcy.
       | 
       | The flaw was identified, flagged and acknowledged before it
       | happened:
       | 
       | "In 2016, one group of IT executives had pushed for a preemptive
       | security redesign of Maersk's entire global network. They called
       | attention to Maersk's less-than-perfect software patching,
       | outdated operating systems, and above all insufficient network
       | segmentation. That last vulnerability in particular, they warned,
       | could allow malware with access to one part of the network to
       | spread wildly beyond its initial foothold, exactly as NotPetya
       | would the next year."
       | 
       | But:
       | 
       | "The security revamp was green-lit and budgeted. But its success
       | was never made a so-called key performance indicator for Maersk's
       | most senior IT overseers, so implementing it wouldn't contribute
       | to their bonuses."
       | 
       | Basically, a near miss that wasn't incentivised for anyone to
       | fix.
       | 
       | If you're interested in this type of story, it's an absolute
       | thriller to read:
       | https://archive.is/Gyu2T#selection-3563.0-3563.212
        
       ___________________________________________________________________
       (page generated 2025-02-08 23:00 UTC)