[HN Gopher] What Went Wrong?
___________________________________________________________________
What Went Wrong?
Author : headalgorithm
Score : 154 points
Date : 2021-07-17 11:34 UTC (11 hours ago)
(HTM) web link (queue.acm.org)
(TXT) w3m dump (queue.acm.org)
| nixpulvis wrote:
| I would gladly work for a prolific IRB.
| MrStonedOne wrote:
| In washington state, the state superior court ruled the police
| department was not liable for the impound fee paid by somebody
| who had their car impounded for 90* days for driving on what the
| computer reported was a suspended license, because they are
| exempt from mistakes from trusting their own computer system.
|
| This was the second time the department had wrongfully impounded
| his car and they made no attempt to fix the mistake from the
| first time, this didn't impact the ruling.
|
| Its gonna get much worse before it get any better.
| spaetzleesser wrote:
| It will get much worse I think. More and more companies are
| hiding behind algorithms and other computer systems while
| cutting support staff. If you are wronged you have nobody to
| talk to and they make no effort to correct the situation. the
| only recourse is a lawsuit which is way too expensive for most
| people. And even when they are caught the fines are usually
| only nominal.
|
| I think we are building up the ultimate faceless bureaucracies.
| [deleted]
| verytrivial wrote:
| I agree with nearly everything in this artictle but the following
| question stumped me: when exactly would a software disaster
| investigation board be employed?
|
| Plane goes down, train goes off rails or passes signal at danger,
| easy. But at what exact what point did the UK postmaster system
| "fail" enough for an investigation?
| andersource wrote:
| I would say at latest when people convicted because of it had
| their names cleared -
| https://www.bbc.com/news/business-56859357
| ashton314 wrote:
| > Personal information is the helium of IT systems--it leaks out
| of every crack or imperfection faster than seems possible.
|
| Might as well call it the _hydrogen_ of IT systems--get too much
| of it concentrated in one place, and all it takes is one little
| spark for it all to go up in flames. Boom!
| dgb23 wrote:
| Large amounts of money spent on government systems that never
| ship is a tragedy, but software projects like these tend to have
| a lot of open questions.
|
| We understand software development often as a discovery process
| (evolving requirements), especially if they are large or
| disruptive. So one critical output of any such project has to be
| knowledge that can be built upon, as in open, clearly specified
| and written papers. This should be done regardless of whether it
| failed or didn't fail.
| openthc wrote:
| In Washington State we have a system to track cannabis, the
| enforcement officers are supposed to be able to get reports from
| this system. The system is super buggy and also doesn't have
| meaningful reports. So there is a secondary system for officers
| to export to Excel documents. In one of the trainings they've
| been instructed to look for anomalies -- not real analisys, not
| even a pivot table. One thing they find is "negative quantities"
| -- but how can that be? (hint: it's bugs in the tracking
| software). Then enforcement shows up at the cannabis business to
| audit these negative numbers (or demand the business try to
| correct the data (which they cannot due to bugs)).
|
| So, crappy software gets law enforcement officers to basically
| review data "anomalies" created by bugs by visiting a business.
| The second most expensive method for data sanatization I can
| imagine. It's a poor use of their time and disruptive to the
| business.
|
| The system in WA is so buggy that the agency has opted to freeze
| the software rather than try to fix the issues. The future of
| government software is bleak -- so long as they keep using closed
| source packages from low-cost bidders.
| laurent92 wrote:
| Why isn't all software created for the government required to
| be open-source? Would that really drive the costs up, if the
| providers don't have the choice?
| openthc wrote:
| The vendor claimed that if the code was out it would be a
| security risk. The agency claims the vendor needs to protect
| their intellectual property rights. We have (some) visibility
| into other things our taxes pay for -- the software should
| absolutely be one -- expecially the regulatory compliance
| ones that drive enforcement action.
|
| Edit: also, they were breached anyway shortly after launch
| (2018) and then an email went around offerting to sell the
| code and data from their entire system.
| foobiekr wrote:
| Part of my job is to help the executives that I report to
| understand why things went wrong from the security perspective in
| our business unit. These are purely internal discussions, not
| even investigations. There are no penalties, but really, for
| things as egregious as hard coded passwords. As will become clear
| in a moment, the fact that my executives care is quite unusual.
|
| Culturally the result is coverups and lies.
|
| Engineers lie, managers lie, test people lie, directors lie,
| senior directors lie, vice president lie, external interesting
| teams are negotiated into minimizing certain critical failures,
| and so on. Managers don't want to hear it so that they can't be
| accused of lying, vice presidents don't wanna know, SVP's just
| want green squares on the cross-BU PowerPoint.
|
| This is internal discussion of revenue impacting incidents. Do
| you know what executives do care about? Revenue. Lost deals. If
| the people who care about money, including the account teams,
| don't care about security and severe quality issues enough to be
| honest enough to get to improvement, how could an external board
| accomplish anything for those very few incidents that actually
| become publicly visible?
|
| This isn't like the NTSB; I spent my life reading NTSB accident
| reports. They have actual real authority, there are potentially
| issues that might impact someone more than being caught
| distorting things.
| slyall wrote:
| I think you are overestimating the importance of "revenue
| impacting incidents" to company employees.
|
| If the company makes a couple of million extra or less this
| year it doesn't effect the majority of workers. Their bonus
| isn't going up or down etc. And remember this incident has
| already happened.
|
| By contrast if a report comes out blaming the loss on a worker,
| department or division then that could have major consequences.
| No matter how "blameless" it is, come next round of bonuses,
| promotions or layoffs everybody knows it'll be factored into
| the decisions.
|
| So people don't have an incentive to make themselves look bad
| and unlike with the NTSB there is no legal powers or fear of
| causing deaths behind the investigation.
| foobiekr wrote:
| I didn't say "employees" so much as "executives"; and the
| executives I'm referring to go beyond owning P&L. They
| actually do care about revenue, which is why everyone lies to
| them.
| laurent92 wrote:
| I understand, but it sounds like we are digging ourselves
| into the same hole as USSR workers who were not incentivized
| to deliver working products, when we do that. It's a
| civilizational peril. How do we solve cooperation at large
| scale? Is the only way to watch large companies accumulate
| bored employees and constantly recreate "the small guy", the
| startup, which will finally make things right, until they
| become too big to be incentivized?
| izacus wrote:
| I also wonder if "blameless postmortem" culture perhaps
| actively works against preventing these kind of incidents. It
| doesn't seem that anyone in IT is ever responsible for damage
| they cause.
|
| But yes, lying, "not seeing" and covering documentation is
| pretty much standard corporate behaviour I've seen around
| plenty of companies as well.
| nanis wrote:
| In my negative experiences, "blameless" turned in to "nobody
| did anything wrong" which, of course, undermines the whole
| point of finding out what actually happened so we can see if
| there is a thing we can do to reduce the likelihood of it
| happening again.
|
| Sometimes, the root cause is indeed someone with the
| privilege but not the good sense ignoring warning signs. If
| we can't identify that problem, then we can't improve our
| odds for the next time.
| foobiekr wrote:
| I no longer believe in blameless post mortem as a general
| rule. I have, through experience, come to believe that the
| contexts where blameless post mortems work are the contexts
| where literally anything works because they are organizations
| that have high hiring bars and high expectations. My current
| employer is not one of them; we are a mountain of mediocrity
| and all blameless post mortems do is act as an excuse to
| avoid raising the bar.
| _jal wrote:
| > and all blameless post mortems do is act as an excuse to
| avoid raising the bar
|
| "Well, there's your problem, right there."
|
| The entire point of doing blameless post-mortems is to
| correctly identify problems for resolution. If management
| doesn't drive changes in response (process, training,
| communication, whatever), you have a different problem to
| solve before they'll do any good.
| jolux wrote:
| The principle of blameless postmortems is not supposed to
| absolve anyone of the responsibility to change anything,
| it's supposed to foreground that serious failures are
| organizational failures first and foremost, because it's
| the organization that has an obligation not to fail, not
| individuals, who fail all the time as a rule.
| torgard wrote:
| A post-mortem should not necessarily blame the individual,
| but blame the circumstances the individual finds themselves
| in.
|
| Yes, a hard-coded password is bad practice. But does the
| company have a bad culture of keeping configs in repos?
| Maybe management thinks it easier to commit configs with
| sensitive data, than to set up proper deployment shit. And
| after all, the repos are private, so it should be fine
| yeah?
|
| Bad code ending up in production is something you'll see
| often. Does the company have nice test suites for
| everything? Continuous integration pipelines? E2E tests? Or
| is upper management pushing everyone to their limits,
| because "fuck it ship it"?
| Scoundreller wrote:
| > In 2017 the motor of an airplane exploded over the southern
| part of the Greenland icecap. Part of the engine landed on the
| ice while the plane continued to the first suitable airport way
| up north in Canada.
|
| eh, Happy Valley-Goose Bay isn't that far north as far as Canada
| goes. 53 degrees north.
|
| The actual droppings in Greendland were around 61 degrees N.
|
| Nuuk would have been ~60% closer, but not a chance it could
| handle an A380.
| ithkuil wrote:
| Well written article
| ldarby wrote:
| It's known what went wrong, computerphile has a video with some
| details: https://www.youtube.com/watch?v=hBJm9ZYqL10 but it
| doesn't address any of the judicial and cultural fails, that's
| what needs to be fixed. Software bugs are a fact of life, people
| know this, except the judges in this case apparently.
| HarryHirsch wrote:
| Bugs are a fact of life because of sloppy practices. The
| experience from SQLite is instructive, after a testsuite had
| been written, matters improved immensely.
|
| Why was the testsuite written? Because it was in the list of
| requirements from the client, aerospace standards demand that
| every possible branch is covered by a test.
|
| We choose to write bad software.
| II2II wrote:
| One could argue that faults in the engineering and construction
| are also a fact of life, yet that doesn't mean we excuse them
| and it doesn't mean that assume that a failure is due to those
| faults. Investigations are performed in order to ascertain the
| truth.
|
| I think the authors comparison to the historical development of
| trains is appropriate. Investigating IT failures wasn't as
| important 50 years ago because IT infrastructure was not as
| critical. Investigating IT failures today is critical because
| the functioning of society depends upon it.
| ChrisMarshallNY wrote:
| I really enjoyed this.
|
| Like most things, it's a matter of scale. If a train derails, we
| call in the NTSB, but they don't investigate car crashes.
|
| The issue that I see, is that the software industry seems to be
| absolutely _obsessed_ with scale. Small applications are actively
| sneered at. Go big, or go home.
|
| So that means that _every_ accident is a train wreck.
| hamilyon2 wrote:
| The industry fails to listen to lessons written in "Mythical man
| month" - 50 years from now. Half of a century ago. Of course some
| reports on why systems are being designed and coded poorly won't
| change anything. We know why, we just ignored the knowledge to
| the point of absurdity.
| torgard wrote:
| Companies could be held liable for gross misconduct. Although
| GDPR is not exactly a shining example of IT regulation, I think
| it's a good example of liability.
|
| Companies get fined for breaking GDPR.
|
| Governmental projects should have similar requirements in
| place, and companies and people should be held accountable for
| breaking them.
| Scoundreller wrote:
| Would also like to point out the fantastic videos created by the
| US Chemical Safety Board: https://www.youtube.com/user/USCSB
___________________________________________________________________
(page generated 2021-07-17 23:00 UTC)