[HN Gopher] CrowdStrike ex-employees: 'Quality control was not p...
___________________________________________________________________
CrowdStrike ex-employees: 'Quality control was not part of our
process'
Author : everybodyknows
Score : 119 points
Date : 2024-09-13 20:17 UTC (2 hours ago)
(HTM) web link (www.semafor.com)
(TXT) w3m dump (www.semafor.com)
| Alupis wrote:
| > "Speed was the most important thing," said Jeff Gardner, a
| senior user experience designer at CrowdStrike who said he was
| laid off in January 2023 after two years at the company. "Quality
| control was not really part of our process or our conversation."
|
| This type of article - built upon disgruntled former employees -
| is worth about as much as the apology GrubHub gift card.
|
| Look, I think just as poorly about CrowdStrike as anyone else out
| there... but you can find someone to say anything, especially
| when they have an axe to grind and a chance at some spotlight.
| Not to mention this guy was a designer and wouldn't be involved
| in QC anyway.
|
| > Of the 24 former employees who spoke to Semafor, 10 said they
| were laid off or fired and 14 said they left on their own. One
| was at the company as recently as this summer. Three former
| employees disagreed with the accounts of the others. Joey
| Victorino, who spent a year at the company before leaving in
| 2023, said CrowdStrike was "meticulous about everything it was
| doing."
|
| So basically we have nothing.
| nyc_data_geek1 wrote:
| >>So basically we have nothing.
|
| Except the biggest IT outage ever. And a postmortem showing
| their validation checks were insufficient. And a rollout
| process that did not stage at all, just rawdogged straight to
| global prod. And no lab where the new code was actually
| installed and run prior to global rawdogging.
|
| I'd say there's smoke, and numerous accounts of fire, which
| this can be taken in the context of.
| mewpmewp2 wrote:
| There definitely was a huge outage, but based on the given
| information we still can't know for sure how much they
| invested in testing and quality control.
|
| There's always a chance of failure even for the most
| meticulous companies.
|
| Now I'm not defending or excusing the company, but a singular
| event like this can happen to anyone and nothing is 100%.
|
| If thorough investigation revealed poor quality control
| investment compared to what would be appropriate for a
| company like this, then we can say for sure.
| daedrdev wrote:
| Two things are clear though
|
| Nobody ran this update
|
| The update was pushed globally to all computers
|
| With that alone we know they have failed the simplest of
| quality control methods for a piece of software as
| widespread as theirs. This is even excluding that there
| should have been some kind of error handling to allow the
| computer to boot if they did push bad code.
| busterarm wrote:
| Also it's the _second_ time that they had done this in a
| few short months.
|
| They had previous bricked linux hosts earlier with a
| similar type of update.
|
| So we also know that they don't learn from their
| mistakes.
| rblatz wrote:
| The blame for the Linux situation isn't as clear cut as
| you make it out to be. Red hat rolled out a breaking
| change to BPF which was likely a regression. That wasn't
| caused directly by a crowdstrike update.
| idkwhatimdoin wrote:
| > If thorough investigation revealed poor quality control
| investment compared to what would be appropriate for a
| company like this, then we can say for sure.
|
| We don't really need that thorough of an investigation.
| They had no staged deploys when servicing millions of
| machines. That alone is enough to say they're not running
| the company correctly.
| dartos wrote:
| Totally agree.
|
| I'd consider staggering a rollout to be the absolute
| basics of due diligence.
|
| Especially when you're building a critical part of
| millions of customer machines.
| mewpmewp2 wrote:
| I would say that canary release is an absolute must 100%.
| Except I can think of cases where it might still not be
| enough. So, I just don't feel comfortable judging them
| out of the box. Does all the evidence seem to point
| against them? For sure. But I just don't feel comfortable
| giving that final verdict without knowing for sure.
|
| Specifically because this is about fighting against
| malicious actors, where time can be of essence to deploy
| some sort of protection against a novel threat.
|
| If there's deadlines that you can go over, and nothing
| bad happens, for sure. Always have canary releases, and
| perfect QA, monitoring everything thoroughly, but I'm
| just saying, there can be cases where damage that could
| be done if you don't act fast enough, is just so much
| worse.
|
| And I don't know that it wasn't the case for them. I just
| don't know.
| dartos wrote:
| In this case, they pretty much caused a worst case
| scenario...
| quietbritishjim wrote:
| The sentence you quoted clearly meant, from the context,
| "clearly we have nothing [to learn from the opinions of these
| former employees]". Nothing in your comment is really
| anything to do with that.
| tomrod wrote:
| Triangulation versus new signal.
| sundvor wrote:
| "Everyone" piles on Tesla all the time; a worthwhile
| comparison would be how Tesla roll out vehicle updates.
|
| Sometimes people are up in arms "where's my next version" (eg
| when adaptive headlights was introduced), yet Tesla
| prioritise a safe, slow roll out. Sometimes the updates fail
| (and get resolved individually), but never on a global scale.
| (None experienced myself, as a TM3 owner on the "advanced"
| update preference).
|
| I understand the premise of Crowdstrike's model is to have up
| to date protection everywhere but clearly they didn't think
| this through enough times, if at all.
| kccqzy wrote:
| You can also say the same thing about Google. Just go look
| at the release notes on the App Store for the Google Home
| app. There was a period of more than six months where every
| single release said "over the next few weeks we're rolling
| out the totally redesigned Google Home app: new easier to
| navigate 5-tab layout."
|
| When I read the same release notes so often I begin to
| question whether this redesign is really taking more than
| six months to roll out. And then I read the Sonos app
| disaster and I thought that was the other extreme.
| sonofhans wrote:
| If design isn't involved in QC you're not doing QC very well.
| If design isn't plugged into development process enough to
| understand QC then you're not doing design very well.
| tw04 wrote:
| Why would a UX designer be involved in any way, shape, or
| form in kernel level code patches? They would literally never
| ship an update if they had that many hands in the pot for
| something completely unrelated. Should they also have their
| sales reps and marketing folks pre-brief before they make any
| code changes?
| darby_nine wrote:
| I feel like crowdstrike is perfectly capable of mounting its
| own defense
| JumpCrisscross wrote:
| > _This type of article - built upon disgruntled former
| employees - is worth about as much as the apology GrubHub gift
| card_
|
| To you and me, maybe. To the insurers and airlines paying out
| over the problem, maybe not.
| bdcravens wrote:
| I'm going with principle of least astonishment, where
| productivity is more highly valued in most companies than
| quality control.
| insane_dreamer wrote:
| > So basically we have nothing.
|
| Except the fact that CrowdStrike fucked up the one thing they
| weren't supposed to fuck up.
|
| So yeah, at this point I'm taking the ex-employees' word,
| because it confirms the results that we already know -- there
| is no way that update could have gone out had there been proper
| "safety first" protocols in place and CrowdStrike was
| "meticulous".
| theideaofcoffee wrote:
| I just don't think a company like Crowdstrike has a leg to
| stand on when leveling the "disgruntled" label in the face of
| their, let's face it, astoundingly epic fuck up. It's the
| disgruntled employees that I think would have the most clear
| picture of what was going on, regardless of them being in QA/QC
| or not because they, at that point, don't really care any more
| and will be more forthright with their thoughts. I'd certainly
| trust their info more than a company yes-man which is probably
| where some of that opposing messaging came from.
| Sarkie wrote:
| It was shown in the RCA that their QA processes were shit
| monksy wrote:
| No shit.
| nine_zeros wrote:
| Typical of tech companies these days. Quality is considered
| immaterial - or worse - put on low level managers and engineers
| who don't have the time to clearly examine quality and good roll
| out practices.
|
| C-Suite and investors don't seem to want to spend on quality.
| They should just price in that their stock investment could
| collapse any day.
| dgfitz wrote:
| I wonder if their QA budget went to DEI initiatives, or both
| were just vapid proclivities.
|
| https://www.crowdstrike.com/careers/diversity-equity-and-inc...
| 0xbadcafebee wrote:
| Critical software infrastructure should be regulated the way
| critical physical infrastructure is. We don't trust the people
| who make buildings and bridges to "do the right thing" - we
| mandate it with regulations and inspections. (When your software
| not working strands millions of people around the globe, it's
| critical) And this was just a regular old "accident"; imagine the
| future, when a war has threat actors _trying_ to knock things
| out.
| theideaofcoffee wrote:
| "We can't regulate the industry because then the US loses to
| China" or "regulation will kill the US competitive advantage!"
| responses I've had to suggesting the same and I just can't. But
| I agree with you 100%. If it's safety critical, it should be
| under even more scrutiny than other things, it shouldn't be
| left to self-regulating QA-like processes in profit seeking
| companies and has to have a bit more scrutiny before the big
| button gets pressed.
| janalsncm wrote:
| > then the US loses to China
|
| Yeah it makes no sense. Was the US not losing to China when
| we own-goaled the biggest cybersecurity incident in history?
| Zigurd wrote:
| Not to mention humans going extinct because regulators are to
| blame for there being no city on Mars. Because that's
| definitely the reason there's no city on Mars.
| pclmulqdq wrote:
| Everything that we know about CrowdStrike stinks of Knight
| Capital to me. A minor culture problem snowballed into complete
| dysfunction, eventually resulting in a company-ending bug.
| bb88 wrote:
| Most interesting quote in the article: "It was
| hard to get people to do sufficient testing sometimes," said
| Preston Sego, who worked at CrowdStrike from 2019 to
| 2023. His job was to review the tests completed by user
| experience developers that alerted engineers to bugs
| before proposed coding changes were released to customers. Sego
| said he was fired in February 2023 as an "insider
| threat" after he criticized the company's return to-work
| policy on an internal Slack channel.
|
| Okay clearly that company has a culture issue. Imagine
| criticizing a policy and then getting labeled "insider threat".
| seanw444 wrote:
| And everybody gasped in surprise.
| tamimio wrote:
| I think the whole world knew that already.
| insane_dreamer wrote:
| > CrowdStrike disputed much of Semafor's reporting
|
| I expect some ex-employees to be disgruntled and present things
| in a way that makes CroudStrike look bad. That happens with every
| company.
|
| BUT, CrowdStrike has ZERO credibility at this point. I don't
| believe a word they say.
| Zigurd wrote:
| At some companies, like Boeing, the shorter list would be the
| gruntled employees.
| chaps wrote:
| Worked on a team that deployed crowdstrike agents to organize
| and... Yeah. One of the biggest problems we had was that the
| daemon would log a massive amount of stuff... But had no config
| for it to stop or reduce it.
| st3fan wrote:
| Found out that the CrowdStrike Mac agent (Falcon) sends all your
| secrets from environment variables to their cloud hosted SIEM. In
| plain text.
|
| Anyone with access to your CS SIEM can search for GitHub, aws,
| etc creds. Anything your devs, ops and sec teams use on their
| Macs.
|
| Only the Mac version does this. There is no way to disable this
| behaviour or a way to redact things.
|
| Another really odd design decision. They probably have many many
| thousands of plain text secrets from their customers stored in
| their SIEM.
| x3n0ph3n3 wrote:
| Can you provide some more info on this? How do you know? Is
| this documented somewhere?
|
| I'm sure this is going to raise red-flags in my IT department.
| st3fan wrote:
| Ask them to search for the usual env var names like
| GITHUB_TOKEN or AWS_ACCESS_KEY_ID.
| apimade wrote:
| Is this really a criticism? Because this has been the case
| forever with all security and SIEM tools. It's one of the
| reasons why the SIEM is the most locked down pieces of software
| in the business.
|
| Realistically, secrets alone shouldn't allow an attacker access
| - they should need access to infrastructure or a certificates
| in machines as well. But unfortunately that's not the case for
| many SaaS vendors.
| st3fan wrote:
| I think some configurability would be great. I would like to
| provide an allow list or the ability to redact.
| avree wrote:
| ""Speed was the most important thing," said Jeff Gardner, a
| senior user experience designer at CrowdStrike who said he was
| laid off in January 2023 after two years at the company. "Quality
| control was not really part of our process or our conversation."
|
| Their 'expert' on engineering process is a senior UX designer?
| Somehow, I doubt they were very close to the kernel patch
| deployment process.
___________________________________________________________________
(page generated 2024-09-13 23:00 UTC)