[HN Gopher] CrowdStrike ex-employees: 'Quality control was not p...
       ___________________________________________________________________
        
       CrowdStrike ex-employees: 'Quality control was not part of our
       process'
        
       Author : everybodyknows
       Score  : 119 points
       Date   : 2024-09-13 20:17 UTC (2 hours ago)
        
 (HTM) web link (www.semafor.com)
 (TXT) w3m dump (www.semafor.com)
        
       | Alupis wrote:
       | > "Speed was the most important thing," said Jeff Gardner, a
       | senior user experience designer at CrowdStrike who said he was
       | laid off in January 2023 after two years at the company. "Quality
       | control was not really part of our process or our conversation."
       | 
       | This type of article - built upon disgruntled former employees -
       | is worth about as much as the apology GrubHub gift card.
       | 
       | Look, I think just as poorly about CrowdStrike as anyone else out
       | there... but you can find someone to say anything, especially
       | when they have an axe to grind and a chance at some spotlight.
       | Not to mention this guy was a designer and wouldn't be involved
       | in QC anyway.
       | 
       | > Of the 24 former employees who spoke to Semafor, 10 said they
       | were laid off or fired and 14 said they left on their own. One
       | was at the company as recently as this summer. Three former
       | employees disagreed with the accounts of the others. Joey
       | Victorino, who spent a year at the company before leaving in
       | 2023, said CrowdStrike was "meticulous about everything it was
       | doing."
       | 
       | So basically we have nothing.
        
         | nyc_data_geek1 wrote:
         | >>So basically we have nothing.
         | 
         | Except the biggest IT outage ever. And a postmortem showing
         | their validation checks were insufficient. And a rollout
         | process that did not stage at all, just rawdogged straight to
         | global prod. And no lab where the new code was actually
         | installed and run prior to global rawdogging.
         | 
         | I'd say there's smoke, and numerous accounts of fire, which
         | this can be taken in the context of.
        
           | mewpmewp2 wrote:
           | There definitely was a huge outage, but based on the given
           | information we still can't know for sure how much they
           | invested in testing and quality control.
           | 
           | There's always a chance of failure even for the most
           | meticulous companies.
           | 
           | Now I'm not defending or excusing the company, but a singular
           | event like this can happen to anyone and nothing is 100%.
           | 
           | If thorough investigation revealed poor quality control
           | investment compared to what would be appropriate for a
           | company like this, then we can say for sure.
        
             | daedrdev wrote:
             | Two things are clear though
             | 
             | Nobody ran this update
             | 
             | The update was pushed globally to all computers
             | 
             | With that alone we know they have failed the simplest of
             | quality control methods for a piece of software as
             | widespread as theirs. This is even excluding that there
             | should have been some kind of error handling to allow the
             | computer to boot if they did push bad code.
        
               | busterarm wrote:
               | Also it's the _second_ time that they had done this in a
               | few short months.
               | 
               | They had previous bricked linux hosts earlier with a
               | similar type of update.
               | 
               | So we also know that they don't learn from their
               | mistakes.
        
               | rblatz wrote:
               | The blame for the Linux situation isn't as clear cut as
               | you make it out to be. Red hat rolled out a breaking
               | change to BPF which was likely a regression. That wasn't
               | caused directly by a crowdstrike update.
        
             | idkwhatimdoin wrote:
             | > If thorough investigation revealed poor quality control
             | investment compared to what would be appropriate for a
             | company like this, then we can say for sure.
             | 
             | We don't really need that thorough of an investigation.
             | They had no staged deploys when servicing millions of
             | machines. That alone is enough to say they're not running
             | the company correctly.
        
               | dartos wrote:
               | Totally agree.
               | 
               | I'd consider staggering a rollout to be the absolute
               | basics of due diligence.
               | 
               | Especially when you're building a critical part of
               | millions of customer machines.
        
               | mewpmewp2 wrote:
               | I would say that canary release is an absolute must 100%.
               | Except I can think of cases where it might still not be
               | enough. So, I just don't feel comfortable judging them
               | out of the box. Does all the evidence seem to point
               | against them? For sure. But I just don't feel comfortable
               | giving that final verdict without knowing for sure.
               | 
               | Specifically because this is about fighting against
               | malicious actors, where time can be of essence to deploy
               | some sort of protection against a novel threat.
               | 
               | If there's deadlines that you can go over, and nothing
               | bad happens, for sure. Always have canary releases, and
               | perfect QA, monitoring everything thoroughly, but I'm
               | just saying, there can be cases where damage that could
               | be done if you don't act fast enough, is just so much
               | worse.
               | 
               | And I don't know that it wasn't the case for them. I just
               | don't know.
        
               | dartos wrote:
               | In this case, they pretty much caused a worst case
               | scenario...
        
           | quietbritishjim wrote:
           | The sentence you quoted clearly meant, from the context,
           | "clearly we have nothing [to learn from the opinions of these
           | former employees]". Nothing in your comment is really
           | anything to do with that.
        
             | tomrod wrote:
             | Triangulation versus new signal.
        
           | sundvor wrote:
           | "Everyone" piles on Tesla all the time; a worthwhile
           | comparison would be how Tesla roll out vehicle updates.
           | 
           | Sometimes people are up in arms "where's my next version" (eg
           | when adaptive headlights was introduced), yet Tesla
           | prioritise a safe, slow roll out. Sometimes the updates fail
           | (and get resolved individually), but never on a global scale.
           | (None experienced myself, as a TM3 owner on the "advanced"
           | update preference).
           | 
           | I understand the premise of Crowdstrike's model is to have up
           | to date protection everywhere but clearly they didn't think
           | this through enough times, if at all.
        
             | kccqzy wrote:
             | You can also say the same thing about Google. Just go look
             | at the release notes on the App Store for the Google Home
             | app. There was a period of more than six months where every
             | single release said "over the next few weeks we're rolling
             | out the totally redesigned Google Home app: new easier to
             | navigate 5-tab layout."
             | 
             | When I read the same release notes so often I begin to
             | question whether this redesign is really taking more than
             | six months to roll out. And then I read the Sonos app
             | disaster and I thought that was the other extreme.
        
         | sonofhans wrote:
         | If design isn't involved in QC you're not doing QC very well.
         | If design isn't plugged into development process enough to
         | understand QC then you're not doing design very well.
        
           | tw04 wrote:
           | Why would a UX designer be involved in any way, shape, or
           | form in kernel level code patches? They would literally never
           | ship an update if they had that many hands in the pot for
           | something completely unrelated. Should they also have their
           | sales reps and marketing folks pre-brief before they make any
           | code changes?
        
         | darby_nine wrote:
         | I feel like crowdstrike is perfectly capable of mounting its
         | own defense
        
         | JumpCrisscross wrote:
         | > _This type of article - built upon disgruntled former
         | employees - is worth about as much as the apology GrubHub gift
         | card_
         | 
         | To you and me, maybe. To the insurers and airlines paying out
         | over the problem, maybe not.
        
         | bdcravens wrote:
         | I'm going with principle of least astonishment, where
         | productivity is more highly valued in most companies than
         | quality control.
        
         | insane_dreamer wrote:
         | > So basically we have nothing.
         | 
         | Except the fact that CrowdStrike fucked up the one thing they
         | weren't supposed to fuck up.
         | 
         | So yeah, at this point I'm taking the ex-employees' word,
         | because it confirms the results that we already know -- there
         | is no way that update could have gone out had there been proper
         | "safety first" protocols in place and CrowdStrike was
         | "meticulous".
        
         | theideaofcoffee wrote:
         | I just don't think a company like Crowdstrike has a leg to
         | stand on when leveling the "disgruntled" label in the face of
         | their, let's face it, astoundingly epic fuck up. It's the
         | disgruntled employees that I think would have the most clear
         | picture of what was going on, regardless of them being in QA/QC
         | or not because they, at that point, don't really care any more
         | and will be more forthright with their thoughts. I'd certainly
         | trust their info more than a company yes-man which is probably
         | where some of that opposing messaging came from.
        
       | Sarkie wrote:
       | It was shown in the RCA that their QA processes were shit
        
       | monksy wrote:
       | No shit.
        
       | nine_zeros wrote:
       | Typical of tech companies these days. Quality is considered
       | immaterial - or worse - put on low level managers and engineers
       | who don't have the time to clearly examine quality and good roll
       | out practices.
       | 
       | C-Suite and investors don't seem to want to spend on quality.
       | They should just price in that their stock investment could
       | collapse any day.
        
         | dgfitz wrote:
         | I wonder if their QA budget went to DEI initiatives, or both
         | were just vapid proclivities.
         | 
         | https://www.crowdstrike.com/careers/diversity-equity-and-inc...
        
       | 0xbadcafebee wrote:
       | Critical software infrastructure should be regulated the way
       | critical physical infrastructure is. We don't trust the people
       | who make buildings and bridges to "do the right thing" - we
       | mandate it with regulations and inspections. (When your software
       | not working strands millions of people around the globe, it's
       | critical) And this was just a regular old "accident"; imagine the
       | future, when a war has threat actors _trying_ to knock things
       | out.
        
         | theideaofcoffee wrote:
         | "We can't regulate the industry because then the US loses to
         | China" or "regulation will kill the US competitive advantage!"
         | responses I've had to suggesting the same and I just can't. But
         | I agree with you 100%. If it's safety critical, it should be
         | under even more scrutiny than other things, it shouldn't be
         | left to self-regulating QA-like processes in profit seeking
         | companies and has to have a bit more scrutiny before the big
         | button gets pressed.
        
           | janalsncm wrote:
           | > then the US loses to China
           | 
           | Yeah it makes no sense. Was the US not losing to China when
           | we own-goaled the biggest cybersecurity incident in history?
        
           | Zigurd wrote:
           | Not to mention humans going extinct because regulators are to
           | blame for there being no city on Mars. Because that's
           | definitely the reason there's no city on Mars.
        
       | pclmulqdq wrote:
       | Everything that we know about CrowdStrike stinks of Knight
       | Capital to me. A minor culture problem snowballed into complete
       | dysfunction, eventually resulting in a company-ending bug.
        
       | bb88 wrote:
       | Most interesting quote in the article:                   "It was
       | hard to get people to do sufficient testing sometimes," said
       | Preston         Sego, who worked at CrowdStrike from 2019 to
       | 2023. His job was to review the         tests completed by user
       | experience developers that alerted engineers to bugs
       | before proposed coding changes were released to customers. Sego
       | said he was          fired in February 2023 as an "insider
       | threat" after he criticized the         company's return to-work
       | policy on an internal Slack channel.
       | 
       | Okay clearly that company has a culture issue. Imagine
       | criticizing a policy and then getting labeled "insider threat".
        
       | seanw444 wrote:
       | And everybody gasped in surprise.
        
       | tamimio wrote:
       | I think the whole world knew that already.
        
       | insane_dreamer wrote:
       | > CrowdStrike disputed much of Semafor's reporting
       | 
       | I expect some ex-employees to be disgruntled and present things
       | in a way that makes CroudStrike look bad. That happens with every
       | company.
       | 
       | BUT, CrowdStrike has ZERO credibility at this point. I don't
       | believe a word they say.
        
         | Zigurd wrote:
         | At some companies, like Boeing, the shorter list would be the
         | gruntled employees.
        
       | chaps wrote:
       | Worked on a team that deployed crowdstrike agents to organize
       | and... Yeah. One of the biggest problems we had was that the
       | daemon would log a massive amount of stuff... But had no config
       | for it to stop or reduce it.
        
       | st3fan wrote:
       | Found out that the CrowdStrike Mac agent (Falcon) sends all your
       | secrets from environment variables to their cloud hosted SIEM. In
       | plain text.
       | 
       | Anyone with access to your CS SIEM can search for GitHub, aws,
       | etc creds. Anything your devs, ops and sec teams use on their
       | Macs.
       | 
       | Only the Mac version does this. There is no way to disable this
       | behaviour or a way to redact things.
       | 
       | Another really odd design decision. They probably have many many
       | thousands of plain text secrets from their customers stored in
       | their SIEM.
        
         | x3n0ph3n3 wrote:
         | Can you provide some more info on this? How do you know? Is
         | this documented somewhere?
         | 
         | I'm sure this is going to raise red-flags in my IT department.
        
           | st3fan wrote:
           | Ask them to search for the usual env var names like
           | GITHUB_TOKEN or AWS_ACCESS_KEY_ID.
        
         | apimade wrote:
         | Is this really a criticism? Because this has been the case
         | forever with all security and SIEM tools. It's one of the
         | reasons why the SIEM is the most locked down pieces of software
         | in the business.
         | 
         | Realistically, secrets alone shouldn't allow an attacker access
         | - they should need access to infrastructure or a certificates
         | in machines as well. But unfortunately that's not the case for
         | many SaaS vendors.
        
           | st3fan wrote:
           | I think some configurability would be great. I would like to
           | provide an allow list or the ability to redact.
        
       | avree wrote:
       | ""Speed was the most important thing," said Jeff Gardner, a
       | senior user experience designer at CrowdStrike who said he was
       | laid off in January 2023 after two years at the company. "Quality
       | control was not really part of our process or our conversation."
       | 
       | Their 'expert' on engineering process is a senior UX designer?
       | Somehow, I doubt they were very close to the kernel patch
       | deployment process.
        
       ___________________________________________________________________
       (page generated 2024-09-13 23:00 UTC)