[HN Gopher] CrowdStrike ex-employees: 'Quality control was not p...
___________________________________________________________________
CrowdStrike ex-employees: 'Quality control was not part of our
process'
Author : everybodyknows
Score : 528 points
Date : 2024-09-13 20:17 UTC (1 days ago)
(HTM) web link (www.semafor.com)
(TXT) w3m dump (www.semafor.com)
| Alupis wrote:
| > "Speed was the most important thing," said Jeff Gardner, a
| senior user experience designer at CrowdStrike who said he was
| laid off in January 2023 after two years at the company. "Quality
| control was not really part of our process or our conversation."
|
| This type of article - built upon disgruntled former employees -
| is worth about as much as the apology GrubHub gift card.
|
| Look, I think just as poorly about CrowdStrike as anyone else out
| there... but you can find someone to say anything, especially
| when they have an axe to grind and a chance at some spotlight.
| Not to mention this guy was a designer and wouldn't be involved
| in QC anyway.
|
| > Of the 24 former employees who spoke to Semafor, 10 said they
| were laid off or fired and 14 said they left on their own. One
| was at the company as recently as this summer. Three former
| employees disagreed with the accounts of the others. Joey
| Victorino, who spent a year at the company before leaving in
| 2023, said CrowdStrike was "meticulous about everything it was
| doing."
|
| So basically we have nothing.
| nyc_data_geek1 wrote:
| >>So basically we have nothing.
|
| Except the biggest IT outage ever. And a postmortem showing
| their validation checks were insufficient. And a rollout
| process that did not stage at all, just rawdogged straight to
| global prod. And no lab where the new code was actually
| installed and run prior to global rawdogging.
|
| I'd say there's smoke, and numerous accounts of fire, which
| this can be taken in the context of.
| mewpmewp2 wrote:
| There definitely was a huge outage, but based on the given
| information we still can't know for sure how much they
| invested in testing and quality control.
|
| There's always a chance of failure even for the most
| meticulous companies.
|
| Now I'm not defending or excusing the company, but a singular
| event like this can happen to anyone and nothing is 100%.
|
| If thorough investigation revealed poor quality control
| investment compared to what would be appropriate for a
| company like this, then we can say for sure.
| daedrdev wrote:
| Two things are clear though
|
| Nobody ran this update
|
| The update was pushed globally to all computers
|
| With that alone we know they have failed the simplest of
| quality control methods for a piece of software as
| widespread as theirs. This is even excluding that there
| should have been some kind of error handling to allow the
| computer to boot if they did push bad code.
| busterarm wrote:
| Also it's the _second_ time that they had done this in a
| few short months.
|
| They had previous bricked linux hosts earlier with a
| similar type of update.
|
| So we also know that they don't learn from their
| mistakes.
| rblatz wrote:
| The blame for the Linux situation isn't as clear cut as
| you make it out to be. Red hat rolled out a breaking
| change to BPF which was likely a regression. That wasn't
| caused directly by a crowdstrike update.
| IcyWindows wrote:
| At least one of the incidents involved Debian machines,
| so I don't understand how Red Hat's change would be
| related.
| rblatz wrote:
| Sorry, that's correct it was Debian, but Debian did apply
| a RHEL specific patch to their kernel. That's the
| relationship to red hat.
| busterarm wrote:
| It's not about the blame, it's about how you respond to
| incidents and what mitigation steps you take. Even if
| they aren't directly responsible, they clearly didn't
| take proper mitigation steps when they encountered the
| problem.
| roblabla wrote:
| How do you mitigate the OS breaking an API below you in
| an update? Test the updates before they come out? Even if
| you could, you'd still need to deploy a fix before the OS
| update hits the customers, and anyone that didn't update
| would still be affected.
|
| The linux case is just _very_ different from the windows
| case. The mitigation steps that could have been taken to
| avoid the linux problem would not have helped for the
| windows outage anyways, the problems are just too
| different. The linux update was about an OS update
| breaking their program, while the windows issue was about
| a configuration change they made triggering crashes in
| their driver.
| busterarm wrote:
| You're missing the forest for the trees.
|
| It's: a) an update, b) pushed out globally without proper
| testing, c) that bricked the OS.
|
| It's an obvious failure mode that if you have a proper
| incident response process would be revealed from that
| specific incident and flagged for needing mitigation.
|
| I do this specific thing for a living. You don't just
| address the exact failure that happened but try to
| identify classes of risk in your platform.
|
| > Even if you could, you'd still need to deploy a fix
| before the OS update hits the customers, and anyone that
| didn't update would still be affected.
|
| And yet the problem would still only affect Crowdstrike's
| paying customers. No matter how much you blame upstream
| your paying customers are only ever going to blame their
| vendor because the vendor had discretion to test and not
| release the update. As their customers should.
| ScottBurson wrote:
| > there should have been some kind of error handling
|
| This is the point I would emphasize. A kernel module that
| parses configuration files must defend itself against a
| failed parse.
| hn_throwaway_99 wrote:
| While I agree with this, from a software engineering
| perspective I think it's more useful to look at the
| lessons learned. I think it's too easy to just throw
| "Crowdstrike is a bunch of idiots" against the wall, and
| I don't think that's true.
|
| It's clear to me that CrowdStrike saw this as a _data_
| update vs. a _code_ update, and that they had much more
| stringent QA procedures for code updates that they did
| data updates. It 's very easy for organizations to lull
| themselves into this false sense of security when they
| make these kinds of delineations (sometimes even
| subconsciously at first), and then over time they lose
| site of the fact that a bad data update can be just as
| catastrophic as a bad code update. I've seen shades of
| this issue elsewhere many times.
|
| So all that said, I think your point is valid. I know
| Crowdstrike had the posture that they wanted to get
| vulnerability files deployed globally as fast as possible
| upon a new threat detection in order to protect their
| clients, but it wouldn't have been that hard to build in
| some simple checks in their build process (first deploy
| to a test bed, then deploy globally) even if they felt a
| slower staged rollout would have left too many of their
| clients unprotected for too long.
|
| Hindsight is always 20/20, but I think the most important
| lesson is that this code vs data dichotomy can be
| dangerous if the implications are not fully understood.
| llm_trw wrote:
| I'm sorry but there comes a point where you have to call
| a spade a spade.
|
| When you have the trifecta of regex, *argv packing and
| uninitialized memory you're reaching levels of
| incompetence which require being actively malicious and
| not just stupid.
| abraae wrote:
| > It's clear to me that CrowdStrike saw this as a data
| update vs. a code update, and that they had much more
| stringent QA procedures for code updates that they did
| data updates.
|
| It cannot have been a surprise to Crowdstrike that
| pushing bad data had the potential to bork the target
| computer. So if they had such an attitude that would
| indicate striking incompetence. So perhaps you are right.
| Comma2976 wrote:
| Crowdstrike is a bunch of idiots
| mavhc wrote:
| If they weren't idiots they wouldn't be parsing data in
| the kernel level module
| GuB-42 wrote:
| It could have been ok to expedite data updates, should
| the code treat configuration data as untrusted input, as
| if it could be written by an attacker. It means fuzz
| testing and all that.
|
| Obviously the system wasn't very robust, as a simple,
| within specs change could break it. A company like
| CrowdStrike, which routinely deals with memory exploits
| and claims to do "zero trust" should know better.
|
| As often, there is a good chance it is an organization
| problem. The team in charge of the parsing expected that
| the team in charge of the data did their tests and made
| sure the files weren't broken, while on the other side,
| they expected the parser to be robust and at worst, a
| quick rollback could fix the problem. This may indeed be
| the sign of a broken company culture, which would give
| some credit to the ex-employees.
| Izkata wrote:
| > Obviously the system wasn't very robust, as a simple,
| within specs change could break it.
|
| From my limited understanding, the file was corrupted in
| some way. Lots of NULL bytes, something like that.
| GuB-42 wrote:
| From the report, it seems the problem is that they added
| a feature that could use 21 arguments, but there was only
| enough space for 20. Until now, no configuration used all
| 21 (the last one was a wildcard regex, which apparently
| didn't count), but when they finally did, it caused a
| buffer overflow and crashed.
| acdha wrote:
| That rumor floated around Twitter but the company quickly
| disavowed it. The problem was that they added an extra
| parameter to a common function but never tested it with a
| non-wildcard value, revealing a gap in their code
| coverage review:
|
| https://www.crowdstrike.com/wp-
| content/uploads/2024/08/Chann...
| RaftPeople wrote:
| > _It 's clear to me that CrowdStrike saw this as a data
| update vs. a code update_
|
| > _Hindsight is always 20 /20, but I think the most
| important lesson is that this code vs data dichotomy can
| be dangerous if the implications are not fully
| understood._
|
| But it's not some new condition that the industry hasn't
| already been dealing with for many many decades (i.e.
| code vs config vs data vs any other type of change to
| system, etc.).
|
| There are known strategies to reduce the risk.
| idkwhatimdoin wrote:
| > If thorough investigation revealed poor quality control
| investment compared to what would be appropriate for a
| company like this, then we can say for sure.
|
| We don't really need that thorough of an investigation.
| They had no staged deploys when servicing millions of
| machines. That alone is enough to say they're not running
| the company correctly.
| dartos wrote:
| Totally agree.
|
| I'd consider staggering a rollout to be the absolute
| basics of due diligence.
|
| Especially when you're building a critical part of
| millions of customer machines.
| mewpmewp2 wrote:
| I would say that canary release is an absolute must 100%.
| Except I can think of cases where it might still not be
| enough. So, I just don't feel comfortable judging them
| out of the box. Does all the evidence seem to point
| against them? For sure. But I just don't feel comfortable
| giving that final verdict without knowing for sure.
|
| Specifically because this is about fighting against
| malicious actors, where time can be of essence to deploy
| some sort of protection against a novel threat.
|
| If there's deadlines that you can go over, and nothing
| bad happens, for sure. Always have canary releases, and
| perfect QA, monitoring everything thoroughly, but I'm
| just saying, there can be cases where damage that could
| be done if you don't act fast enough, is just so much
| worse.
|
| And I don't know that it wasn't the case for them. I just
| don't know.
| dartos wrote:
| In this case, they pretty much caused a worst case
| scenario...
| acdha wrote:
| > Specifically because this is about fighting against
| malicious actors, where time can be of essence to deploy
| some sort of protection against a novel threat.
|
| This is severely overstating the problem: an extra few
| minutes is not going to be the difference between their
| customers being compromised. Most of the devices they run
| on are never compromised, because anyone remotely serious
| has defense in depth.
|
| If it was true, or even close to true, that would make
| the criticism more rather than less strong. If time is of
| the essence, you invest in things like reviewing test
| coverage (their most glaring lapse), fuzz testing, and
| common reliability engineering techniques like having the
| system roll back to the last known good configuration
| after it's failed to load. We think of progressive
| rollouts as common now but they got to get that
| mainstream in large part because the Google Chrome team
| realized rapid updates are important but then asked what
| they needed to do to make them safe. CrowdStrike's report
| suggests that they wanted rapid but weren't willing to
| invest in the implementation because that isn't a
| customer-visible feature - until it very painfully became
| one.
| wlonkly wrote:
| I also fall on the side of "stagger the rollout" (or
| "give customers tools to stagger the rollout"), but at
| the same time I recognize that a lot of customers would
| not accept delays on the latest malware data.
|
| _Before_ the incident, if you asked a customer if they
| would like to get updates faster even if it means that
| there is a remote chance of a problem with them... I bet
| they 'd still want to get updates faster.
| canucker2016 wrote:
| They literally half-assed their deployment process - one
| part enterprisey, one part "move fast and break things".
|
| Guess which part took down much of the corporate world?
|
| from Preliminary Post Incident Review at
| https://www.crowdstrike.com/falcon-content-update-
| remediatio... :
|
| "CrowdStrike delivers security content configuration
| updates to our sensors in two ways: Sensor Content that
| is shipped with our sensor directly, and Rapid Response
| Content that is designed to respond to the changing
| threat landscape at operational speed.
|
| ...
|
| The sensor release process begins with automated testing,
| both prior to and after merging into our code base. This
| includes unit testing, integration testing, performance
| testing and stress testing. This culminates in a staged
| sensor rollout process that starts with dogfooding
| internally at CrowdStrike, followed by early adopters. It
| is then made generally available to customers. Customers
| then have the option of selecting which parts of their
| fleet should install the latest sensor release ('N'), or
| one version older ('N-1') or two versions older ('N-2')
| through Sensor Update Policies.
|
| The event of Friday, July 19, 2024 was not triggered by
| Sensor Content, which is only delivered with the release
| of an updated Falcon sensor. Customers have complete
| control over the deployment of the sensor -- which
| includes Sensor Content and Template Types.
|
| ...
|
| Rapid Response Content is used to perform a variety of
| behavioral pattern-matching operations on the sensor
| using a highly optimized engine.
|
| Newly released Template Types are stress tested across
| many aspects, such as resource utilization, system
| performance impact and event volume. For each Template
| Type, a specific Template Instance is used to stress test
| the Template Type by matching against any possible value
| of the associated data fields to identify adverse system
| interactions.
|
| Template Instances are created and configured through the
| use of the Content Configuration System, which includes
| the Content Validator that performs validation checks on
| the content before it is published.
|
| On July 19, 2024, two additional IPC Template Instances
| were deployed. Due to a bug in the Content Validator, one
| of the two Template Instances passed validation despite
| containing problematic content data.
|
| Based on the testing performed before the initial
| deployment of the Template Type (on March 05, 2024),
| trust in the checks performed in the Content Validator,
| and previous successful IPC Template Instance
| deployments, these instances were deployed into
| production."
| hello_moto wrote:
| > one part enterprisey, one part "move fast and break
| things".
|
| When there's 0day, how enterprisey you would like to
| catch the 0day?
| tsimionescu wrote:
| Not sure, but definitely more enterprisey than "release a
| patch to the entire world at once before running it on a
| _single machine_ in-house ".
| mewpmewp2 wrote:
| So it would be preferable to have your data encrypted,
| taken hostage unless you pay, and be down for days,
| instead of 6 hours of just down?
| xeromal wrote:
| That's a false dichotomy
| tsimionescu wrote:
| Do you seriously believe that _all_ CrowdStrike on
| Windows customers were at such imminent risk of
| ransomware that one-two hours to run this on one internal
| setup and catch the critical error they released would
| have been dangerous?
|
| This is a ludicrous position, and has been proven
| obviously false by the proceedings: all systems that were
| crashed by this critical failure were not, in fact,
| attacked with ransomware once the CS agent was un-
| installed (at great pain).
| Aeolun wrote:
| Nonsense. You don't need any staged deploys if you simply
| make no mistakes.
|
| /s
| quietbritishjim wrote:
| The sentence you quoted clearly meant, from the context,
| "clearly we have nothing [to learn from the opinions of these
| former employees]". Nothing in your comment is really
| anything to do with that.
| tomrod wrote:
| Triangulation versus new signal.
| sundvor wrote:
| "Everyone" piles on Tesla all the time; a worthwhile
| comparison would be how Tesla roll out vehicle updates.
|
| Sometimes people are up in arms "where's my next version" (eg
| when adaptive headlights was introduced), yet Tesla
| prioritise a safe, slow roll out. Sometimes the updates fail
| (and get resolved individually), but never on a global scale.
| (None experienced myself, as a TM3 owner on the "advanced"
| update preference).
|
| I understand the premise of Crowdstrike's model is to have up
| to date protection everywhere but clearly they didn't think
| this through enough times, if at all.
| kccqzy wrote:
| You can also say the same thing about Google. Just go look
| at the release notes on the App Store for the Google Home
| app. There was a period of more than six months where every
| single release said "over the next few weeks we're rolling
| out the totally redesigned Google Home app: new easier to
| navigate 5-tab layout."
|
| When I read the same release notes so often I begin to
| question whether this redesign is really taking more than
| six months to roll out. And then I read the Sonos app
| disaster and I thought that was the other extreme.
| cesarb wrote:
| > Just go look at the release notes on the App Store for
| the Google Home app. [...] When I read the same release
| notes so often I begin to question whether this redesign
| is really taking more than six months to roll out.
|
| Google is terrible at release notes. Since several years
| ago, the release notes for the "Google" app on the
| Android app store always shows the exact same four
| unchanging entries, loosely translating from Portuguese:
| "enhanced search page appearance", "new doodles designed
| for app experience", "offline voice actions (play music,
| enable Wi-Fi, enable flashlight) - available only in the
| USA", "web pages opened directly within the app". I
| heavily doubt it's taking these many years to roll out
| these changes; they probably simply don't care anymore,
| and never update these app store release notes.
| hello_moto wrote:
| > And no lab where the new code was actually installed and
| run prior to global rawdogging.
|
| I thought the new code was actually installed, the running
| part depends on the script input...?
| sonofhans wrote:
| If design isn't involved in QC you're not doing QC very well.
| If design isn't plugged into development process enough to
| understand QC then you're not doing design very well.
| tw04 wrote:
| Why would a UX designer be involved in any way, shape, or
| form in kernel level code patches? They would literally never
| ship an update if they had that many hands in the pot for
| something completely unrelated. Should they also have their
| sales reps and marketing folks pre-brief before they make any
| code changes?
| sonofhans wrote:
| A UX designer might have told them it was a bad idea to
| deploy the patch widely without testing a smaller cohort,
| for instance. That's an obvious measure that they skipped
| this time.
| newshackr wrote:
| But that doesn't have anything to do with what UX
| designers typically do
| hello_moto wrote:
| the person you're replying will not take any sane
| argument once they decided that UX must be involved in
| kernel technical decision...
| sonofhans wrote:
| Pfft, I never said that at all. I'm not talking about
| technical decisions. OP was talking about QC, which is
| verifying software for human use. If you don't have user-
| centered people involved (UX or product or proserve) then
| you end up with user-hostile decisions like these people
| made.
| sigseg1v wrote:
| How would it not be related? Jamming untested code down
| the pipe with no way for users to configure when it's
| deployed and then rendering their machines inoperable is
| an extremely bad user experience and I would absolutely
| expect a UX expert to step in to try to avoid that.
| diatone wrote:
| Not true; UX designers typically are responsible for
| advocating for a robust, intuitive experience for users.
| The fact that kernel updates don't have a user interface
| doesn't make them exempt from asking the simple question:
| how will this affect users? And the subsequent question:
| is there a chance that deploying this eviscerates the
| user experience?
|
| Granted, a company that isn't focused on the user
| experience as much as it is on other things might not
| prioritise this as much in the first place.
| fzeroracer wrote:
| I can't believe people on HN are posting this stuff over
| and over again. Either you are holistically disconnected
| from what proper software development should look like or
| outright creating the same environments that resulted in
| the crowdstrike issue.
|
| Software security and quality is the responsibility of
| everyone on the team. A good UX designer should be
| thinking of ways a user can escape the typical flow or
| operate in unintended ways and express that to testers.
| And in decisions where management is forcing untested
| patches everyone should chime in.
| zipy124 wrote:
| I would agree if it was a UI designer, but a good UX
| designer designs for the users, which in this case
| including the system admins who will be updating kernel
| level code patches. Ensuring they have a good experience
| e.g no crashes, is their job. A recommendation would likely
| be for example small roll-outs to minimise the number of
| people having a bad user experience on a roll-out that goes
| wrong.
| darby_nine wrote:
| I feel like crowdstrike is perfectly capable of mounting its
| own defense
| JumpCrisscross wrote:
| > _This type of article - built upon disgruntled former
| employees - is worth about as much as the apology GrubHub gift
| card_
|
| To you and me, maybe. To the insurers and airlines paying out
| over the problem, maybe not.
| bdcravens wrote:
| I'm going with principle of least astonishment, where
| productivity is more highly valued in most companies than
| quality control.
| insane_dreamer wrote:
| > So basically we have nothing.
|
| Except the fact that CrowdStrike fucked up the one thing they
| weren't supposed to fuck up.
|
| So yeah, at this point I'm taking the ex-employees' word,
| because it confirms the results that we already know -- there
| is no way that update could have gone out had there been proper
| "safety first" protocols in place and CrowdStrike was
| "meticulous".
| theideaofcoffee wrote:
| I just don't think a company like Crowdstrike has a leg to
| stand on when leveling the "disgruntled" label in the face of
| their, let's face it, astoundingly epic fuck up. It's the
| disgruntled employees that I think would have the most clear
| picture of what was going on, regardless of them being in QA/QC
| or not because they, at that point, don't really care any more
| and will be more forthright with their thoughts. I'd certainly
| trust their info more than a company yes-man which is probably
| where some of that opposing messaging came from.
| paulcole wrote:
| Why would you trust a company no-man any more than a company
| yes-man? They both have agendas and biases. Is it just that
| you personally prefer one set of biases (anti-company) more
| than the other (pro-company)?
| theideaofcoffee wrote:
| Yes, I am very much biased toward being anti-company and I
| make no apologies for that. I've been in the corporate
| world long enough to know first-hand the sins that PR and
| corporate management commits on the company's behalf and
| the harm it does. I find information coming from the
| individual more reliable than having it filtered through
| corpo PR, legal, ass-covering nonsense, the latter group
| often wanting to preserve the status quo than getting out
| actual info.
| paulcole wrote:
| OK just checking. Nice that you at least acknowledge your
| bias.
| noisy_boy wrote:
| Because there is still an off-hand chance that an employee
| who has been let go isn't speaking out of spite and merely
| stating the facts - depends on a combination of their
| honesty and the feeling they harbor about being let go.
| Everyone who is let go isn't bitter and/or a liar.
|
| However, every company yes-man is paid to be a yes-man and
| will speak in favor of the company without exception - that
| literally is the job. Otherwise they will be fired and will
| join the ranks of the aforementioned people.
|
| So logically it makes more sense for me to believe the
| former more than the latter. The two-sides are not
| equivalent (as you may have alluded) in term of
| trustworthiness.
| nullvoxpopuli wrote:
| Agreed. As a data point, i'm not disgruntled (i'm quoted
| in this article).
|
| Mostly disappointed.
| insane_dreamer wrote:
| Well, in this case, we know one side (pro-company) fucked
| up big time. The other side (anti-company) may or may not
| have fucked up.
|
| That makes it easier to trust one side over another.
| paulcole wrote:
| You've kind of set yourself up in a no-lose situation
| here.
|
| If the employees fucked up then you'll say the company
| still fucked up because it wasn't managing the employees
| well.
|
| And then in that situation you'll still believe the lying
| employees who say its the company's fault while leaving
| out their culpability.
| tooltower wrote:
| This is like online reviews. If you selectively take positive
| or negative reviews and somehow censor the rest, the reviews
| are worthless. Yet, if you report on all the ones you find,
| it's still useful.
|
| Yes, I'm more likely to leave reviews if I'm unsatisfied. Yes,
| people are more likely to leave CS if they were unhappy. Biased
| data, but still useful data.
| denkmoon wrote:
| Well they certainly don't care about the speed of the endpoints
| their malware runs on. Shit has ruined my macos laptop's
| performance.
| nullvoxpopuli wrote:
| All EDR software does (at least on macos)
|
| Source: me, a developer who also codes in free time and
| notices how bad fs perf is especially.
|
| I've had the CrowdStrike sensor, and my current company is
| using cyberhaven.
|
| So.. while 2 data points don't technically make a pattern, it
| does begin to raise suspicion.
| Aeolun wrote:
| Honestly, this article describes nearly all companies (from the
| perspective of the engineers) so I'm not sure I find it hard to
| believe this one is the same.
| zik wrote:
| Here's some anecdotal evidence - a friend worked at CrowdStrike
| and was horrified at how incredibly disorganised the whole
| place was. They said it was completely unsurprising to them
| that the outage occurred. More surprising to them was that it
| hadn't happened more often given what a clusterfrock the place
| was.
| wpietri wrote:
| > So basically we have nothing.
|
| No, what we have is a publication who is claiming that the
| people they talked to were credible and had points that were
| interesting and tended to match one another and/or other
| evidence.
|
| You can make the claim that Semafor is bad at their jobs, or
| even that they're malicious. But that's a hard claim to make
| given that in the paragraph you've quoted they are giving you
| the contrary evidence that they found.
|
| And this is a process many of us have done informally. When we
| talk to one ex-employee of a company, well maybe it was just
| that guy, or just where he was in the company. But when a bunch
| of people have the same complaint, it's worth taking it much
| more seriously.
| lr4444lr wrote:
| In principle yes, I agree that former employees' sentiments
| have an obvious bias, but if they all trend in the same
| direction - people who worked in different times and functions
| and didn't know each other while on the job - that points to a
| likely underlying truth.
| _heimdall wrote:
| I do agree with having to expect bias there, but who else do
| you really expect to speak out?Any current employee would very
| quickly become an ex-employee if they speak out with any
| specifics.
|
| I would expect any contractor that may have worked for
| CrowdStrike, or done something like a third-party audit, would
| be under an NDA covering their work.
|
| Who's left to speak out with any meaningful details?
| _fat_santa wrote:
| > Quality control was not really part of our process or our
| conversation.
|
| Is anyone really surprised or learned any new information? For
| us that have worked for tech companies, this is one of those
| repeating complaints that you hear across orgs that indicates a
| less than stellar engineering culture.
|
| I've worked with numerous F500 orgs and I would say 3/5 orgs
| that I worked in, their code was so bad that it made me wonder
| how they haven't had a major incident yet.
| skenderbeu wrote:
| Disgruntled are the Crowdstrike customers that had to deal with
| the outage. These employees have a lot of reputation to lose
| for coming forward. Crowstrike is a disgrace of a company and
| many others like it are doing the same behaviors but they just
| haven't gotten caught yet. Software development has become a
| disgrace when the bottom line of squeezing margins to please
| investors took over.
| iudqnolq wrote:
| There are some very specific accusations backed up by non-
| denials from crowdstrike.
|
| Ex-employees said bugs caused the log monitor to drop entries.
| Crowdstrike responded the project was never designed to alert
| in real time. But Crowdstrike's website currently advertises it
| as working in real time.
|
| Ex-employees said people trained to monitor laptops were
| assigned to monitor AWS accounts with no extra training.
| Crowdstrike replied that "there were no experienced 'cloud
| threat hunters' to be had" in 2022 and that optional training
| was available to the employees.
| Sarkie wrote:
| It was shown in the RCA that their QA processes were shit
| monksy wrote:
| No shit.
| nine_zeros wrote:
| Typical of tech companies these days. Quality is considered
| immaterial - or worse - put on low level managers and engineers
| who don't have the time to clearly examine quality and good roll
| out practices.
|
| C-Suite and investors don't seem to want to spend on quality.
| They should just price in that their stock investment could
| collapse any day.
| 0xbadcafebee wrote:
| Critical software infrastructure should be regulated the way
| critical physical infrastructure is. We don't trust the people
| who make buildings and bridges to "do the right thing" - we
| mandate it with regulations and inspections. (When your software
| not working strands millions of people around the globe, it's
| critical) And this was just a regular old "accident"; imagine the
| future, when a war has threat actors _trying_ to knock things
| out.
| theideaofcoffee wrote:
| "We can't regulate the industry because then the US loses to
| China" or "regulation will kill the US competitive advantage!"
| responses I've had to suggesting the same and I just can't. But
| I agree with you 100%. If it's safety critical, it should be
| under even more scrutiny than other things, it shouldn't be
| left to self-regulating QA-like processes in profit seeking
| companies and has to have a bit more scrutiny before the big
| button gets pressed.
|
| Edit: Disclaimer: The quotes aren't mine, just retorts I've
| received from others when I suggest the R-word.
| janalsncm wrote:
| > then the US loses to China
|
| Yeah it makes no sense. Was the US not losing to China when
| we own-goaled the biggest cybersecurity incident in history?
| worik wrote:
| > then the US loses to China
|
| Such a silly meme, too. Economics 101 China and USA would
| both benefit by halting the conflict and trading with each
| other
| Zigurd wrote:
| Not to mention humans going extinct because regulators are to
| blame for there being no city on Mars. Because that's
| definitely the reason there's no city on Mars.
| owl57 wrote:
| Did you notice that the piece of software in question was
| apparently installed mostly in companies where regulations and
| inspections already override sysadmins' common sense? Are you
| sure the answer is simply more of the same?
| acdha wrote:
| It's not true that "common sense" is being overridden: most
| companies and sysadmins do need that baseline to avoid
| "forgetting" about things which aren't trivial to implement
| (if you didn't work in the field 10+ years ago, it was common
| to see systems getting patched annually or worse, people
| opening up SSH/Remote Desktop to the internet for
| convenience, shared/short passwords even for privileged
| accounts, vendors would require horribly insecure
| configuration because they didn't want to hire anyone who
| knew how to do things better, etc.). There are drawbacks to
| compliance security but it has been useful for flushing all
| of that mess out.
|
| Even if it wasn't wrong, that's still the wrong reaction.
| We're in this situation because so many companies were
| negligent in the past and the status quo was obviously
| untenable. If there is a problem with a given standard the
| solution is to make a better system (e.g. like Apple did)
| rather than to say one of the most important industries in
| the world can't be improved because that'd require a small
| fraction of its budget.
| sitkack wrote:
| I sure noticed how much snark you packed into two sentences!
| 0xbadcafebee wrote:
| I've worked in these enterprise organizations for a long
| time. They don't run on common sense, or even what one might
| consider "business sense". Their existing incentives create
| bizarre behavior.
|
| For example, you might think _" if a big security exploit
| happens, the stock price might tank"_. So if they value the
| stock price, they'll focus on security, right?. In reality
| what they do is focus on _burying the evidence_ of security
| exploits. Because if nobody finds out, the stock price won 't
| tank. Much easier than doing the work of actually securing
| things. And apparently it's often legal.
|
| And when it's not a bizarre incentive, often people just
| ignore risks, or even low-level failures, until it's too
| late. Four-way intersections can pile up accidents for years
| until a school bus full of kids gets T-boned by a dump truck.
| We can't expect people to do the right thing even if they
| notice a problem. Something has to force the right thing.
|
| The _only_ thing I have ever seen force an executive to do
| the right thing is a law that says they will be held liable
| if they don 't. That's still not a guarantee it will actually
| happen correctly, course. But they will put pressure on their
| underlings to at least try to make it happen.
|
| On top of that, I would have standards that they are required
| to follow, the way building codes specify the standard
| tolerances, sizes, engineering diagrams, etc that need to be
| followed and inspected before someone is allowed into the
| building. This would enforce the quality control (and someone
| impartial to check it) that was lacking recently.
|
| This will have similar results as building codes - increased
| bureaucracy, cost, complexity, time... but also, more safety.
| I think for _critical_ things, we really do need it.
| Industrial controls, like those used for water, power
| (nuclear...), gas, etc, need it. Tanker and container ships,
| trains /subways, airlines, elevators, fire suppressants,
| military/defense, etc. The few, but very, very important,
| systems.
|
| If somebody else has better ideas, believe me, I am happy to
| hear them....
| abbadadda wrote:
| Probably there should be an independent body that oversees
| postmortems on tech issues, with the ability to suggest
| changes. This is what airlines face during crash
| investigations and often new rules are put in place (e.g.,
| don't let the shift manager self-certify his own work in
| the incident where the pilot's window popped off). How this
| would look like with software companies, and what the bar
| is for being subject to this rigor I don't know (I suspect
| not for a Candy Crush outage though).
|
| In general, the biggest problem I see with late stage
| capitalism, and a lack of accountability in general, is
| that given the right incentives people will "fuck things
| up" faster than you can stop them. For example, say
| CrowdStrike was skirting QA - what's my incentive as an
| individual employee versus the incentive of an executive at
| the company? If the exec can't tell the difference between
| good QA and bad QA, but can visually see the accounting
| numbers go up when QA is underfunded, he's going to
| optimize for stock price. And as an IC there's not much you
| can do unless you're willing to fight the good fight day in
| and day out. But when management repeatedly communicates
| they do not reward that behavior, and indeed may not care
| at all about software quality over a 5 year time horizon,
| what do you do? The key lies in finding ways to convince
| executives or short of that holding them to account like
| you say.
| theideaofcoffee wrote:
| I've commented on this before, but in this case I think
| it starts to fall onto the laps of the individual
| employees themselves by way of licensing, or at least
| some sort of certification system. Sure, you could skirt
| a test here or there, but then you'd only be shorting
| yourself when shit hits the fan. It'd be your license and
| essentially your livelihood on the line.
|
| "Proper" engineering disciplines have similar systems
| like the Professional Engineer cert via the NSPE that
| requires designs be signed off. If you had the
| requirement that all software engineers (now with the
| certification actually bestowing them the proper title of
| 'engineer') sign off on their design, you could prevent
| the company from just finding someone else more
| unscrupulous to push that update or whatever through. If
| the entirety of the department or company is employing
| properly certificated people, they'd be stuck actually
| doing it the right way.
|
| That's their incentive to do it correctly: sign your name
| to it, or lose your license, and just for drama's sake,
| don't collect $200, directly to jail. For the companies,
| employ properly licensed engineers, or risk unlimited
| downside liability when shit goes sideways, similar to
| what might happen if an engineering firm built a shoddy
| bridge.
|
| Would a firm that peddles some sort of CRUD app need to
| go through all of this? If it handles toxic data like
| payments or health data or other PII, sure. Otherwise,
| probably not, just like you have small contracting
| outfits that build garden sheds or whatever being a bit
| different than those that maintain, say, cooling systems
| for nuclear plants. Perhaps a law might be written to
| include companies that work in certain industries or
| business lines to compel them to do this.
| chii wrote:
| While good, those ideas will all increase costs.
|
| Would you pay 10x (or more, even) for these systems? That
| means 10x the price of water, utilities, transport etc,
| which then accumulate up the chain to make other things
| which don't have criticality but do depend on the ones that
| do.
|
| The thing is, what exists today exists because it's the
| path of least resistence.
| solidninja wrote:
| No, it exists because of all must bow to the deity of
| increasing shareholder value. Remember that good product
| is not necessarily equal or even a subset of the easy to
| sell product. Only once the incentives are aligned
| towards building quality software that lasts will we see
| change.
| duckmysick wrote:
| You're right (not sure about the exact factor though) -
| and there's also additional costs when those systems
| fail. Someone, somewhere lost money when all those planes
| were grounded and services suspended.
|
| At some point - maybe it already happened, I don't know -
| spending more on preventive measures and maintenance will
| be the path of least resistance.
| tempodox wrote:
| Cars without seat belts were the path of least resistance
| for a long time. I wonder how that changed.
| insane_dreamer wrote:
| > Would you pay 10x (or more, even) for these systems?
|
| if it's critical to your business, then yes; but you
| quickly find out whether or not it's actually critical to
| your business or whether it's something you can do
| without
| Vegenoid wrote:
| Consumer costs would not go up 10x to put more care into
| ensuring the continuous operation of critical IT
| infrastructure. Things like "an update to the software or
| configuration of critical systems must first be performed
| on a test system".
| tedk-42 wrote:
| Like everything, cheap, quick or good rule applies (pick 2).
|
| Software is pretty much always made cheaply and quickly. Even
| NASA will have b software blunders and have rockets explode mid
| flight.
| TiredOfLife wrote:
| The regulations were the reason the companies were running
| Crowdstrike in the first place.
| 0xbadcafebee wrote:
| I'm saying that a (different) regulation, standard, and
| inspection, should apply to the whole software bill of
| materials, as it relates to the critical-ness of the product.
| Like, if security is important, the security-critical
| components should be inspected/tested. That's how you build a
| building safely: the nails are built to a certain
| specification and the nail vendor signs off on that.
| pclmulqdq wrote:
| Everything that we know about CrowdStrike stinks of Knight
| Capital to me. A minor culture problem snowballed into complete
| dysfunction, eventually resulting in a company-ending bug.
| ForOldHack wrote:
| Knight Capitol:
|
| "$10 million a minute.
|
| That's about how much the trading problem that set off turmoil
| on the stock market on Wednesday morning is already costing the
| trading firm.
|
| The Knight Capital Group announced on Thursday that it lost
| $440 million when it sold all the stocks it accidentally bought
| Wednesday morning because a computer glitch. "
|
| Glitch. Oh...
|
| https://en.wikipedia.org/wiki/Therac-25
| 0cf8612b2e1e wrote:
| I do not work in finance, but surely every trading company
| has had an algorithm go wild at some point. Just becomes a
| matter of how fast someone can pull the circuit breaker
| before the expensive failure becomes public.
| worik wrote:
| > surely every trading company has had an algorithm go wild
| at some point.
|
| You would think so.
|
| Cynical me.
|
| But no. When money is at stake much more care is taken than
| when lives are at stake.
| pclmulqdq wrote:
| Shamelessly plugging my own blog post on this:
| https://specbranch.com/posts/knight-capital/
|
| The TL;DR of Knight is that Knight had several things go
| wrong at the same time, and had no circuit breaker for the
| problem that did not stop trading for the whole firm for
| the day. Most trading firms have had things go badly, but
| the holes in the Swiss cheese aligned for Knight (and they
| were larger than many other firms). This all comes from a
| sort of culture of carelessness.
| odyssey7 wrote:
| I always thought the Swiss cheese model was used to
| suggest that no one party could possibly be responsible
| for a bad thing that happened. Interesting to see the
| company's culture blamed for the cheese itself.
| pclmulqdq wrote:
| Personally, I think there are too many things in modern
| American society that involve diffusion of
| responsibility, presumably so that people avoid negative
| consequences. If you're going to suggest that a system
| gives 1/10th of the responsibility to 10 different
| people, the one who made the system is the enabler of
| that and IMO should suffer the consequences.
| odyssey7 wrote:
| The Swiss cheese model fits better as a rebuttal when the
| cheese comprises both the finger-pointer and the finger-
| pointee. Think: sure, our software had a bug that said up
| was down, but what about all of your own employees who
| used the software, had certifications, and should have
| known better than to accept its conclusions?
|
| Your usage, in assigning blame rather than diffusing it,
| was novel to me.
| bitcharmer wrote:
| We have circuit breakers for that very purpose. Everyone on
| the street does. It's just that theirs seems to have failed
| for some reason.
| pclmulqdq wrote:
| Theirs didn't fail, and they did have one. The circuit
| breaker they had that would have worked was a big red
| button that killed all of their trading processes, which
| would have meant spending the rest of the day figuring
| out and unwinding their positions.
|
| Ihey were unwilling to push that button in the short time
| they had. If you read the reports to the SEC or the
| articles about it, you will note that. The follow-ups
| recommended that all firms adopt a big red button that is
| less catastrophic.
| bitcharmer wrote:
| Gotcha, thanks for correcting me, I need to read up more
| about the incident.
| bb88 wrote:
| Most interesting quote in the article: "It was
| hard to get people to do sufficient testing sometimes," said
| Preston Sego, who worked at CrowdStrike from 2019 to
| 2023. His job was to review the tests completed by user
| experience developers that alerted engineers to bugs
| before proposed coding changes were released to customers. Sego
| said he was fired in February 2023 as an "insider
| threat" after he criticized the company's return to-work
| policy on an internal Slack channel.
|
| Okay clearly that company has a culture issue. Imagine
| criticizing a policy and then getting labeled "insider threat".
| nullvoxpopuli wrote:
| I'd like to clarify: that my job was also to educate,
| modernize, and improve developer velocity through tooling and
| framework updates / changes (impacting every team in my
| department (UX / frontend engineering)).
|
| Reviewing tests is part of PR review.
|
| --- and before anyone asks, this is my statement on CrowdStrike
| calling everyone disgruntled:
|
| "I'm not disgruntled.
|
| But as a shareholder (and probably more primarily, someone who
| cares about coworkers), I am disappointed.
|
| For the most part, I'm still mourning the loss of working with
| the UX/Platform team."
| bb88 wrote:
| I mourn the fact that your ex co-workers are still working
| for a shitty company.
| nullvoxpopuli wrote:
| The market for jobs isn't great, so i don't blame them.
|
| At the same time, i feel like big profit-chasing software
| companies are _all_ like how CrowdStrike is.
|
| Many may be in the same type of company, but situations
| have not arisen that reveal how leadership really feels
| about employees.
| Aeolun wrote:
| > Imagine criticizing a policy and then getting labeled
| "insider threat".
|
| Especially because that's incredibly dumb. A true insider
| threat would play nice while you find all your confidential
| data leaking.
| bb88 wrote:
| I mean, that's just insanely true. I think this is maybe the
| most dystopian company I've ever heard of so far.
| wesselbindt wrote:
| > return to work
|
| I know you're just quoting the phrase, but what a gross and
| dishonest way of phrasing "return to office". Implies working
| remotely doesn't count as work. Smacks of PR. Yuck.
| seanw444 wrote:
| And everybody gasped in surprise.
| tamimio wrote:
| I think the whole world knew that already.
| insane_dreamer wrote:
| > CrowdStrike disputed much of Semafor's reporting
|
| I expect some ex-employees to be disgruntled and present things
| in a way that makes CroudStrike look bad. That happens with every
| company.
|
| BUT, CrowdStrike has ZERO credibility at this point. I don't
| believe a word they say.
| Zigurd wrote:
| At some companies, like Boeing, the shorter list would be the
| gruntled employees.
| insane_dreamer wrote:
| > gruntled
|
| have never heard that word used is a non-negative way
| beng-nl wrote:
| Off-Topic, but do I have a story for you
|
| https://www.ling.upenn.edu/~beatrice/humor/how-i-met-my-
| wife...
| tsimionescu wrote:
| Fun linguistics fact, but gruntled as the antonym of
| disgruntled is a back-formation. The word disgruntled is a
| bit strange, in that it uses "dis-" not as a reversal
| prefix (such as in dissatisfied or dissimilar), but as an
| intensifier. The original "gruntle" was related to grunt,
| grunting, it was similar to "grumble", denoting the sounds
| an annoyed crowd might make. But this old sense of gruntle,
| gruntling, gruntled has not been used since the 16th
| century. And in the past century, people have started back-
| forming a new "gruntle" by analyzing "dis-gruntled" as
| using the more common meaning of "dis-".
|
| A similar use of dis- as an intensifier apparently happened
| in "dismayed" (here from an Old French verb, esmaier ,
| which meant to trouble, to disturb), and in "disturbed"
| (from Latin a word, turba, meaning turmoil). I haven't
| heard any one say they are "mayed" or "turbed", but people
| would probably see the same as "gruntled" if you used them.
| dbattaglia wrote:
| I've only heard it from Michael Scott: "Everyone here is
| extremely gruntled".
| chaps wrote:
| Worked on a team that deployed crowdstrike agents to organize
| and... Yeah. One of the biggest problems we had was that the
| daemon would log a massive amount of stuff... But had no config
| for it to stop or reduce it.
| st3fan wrote:
| Found out that the CrowdStrike Mac agent (Falcon) sends all your
| secrets from environment variables to their cloud hosted SIEM. In
| plain text.
|
| Anyone with access to your CS SIEM can search for GitHub, aws,
| etc creds. Anything your devs, ops and sec teams use on their
| Macs.
|
| Only the Mac version does this. There is no way to disable this
| behaviour or a way to redact things.
|
| Another really odd design decision. They probably have many many
| thousands of plain text secrets from their customers stored in
| their SIEM.
| x3n0ph3n3 wrote:
| Can you provide some more info on this? How do you know? Is
| this documented somewhere?
|
| I'm sure this is going to raise red-flags in my IT department.
| st3fan wrote:
| Ask them to search for the usual env var names like
| GITHUB_TOKEN or AWS_ACCESS_KEY_ID.
| skewer99 wrote:
| AKIDs... ugh. They'll be there if you use AWS + Mac.
|
| Again, the plaintext is the problem.
|
| These environment variables get loaded from the command line,
| scripts, etc. - CrowdStrike and all of the best EDRs also
| collect and send home all of that, but probably in an
| encrypted stream?
| zxexz wrote:
| I usually remote dev on an instance in a VPC because of
| crap like this. If you like terrible ideas (I don't use
| this except for debugging IAM stuff, occasionally), you can
| use the IMDS like you were an AWS instance by giving a
| local loopback device the link-local ipv4 address
| 169.254.169.254/32 and binding traffic on the instance's
| 169.254.169.254/32 port 80 to your lo's port 80, and a
| local AWS SDK will use the IAM instance profile of the
| instance you're connected to. I'll repeat, this is not a
| good idea.
| apimade wrote:
| Is this really a criticism? Because this has been the case
| forever with all security and SIEM tools. It's one of the
| reasons why the SIEM is the most locked down pieces of software
| in the business.
|
| Realistically, secrets alone shouldn't allow an attacker access
| - they should need access to infrastructure or a certificates
| in machines as well. But unfortunately that's not the case for
| many SaaS vendors.
| st3fan wrote:
| But why only forced on MacOS?
|
| I think some configurability would be great. I would like to
| provide an allow list or the ability to redact. Or exclude
| specific host groups.
|
| We all have different levels of acceptable risk
| btilly wrote:
| Conspiracy theory time. Because Apple is the only OS
| company that has reliably proven that it won't decrypt hard
| drives at government request.
| vagrantJin wrote:
| This is a true conspiracy .
| jordanb wrote:
| Seriously? Crowdstrike is obviously NSA just like
| Kaspersky is obviously KGB and Wiz is obviously Mossad.
| Why else are counties so anxious about local businesses
| not using agents made by foreign actors?
| smolder wrote:
| KGB is not even a thing. Modern equivalent is FSB, no?
| I'm skeptical. I don't think it's obvious that these are
| all basically fronts, as much as I'm willing to believe
| that IC tentacles reach wide and deep.
| iml7 wrote:
| It depends on the country it is in, it rejects the US
| government's request. But it fully complies with any
| request from the Chinese government
| EE84M3i wrote:
| I'd be interested to learn more about that.
|
| My mental model was that Apple provides backdoor
| decryption keys to China _in advance_ for devices sold in
| China /Chinese iCloud accounts, but that they cannot/will
| not bypass device encryption for China for devices sold
| outside of the country/foreign iCloud accounts.
| throwaway48476 wrote:
| The venn diagram of users who don't want the government
| to access their data and crowdstrike customers is two
| circles in different galaxies.
| xnyan wrote:
| It's probably being run on an enterprise-managed mac. The
| only person who can be locked out via encryption is the
| user.
| immibis wrote:
| The certificate private key is also a secret.
| meowface wrote:
| All SIEM instances certainly contain a lot of sensitive data
| in events, but I'm not sure if most agents forward all
| environment variables to a SIEM.
| hello_moto wrote:
| Agents don't just read env vars and send them to SIEM.
|
| There's a triggering action that caused the env vars to be
| used by another ... ehem... Process ... that any EDR
| software in this beautiful planet would have tracked.
| st3fan wrote:
| No it logs every command macOS runs or that you type in a
| terminal. Either directly or indirectly. From macOS
| internal periodic tasks to you running "ls".
| worik wrote:
| > Because this has been the case forever with all security
| and SIEM tools.
|
| Why?
|
| There is no need to send your environment variables.
| gruez wrote:
| Otherwise malware can hide in environment variables
| llm_trw wrote:
| Ok, suppose you're right.
|
| Why are they only doing it for macs then?
| st3fan wrote:
| It may depend a bit on your organization but I bet most
| folks using an EDR solution can tell you that Macs are
| probably very low on the list when it comes to malware.
| You can guess which OS you will spend time on every day
| ...
| llm_trw wrote:
| So because macs are not the targets of malware ... we're
| locking them down tighter than any other system?
| namaria wrote:
| No, see, they're leveling the playing field by storing
| all secrets they find on macs in plaintext
| batch12 wrote:
| I don't think this is limited to just Macs based on my
| experience with the tool. It also sends command line
| arguments for processes which sometimes contain secrets.
| The client can see everything and run commands on the
| endpoints. What isn't sent automatically can be collected
| for review as needed.
| st3fan wrote:
| It does redact secrets passed as command line arguments.
| This is what makes it so inconsistent. It does recognize
| a GitHub token as an argument and blanks it out before
| sending it. But then it doesn't do that if the GitHub
| token appears in an env var.
| worik wrote:
| They do not need to take the data off the computer to do
| that
| cma wrote:
| Malware can hide in the frame buffer at modern
| resolutions. They could keep a full copy of it and each
| frame transition too.
| chelmzy wrote:
| Most sane SIEM engineers would implement masking for this.
| Not sure if CS still uses Splunk but they did at one point.
| No excuse really.
| wbl wrote:
| What do you think grants the access to the infra or ability
| to get a certificate?
| ants_everywhere wrote:
| Ideally secrets never leave secure enclaves and humans at the
| organization can't even access them.
|
| It's totally insane to send them to a remote service
| controlled by another organization.
| Natsu wrote:
| I mean it's right there in the name. They're not really
| secrets any longer if you're sharing them in plaintext with
| another company.
| cj wrote:
| Essentially, it's straddling two extremes:
|
| 1) employees are trusted with secrets, so we have to audit
| that employees are treating those secrets securely (via
| tracking, monitoring, etc)
|
| 2) we don't allow employees to have access to secrets
| whatsoever, therefore we don't need any auditing or
| monitoring
| ants_everywhere wrote:
| You give employees the ability to _use_ the secrets, and
| that usage is tracked and audited.
|
| It works the same way for biometrics like face unlock on
| mobile phones
| stogot wrote:
| Exporting to a SIEM does not correlate to either of those
| extremes. It's stupidity and makes auditing worse
| cj wrote:
| SIEM = Security Information & Event Management
|
| Factually, it is necessary for auditing and absolutely
| correlates with the extreme of needing to monitor the
| "usage" of "secrets".
|
| In a highly auditable/"secure" environment, you can't
| give secrets to employees with no tracking of when the
| secrets are used.
| davorak wrote:
| > In a highly auditable/"secure" environment, you can't
| give secrets to employees with no tracking of when the
| secrets are used.
|
| This does not seem to require regularly exporting secrets
| form the employee's machines though. Which is the main
| complaint I am reading. You would log when the secret is
| used to access something, presumably remote to the users
| machine.
| halayli wrote:
| That's far from factual and you are making things up. You
| don't need to send the actual keys to a siem service to
| monitor the usage of those secrets. You can use a
| cryptographic hash and send the hash instead. And they
| definitely don't need to dump env values and send them
| all.
|
| Sending env vars of all your employees to one place
| doesn't improve anything. In fact, one can argue the
| company is now more vulnerable.
|
| It feels like a decision made by a clueless school
| principle, instead of a security expert.
| Too wrote:
| In a highly secure environment, don't use long lived
| secrets in the first place. You use 2FA and only give out
| short lived tokens. The IdP (ID Provider) refreshing the
| token for you provides the audit trail.
|
| Repeat after me: Security is not a bolt on tool.
| defrost wrote:
| More like a triple lock steel core reinforced door laying
| on its side in an open field?
|
| Good start, might need a little more work around the
| edges.
| smolder wrote:
| A secure environment doesn't involve software
| exfiltrating secrets to a 3rd party. It shouldn't even
| centralize secrets in plaintext. The thing to collect and
| monitor is behavior: so-and-so logged into a dashboard
| using credentials user+passhash and spun up a server
| which connected to X Y and Z over ports whatever... And
| those monitored barriers should be integral to an
| architecture, such that every _behavior_ in need of
| auditing is provably recorded.
|
| If you lean in the direction of keylogging all your
| employees, that's not only lazy but ineffective on
| account of the unnecessary noise collected, and it's
| counterproductive in that it creates a juicy central
| target that you can hardly trust anyone with. Good
| auditing is minimally useful to an adversary, IMO.
| userbinator wrote:
| _employees are trusted with secrets, so we have to audit
| that employees are treating those secrets securely_
|
| IMHO needing to be monitored constantly is not being
| "trusted" by any sense of the word.
| fragmede wrote:
| I can trust you enough to let you borrow my car and not
| crash it, but still want to know where my car is with an
| Airtag.
|
| Similarly employees can be trusted enough with access to
| prod, while the company wants to protect itself from
| someone getting phished or from running the wrong "curl |
| bash" command, so the company doesn't get pwned.
| cortesoft wrote:
| > Ideally secrets never leave secure enclaves and humans at
| the organization can't even access them.
|
| Right, but doesn't that mean there is no risk from sending
| employee laptop ENV variables, since they shouldn't have
| any secrets on their laptops?
| AmericanChopper wrote:
| Keeping secrets and other sensitive data out of your SIEM is
| a very important part of SIEM design. Depending on what
| you're dealing with you might want to tokenize it, or redact
| it, but you absolutely don't want to don't want to just
| ingest them in plaintext.
|
| If you're a PCI company then ending up with a credit card
| number in your SIEM can be a massive disaster. Because you're
| never allowed to store that in plaintext, and your SIEM data
| is supposed to be immutable. In theory that puts you out of
| compliance for a minimum of one year with no way to fix it,
| in reality your QSAs will spend some time debating what to do
| about it and then require you to figure out some way to
| delete it, which might be incredibly onerous. But I have no
| idea what they'd do if your SIEM somehow became full of
| credit card numbers, that probably is unfixable...
| ronsor wrote:
| > But I have no idea what they'd do if your SIEM somehow
| became full of credit card numbers, that probably is
| unfixable...
|
| You'd get rid of it.
| AmericanChopper wrote:
| If that's straightforward then congratulations, you've
| failed your assessment for not having immutable log
| retention.
|
| They certainly wouldn't let you keep it there, but if
| your SIEM was absolutely full of cardholder data, I
| imagine they'd require you to extract ALL of it, redact
| the cardholder data, and the import it to a new instance,
| nuking the old one. But for a QSA to sign off on that
| they'd be expecting to see a lot of evidence that
| removing the cardholder data was the only thing you
| changed.
| Aeolun wrote:
| If my security software exfiltrates my secrets _by design_ ,
| I'm just going to give up on keeping anything secure now.
| benreesman wrote:
| Arbitrary bad practices as status quo without criticism, far
| from absolving more of the same, demand scrutiny.
|
| Arbitrarily high levels of market penetration by sloppy
| vendors in high-stakes activities, far from being an argument
| for functioning markets, demand regulation.
|
| Arbitrarily high profile failures of the previous two, far
| from indicating a tolerable norm, demand criminal
| prosecution.
|
| It is recently that this seemingly ubiquitous vendor, with
| zero-day access to a critical kernel space that any red team
| adversary would kill for, said "lgtm shipit" instead of
| running a test suite with consequences and costs (depending
| on who you listen to) ranging from billions in lost treasure
| to loss of innocent life.
|
| We know who fucked up, have an idea of how much corrupt-ass
| market failure crony capitalism could admit such a thing.
|
| The only thing we don't know is how much worse it would have
| to be before anyone involved suffers any consequences.
| kmacdough wrote:
| "Oh, but our system is _so secure,_ you don 't need other
| layers."
| lolinder wrote:
| > Realistically, secrets alone shouldn't allow an attacker
| access - they should need access to infrastructure or a
| certificates in machines as well.
|
| This isn't realistic, it's idealistic. In the real world
| secrets are enough to grant access, and even if they weren't,
| exposing one half of the equation in clear text by design is
| still _really bad_ for security.
|
| Two factor auth with one factor known to be compromised is
| actually only one factor. The same applies here.
| skewer99 wrote:
| The monitoring and collection isn't the problem, that's what
| modern EDR does - collect, analyze, compare, and do statistics
| on all of the things.
|
| The plaintext part is not okay.
| notepad0x90 wrote:
| Thank you, that's a sound perspective, but it is the
| responsibility of the security staff who deploy EDRs like
| Crowdstrike to scrub any data at ingestion time into their
| SIEM. but within CS's platform, it makes little sense to talk
| about scrubbing, since CS doesn't know what you want scrubbed
| unless it is standardized data forms (like SSNs,credit
| cards,etc..).
|
| Another way to look at it is, the CS cloud environment is
| effectively part of your environment. the secrets can get
| scrubbed, but CS still has access to your devices, they can
| remotely access them and get those secrets at any time
| without your knowledge. that is the product. The security
| boundary of OP's mac is inclusive of the CS cloud.
| st3fan wrote:
| Unfortunately the software doesn't allow for scrubbing or
| redacting to be configured. Those features simply do not
| exist.
| notepad0x90 wrote:
| for their own cloud, yeah, you basically accept their
| cloud as an extension of your devices. but the back-end
| they use(d?), Splunk, does have scrubbing capability they
| can expose to customers, if actual customers requested
| it.
|
| In reality, you can take steps to prevent PII from being
| logged by Crowdstrike, but credentials are too non-
| standard to meaningfully scrub. It would be an exercise
| in futility. If you trust them to have unrestricted
| access to the credential, the fact that they're
| inadvertently logging it because of the way your
| applications work should not be considered an increase in
| risk.
| SoftTalker wrote:
| Secrets in clear text in environment variables is never a good
| idea though.
| dchftcs wrote:
| There are secrets like passwords, but there are also secrets
| like "these are the parameters for running a server for our
| assembly line for X big corp".
| brundolf wrote:
| Do you have a source?
| madcadmium wrote:
| Does it also monitor the contents of your copy/paste buffer? It
| would scoop up a ton of privileged data if so.
| hiddencost wrote:
| This kind of information seems like it should have a CVE and a
| responsible disclosure process.
|
| Kidding, mostly, but wow that's a hell of a vulnerability.
| notepad0x90 wrote:
| It is not a vulnerability, you literally pay for this
| feature. I really don't want to defend Crowdstrike but HN
| keeps making it hard not to.
| hiddencost wrote:
| Storing secrets in unsecured environments in plaintext is
| literally a vulnerability.
|
| One of the most famous examples can be seen in the NSA
| slide at the top of this article:
|
| https://www.washingtonpost.com/world/national-
| security/nsa-i...
| notepad0x90 wrote:
| the security tools' storage system is always considered a
| secured environment.
| j4coh wrote:
| Without even having to secure it?
| throw_a_grenade wrote:
| Yes, but also No.
|
| So there's this thing called "Threat model" and it
| includes some assumptions about some moving parts of the
| infra, and it very often includes assertion that a
| particular environment (like IDS log, signing infra
| surrounding HSM etc.) is "secure" (they mean outside of
| the scope of that particular threat model). So it often
| gets papered over, and it takes some reflex to say "hey,
| how we will secure that other part". There needs to be
| some conciousnes about it, because it's not part of this
| model under discussuon, so not part of the agenda of this
| meeting...
|
| And it gets lost.
|
| That's how shit happens in compliance-oriented security.
| notepad0x90 wrote:
| that's what EDRs do. anyone with access to your SIEM or CS data
| should also be trusted with response access (i.e.: remotely
| access those machines).
|
| If you want this redacted, it is a SIEM functionality not
| Crowdstrike's. Depends on the SIEM but even older generation
| SIEMs have a data scrubbing feature.
|
| This isn't a Crowdstrike design decision as you've put it.
| _any_ endpoint monitoring too, including the free and open
| source ones behave just as you described. You won 't just see
| env vars from macs but things like domain admin creds and PKI
| root signing private keys. If you give someone access to an
| EDR, or they are incident responders with SIEM access, you've
| trusted them with full -- yet, auditable and monitored --
| access to that deployment.
| pmlnr wrote:
| Don't downvote this, this is the sad truth.
| Fnoord wrote:
| Sure, storage. Networking though? SIEMs receive and send data
| unencrypted? They should not. By sending the data in plain
| text you open up an attack surface to anyone sniffing the
| network.
| notepad0x90 wrote:
| Crowdstrike like many EDRs uses mutually authenticated TLS
| to send the data over the network to their cloud.
| jgtrosh wrote:
| Did somebody say GDPR?
| pmlnr wrote:
| Companies believe GDPR doesn't apply to their human
| resources.
| riedel wrote:
| They have IT policies to make sure it largely does not
| apply. Even in our policy officially any personal use is
| forbidden. Funnily there is also agreement with our
| employee board, that any personal use will not be
| sanctioned. So guess what happens. This done to circumvent
| not only GPR but also TTDSG in germany (which is harsher on
| 'spying' as it applies to telecoms. For any 'officially'
| gathered personal information though typical very specific
| agreements with our employee board exist though (reporting
| of illness, etc). Wonder how such information which is also
| sensitive in a workplace is handled. Also I see those
| systems used in hospitals etc, if other peoples data is
| pumped through this systems GDPR definitively applies and
| auditors may find it (I only know such auditing in finance
| though). In the future NIS2 will also apply so exactly the
| people that use such systems will be put under additional
| scrutiny. Hope this triggers also some auditing of the
| systems used and not just the use of more of such systems.
| unilynx wrote:
| What would you expect the GDPR to say? This is allowed as
| long as the GDPRs requirements are followed
| raverbashing wrote:
| Not applicable. It is not related to personal data
| philshem wrote:
| SIEM = Security information and event management
|
| https://en.wikipedia.org/wiki/Security_information_and_event...
| debarshri wrote:
| It is a common in the world of SIEM. Logs with secrets and PII
| data is often sent and stays in the SIEM for years until an
| incident occurs.
| MasterIdiot wrote:
| Having worked for a SIEM vendor, I can say that all security
| software is extremely invasive, and most security people can
| probably track every action you make on company-issued devices,
| and that includes HTTPS decryption.
| firtoz wrote:
| Reminds me of a guy I know openly bragging that he can watch
| all of his customers who installed his company's security
| cameras. I won't reveal his details but just imagine any
| cloud security camera company doing the same and you would
| probably be right.
|
| I guess it's pretty much the same principle.
| blablabla123 wrote:
| Yeah the question is always if the cure is better than the
| disease. I'm quite ambivalent on this. On the one hand I tend
| to agree with the "Anti AV camp" that a sufficiently
| maintained machine can do well when following best practices.
| Of course that includes SIEM which can also be run on-premise
| and doesn't necessarily have to decrypt traffic if it just
| consumes properly formatted logs.
|
| On the other hand there was e.g. WannaCry in 2017 where
| 200,000 systems across 150 countries running Windows XP and
| other unsupported Windows Server versions had crypto miners
| installed. It shows that companies world-wide had trouble
| properly maintaining the life cycle of their systems. I think
| it's too easy to only accuse security vendors of quality
| problems.
| batch12 wrote:
| Anyone with the right level of access to your Falcon instance
| can run commands on your endpoints (using RTR) and collect any
| data not already being collected.
| avree wrote:
| ""Speed was the most important thing," said Jeff Gardner, a
| senior user experience designer at CrowdStrike who said he was
| laid off in January 2023 after two years at the company. "Quality
| control was not really part of our process or our conversation."
|
| Their 'expert' on engineering process is a senior UX designer?
| Somehow, I doubt they were very close to the kernel patch
| deployment process.
| acdha wrote:
| They probably weren't, but that still speaks to their general
| culture and is compatible with what we know about their kernel
| engineering culture (limited testing, no review, no use of
| common fail safe mechanisms).
| hello_moto wrote:
| A company can have different business units with different
| culture/mentality.
|
| I bet my ass anyone working in low-level code don't ship the
| way you do in Cloud.
| acdha wrote:
| > I bet my ass anyone working in low-level code don't ship
| the way you do in Cloud.
|
| Their technical report says otherwise - and we know they
| didn't adopt the common cloud practices of doing real
| testing before shipping or having a progressive deployment.
| esperent wrote:
| > is compatible with what we know
|
| In other words, it confirms our biases and we're willing to
| accept it at face value despite there being only a single
| anecdotal piece of evidence.
| acdha wrote:
| It sounds like you might want to read their technical
| report. That's neither anecdotal nor a single point, and it
| showed a pretty large gap in engineering leadership with
| numerous areas well behind the state of the art.
|
| That's why I said it was compatible: both these former
| employees and their own report showed an emphasis on
| shipping rapidly but not the willingness to invest serious
| money in the safeguards needed to do so safely. If you want
| to construct another theory, feel free to do so.
| panic wrote:
| Why would it matter? The absolute worst case scenario happened
| and their stock is still up 50% YoY, beating the S&P 500.
| 0cf8612b2e1e wrote:
| I thought you were joking. The stock market is incredible.
|
| Everyone must realize that crowdstrike has a captive audience
| with no alternatives that can meet corporate compliance.
| intelVISA wrote:
| Can't think of a bigger flex of how locked-in their market
| share is.
|
| On the plus side this should spur some disruptors into gear,
| assuming VCs are willing to pivot from wasting money funding
| LLM wrappers.
| hyperpape wrote:
| It's down 30% since the incident, and flat since 3 years ago.
|
| If it runs up a huge amount in the first half of the year and
| then the incident knocks off 30% of their market, that still
| means the incident was really bad.
| hello_moto wrote:
| Their stock has always been volatile but you can't ignore the
| fact that it hasn't been that bad after the incident.
| goralph wrote:
| What are some alternatives to CrowdStrike?
| taspeotis wrote:
| Personal: Nothing - Windows Defender is built into Windows.
|
| Business: Nothing - Windows Defender Advanced Threat Protection
| is built into the higher Microsoft 365 license tiers.
|
| It amazes me people chose to pay money to have all their PCs
| bluescreen.
| neverrroot wrote:
| This is a good example of very limited thinking.
| Aeolun wrote:
| mdatp is also a virus. So slow...
| taspeotis wrote:
| It can record some telemetry to help you understand why
| it's slow: https://learn.microsoft.com/en-us/defender-
| endpoint/troubles...
| digitalsushi wrote:
| if you had used 'some' before 'people' i could agree but some
| industries have to use a siem or they can be fined, so, i
| mean if there's a list of siems that are definitely not going
| to ever crash by messing around in the kernel lets get a list
| going
| taspeotis wrote:
| Microsoft Sentinel seems like a pretty unlikely candidate
| for SIEM to crash every machine it's receiving data from.
| qaq wrote:
| large orgs want something that will run across all of their
| fleet so linux servers, Macs etc.
| taspeotis wrote:
| Linux: https://learn.microsoft.com/en-us/defender-
| endpoint/microsof...
|
| macOS: https://learn.microsoft.com/en-us/defender-
| endpoint/microsof...
|
| It does iOS and Android too.
|
| Again, if you're an organisation big enough to care about
| single-pane-of-glass-monitoring you probably already have
| access to this via the Microsoft 365 license tier you're
| on.
| worik wrote:
| > What are some alternatives to CrowdStrike?
|
| In house competence
| rnts08 wrote:
| But then you can't blame anyone else when shit hits the fan!
| Isn't that what you're really paying for with EDR? No one is
| safe from a targeted attack, regardless of software.
|
| /s
| duckmysick wrote:
| Insurers often require to have Endpoint Detection and
| Response for all the devices, from a third-party. In-house
| often won't cut it, even if it makes more practical sense.
| strunz wrote:
| Carbon Black was, though now they're owned by Broadcom and
| folded into Symantec
| TillE wrote:
| Everything that describes itself as "endpoint security".
| iamhamm wrote:
| SentinelOne
| ramesh31 wrote:
| If their (or your) shop is anything like mine, its' been a
| constant whittling of ancillary support roles (SDET, QA, SRE) and
| a shoving of all of the above into the sole responsibility of
| devs over the last few years. None of this is surprising at all.
| nittanymount wrote:
| does it have competitors ?
| xyst wrote:
| Switch off CrowdStrike junk. Those companies renewing contracts
| with them have idiots for leaders.
|
| Many competing platforms that can be a drop in placement for
| ClownStrike.
| paulcole wrote:
| Well if they say that QA was part of the process then they'll
| look like idiots because they sucked at the process.
|
| Don't find this particularly interesting news.
| hinkley wrote:
| I have only just begun to consider this question: when does risk
| taking become thrill seeking?
|
| At some point you go past questions of laziness or discipline and
| it becomes a neurosis. Like an addiction.
| Cyclone_ wrote:
| Not justifying what they did with qc, but qc is missing from
| quite a few places in software development that I've been apart
| of. People might get the impression from the article that every
| software project is well tested, whereas in my experience most
| are rushed out.
| Borborygymus wrote:
| Exactly.
|
| Much of the discourse around this topic has described ideal
| testing and deployment practise. Maybe it's different in
| Silicon Valley or investment banks, but for the sorts of
| companies I work for (telco mostly) things are very far from
| that ideal.
|
| My view of he industry is one of shocking technical ineptitude
| from all but a minority of very competent people who actually
| keep things running... Of management who prioritize short term
| cost reduction over quality at every opportunity, leading to
| appalling technical debt and demoralized, over-worked staff who
| rapidly stop giving a damn about quality, because speaking out
| about quality problems is penalized.
| padjo wrote:
| I've worked for several multi billion dollar software
| companies. None of them had a dedicated QA function by design.
| Everything is about moving fast. That culture is ok if you're
| making entertainment software or low criticality business
| software. It's a very bad idea for critical software.
| Unfortunately the "move fast" attitude has metastasised to
| places where it has no place .
| mattfrommars wrote:
| Side effect of the old adage, "move fast, fail fast"?
| Timber-6539 wrote:
| Doesn't matter now. CRWD didn't go to zero. Meaning they get the
| chance to do this again.
| noisy_boy wrote:
| Would be interesting to know from their employees if there have
| been any tangible changes in the blind pursuit of velocity,
| better QA etc in the aftermath of this fiasco.
| jokoon wrote:
| We need laws and regulations on software the same way we have for
| toys, cars, airplanes, boats, buildings.
|
| This silicon valley libertarian non sense needs to stop.
| jrm4 wrote:
| Does anyone have a logical reason why this company should _not_
| be sued into oblivion?
| superposeur wrote:
| Yes, because in point of fact this company is the best at what
| it does -- preventing security breaches. The outage --
| disruptive as it was -- was not a breach. This elemental fact
| is lost amidst all the knee jerk HN hate, but goes a long way
| toward explaining why the stock only took a modest hit.
| hun3 wrote:
| That's a somewhat narrow definition of "security."
|
| The 3rd component of the CIA triad is often overlooked, yet
| the availability is what makes the _protected_ asset--and,
| transitively, the _protection_ itself--useful at the first
| place.
|
| The disruption is effectively a Denial of Service.
| nailer wrote:
| It's a UX designer. I don't particularly like crowdstrike, but
| this person will know very little about their kernel Drivers.
| ricardobayes wrote:
| I believe one of the biggest bad trends of the software industry
| as a whole is cutting down on QA/testing effort. A buggy product
| is almost always an unsuccessful one.
| breadwinner wrote:
| Blame Facebook and Google for that. They became successful
| without QA engineers, so the rest of the industry decided to
| follow suit in an effort to stay modern.
| bitcharmer wrote:
| Another company that got MBA-ified
| sersi wrote:
| Crowdstrike was heavily pushed on us at a previous company both
| for compliance reason by some of our clients (BCG were the ones
| pushing us to use crowdstrike) and from our liability insurance
| company.
|
| It was really an uphill battle to convince everyone not to use
| Crowdstrike. Eventually I managed to but after many meetings
| where I had to spend a significant amount of time convincing
| different shareholders. I'm sure a lot of people just fold and go
| with them.
| mikeocool wrote:
| Curious -- did you go with a different EDR solution? Or were
| you able to convince people not to roll one out at all?
| wesselbindt wrote:
| What made you unwilling to use CS at the time?
| manvillej wrote:
| anyone feel like this and Boeing sound remarkably similar?
|
| Its almost like there is a lesson for executives here. hmmmm
| bitcharmer wrote:
| The only lesson for these people is loss of bonuses. This will
| keep happening for as long as golden parachutes are a thing.
| wesselbindt wrote:
| How can we get rid of golden parachutes?
| bmitc wrote:
| Has anyone _actually_ worked at a place where quality control was
| treated as important? I wouldn 't consider this exactly
| surprising.
| 6h6n56 wrote:
| Nope. Did everyone forget the tech motto "move fast and break
| things"? Where is the room for quality control in that
| philosophy?
|
| Corps won't even put resource into anti-fraud efforts if they
| believe the millions being stolen from their bottom line isn't
| worth the effort. I have seen this attitude working in FAANGS.
|
| None of this will change until tech workers stop being sadists
| and actually unionize.
| sudosysgen wrote:
| Yes, at a trading company, where important central systems had
| a multiweek testing process (unless the change was marked as
| urgent, in which case it was faster) with a dedicated team and
| a full replica environment which would replay historical
| functions 1:1 (or in some cases live), and every change needed
| to have an automated rollback process. Unsurprising since it
| directly affects the bottom line.
| bmitc wrote:
| Very interesting. Thanks for sharing.
|
| > every change needed to have an automated rollback process
|
| How did you accomplish that?
| m3047 wrote:
| Yes. It was a manufacturing facility and since the products
| were photosensitive the entire line operated in total darkness.
| It was two months before they turned the lights on and I could
| see what I was programming for.
|
| This was the first place I saw standups. [Edit: this was the
| 1990s] They were run by and for the "meat", the people running
| the line. "Level 2" only got to speak if we were blocked, or to
| briefly describe any new investigations we would be
| undertaking.
|
| Weirdly (maybe?) they didn't drug test. I thought of all the
| places I've worked, they would. But they didn't. They were
| firmly committed to the "no SPOFs" doctrine and had a "tap out"
| policy: if anyone felt you were distracted, they could "tap you
| out" for the day. It was no fault. I was there for six months
| and three or four times I was tapped out and (after the first
| time, because they asked what I did with my time off the first
| time) told to "go climb a rock". I tapped somebody out once,
| for what later gossip suggested was a family issue.
| insane_dreamer wrote:
| I haven't worked there but I would presume that systems running
| nuclear reactors or ICBM launchers have a strong emphasis on
| QC.
| hitekker wrote:
| I was surprised by how dismissive these comments are. Former
| staff members, engineers included, are claiming that their former
| company's unsafe development culture contributed to a colossal
| world-wide outage & other previous outages. These employee's
| allegations ought to be seen as credible, or at least as
| informative. Instead, many seem to be attacking the UX designer
| commenting on 'Quality control was not part of our process'.
|
| My guess is that people are identifying with sentence said just
| before: "Speed [of shipping] is everything." Aka "Move fast and
| break things."
|
| The culture described by this article must mirror many of our
| lived experiences. The pure pleasure of shipping code, putting
| out fires, making an impact (positive or negative)... and then
| leaving it to the next engineers & managers to sort out, ignoring
| the mess until it explodes. Even when it does, no one gets blamed
| for the outage and soon everyone goes back to building features
| that get them promoted, regardless of quality.
|
| Through that ZIRP light, these process failures must look like a
| feature, not a bug. The emphasis on "quality" must also look like
| annoying roadblocks in the way of having fun on the customer's
| dime.
| wesselbindt wrote:
| There's folks out there who enjoy putting out proverbial fires?
| I find rework like that quite frustrating
| MichaelZuo wrote:
| Well there are a handful of expert consultants who do, since
| they charge an eye watering price per hour for putting out
| fires.
| hitekker wrote:
| Absolutely. Some people are born firefighters. Nothing wrong
| with that.
|
| I once worked with a senior engineer who loved running
| incidents. He felt it was _real_ engineering. He loved
| debugging thorny problems on a strict timeline, getting every
| engineer in a room and ordering them about, while also
| communicating widely to the company. Then, there 's the rush
| of the all-clear and the kudos from stakeholders.
|
| Specific to his situation, I think he enjoyed the inflated
| ownership that the sudden urgency demanded. The system we
| owned was largely taken for granted by the org; a dead-end
| for a career. Calling incidents was a good way to get
| visibility at low-cost, i.e., no one would follow-up on our
| postmortem action items.
|
| It eventually became a problem, though, when the system we
| owned was essentially put into maintenance mode, aka zero
| development velocity. Then I estimate (balancing for other
| variables) the rate the senior engineer called an incident
| for not-incidents went up by 3x...
| oooyay wrote:
| That's called hero culture and there's definitely something
| wrong with it.
| wesselbindt wrote:
| I agree that enjoying firefighting is not inherently
| harmful. However, the situation you describe afterward irks
| me in some way I can't quite put my finger on. A lot of
| words (toxic, dishonest, marketing, counterproductive, bus
| factor) come to mind, but none of them quite fit.
| jamesmotherway wrote:
| Some people rise to the occasion during crises and find it
| rewarding. There's a lot of pop science around COMT (the
| "warrior gene" associated with stress resilience), which I
| take with a grain of salt. There does seem to be something
| there, though, and it overlaps with my personal experience
| that many great security operations people tend to have ADHD
| traits.
| 1000100_1000101 wrote:
| I've volunteered to fight a share of fires from people who
| check things in untested, change infrastructure randomly,
| etc.
|
| What I've learned is that fixing things for these people (and
| even having entire teams fixing things for weeks) just leads
| to a continued lax attitude to testing, and leaving the
| fallout for others to deal with. To them, it all worked out
| in the end, and they get kudos for rapidly getting a solution
| in place.
|
| I'm done fixing their work. I'd rather work on my own tasks
| than fix all the problems with theirs. I'm strongly
| considering moving on, as this has become an entrenched
| pattern.
| righthand wrote:
| Former QA engineer here, and can confirm quality is seen as an
| annoying roadblock in the way of self-interested workers,
| disguised as in the way of having fun on the customers dime.
|
| My favorite repeated reorg strategy over the years is "that we
| will train everyone in engineering to be hot swappable in their
| domains". Talk about spinning wheels.
| ClickedUp wrote:
| This is not a game. I would normally agree but not when it
| comes to low-level kernel drivers. They're a cyber security
| company making it even worse.
|
| Not very long ago we had this client who ordered a custom high
| security solution (using a kernel driver). I can't reveal too
| much but basically they had this offline computer running this
| critical database and they needed a way to account for every
| single system call to guarantee that any data could have not
| been changed without the security system alerting and logging
| the exact change. No backups etc were allowed to leave the
| computer ever. We were even required to check ntdll (this was
| on Windows) for hooks before installing the driver on-site &
| other safety precautions. Exceptions, freezes or a deadlock? No
| way. Any system call missed = disaster.
|
| We took this seriously. Whenever we made a change to the driver
| code we had to re-test the driver on 7 different computers (in-
| office) running completely different hardware doing a set test
| procedure. Last test before release entailed an even more
| extensive test procedure.
|
| This may sound harsh but CrowdStrike are total amateurs, always
| been. Besides, what have they contributed to the cyber security
| community? - Nothing! Their research are at a level of a junior
| cyber security researcher. They are willing to outright lie and
| jump to wild conclusions which is very frowned upon in the
| community. Also heard others comment on how CS really doesn't
| really fit the mold of a standard cyber security company.
|
| Nah, CS should take a close look at true professional companies
| like Kaspersky and Checkpoint; industry leaders who've created
| proven top notch security solutions (software/services) but not
| least actually contributed their valuable research to the
| community for free, catching zero-days, reporting them before
| no one even had a chance of exploiting them.
|
| They deserve some criticism.
| musicale wrote:
| I'm don't Kaspersky and Checkpoint either. But CS should exit
| the market.
| addled wrote:
| Yesterday morning I learned that someone I was acquainted with
| had just passed away and the funeral is scheduled for next week.
|
| They recently had a stroke at home just days after spending over
| a month in the hospital.
|
| Then I remembered that they were originally supposed to be
| getting an important surgery, but it was delayed because of the
| CrowdStrike outage. It took weeks for the stars to align again
| and the surgery to happen.
|
| It makes me wonder what the outcome would have been if they had
| gotten the surgery done that day, and not spent those extra weeks
| in the hospital with their condition and stressing about their
| future?
| oehpr wrote:
| I appreciate your post here and I'm glad you shared, because
| it's an example of a distributed harm. One of millions to shake
| out of this incident, that doesn't have a dollar figure, so it
| doesn't really "count".
|
| To illustrate:
|
| If I were to do something horrible like kick a 3 year olds knee
| out and cripple them for life, I would be rightly labeled a
| monster.
|
| But If I were to say... advocate for education reform to push
| American Sign Language out of schools, so that deaf children
| grow up without a developmental language? We don't have words
| for that, and if we did, none of them would get near the
| cumulative scope and harm of that act.
|
| We simply do not address distributed harms correctly. And a big
| part of it is that we don't, we _can 't_, see all the tangible
| harms it causes.
| namdnay wrote:
| Not to defend Crowdstrike in any way, but it's a bit unfair to
| only look at the downside. What if his hospital hadn't bought
| an antivirus, and got hit by ransomware?
| SlightlyLeftPad wrote:
| Just another example of technical leadership being completely
| irresponsible and another example of tech companies prioritizing
| the wrong things. As a security company, this completely blows
| their credibility. i'm not convinced they learned anything from
| this and don't expect this effect to change anything. This is a
| culture issue, not a technical one. One RCA isn't going to change
| this.
|
| Reliability is a critical facet of security from a business
| continuity standpoint. Any business still using crowdstrike is
| out of their mind.
___________________________________________________________________
(page generated 2024-09-14 23:01 UTC)