hngopher.com

       [HN Gopher] Initial details about why CrowdStrike's CSAgent.sys ...
       ___________________________________________________________________
        
       Initial details about why CrowdStrike's CSAgent.sys crashed
        
       Author : pilfered
       Score  : 466 points
       Date   : 2024-07-21 00:17 UTC (22 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | blirio wrote:
       | So is unmapped address another way of saying null pointer?
        
         | two_handfuls wrote:
         | It's an invalid pointer yes, but it doesn't say whether it's
         | null specifically.
        
           | blirio wrote:
           | Oh wait, I just remembered null is normally 0 in C and C++.
           | So probably not that if it is not 0.
        
             | taspeotis wrote:
             | What? If you have a null pointer to a class, and try to
             | reference the member that starts 156 bytes from the start
             | of the class, you'll deference 0x9c (0 + 156)
        
               | emmelaich wrote:
               | Strangely, not necessarily on every implementation on
               | every processor.
               | 
               | It's not guaranteed that NULL is 0.
               | 
               | Still, I don't think you'd find a counterexample in the
               | wild these days.
        
             | chongli wrote:
             | NULL isn't always the integer 0 in C. It's implementation-
             | defined.
        
               | loeg wrote:
               | In every real world implementation anyone cares about,
               | it's zero. Also I believe it is defined to compare equal
               | to zero in the standard, but don't quote me on that.
        
               | tzs wrote:
               | > Also I believe it is defined to compare equal to zero
               | in the standard, but don't quote me on that.
               | 
               | That's true for the literal constant 0. For 0 in a
               | variable it is not necessarily true. Basically when a
               | literal 0 is assigned to a pointer or compared to a
               | pointer the compiler takes that 0 to mean whatever bit
               | pattern represents the null pointer on the target system.
        
             | cmpxchg8b wrote:
             | If you have a page mapped at address 0, accessing address 0
             | is valid.
        
           | cratermoon wrote:
           | Looks like a null pointer error to me
           | https://www.youtube.com/watch?v=pCxvyIx922A
        
             | jeffbee wrote:
             | "Attempt to read from address 0x9c" doesn't strike me as
             | "null pointer". It's an invalid address and it doesn't
             | really matter if it was null or not.
        
               | GeneralMayhem wrote:
               | 0x9c (156 dec) is still a very small number, all things
               | considered. To me that sounds like attempting to access
               | an offset from null - for instance, using a null pointer
               | to a struct type, and trying to access one of its member
               | fields.
        
               | Aloisius wrote:
               | Could just as easily be accessing an uninitialized
               | pointer, especially given there is a null check
               | immediately before.
        
               | Dwedit wrote:
               | 9C means that it's a NULL address plus some offset of 9C.
               | Like a particular field of a struct.
        
               | loeg wrote:
               | It is pretty common for null pointers to structures to
               | have members dereferenced at small offsets, and people
               | usually consider those null dereferences despite not
               | literally being 0. (However, the assembly generated in
               | this case does not match that access pattern, and in fact
               | there was an explicit null check before the dereference.)
        
               | jmb99 wrote:
               | As an example to illustrate the sibling comments'
               | explanations:
               | 
               | int *array = NULL
               | 
               | int position = 0x9C
               | 
               | int a = *(array[pos]) //equivalent to *(array + 0x9C) -
               | dereferencing NULL+0x9C, which is just 0x9C
               | 
               | This will segfault (or equivalent) due to reading invalid
               | memory at address 0x9C. Most people would call array[pos]
               | a null pointer dereference casually, even though it's
               | actually a 0x9C pointer dereference, because there's very
               | little effective difference between them.
               | 
               | Now, whether this case was actually something like this
               | (dereferencing some element of a null array pointer) or
               | something like type confusion (value 0x9C was supposed to
               | be loaded into an int, or char, or some other non-pointer
               | type) isn't clear to me. But I haven't dug into it
               | really, someone smarter than me could probably figure out
               | which it is.
        
               | UncleMeat wrote:
               | Except we don't see the instructions you'd expect to see
               | if the code was as you describe.
               | 
               | https://x.com/taviso/status/1814762302337654829
        
               | jeffbee wrote:
               | What we are witnessing quite starkly in this thread is
               | that the majority of HN commenters are the kinds of
               | people exposed to anti-woke/DEI culture warriors on
               | Twitter.
        
               | stravant wrote:
               | Such an invalid access of a very small address probably
               | does result from a nullptr error:
               | struct BigObject {             char stuff[0x9c]; //
               | random fields             int field;         }
               | BigObject* object = nullptr;         printf("%d",
               | object->field);
               | 
               | That will result in "Attempt to read from address 0x9c".
               | Just because it's not trying to read from literal address
               | 0x0 doesn't mean it's not nullptr error.
        
             | phire wrote:
             | Probably not.
             | 
             | R8 is 0x9c in that example, which is somewhat typical for
             | null+offset, but in the twitter thread it's
             | 0xffff9c8e0000008a.
             | 
             | So the actual bug is further back. It's not a null pointer
             | dereference, but it somehow results in the mov r8,
             | [rax+r11*8] instruction reading random data (could be
             | anything) into r8, which then gets used as a pointer.
             | 
             | Maybe this is a use-after-free?
        
         | saagarjha wrote:
         | It seems unlikely that it's a null pointer:
         | https://twitter.com/taviso/status/1814762302337654829
        
         | leeter wrote:
         | No this is kernelspace, an so while all addresses are 'virtual'
         | an unmapped address is an address that hasn't been mapped in
         | the page tables. Normally critical kernel drivers and data are
         | marked as non-pagable (note: The Linux Kernel doesn't page,
         | NTKernel does a legacy of when it was first written and memory
         | constraints of the time). So if a driver needs to access
         | pagable data it must not be part of the storage flow (and
         | Crowdstrike is almost certainly part of it), and at the correct
         | IRQL (the Interrupt priority level, anything above dispatch,
         | AKA the scheduler, has severe restraints on what can happen
         | there).
         | 
         | So no an unmapped address is a completely different BSOD,
         | usually PAGE_FAULT_IN_UNPAGED_AREA which is a very bad sign
        
           | jkrejcha wrote:
           | PAGE_FAULT_IN_NONPAGED_AREA[1]... was the BSOD that occurred
           | in this case. That's basically the first sign that it was a
           | bad pointer dereference in the first place.
           | 
           | (DRIVER_)IRQL_NOT_LESS_OR_EQUAL[2][3] is not this case, but
           | it's probably one of the most common reasons drivers crash
           | the system generally. Like you said it's basically attempting
           | to access pageable memory at a time that paging isn't allowed
           | (i.e. when at DISPATCH_LEVEL or higher).
           | 
           | [1]: https://learn.microsoft.com/en-us/windows-
           | hardware/drivers/d...
           | 
           | [2]: https://learn.microsoft.com/en-us/windows-
           | hardware/drivers/d...
           | 
           | [3]: https://learn.microsoft.com/en-us/windows-
           | hardware/drivers/d...
        
         | loeg wrote:
         | No; lots of virtual addresses are not mapped. Null is a subset
         | of all unmapped addresses.
        
       | qmarchi wrote:
       | Meta Conversation: The fact that X has a "Show Probable Spam" and
       | both of the responses were pretty valid, with one even getting a
       | reply from the creator.
       | 
       | I just don't understand how they still have users.
        
         | honeybadger1 wrote:
         | I believe that is dependent on your account settings. I block
         | all comments on accounts that do not have a verified phone
         | number as an example and they get dropped into that.
        
         | fireflies_ wrote:
         | > I just don't understand how they still have users.
         | 
         | Because this post is here and not somewhere else. Strong
         | network effects.
        
         | hipadev23 wrote:
         | There's literally not a better alternative and nobody seems to
         | be earnestly trying to fill that gap. Threads is boomer chat
         | with an instagram requirement. Every Mastodon instance is slow
         | beyond reason and it's still confusing to regular users in
         | terms of how it works. And is Bluesky still invite only?
         | Honestly haven't heard about it in a long time.
        
           | honeybadger1 wrote:
           | It is the best internet social feed to me as well. I use pro
           | a lot for following different communities and there is
           | nothing that can comes close today to being on the edge of
           | change online.
        
           | ric2b wrote:
           | Mastodon doesn't feel any slower to me than Twitter, maybe I
           | got lucky, according to you?
        
             | MBCook wrote:
             | Same. I have no issues at all on Mastodon. I'm quite happy
             | with it.
        
             | r2vcap wrote:
             | Maybe the experience varies depending on where the user is
             | located. Users near Mastodon servers (possibly on the US
             | East or West Coast) may not feel the slowness as much as
             | users in other parts of the world. I notice noticeably
             | slower response times when I use Mastodon in my location
             | (Korea).
        
               | robjan wrote:
               | I think a lot of people use Hetzner. I notice slowness,
               | especially with media, in Hong Kong. A workaround I've
               | found is to use VPNs which seem to utilise networks with
               | better peering with local ISPs
        
           | cageface wrote:
           | All the people I know that are still active on Twitter
           | because they need to be "informed" are constantly sending me
           | alarmist "news" that breaks on Twitter that, far more often
           | than not, turns out to be wrong.
        
           | lutoma wrote:
           | > Every Mastodon instance is slow beyond reason and it's
           | still confusing to regular users in terms of how it works.
           | 
           | I'll concede the confusing part but all the major Mastodon
           | servers I interact with regularly are pretty quick so I'm not
           | sure where that part comes from.
        
             | Lt_Riza_Hawkeye wrote:
             | It is not so bad with Mastodon but much fedi software gets
             | slower the longer it's been running. "Akkoma Rot" is the
             | one that's typically most talked about but the universe of
             | misskey forks experiences the same problems, and Mastodon
             | can sometimes absolutely crunch to a halt on 4GB of ram
             | even for a single user instance.
        
           | add-sub-mul-div wrote:
           | > And is Bluesky still invite only?
           | 
           | Not since February. But it's for the best that the Eternal
           | September has remained quarantined on Twitter.
        
           | TechSquidTV wrote:
           | Mastodon is a PERFECT replacement. But it'll never win
           | because there isn't a business propping it up and there is
           | inherent complexity, mixed with the biggest problem, cost.
           | 
           | No one wants to pay for anything, and that's the true root of
           | every issue around this. People complain YouTube has ads, but
           | wont buy premium. People hate Elon and Twitter but won't take
           | even an ounce of temporary inconvenience to try and solve it.
           | 
           | Threads exists, I'm happy they integrate with Activity Pub,
           | which should give us the best of both worlds. Why don't
           | people use Threads? I'd a little more popular outside the US
           | but personally, I think the "algorithm" pushes a lot of
           | engagement bait nonsense.
        
             | doodlebugging wrote:
             | >No one wants to pay for anything, and that's the true root
             | of every issue around this. People complain YouTube has
             | ads, but wont buy premium.
             | 
             | Perhaps if buying into a service guaranteed that they would
             | not be sold out then there would be more engagement. When
             | someone signs up it is pretty much a rock-hard guarantee
             | that their personal information will be marketed and sold
             | to any entity with the money and interest to buy it -
             | paying customers, free-loaders, etc.
             | 
             | When someone chooses to buy your app or SaaS then they
             | should be excluded from the list of users that you sell or
             | trade between "business partners".
             | 
             | When paying for a service guarantees that you're selling
             | all details of your engagement with that service to
             | unrelated business entities you have a disincentive to pay.
             | 
             | People are wising up to all this PII harvesting and those
             | clowns who sold everyone out need to find a different model
             | or quit bitching when real people choose to avoid their
             | "services" since most of these things are not necessary for
             | people to enjoy life anyway. They are distractions.
             | 
             | EDIT: This is not intended as a personal attack on you but
             | is instead a general observation from the perspective of
             | someone who does not use or pay for any apps or SaaS
             | services and who actively avoids handing out accurate
             | personal information when the opportunity arises.
        
             | jnurmine wrote:
             | Mastodon - mixed feelings.
             | 
             | In my experience, Mastodon is nice until you want to
             | partake in discussions. To do so, you need an account.
             | 
             | With an account you can engage in civilized discussions.
             | Some people don't agree with you, and you don't agree with
             | some people. That's fine, maybe you'll learn something new.
             | It's a discussion.
             | 
             | And then, suddenly, a secret court convenes and kills your
             | account just like that; no reason will be given, no
             | recourse will be available, admins won't reply, and you can
             | do two things: go away for good, or try again on a
             | different server.
             | 
             | I'm happy with a read-only Mastodon via a web interface.
             | 
             | But read-write? Never again, I probably don't have the
             | correct ideology for it.
        
           | fragmede wrote:
           | > Threads is boomer chat with an instagram requirement.
           | 
           | You're being too dismissive of Threads. It's fine, there are
           | adults there.
           | 
           | What weirdo doesn't have an insta?
        
             | macintux wrote:
             | Some of us stay far, far away from Facebook.
        
             | II2II wrote:
             | _raises hand_
             | 
             | Some people don't jump on every fad out there. Most of the
             | people who miss out on fads quickly realize that they
             | aren't losing out on much simply because fads are so
             | ephemeral. As far as I can tell, this is normal (though
             | different people will come to that realization at different
             | stages of their life).
        
               | fragmede wrote:
               | Facebook is going to run threads for as long as it wants,
               | time will tell if it's a fad or not. Is ChatGPT a fad?
        
               | II2II wrote:
               | While a fad (in this context) depends upon a company
               | maintaining a product, the act of maintaining a product
               | is not a measure of how long the fad lasts. Take
               | Facebook, the product. I'm fairly certain that it is long
               | past its peak as a communications tool between family,
               | friends, and colleagues. Facebook, the company, remains
               | relevant for other reasons.
               | 
               | As for ChatGPT, I'm sure time will prove it is a fad.
               | That doesn't mean that LLMs are a fad (though it is too
               | early to tell).
        
             | zdragnar wrote:
             | I don't have any social media of any kind, unless you count
             | HN.
             | 
             | My wife only uses Facebook, and even then pretty sparingly.
        
             | shzhdbi09gv8ioi wrote:
             | I never had insta. Why would anyone use that.
        
             | mardifoufs wrote:
             | Sadly enough the "average" instagram user doesn't use
             | threads. It's just a weird subset of them that use it, and
             | imo it's not the subset that makes Instagram great lol.
             | (It's a lot of pre 2021 twitter refugees, and that's an
             | incredibly obnoxious and self centered crowd in my
             | experience)
        
           | shzhdbi09gv8ioi wrote:
           | Strange take.. Mastodon is where alot of the IT discussion
           | happens these days.
           | 
           | The quality vs crap ratio is stellar on mastodon. Not so much
           | on anywhere else.
        
         | ants_everywhere wrote:
         | Relatedly, it's crazy to me how many people still get their
         | news from X. I mean serious people, not just Joe Schmoe.
         | 
         | The probable spam thing was nuts to me too. My guess was it's
         | maybe trying to detect users with lower engagement. Like people
         | who aren't moving the investigation forward but are trying to
         | follow it and be in the discussion.
        
           | pyinstallwoes wrote:
           | Relatedly, it's crazy to me how many people still get news
           | from the Sunday times!
        
             | jen729w wrote:
             | Relatedly, it's crazy to me how many people still read the
             | news!
        
           | AnthonyMouse wrote:
           | One of the things to keep in mind is that Twitter had most of
           | these misfeatures before Musk bought it.
           | 
           | The basic problem is, no moderation results in a deluge of
           | spam and algorithmic moderation is hot garbage that can only
           | filter out the bulk of the spam by also filtering out like
           | half of the legitimate comments. Human moderation is
           | prohibitively expensive unless you want to hire Mechanical
           | Turk-level moderators and not give them enough time to do a
           | good job, in which case you're back to hot garbage.
           | 
           | Nobody really knows how to solve it outside of the knob
           | everybody knows about that can improve the false negative
           | rate at the expense of the false positive rate or vice versa.
           | Do you want less ham or more spam?
        
             | ants_everywhere wrote:
             | I agree the problem is hard from a technical level.
             | 
             | The problem is also getting significantly worse because
             | it's trivial to generate entire pages of inorganic content
             | with LLMs.
             | 
             | The backstories of inorganic accounts are also much more
             | convincing now that they can be generated by LLMs. Before
             | LLMs, backstories all focused on a small handful of topics
             | (e.g. sports, games) because humans had to generate them
             | from playbooks of best pracitces. Now they can be into
             | almost anything.
        
               | pyinstallwoes wrote:
               | If you can't tell, is it spam?
        
           | ungreased0675 wrote:
           | When something big happens, Twitter is probably the best
           | place to get real time information from people on location.
           | 
           | Most everything else goes through a filter and pasteurization
           | before public consumption.
        
         | dclowd9901 wrote:
         | I had to log in to see responses. Pretty sure that's how they
         | still have users.
        
           | pyinstallwoes wrote:
           | How's that logic work when the platform depends upon content?
        
         | Jimmc414 wrote:
         | I use X solely for the AI discussions and I actively curate who
         | I follow, but where is there a better platform to join in
         | conversations with the top 500 people in a particular field?
         | 
         | I always assumed that the reason legit answers often fall under
         | "Show probable spam" is because of the inevitable reports
         | coming in on controversial topics. It seems like the community
         | notes feature works well most of the time.
        
         | wrycoder wrote:
         | When I see that, I usually upvote it.
        
         | mardifoufs wrote:
         | If bad spam detection was such a big issue for a social
         | platform, YouTube wouldn't be used by anyone ;). In fact it's
         | even worse on YouTube, it's the same pattern of accounts with
         | weird profile pictures copy pasting an existing comment as is
         | and posting it, for thousands of videos, and it's been going on
         | for a year now. It's actually so basic that I really wonder if
         | there's some other secret sauce to those bots to make them
         | undetectable.
        
           | omoikane wrote:
           | Well if it's just the comments, I think a lot of people just
           | don't read those. In fact, it's a fair bit of effort just to
           | read the descriptions with the YouTube app on some devices
           | (e.g. smart TVs), and it's really not worth the effort to
           | read the comments when users can just move on to the next
           | video.
        
             | mardifoufs wrote:
             | I don't necessarily think that's true anymore. YouTube
             | comments are important to the algorithm so creators are
             | more and more active in the comment section, and the
             | comments in general have been a lot more alive and often
             | add a lot of context or info for some type of videos.
             | YouTube has also started giving the comments a lot more
             | visibility in the layout (more than say, the video
             | description). But you're probably right w.r.t platforms
             | like TVs.
             | 
             | Before this wave of insane bot spam, the comments had
             | started to be so much better than what they used to be (low
             | effort, boomer spam). In fact I think they were much better
             | than the absolute cringy mess that comments on dedicated
             | forums like Reddit turned into
        
         | ascorbic wrote:
         | I'd go so far to say that almost all responses that I see under
         | "probable spam" are legitimate. Meanwhile real spam is
         | everywhere in replies, and most ads are dropshipped crap and
         | crypto scams with community notes. It's far worse than it's
         | ever been before.
        
       | js2 wrote:
       | https://threadreaderapp.com/thread/1814343502886477857.html
        
       | MBCook wrote:
       | https://twitter-thread.com/t/1814343502886477857
        
       | Fr0styMatt88 wrote:
       | The scarier thought I've had -- if a black hat had discovered
       | this crash case, could it have been turned into a widely deployed
       | code execution vulnerability?
        
         | MBCook wrote:
         | I had that same one. If loading a file crashed the kernel
         | module, could it have been exploitable? Or was there a
         | different exploitable bug in there?
         | 
         | Did any nation states/other groups have 0-days on this?
         | 
         | Did this event reveal something known to the public, or did
         | this screw up accidentally protect us from someone finding +
         | exploiting this in the future?
        
         | plorkyeran wrote:
         | Shockingly it turns out that installing a rootkit can have some
         | negative security implications.
        
           | llm_trw wrote:
           | Trying to explain to execs that giving someone root access to
           | your computers means they have root access to your computers
           | is surprisingly difficult.
        
             | tonetegeatinst wrote:
             | I mean kernal level access does provide feature not
             | accessible in userspace. Is it alsooverused when other
             | solutions exist, you bet.
             | 
             | Most people don't need this stuff. Just keeping shit up to
             | date, no not on the nightly build branch, but like
             | installing windows update atleast a day or two after they
             | come out. Or maby regular antivirus scans.
             | 
             | But let's be honest, your kernal drivers are useless if
             | your employees fall for phishing or social engineering. See
             | then its not malware, its an authorized user on the
             | system....just copying data onto a USB drive or a rouge
             | employee taking your customer list to your competition.
             | That fancy pants kernal driver might be really good at
             | stopping sophisticated threats and I'm sure the marketing
             | majors at any company cram products full of buzz words. But
             | remember, you can't fix incompetent or malicious employees
             | unless your taking steps to prevent it.
             | 
             | What's more likely: some foreign government hacking khols?
             | Or a script kiddie social engineers some poor worker
             | pretending to be the support desk?
             | 
             | Not here to shit on this product, it has its place and it
             | obviously does a good job....(heard its expensive but most
             | xrd/edr is)
             | 
             | Seems like we are learning how vulnerable certain things
             | are once again. As a fellow security fellow, I must say
             | that Jia Tan must be so envious that he couldn't have this
             | level of market impact.
        
             | rdtsc wrote:
             | Start a story for them: "and then, the hackers managed to
             | install a rootkit which runs in kernel mode. The rootkit
             | has sophisticated C2 mechanism with configuration files
             | pretending to be drivers suffixed with .sys extensions. And
             | then, they used that to prevent hospitals and 911 systems
             | around the world from working, resulting in delayed
             | emergency responses, injuries, possibly deaths".
             | 
             | After they cuss the hackers under their breath exclaiming
             | something like: "they should be locked up in jail for the
             | rest of their lives!...", tell them that's exactly what
             | happened, but CS were the hackers, and maybe they should
             | reconsider mandating installing that crap everywhere.
        
         | naveen99 wrote:
         | The hard part is the deploying. Yes if you can get control of
         | the crowdstrike deployment machinery, you can do whatever you
         | want on hundreds of millions of machines. but you don't need
         | any vulnerabilities in the crowdstrike deployed software for
         | that only the deploying servers.
        
           | tranceylc wrote:
           | Call me crazy but that is a real worry for me, and has been
           | for a while. How long until we see some large corporate
           | software have their deployment process hijacked, and have it
           | affect a ton of computers that auto-update?
        
             | spydum wrote:
             | I mean, isn't that roughly the solarwinds story? There is
             | no real shortage of supply chain incidents in the last few
             | years. The reality is we are all mostly okay with that
             | tradeoff.
        
             | jen20 wrote:
             | Around -4 years? [1]
             | 
             | [1]: https://en.wikipedia.org/wiki/2020_United_States_feder
             | al_gov...
        
             | alsodumb wrote:
             | You mean like the SolarWinds hack that happened a lil while
             | ago?
             | 
             | https://www.techtarget.com/whatis/feature/SolarWinds-hack-
             | ex...
        
             | btown wrote:
             | One of the most dangerous versions of this IMO is someone
             | who compromises a NPM/Pypi package that's widely used as a
             | dependency. If you can make it so that the original
             | developer doesn't know you've compromised their accounts
             | (spear-phished SIM swap + email compromise while the target
             | is traveling, for instance, or simply compromising the
             | developer themselves), you don't need every downstream user
             | to manually update - you just need enough projects that
             | aren't properly configured with lockfiles, and you've got
             | code execution on a huge number of servers.
             | 
             | I'm hopeful that the fallout from Crowdstrike will be a
             | larger emphasis on software BOM risk - when your systems
             | regularly phone home for updates, you're at the mercy of
             | the weakest link in that chain, and that applies to CI/CD
             | and end user devices alike.
        
               | IncreasePosts wrote:
               | It makes me wonder how many core software libraries to
               | modern infrastructure could be compromised by merely
               | threatening a single person.
        
               | jmb99 wrote:
               | As always, a relevant xkcd[1]. I would not be surprised
               | if the answer to "how many machines can be compromised in
               | 24 hours by threatening one person" was less than 8
               | figures. If you can find the right person, probably 9+.
               | 
               | [1] https://xkcd.com/2347/
        
               | leni536 wrote:
               | Just compromise one popular vim plugin and you have dev
               | access to half of the industry.
        
           | inferiorhuman wrote:
           | if you can get control of the crowdstrike deployment
           | machinery
           | 
           | Or combine a lack of certificate pinning with BGP hijacking.
        
         | Murky3515 wrote:
         | Probably would've been use to mine bitcoin before it was
         | patched
        
         | phire wrote:
         | No.
         | 
         | To trigger the crash, you need to write a bad file into
         | C:\Windows\System32\drivers\CrowdStrike\
         | 
         | You need Administrator permissions to write a file there, which
         | means you already have code execution permissions, and don't
         | need an exploit.
         | 
         | The only people who can trigger it over network are CrowdStrike
         | themselves... Or a malicious entity inside their system who
         | controls both their update signing keys, and the update
         | endpoint.
        
           | cyrnel wrote:
           | Anyone know if the updates use outbound HTTPS requests? If
           | so, those companies that have crappy TLS terminating outbound
           | proxies are looking juicy. And if they aren't pinning certs
           | or using CAA, I'm sure a $5 wrench[1] could convince one of
           | the lesser certificate authorities to sign a cert for
           | whatever domain they're using.
           | 
           | [1]: https://xkcd.com/538/
        
             | phire wrote:
             | The update files are almost certainly signed.
             | 
             | Even if the HTTPS channel is compromised with a man-in-the-
             | middle attack, the attacker shouldn't be able to craft a
             | valid update, unless they also compromised CrowdStrke's
             | keys.
             | 
             | However, the fact that this update apparently managed to
             | bypass any internal testing or staging release channels
             | makes me question how good CrowdStrike's procedures are
             | about securing those update keys.
        
               | cyrnel wrote:
               | Depends when/how the signature is checked. I could
               | imagine a signature being embedded in the file itself, or
               | the file could be partially parsed before the signature
               | is checked.
               | 
               | It's wild to me that it's so normal to install software
               | like this on critical infrastructure, but questions about
               | how they do code signing is a closely guarded/obfuscated
               | secret.
        
               | jmb99 wrote:
               | Kind of a side talent, but I'm currently (begrudgingly)
               | working on a project with a Fortune 20 company that
               | involves a complicated mess of PKI management, custom
               | (read: non-standard) certificates, a variety of
               | management/logging/debugging keys, and (critically) code
               | signing. It's taken me months of pulling teeth just to
               | get details about the hierarchy and how the PKI is
               | supposed to work from my own coworkers in a different
               | department (who are in charge of the project), let alone
               | from the client. I still have absolutely 0 idea how they
               | perform code signing, how it's validated, or how I can
               | test that the non-standard certificates can validate this
               | black-hole-box code signing process. So yeah, companies
               | really don't like sharing details about code signing.
        
               | phire wrote:
               | Sure, it's certainly possible.
               | 
               | Though, I prefer to give people benefit of doubt for this
               | type of thing. IMO, the level of incompetence to parse a
               | binary file before checking the signature is
               | significantly higher (or at least different) than simply
               | pushing out a bad update (even if the latter produces a
               | much more spectacular result).
               | 
               | Besides, we don't need to speculate. We have the driver.
               | We have the signature files [1]. Because of the
               | publicity, I bet thousands of people are throwing it into
               | Binary RE tools right now, and if they are doing
               | something as stupid as parsing a binary file before
               | checking it's signature (or not checking a signature at
               | all), I'm sure we will hear about it.
               | 
               | We can't see how it was signed because that's happening
               | on Cloudstrike's infrastructure, but checking the
               | signature verification code is trivial.
               | 
               | [1] Both in this zip file: https://drive.google.com/file/
               | d/1OVIWLDMN9xzYv8L391V1ob2ghp8...
        
               | emmelaich wrote:
               | See my speculation above.
               | 
               | https://news.ycombinator.com/item?id=41022110
        
             | gruez wrote:
             | that's assuming they don't do cert pinning. Moreover
             | despite all the evil things you can supposedly do with a $5
             | wrench, I'm not aware of any documented cases of this sort
             | of attack happening. The closest we've seen are
             | misissuances seemingly caused by buggy code.
        
             | emmelaich wrote:
             | My speculation is the bit of code/data that was broken, is
             | added after the build and testing _precisely to avoid_ the
             | $5 wrench attack.
             | 
             | That is, the data is signed and they don't want to use the
             | real signing key during testing / in the continuous build
             | because then it is too exposed.
             | 
             | So it's added after as something that "could not break".
             | But it of course did.
        
               | phire wrote:
               | I can think of a bunch of different answers:
               | 
               | This wasn't a code update, just a configuration update.
               | Maybe they don't put config update though QA at all,
               | assuming they are safe.
               | 
               | It's possible that QA is different enough from production
               | (for example debug builds, or signature checking
               | disabled) that it didn't detect this bug.
               | 
               | Might be an ordering issue, and that they tested applying
               | update A then update B, but pushed out update B first.
               | 
               | The fact that it instantly went out to all channels is
               | interesting. Maybe they tested it for the beta channel it
               | was meant for (and it worked, because that version of the
               | driver knew how to cope with that config) but then
               | accidentally pushed it out to all channels, and the older
               | versions had no idea what to do wiht it.
               | 
               | Or maybe they though they were only sending it to their
               | QA systems but pushed the wrong button and sent it out
               | everywhere.
        
               | emmelaich wrote:
               | > _This wasn 't a code update, just a configuration
               | update_
               | 
               | Configuration is data, data is code.
        
           | Animats wrote:
           | How does it validate the updates, exactly?
           | 
           | Microsoft supposedly has source IP addresses known by their
           | update clients, so that DNS spoofing won't work.
        
             | FreakLegion wrote:
             | Microsoft signs its updates. There's no restriction on
             | where you can get them from.
        
               | ffhhj wrote:
               | Microsoft has previously leaked their keys.
        
               | FreakLegion wrote:
               | Not that I recall.
               | 
               | Microsoft has leaked keys that weren't used for code
               | signing. I've been on the receiving end of this actually,
               | when someone from the Microsoft Active Protections
               | Program accidentally sent me the program's email private
               | key.
               | 
               | Microsoft has been tricked into signing bad code
               | themselves, just like Apple, Google, and everyone else
               | who does centralized review and signing.
               | 
               | Microsoft has had certificates forged, basically, through
               | MD5 collisions. Trail of Bits did a good write-up of this
               | years ago.
               | 
               | But I can't think of a case of Microsoft losing control
               | of a code signing key. What are you referring to?
        
             | Randor wrote:
             | As a former member of the Windows Update software
             | engineering team, I can say this is absolutely false. The
             | updates are signed.
        
               | Animats wrote:
               | I know they are signed. But is that enough?
               | 
               | Attackers today may be willing to spend a few million
               | dollars to access those keys.
        
           | jackjeff wrote:
           | If you get have privileged escalation vulnerability there are
           | worse things you can do. Just making the system unbootable by
           | destroying the boot sector/EFI partition and overwriting
           | system files. No more rebooting in safe mode and no more
           | deleting a single file to fix the boot.
           | 
           | This would probably be classified as a terrorist attack and
           | frankly it's just a matter of time until we get one some day.
           | A small dedicated team could pull it off. It's just so
           | happens that the people with the skills currently either opt
           | for cyber criminality (crypto lockers and such), work for a
           | state actor (think Stuxnet) or play defense in a cyber
           | security firm.
        
       | canistel wrote:
       | Out of curiosity: In the old days, SoftIce could have been used
       | which was a kernel mode debugger. What tool can be used these
       | days?
        
         | mauvehaus wrote:
         | SoftIce predates me, but when I was doing filesystem filter
         | driver work, the tool of choice was WinDbg. Been out of the
         | trade for a bit, but it looks to still be in use. We had it set
         | up between a couple of VMs on VMware.
        
         | Dwedit wrote:
         | You'd use WinDBG today. It allows you to do remote kernel
         | debugging over a network. This also includes running Windows in
         | a virtual machine, and debugging it through the private network
         | connection.
        
           | gonesilent wrote:
           | FireWire is also still used to dump out kernel debug.
        
             | the8472 wrote:
             | Shouldn't IOMMUs block that these days?
        
         | swdunlop wrote:
         | https://qemu-project.gitlab.io/qemu/system/gdb.html
        
       | golemiprague wrote:
       | But how come they didn't catch it in the testing deployments?
       | what was the difference that caused it to happen when they
       | deployed to the outside world. I find it hard to believe that
       | they didn't test it before deployment. I also think companies
       | should all have a testing environment before deploying 3rd party
       | components. I mean, we all install some packages during
       | development that fails or cause some problems but nobody think it
       | is a good idea to do it directly in their production environment
       | before testing, so how is this different?
        
         | someonehere wrote:
         | That's what a lot of us are wondering. There's a lot of outside
         | thinking of the box right now about this in certain circles.
        
           | IAmGraydon wrote:
           | There's no point in leaving vague allusions. Can you expand
           | on this?
        
             | kbar13 wrote:
             | security industry's favorite language is nothingspeak
        
         | jmb99 wrote:
         | > I find it hard to believe that they didn't test it before
         | deployment.
         | 
         | I'm not sure why you find that hard to believe - based on the
         | (admittedly fairly limited) evidence we have right now, it's
         | highly unlikely that this deployment was tested much, if at
         | all. It seems much more likely to me that they were playing
         | fast and loose with definition updates to meet some arbitrary
         | SLAs[1] on zero-day prevention, and it finally caught up with
         | them. Much more likely than somehow every single real-world pc
         | running their software being affected but their test machines
         | somehow all impervious.
         | 
         | [1] When my company was considering getting into endpoint
         | security and network anomaly detection, we were required on
         | multiple occasions by multiple potential clients to provide a
         | 4-hour SLA on a wide number of CVE types and severities. That
         | would mean 24/7 on-call security engineers and a sub-4-hour
         | definition creation and deployment. Yes, that 4 hours was for
         | the deployment being available on 100% of the targets. Good
         | luck writing and deploying a high-quality definition for a zero
         | day in 4 hours, let alone running it through a test pipeline,
         | let alone writing new tests to actually cover it. We very
         | quickly noped out of the space, because that was considered
         | "normal" (at least to the potential clients we were
         | discussing). It wouldn't shock me if CS was working in roughly
         | the same way here.
        
           | drooopy wrote:
           | This whole f*up was a failure of management and processes at
           | Crowdstrike. "Intern Steve" pushing faulty code to production
           | on a Friday is only a couple of cm of the tip of an enormous
           | iceberg.
        
             | chronid wrote:
             | I wrote this in another thread already, but the fuck up was
             | both at crowdstrike (they borked a release) but _also_ and
             | more importantly their customers. Shit happens even with
             | the best testing in the world.
             | 
             | You do not deploy _anything_ , _ever_ on your entire
             | production fleet at the same time and you do not buy
             | software that does that. It 's madness and we're not
             | talking about small companies with tiny IT departments
             | here.
        
               | perbu wrote:
               | Shit might happen with the best testing, but with decent
               | testing it would not be this serious.
        
               | wazzaps wrote:
               | Apparently CrowdStrike bypassed clients' staging areas
               | with this update.
               | 
               | Source:
               | https://x.com/patrickwardle/status/1814367918425079934
        
               | owl57 wrote:
               | _> you do not buy software that does that_
               | 
               | Note how the incident disproportionally affected highly
               | regulated industries, where businesses don't have a
               | choice to screw "best practice".
        
               | TeMPOraL wrote:
               | Only highlighting that "best practice" of cybersecurity
               | is, charitably, total bullshit; less charitably, a
               | racket. This is apparent if you look at the costs to the
               | day-to-day ability of employees to do work, but maybe
               | it'll be more apparent now that people got killed because
               | of it.
        
               | badgersnake wrote:
               | It's absolutely a racket.
        
               | d1sxeyes wrote:
               | That's a tricky one. CrowdStrike is cybersecurity. Wait
               | until the first customer complains that they were hit by
               | WannaCry v2 because CrowdStrike wanted to wait a few days
               | after they updated a canary fleet.
               | 
               | The problem here is that this type of update (a content
               | update) should _never_ be able to cause this however
               | badly it goes. In case the software receives a bad
               | content update, it should fail back to the last known
               | good content update (potentially with a warning fired off
               | to CS, the user, or someone else about the failed
               | update).
               | 
               | In principle, updates that _could_ go wrong and cause
               | this kind of issue should absolutely be deployed slowly,
               | but per my understanding, that's already the practice for
               | non-content updates at CrowdStrike.
        
               | chronid wrote:
               | Windows updates are also cybersecurity, but the customer
               | has (had?) a choice to how to roll those out (with Intune
               | nowadays?). The customer should decide when to update,
               | they own the fleet not the vendor!
               | 
               | You do not know if a content update will screw you over
               | and mark all the files of your company as malware. The
               | "It should never happen" situations are the thing you
               | need to prepare for, the reason we talk about security as
               | an onion, the reason we still do staggered production
               | releases with baking times even after tests and QA have
               | passed...
               | 
               | "But it's cybersecurity" is _not_ a justification. I know
               | that security departments and IT departments and
               | companies in general love dropping the  "responsibility"
               | part on someone else, but in the end of the day the thing
               | getting screwed over is the company fleet. You should
               | retain control and make sure things work properly, the
               | fact those billion dollar revenue companies are unable to
               | do so is a joke. A terrible one, since IT underpins
               | everything nowadays.
        
               | chrisjj wrote:
               | > The customer should decide when to update, they own the
               | fleet not the vendor!
               | 
               | The CS customer has decided to update whenever 24/7 CS
               | says. The alternative is to arrive on Monday morning to
               | an infected fleet.
        
               | chronid wrote:
               | Sorry, this is untrue. Enterprises have SOCs and oncalls,
               | if there is a high risk they can do at least minimal
               | testing (which would have found this issue as it has a
               | 100% bsod rate) and then fleet rollout. It would have
               | been rolled out by Friday evening in this case without
               | crashing hundred of thousands of servers.
               | 
               | The CS customer has decided to offload the responsibility
               | of its fleet to CS. In my opinion that's bullshit and
               | negligence (it doesn't mean I don't understand why they
               | did it), particularly at the scale of some of the
               | customers :)
        
               | chrisjj wrote:
               | > they can do at least minimal testing (which would have
               | found this issue as it has a 100% bsod rate)
               | 
               | Incorrect, I believe, given they did not and could not
               | get advance sight of the offending forced update.
        
               | Kwpolska wrote:
               | I doubt CrowdStrike had done any testing of the update.
        
               | chrisjj wrote:
               | > they can do at least minimal testing (which would have
               | found this issue as it has a 100% bsod rate)
               | 
               | Incorrect, I believe, given they could and did not get
               | advance sight of the offending forced update.
        
               | d1sxeyes wrote:
               | It _is_ a justification, just not necessarily one you
               | agree with.
               | 
               | Companies choose to work with Crowdstrike. One of the
               | reasons they do that is 'hands-off' administration-let a
               | trusted partner do it for you. There are absolutely risks
               | of doing it this way. But there are also risks of doing
               | it the other way.
               | 
               | The difference is, if you hand over to Crowdstrike,
               | you're not on your own if something goes wrong. If you
               | manage it yourself, you've only got yourself working on
               | the problem if something goes wrong.
               | 
               | Or worse, something goes wrong and your vendor says "yes,
               | we knew about this issue and released the fix in the
               | patch last Tuesday. Only 5% of your fleet took the patch?
               | Oh. Sounds like your IT guys have got a lot of work on
               | their hands to fix the remaining 95% then!".
        
               | stef25 wrote:
               | You'd think that the software would sit in a kind of
               | sandbox so that it couldn't nuke the whole device but
               | only itself. It's crazy that this is possible.
        
               | echoangle wrote:
               | The software basically works as a kernel module as far as
               | I understand, I don't think there's a good way to
               | separate that from the OS while still allowing it to have
               | the capabilities it needs to have to surveil all other
               | processes.
        
               | temac wrote:
               | Something like ebpf.
        
               | layer8 wrote:
               | And even then, you wouldn't want the system to continue
               | running if the security software crashes. Such a crash
               | might indicate a successful security breach.
        
               | KaiserPro wrote:
               | > You do not deploy anything, ever on your entire
               | production fleet at the same time and you do not buy
               | software that does that
               | 
               | I am sympathetic to that, but its only possible if both
               | policy and staffing allow.
               | 
               | for policy, there are lots of places that demand CVEs be
               | patched within x hours depending on severity. A lot of
               | times, that policy comes from the payment integration
               | systems provider/third party.
               | 
               | However you are also dependent on programs you install
               | not autoupdating. Now, most have an option to flip that
               | off, but its not always 100% effective.
        
               | chronid wrote:
               | > I am sympathetic to that, but its only possible if both
               | policy and staffing allow.
               | 
               | We are not talking about small companies here. We're
               | talking about massive billion revenue enterprises with
               | enormous IT teams and in some cases multiple NOCs and
               | SOCs and probably thousands consultants all around at
               | minimum.
               | 
               | I find it hard to be sympathetic to this complete
               | disregard of ownership just to ship responsibility
               | somewhere else (because this is the need at the of the
               | day let's not joke around). I can understand it, sure,
               | and I can believe - to a point - someone did a risk
               | calculation (possibility of crowdstrike upgrade killing
               | all systems vs hack if we don't patch a CVE in <4h), but
               | it's still madness from a reliability standpoint.
               | 
               | > for policy, there are lots of places that demand CVEs
               | be patched within x hours depending on severity.
               | 
               | I'm pretty sure leadership when they need to choose
               | between production being down for an unspecified amount
               | of time and taking the risk of delaying (of hours in this
               | case) the patching will choose the delay. Partners and
               | payment integration providers can be reasoned with,
               | contracts are not code. A BSOD you cannot talk away.
               | 
               | Sure, leadership is also now saying "but we were doing
               | the same thing as everyone else, the consultants told us
               | to and how could have we have known this random software
               | with root on every machine we own could kill us?!" to
               | cover their asses. The problem is solved already, since
               | it impacted everyone, and they're not the ones spending
               | their weekend hammering systems back to life.
               | 
               | > However you are also dependent on programs you install
               | not autoupdating. Now, most have an option to flip that
               | off, but its not always 100% effective.
               | 
               | You choose what to install on your systems, and you have
               | the option to refuse to engage with companies that don't
               | provide such options. If you don't, you accept the risk.
        
               | sateesh wrote:
               | Disagree with the part where you put onus on customer. As
               | has been mentioned in other HN thread [1], this update
               | was pushed ignoring whatever the settings customer had
               | configured. The original mistake of the customer, if any,
               | was they didn't read this in fine print of the contract
               | (if this point about updates was explicitly mentioned in
               | the contract). 1.
               | https://news.ycombinator.com/item?id=41003390
        
               | chrisjj wrote:
               | > You do not deploy anything, ever on your entire
               | production fleet at the same time
               | 
               | And if an attacker does??
        
             | jmb99 wrote:
             | Oh absolutely. There's many levels of failure here. A few
             | that I see as being likely:
             | 
             | - Lack of testing of a deployment - Lack of required
             | procedures to validate a deployment - Engineering
             | management prioritizing release pace over stability/testing
             | - Management prioritizing tech debt/pentests/etc far too
             | low - Sales/etc promising fast turnarounds that can't be
             | feasibly met while following proper standards - Lack of
             | top-down company culture of security and stability first,
             | which should be a must for _any_ security company
             | 
             | This outage wasn't caused only by "the intern pushing
             | release." It was caused by a poor company culture (read:
             | incorrect direction from the top) resulting in a lack of
             | testing of the program code, lack of testing environment
             | for deployments, lack of formal deployment process, and
             | someone messing up a definition file that was caught by 0
             | other employees or automated systems.
        
           | _moof wrote:
           | I can't speak to its veracity but there's a screenshot making
           | its way around in which Crowdstrike discouraged sites from
           | testing due to the urgency of the update.
        
             | AmericanChopper wrote:
             | I don't work with CS products atm, but my experience with a
             | big CS deployment was exactly like this. They were openly
             | quite hostile to any suggestion of testing their products,
             | we were frequently rebuked for running our prod censors on
             | version n-1. I talked about it a bit in this comment.
             | 
             | https://news.ycombinator.com/item?id=%2041002864
             | 
             | Very much not surprised to see this now.
        
             | jmb99 wrote:
             | It's kind of hard to pitch "zero-day prevention" if you
             | suggest people roll out definitions slowly, over the course
             | of days/weeks. Thus making it a lot harder to charge to the
             | moon for your service.
             | 
             | Now, if these sorts of things were battle tested before
             | release, and had a (ideally decade+-long) history of
             | stability with well-documented processes to ensure that
             | stability, you can more easily make the argument that it's
             | worth it. None of those things are close to true though
             | (and more than likely will never be for any AV/endpoint
             | solution), so it is very hard to justify this sort of
             | configuration.
        
           | qaq wrote:
           | While true agent should roll back to previous content version
           | if it keeps crashing
        
             | Kwpolska wrote:
             | Detecting system crashes would be hard. You could try
             | logging and comparing timestamps on agent startups and see
             | if the difference is 5 minutes or less. Buggy kernel
             | drivers crash Windows hard and fast.
        
               | qaq wrote:
               | loading content is pretty specific step so your solution
               | is more or less valid
        
               | kchr wrote:
               | > Detecting system crashes would be hard.
               | 
               | Store something like an `attemptingUpdate` flag before
               | updating, and remove it if the update was successful.
               | Upon system startup, if the flag is present, revert to
               | the previous config and mark the new config bad.
        
         | treflop wrote:
         | I've seen places where failed releases are just "part of normal
         | engineering." Because no one is perfect, they say.
        
           | slenk wrote:
           | I really dislike this mentality. Don't even get me started on
           | celebrating when your rocket blows up
        
             | galangalalgol wrote:
             | If it is a standard production rocket, I agree. If it is a
             | first of kind or even third of kind launch, celebrating the
             | lessons learned from a failure is a healthy attitude. This
             | production software is not the same thing at all.
        
             | Heliosmaster wrote:
             | spaceX celebrating when their rocket blows up _after a
             | certain milestone_ it 's like us devs celebrating when our
             | branch with that new big feature only fails a few tests.
             | Did it pass no? Are you satisfied as first try? Probably
        
           | photonthug wrote:
           | Even on hn, comments advocating engineering excellence or
           | just quality in general are frequently looked down on, which
           | probably also tells you a lot about the wider world.
           | 
           | This is why we can't have nice things, but maybe we just
           | don't want them anyway? "Mistakes will be made" is way less
           | true if you actually put the effort in to prevent them, but I
           | am beginning to think this has become code for quiet-quitters
           | to telegraph a "I want to get paid for no effort and
           | sympathize with others who feel the same" sentiment and
           | appear compassionate and grimly realistic all at the same
           | time.
           | 
           | yes, billion dollar companies are going to make mistakes, but
           | almost always because of cost cutting, willful ignorance, or
           | negligence. If average people are apologizing for them and
           | excusing that, there has to be some reason that it's good for
           | them.
        
             | treflop wrote:
             | Personally while I value excellence, I reduce the frequency
             | of errors through process and procedure because I'm lazy.
             | 
             | I don't mind meetings but being in a 4 hour emergency
             | meeting because some due diligence wasn't done is a waste
             | of my time.
             | 
             | Life is easier when you do good work.
        
         | usrusr wrote:
         | One possible explanation could be automated testing deployments
         | for definitions updates that don't run the current version of
         | the definition consumer, and the old one they do run is
         | unaffected.
        
         | itronitron wrote:
         | for all we know, the deployment was the test
        
           | owl57 wrote:
           | As the old saying goes, everyone has a test environment, and
           | some also have a separate production one.
        
         | albert_e wrote:
         | My guess -- there are two separate pipelines one for code
         | changes and one for data files.
         | 
         | Pipeline 1 --
         | 
         | Code updates to their software are treated as material changes
         | that require non-production and canary testing before global
         | roll-out of a new "Version".
         | 
         | Pipeline 2 --
         | 
         | Content / channel updates are handled differently -- via a
         | separate pipeline -- because only new malware signatures and
         | the like are distrubuted via this route. The new files are just
         | data files -- they are supposed to be in a standard format and
         | only read, not "executed".
         | 
         | This pipeline itself must have been tested originally and found
         | tobe working satisfactorily -- but inside the pipeline there is
         | no "test" stagethat verifies the integrity of the data fine so
         | generated, nor - more importantly - checking if this new data
         | file works without errors when deployed to the latest versions
         | of the software in use.
         | 
         | The agent software that reads these daily channel files must
         | have been "thoroughly" tested (as part of pipeline 1) for all
         | conceivable data file sizes and simulated contents before
         | deployment. (any invalid data files should simply be rejected
         | with an error ... "obviously")
         | 
         | But the exact scenario here -- possibly caused by a broken
         | pipeline in the second path (pipeline 2) -- created invalid
         | data files with some quirks. And THAT specific scenario was not
         | imagined or tested in the software version dev-test-deploy
         | pipeine (pipeline 1).
         | 
         | If this is true --
         | 
         | The lesson obviously is that even for "data" only distributions
         | and roll-outs, however standardized and stable their pipelines
         | may be, testing is still an essential part before large scale
         | roll-outs. It will increase cost and add latency sure, but we
         | have to live with it. (similar to how people pay for "security"
         | software in the first place)
         | 
         | Same lesson for enterprise customers as well -- test new
         | distributions on non-production within your IT setup, or have a
         | canary deployment in place before allowing full roll-outs into
         | production fleets.
        
           | sateesh wrote:
           | _Same lesson for enterprise customers as well -- test new
           | distributions on non-production within your IT setup, or have
           | a canary deployment in place before allowing full roll-outs
           | into production fleets._
           | 
           | It was mentioned in one of the HN threads, that the update
           | was pushed overriding the settings customer had [1]. What
           | recourse any customer can have in in such a case ?
           | 
           | 1. https://news.ycombinator.com/item?id=41003390
        
             | perryizgr8 wrote:
             | > What recourse any customer can have in in such a case ?
             | 
             | Sue them and use something else.
        
             | teeheelol wrote:
             | Ah that was me. We don't accept "content updates" and they
             | are staged.
             | 
             | We got this update pushed right through.
        
           | rramadass wrote:
           | Nice.
           | 
           | But the problem here is that _the code runs in kernel mode_.
           | As such any data that it may consume should have been tested
           | with the same care as the code itself which has never been
           | the case in this industry.
        
           | Wytwwww wrote:
           | > It will increase cost
           | 
           | And of of course that cost would be absolutely insignificant
           | relative to the potential risk...
        
         | masfuerte wrote:
         | I find it hard to believe they didn't do any testing. I wonder
         | if they tested the virus signatures against the engine, but
         | didn't check the final release artefact (the .sys file) and the
         | bug was somehow introduced in the packaging step.
         | 
         | This would have been poor, but to have released it with no
         | testing would have been the most staggering negligence.
        
       | andix wrote:
       | How sure are we, that this was not a cyberattack?
       | 
       | It seems really scary to me, that crowdstrike is able to push
       | updates in real time to most of their customers systems. I don't
       | know of any other system, that would provide a similar method to
       | inject code at kernel level. Not even windows updates, as they
       | always roll out with some delay and not to all computers at the
       | same time
       | 
       | If you want to attack high profile systems, crowdstrike would be
       | one of the best possible targets.
        
         | Grimblewald wrote:
         | The amount of self pwning that goes on in both corporate and
         | personal devices these days is insane. The amount of games that
         | want you to install kernal level anti-cheat is astounding. The
         | amount of companies that have centralized remote surveillance
         | and control of all devices, where access to this is through a
         | great number of sloppily managed accounts, is beyond spooky.
        
           | padjo wrote:
           | I mean centralized control of devices is great for the far
           | more common occurrence of Bob from accounting leaving his
           | laptop on the train with his password on post-it note stuck
           | to the screen.
        
           | andix wrote:
           | Exactly. It's ridiculous to open up all/most of a companies
           | systems to such a single point of failure. We install
           | redundant PSUs, backup networks, generators, and many more
           | things. But one single automatic update can bring down all
           | systems within minutes. Without any redundancy.
        
       | Anonymityisdead wrote:
       | Where is a good place and way to start practicing disassembly in
       | 2024?
        
         | nophunphil wrote:
         | Take this with a grain of salt as I'm not an SME, but there is
         | a need for volunteers on reverse-engineering projects such as
         | the Zelda decompilation projects[1]. This would probably give
         | you some level of exposure, particularly if you have an
         | interest in videogames.
         | 
         | [1] https://zelda64.dev/
        
         | Scene_Cast2 wrote:
         | Try solving some crackme's. They're binary executables of
         | various difficulty (with rated difficulty), where the goal
         | ranges from finding a hardcoded password to making a keygen to
         | patching the executable. They used to be more popular, but I'm
         | guessing you can still find tutorials on how to get started and
         | solve a simple one.
        
         | commandersaki wrote:
         | I found https://pwn.college to be excellent, even though they
         | mostly focus on exploitation, pretty much everything involves
         | disassembly.
        
         | 13of40 wrote:
         | Writing your own simple programs and debugging/disassembling
         | them is a solid option. Windbg and Ida are good tools to start
         | with. Reading a disassembly is a lot easier than coding in
         | assembly, and once you know what things like function calls and
         | switch statements, etc. look like you can get a feel for what
         | the original program was doing.
        
         | mauvia wrote:
         | first you need to learn assembly, second you can start by
         | downloading ghidra and directly start decompiling some simple
         | things you use and seeing what they do.
        
         | 0xDEADFED5 wrote:
         | you can compile your own hello world and look at the executable
         | with x64dbg. press space on any instruction and you can
         | assemble your own instruction in it's place (optionally filling
         | the leftover bytes with NOPs)
        
         | CodeArtisan wrote:
         | As a very first step, you may start playing with
         | https://godbolt.org/ to see how code is translated into lower-
         | level instructions.
        
       | m0llusk wrote:
       | Ended up being forced because it was a "content update". This is
       | the update of our discontent!
        
       | brcmthrowaway wrote:
       | How did it pass CI?
        
         | voidfunc wrote:
         | I suspect some engineer has discovered their CI scripts were
         | just "exit 0"
        
           | 01HNNWZ0MV43FF wrote:
           | Ah, the French mutation testing. Has never been celebrated
           | for its excellence. </orson>
        
             | dehugger wrote:
             | What is French mutation testing? A casual kagi seems to
             | imply its a type of genetic testing, or perhaps just tests
             | that have been done in France?
        
               | zerocrates wrote:
               | They're referencing an (in)famous video of a
               | drunk/drugged/tired Orson Welles attempting to do a
               | commercial; his line is "Ahhh, the... French... champagne
               | has always been celebrated for its excellence..."
               | 
               | I don't think there's anything more to the inclusion of
               | "French" in their comment beyond it being in the original
               | line.
               | 
               | https://www.youtube.com/watch?v=VFevH5vP32s
               | 
               | and the successful version:
               | https://www.youtube.com/watch?v=qb1KndrrXsY
        
           | Too wrote:
           | lol, I've lost count of how many CI systems I've seen that
           | are essentially no-ops, letting through all errors, because
           | somewhere there was a bash script without set -o errexit.
        
         | emmelaich wrote:
         | I is added after CI, testing. At least according to something I
         | read previously on HN. See my the comment which speculates why.
         | 
         | https://news.ycombinator.com/item?id=41022110
        
         | xyst wrote:
         | Bold of you to assume there is CI to begin with
        
         | Osiris wrote:
         | It wasn't a code update. It was a data file update. It certain
         | seems that they don't include adequate testing for data file
         | updates.
        
           | bni wrote:
           | In my experience, testing data and config is very rare in the
           | whole industry. Feeding software corrupted config files or
           | corrupted content from its own database often makes software
           | to crash. Most often this content is "trusted" to be
           | "correct".
        
       | nickm12 wrote:
       | It's really difficult to evaluate the risk the CrowdStrike system
       | imposed. Was this a confluence of improbable events or an
       | inevitable disaster waiting to happen?
       | 
       | Some still-open questions in my mind:
       | 
       | - was the broken rule in the config file (C-00000291-...32.sys)
       | human authored and reviewed or machine-generated?
       | 
       | - was the config file syntactically or semantically invalid
       | according to its spec?
       | 
       | - what is the intended failure mode of the kernel driver that
       | encounters an invalid config (presumably it's not "go into a boot
       | loop")?
       | 
       | - what automated testing was done on both the file going out and
       | the kernel driver code? Where would we have expected to catch
       | this bug?
       | 
       | - what release strategy, if any, was in place to limit the blast
       | radius of a bug? Was there a bug in the release gates or were
       | there simply no release gates?
       | 
       | Given what we know so far, it seems much more likely that this
       | was a "disaster waiting to happen" but I still think there's a
       | lot more to know. I look forward to the public post-mortem.
        
         | refulgentis wrote:
         | Would any of these, or even a collection of these, resolving in
         | some direction make it highly improbable that it'll never
         | happen again?
         | 
         | Seems to me 3rd party code, running in the kernel, on parsed
         | inputs, that can be remotely updated is enough to be disaster
         | waiting to happen _gestures breezily at Friday_
         | 
         | That's, in the Taleb parlance, a Fat Tony argument, but barring
         | it being a cosmic ray causing a uncorrected bit flop during
         | deploy, I don't think there's room to call it anything but "a
         | disaster waiting to happen"
        
           | slt2021 wrote:
           | kernel driver could have data check on the channel file and
           | fail gracefully/ignore wrong file instead of BSOD.
           | 
           | this code is executed only once during the driver
           | initialization, so shouldn't be much overhead, but will
           | greatly improve reliability against broken channel file
        
             | refulgentis wrote:
             | This is going to code as radical, but I always assumed it
             | was derivable from bog-standard first principles that would
             | fit in any economics class I sat in for my 40 credits:
             | 
             | the natural cost of these bits we sell is zero, so in the
             | long run, if the bar is "just write a good & tested kernel
             | driver", there will always be one more subsequent market
             | entrant who will go too cheap on engineering. Then, they
             | touch the hot wire and burn down the establishment.
             | 
             | That doesn't mean capitalism bad, but it does mean I expect
             | only Microsoft is capable of writing and maintaining this
             | type of software in the long run.
             | 
             | Ex. The dentist and dental hygienist were asking me who was
             | attacking Microsoft on Friday, and they were not going to
             | get through to the the subtleties of 3rd kernel driver
             | release gating strategy.
             | 
             | MS has a very strong incentive to fix this. I don't know
             | how they will. But I love when incentives align and assume
             | they always will, in the long run.
        
           | nickm12 wrote:
           | Yes, if CrowdStrike was following industry best practices and
           | this happened, it would teach us something novel about
           | industry practices that we could learn from and use to reduce
           | the risk of a similar scale outage happening again.
           | 
           | If they weren't following these practices, this is kind of a
           | boring incident with not much to be learned, despite how
           | dramatic the scale is. Practices like staged rollout of
           | changes exist precisely because we've learned these lessons
           | before.
        
           | YZF wrote:
           | Well, kernel code is kernel code, and kernel code in general
           | takes input from outside the kernel. An audio driver takes
           | audio data, a video driver might take drawing instructions, a
           | file system interacts with files, etc. Microsoft, and others,
           | have been releasing kernel code since forever and for the
           | most part, not crashlooping their entire install base.
           | 
           | My Tesla remote updates ... hmph.
           | 
           | It doesn't feel like this is inherently impossible. It feels
           | more like not enough design/process to mitigate the risks.
        
         | hdhshdhshdjd wrote:
         | Was somebody trying to install an exploit or back door and
         | fucked up?
        
           | TechDebtDevin wrote:
           | Everything is a conspiracy now eh?
        
             | choppaface wrote:
             | To be fair, the xd backdoor wasn't immediately obvious
             | https://www.wired.com/story/xz-backdoor-everything-you-
             | need-...
        
             | hdhshdhshdjd wrote:
             | You do remember Solarwinds right? This is an obvious high
             | value target, so it is reasonable to entertain malicious
             | causes.
             | 
             | Given the number of systems infected, if you could push
             | code that rebooted every client into a compromised state
             | you'd still have run of some % of the lot until it was
             | halted. That time window could be invaluable.
             | 
             | Now, imagine if you screw up the code and just boot loop
             | everything.
             | 
             | I'd say business wise it's better for crowd strike to let
             | people think it's an own-goal.
             | 
             | The truth may be mundane but a hack is as reasonable a
             | theory as "oops we pushed boot loop code to world+dog".
        
               | saagarjha wrote:
               | > The truth may be mundane but a hack is as reasonable a
               | theory as "oops we pushed boot loop code to world+dog".
               | 
               | No it's not. There are many signs that point to this
               | being a mistake. There are very few that point to it
               | being a hack. You can't just go "oh it being a hack is
               | one of the options therefore it is also something worth
               | considering".
        
               | azinman2 wrote:
               | Especially because if it was crowdstrike wouldn't be
               | apologizing and accepting blame.
        
               | owl57 wrote:
               | Why? They are in a very specific business and have more
               | incentive to cover up successful attacks than most other
               | companies.
               | 
               | And while I'm 99% for Hanlon's razor here, I don't see a
               | reason to be sure it wasn't even a _completely
               | successful_ DoS attack.
        
               | hdhshdhshdjd wrote:
               | "Our employee pushed bad code by accident" is _VASTLY_
               | better for them than "we didn't secure the infra that
               | pushes updates to millions of machines".
        
               | Huggernaut wrote:
               | Look there are two options on the table so it's 50/50.
               | Ipso facto.
        
               | bunabhucan wrote:
               | I believe the flying spaghetti monster touched the file
               | with His invisible noodly appendage so now it's a three
               | way split.
        
               | hdhshdhshdjd wrote:
               | I didn't say it was 50/50, but an accurate enumeration of
               | options does include a failed attempt at a hack.
               | 
               | I fail to see why this is so difficult to understand.
        
         | Guthur wrote:
         | The glaring question is how and why it was rolled out
         | everywhere all at once?
         | 
         | Many corporations have pretty strict rules on system update
         | scheduling so as to ensure business continuity in case of
         | situations like this but all of those were completely
         | circumvented and we had fully synchronised global failure. It
         | really does not seem like business as usual situation.
        
           | chii wrote:
           | > strict rules on system update scheduling
           | 
           | which crowdstrike gets to bypass because they claime
           | themselves as an antivirus and malware detection platform -
           | at least, this is what the executives they've wined and dined
           | into the purchase contracts have been told. The update
           | schedule is independently controlled by crowdstrike, rather
           | than by a system admin i believe.
        
           | xvector wrote:
           | CrowdStrike's reasoning is that an instantaneous global
           | rollout helps them protect against rapidly spreading malware.
           | 
           | However, I doubt they need an instantaneous rollout for every
           | deployment.
        
             | slenk wrote:
             | I feel like they need to at least first rollout to
             | themselves
        
             | kijin wrote:
             | Well, millions of PCs bluescreening at the same time does
             | help stop a rapidly spreading malware.
             | 
             | Only this time, crowdstrike itself has become
             | indistinguishable from malware.
        
               | imtringued wrote:
               | Whe I first saw news about the outage I was wondering
               | what this malware "CrowdStrike" was. I mean, the name
               | kind of sounds hostile.
        
             | TeMPOraL wrote:
             | They say that, but all I hear is immune system triggering a
             | cytokine storm and killing you because it was worried you
             | may catch a cold.
        
           | inejge wrote:
           | _The glaring question is how and why it was rolled out
           | everywhere all at once?_
           | 
           | Because the point of these updates is to be rolled out
           | quickly and globally. It wasn't a system/driver update, but a
           | data file update: think antivirus signature file. (Yes, I
           | know it can get complicated, and that AV signatures can be
           | dynamic... not the point here.)
           | 
           | Why those data updates skipped validity testing at the source
           | is another question, and one that CrowdStrike better be
           | prepared to answer; but the tempo of redistribution can't be
           | changed.
        
             | Brybry wrote:
             | But is there a need for quick global releases?
             | 
             | Is it realistic that there's a threat actor that will be
             | attacking every computer on the whole planet at once?
             | 
             | I can understand that it's most practical to update
             | _everyone_ when pushing an update to protect _a few_
             | actively under attack but I can also imagine policies where
             | that isn 't how it's done, while still getting urgent
             | updates to those under attack.
        
               | padjo wrote:
               | Is there a need? Maybe, possibly, depends on
               | circumstances.
               | 
               | Is this what people are paying CS for? Absolutely.
        
               | RowanH wrote:
               | After this I imagine there will be an option "do you want
               | updates immediately, or updates when released - n, or
               | n+2, n+6, n+24, n+48 hrs?"
               | 
               | Given the choice I bet there's going to be surprisingly
               | large number of orgs go "we'll take n+24hrs thanks"
        
             | maeil wrote:
             | A customer should be able to test an update, whether a
             | signature file or literally any kind of update, before
             | rolling it out to production systems. Anything else is
             | madness. Being "vulnerable" for an extra few hours carries
             | less risk than auto-updates (of any kind) on production
             | systems. As we've seen here. If you can point to hard
             | evidence to the contrary, where many companies were saved
             | just in time because of a signature update and would have
             | been exploited if they'd waited a few hours, I'd love to
             | read about it. It would have to have happened on a rather
             | large scale for all of the instances combined to have had a
             | larger positive impact than this single instance.
        
           | hmottestad wrote:
           | From the article on The Verge it seems that this kind of
           | update is downloaded automatically even if you disable
           | automatic updates. So those users who took this kind of issue
           | seriously would have thought that everything was configured
           | correctly to not automatically update.
        
           | danielPort9 wrote:
           | > The glaring question is how and why it was rolled out
           | everywhere all at once?
           | 
           | Because it worked good for them so far? There are plenty of
           | companies that do the same and we don't hear about them until
           | something goes wrong.
        
         | YZF wrote:
         | It seems like a none of the above situation because each of
         | those should have really minimized the chances of something
         | like this happening. But this is pure speculation. Even the
         | most perfect organization engineering culture can still have
         | one thing get through... (Wasn't there some Linux incident a
         | little back though?)
         | 
         | Quality starts with good design, good people, etc. the process
         | parts come much after that. I'd like to think that if you do
         | this "right" then this sort of stuff simply can't happen.
         | 
         | If we have organization/culture/engineering/process issues then
         | we're likely not going to get an in-depth public most-mortem.
         | I'd love to get one just for all of us to learn from it. Let's
         | see. Given the cost/impact having something like the Challenger
         | investigation with some smart uninvolved people would be good.
        
         | 7952 wrote:
         | In a world of complex systems a "confluence of improbable
         | events" is the same thing as "a disaster waiting to happen".
         | Its the swiss cheese model of failure. Y
        
           | k8sToGo wrote:
           | Every system can only survive so many improbable events. Even
           | in aviation.
        
       | mianos wrote:
       | A 'channel file' is a file interpreted by their signature
       | detection system. How far is this from a bytecode compiled domain
       | specific language? Javascript anyone?
       | 
       | eBPF, much the same thing, is actually thought about and well
       | designed. If it wasn't it would be easy to crash linux.
       | 
       | This is what they do and they are doing badly. I bet it's just
       | shit on shit under the hood, developed by somewhat competent
       | engineers, all gone or promoted to management.
        
         | broknbottle wrote:
         | Oddly enough, there was an issue last month with CrowdStrike
         | and RHEL 9 kernel where they were triggering a kernel panic
         | when attempting to load a bpf program from their newer bpf
         | sensor. One of the workarounds was to switch to their kernel
         | driver mode.
         | 
         | This was obviously a bug in RHEL kernel because even if the bpf
         | program was bunk it should not cause the kernel to panic.
         | However, it's almost like CrowdStrike does zero testing of
         | their software and looks at their end users as Test/QA.
         | 
         | https://access.redhat.com/solutions/7068083
         | 
         | > 4bb7ea946a37 bpf: fix precision backtracking instruction
         | iteration
        
           | CaliforniaKarl wrote:
           | The kernel update in question was released as part of a RHEL
           | point release (9.3 or 9.4, I forget which).
           | 
           | I'm not sure how much early warning RH gives to folks when a
           | kernel change comes in via a point release. Looking at
           | https://www.redhat.com/en/blog/upcoming-improvements-red-
           | hat..., it seems like it's changing for 9.5. I hope
           | CrowdStrike will be able to start testing against those beta
           | kernels.
        
       | Taniwha wrote:
       | Really the underlying problem here is that their software is
       | loading external data into their kernel driver and not correctly
       | sanitising their inputs
        
         | xvector wrote:
         | I find it absolutely insane they wouldn't be doing this. At the
         | level their software operates, it's sheer negligence to not
         | sanitize inputs.
        
           | blackeyeblitzar wrote:
           | I wonder if it's for performance reasons.
        
             | prisenco wrote:
             | Maybe, maybe, but if it's not in a hot loop, why would the
             | performance gain be worth it?
        
             | silisili wrote:
             | I'm not overly familiar with crowdstrike processes, but
             | assume they are long running. If it's all loaded to memory,
             | eg a config, I can't see how you'd get any performance gain
             | at all. It just seems lazy.
        
             | 0xDEADFED5 wrote:
             | wild speculation aside, i'd say a little less performance
             | is preferable to this outcome.
        
             | dboreham wrote:
             | It's for incompetence reasons.
        
         | Taniwha wrote:
         | The other issue is that they push to everyone - as someone who
         | at my last job had a million boxes in the wild, and was very
         | aware that bricking them all would kill the company we would
         | NEVER push them all at once, we'd push a few 'friends and
         | family' (ie practice each release on ourselves first), then do
         | a few % of the customer base and wait for problems, then maybe
         | 10%, wait again, then the rest.
         | 
         | Of course we didn't have had any third party loading code into
         | our boxes out of our control (and we run linux)
        
           | szundi wrote:
           | Same here. Also before the first phase, we test wether we can
           | remote downgrade after upgrade.
        
       | anothername12 wrote:
       | I found windows confusing. In Linux speak, was this some kind of
       | kernel module thing that CS installed? It's all I can think of
       | for why the machines BSOD
        
         | G3rn0ti wrote:
         | It was a binary data file (supposedly invalid) that caused the
         | actual CS driver component to BSOD. However, they used the
         | ,,sys" suffix to make it look just like a driver supposedly to
         | get Windows protection from a malicious actor to just delete
         | it. AFAIU.
        
           | stevekemp wrote:
           | Windows filesystem protection doesn't rely upon the filename,
           | but on the location.
           | 
           | They could have named their files "foo.cfg", "foo.dat",
           | "foo.bla" and been equally protected.
           | 
           | The use of ".sys" here is probably related to the fact it is
           | used by their system driver. I don't think anybody was trying
           | to pretend the files there are system drivers themselves, and
           | a quick look at the exports/disassembly would make that
           | apparent anyway.
        
       | G3rn0ti wrote:
       | By-passing the discussion whether one actually needs root kit
       | powered endpoint surveillance software such as CS perhaps an
       | open-source solution would be a killer to move this whole sector
       | to more ethical standards. So the main tool would be open source
       | and it would be transparent what it does exactly and that it is
       | free of backdoors or really bad bugs. It could be audited by the
       | public. On the other hand it could still be a business model to
       | supply malware signatures as a security team feeding this system.
        
         | imiric wrote:
         | I'd say no. Kolide is one such attempt, and their practices,
         | and how it's used in companies, are as insidious as those from
         | a proprietary product. As a user, it gives me no assurance that
         | an open source surveillance rootkit is better tested and
         | developed, or that it has my best interests in mind.
         | 
         | The problem is the entire category of surveillance software. It
         | should not exist. Companies that use it don't understand
         | security, and don't trust their employees. They're not good
         | places to work at.
        
           | pxc wrote:
           | I'm curious about this bad 'news' about Kolide. Could you
           | tell me more about your experience with it?
        
             | imiric wrote:
             | I don't have first-hand experience with Kolide, as I
             | refused to install it when it was pushed upon everyone in a
             | company I worked for.
             | 
             | Complaints voiced by others included false positives
             | (flagging something as a threat when it wasn't, or alerting
             | that a system wasn't in place when it was), being too
             | intrusive and affecting their workflow, and privacy
             | concerns (reading and reporting all files, web browsing
             | history, etc.). There were others I'm not remembering, as I
             | mostly tried to stay away from the discussion, but it was
             | generally disliked by the (mostly technical) workforce.
             | Everyone just accepted it as the company deemed it
             | necessary to secure some enterprise customers.
             | 
             | Also, Kolide's whole spiel about "honest security"[1] reeks
             | of PR mumbo jumbo whose only purpose is to distance
             | themselves from other "bad" solutions in the same space,
             | when in reality they're not much different. It's built by
             | Facebook alumni, after all, and relies on FB software
             | (osquery).
             | 
             | [1]: https://honest.security/
        
               | DrRobinson wrote:
               | I think some of the information here is misleading and a
               | bit unfair.
               | 
               | > being too intrusive and affecting their workflow
               | 
               | Kolide is a reporting tool, it doesn't for example remove
               | files or put them in quarantine. You also cannot execute
               | commands remotely like in Crowdstrike. As you mentioned,
               | it's based on osquery which makes it possible to query
               | machine information using SQL. Usually, Kolide is
               | configured to send a Slack message or email if there is a
               | finding, which I guess can be seen as intrusive but IMO
               | not very.
               | 
               | > reading and reporting all files
               | 
               | It does not read and report all files as far as I know,
               | but I think it's possible to make SQL queries to read
               | specific files. But all files or file names aren't stored
               | in Kolide or anything like that. And that live query
               | feature is audited (ens users can see all queries run
               | against their machines) and can be disabled by
               | administrators.
               | 
               | > web browsing history
               | 
               | This is not directly possible as far as I know, but maybe
               | via a file read query but it's not something built-in out
               | of the box/default. And again, custom queries are
               | transparent to users and can be disabled.
               | 
               | > Kolide's whole spiel about "honest security"[1] reeks
               | of PR mumbo jumbo whose only purpose is to distance
               | themselves from other "bad" solutions in the same space
               | 
               | While it's definitely a PR thing, they might still
               | believe in it and practice what they preach. To me it
               | sounds like a good thing to differentiate oneself from
               | bad actors.
               | 
               | Kolide gives users full transparency of what data is
               | collected via their Privacy Center, and they allow end
               | users to make decisions about what to do about findings
               | (if anything) rather than enforcing them.
               | 
               | > It's built by Facebook alumni, after all, and relies on
               | FB software (osquery).
               | 
               | For example React and Semgrep is also built by
               | Facebook/Facebook alumni, but I don't really see the
               | relevance other than some ad-hominem.
               | 
               | Full disclosure: No association with Kolide, just a happy
               | user.
        
               | madeofpalk wrote:
               | Great news - Kolide has a new integration with Okta
               | that'll prevent you from logging into anything if Kolide
               | has a problem with your device!
        
               | imiric wrote:
               | I concede that I may be unreasonably biased against
               | Kolide because of the type of software it is, but I think
               | you're minimizing some of these issues. My memory may be
               | vague on the specifics, but there were certainly many
               | complaints in the areas I mentioned in the company I
               | worked at.
               | 
               | That said, since Kolide/osquery is a very flexible
               | product, the complaints might not have been directed at
               | the product itself, but at how it was configured by the
               | security department as well. There are definitely some
               | growing pains until the company finds the right balance
               | of features that everyone finds acceptable.
               | 
               | Re: intrusiveness, it doesn't matter that Kolide is a
               | report-only tool. Although, it's also possible to install
               | extensions[1,2] that give it a deeper control over the
               | system.
               | 
               | The problem is that the policies it enforces can
               | negatively affect people's workflow. For example, forcing
               | screen locking after a short period of inactivity has
               | dubious security benefits if I'm working from a trusted
               | environment like my home, yet it's highly disruptive.
               | (No, the solution is not to track my location, or give me
               | a setting I have to manage...) Forcing automatic system
               | updates is also disruptive, since I want to update and
               | reboot at my own schedule. Things like this add up, and
               | the combination of all of them is equivalent to working
               | in a babyproofed environment where I'm constantly
               | monitored and nagged about issues that don't take any
               | nuance into account, and at the end of the day do not
               | improve security in the slightest.
               | 
               | Re: web browsing history, I do remember one engineer
               | looking into this and noticing that Kolide read their
               | browser's profile files, and coming up with a way to read
               | the contents of the history data in SQLite files. But I
               | am very vague on the details, so I won't claim that this
               | is something that Kolide enables by default. osquery
               | developers are clearly against this kind of use case[3].
               | It is concerning that the product can, in theory, be
               | exploited to do this. It's also technically possible to
               | pull any file from endpoints[4], so even if this is not
               | directly possible, it could easily be done outside of
               | Kolide/osquery itself.
               | 
               | > Kolide gives users full transparency of what data is
               | collected via their Privacy Center
               | 
               | Honestly, why should I trust what that says? Facebook and
               | Google also have privacy policies, yet have been caught
               | violating their users' privacy numerous times. Trust is
               | earned, not assumed based on "trust me, bro" statements.
               | 
               | > For example React and Semgrep is also built by
               | Facebook/Facebook alumni, but I don't really see the
               | relevance other than some ad-hominem.
               | 
               | Facebook has historically abused their users' privacy,
               | and even has a Wikipedia article about it.[5] In the
               | context of an EDR system, ensuring trust from users and
               | handling their data with the utmost care w.r.t. their
               | privacy are two of the most paramount features. Actually,
               | it's a bit silly that Kolide/osquery is so vocal in favor
               | of preserving user privacy, when this goes against
               | working with employer-owned devices where employee
               | privacy is definitely not expected. In any case, the fact
               | this product is made by people who worked at a company
               | built by exploiting its users is very relevant
               | considering the type of software it is. React and Semgrep
               | have an entirely different purpose.
               | 
               | [1]: https://github.com/trailofbits/osquery-extensions
               | 
               | [2]: https://github.com/hippwn/osquery-exec
               | 
               | [3]: https://github.com/osquery/osquery/issues/7177
               | 
               | [4]:
               | https://osquery.readthedocs.io/en/stable/deployment/file-
               | car...
               | 
               | [5]: https://en.wikipedia.org/wiki/Privacy_concerns_with_
               | Facebook
        
           | chii wrote:
           | whether you morally agree with surveillance software's
           | purpose is not the same as whether a particular piece of
           | surveillence software works well or not.
           | 
           | I would imagine an open source version of crowdstrike would
           | not have had such a bad outcome.
        
             | imiric wrote:
             | I disagree with the concept of surveillance altogether.
             | Computer users should be educated about security, given
             | control of their devices, and trusted that they will do the
             | right thing. If a company can't do that, that's a sign that
             | they don't have good security practices to begin with, and
             | don't do a good job at hiring and training.
             | 
             | The only reason this kind of software is used is so that
             | companies can tick a certification checkbox that gives the
             | appearance of running a tight ship.
             | 
             | I realize it's the easy way out, and possibly the only
             | practical solution for a large corporation, but then this
             | type of issues is unavoidable. Whether the product is free
             | or proprietary makes no difference.
        
               | sooper wrote:
               | Most people do not understand, or care to understand,
               | what "security" means.
               | 
               | You highlight training as a control. Training is
               | expensive - to reduce cost and enhanced effectiveness,
               | how do you focus training on those that need it without
               | any method to identify those that do things in insecure
               | ways?
               | 
               | Additionally, I would say a major function of these
               | systems is not surveillance at all - it is preventive
               | controls to prevent compromise of your systems.
               | 
               | Overall, your comment strikes me a naive and not based on
               | operational experience.
        
               | TeMPOraL wrote:
               | This type of software is notorious for severely degrading
               | employees' ability to do their jobs, occasionally
               | preventing it entirely. It's a main reason why "shadow
               | IT" is a thing - bullshit IT restrictions and endpoint
               | security malware can't reach third-party SaaS' servers.
               | 
               | This is to say, there are costs and threats caused by
               | deploying these systems too, and they should be
               | considered when making security decisions.
        
               | jpc0 wrote:
               | Explain exactly how any AV prevents a user from checking
               | e-mails and opening word?
               | 
               | The years I spent doing IT at that level, every time,
               | every single time I got a request for admin privileges to
               | be granted to a user or for software to be installed on
               | an endpoint we already had a solution in place for
               | exactly what the user wanted, installed and tested on
               | their workstation that was taught in onboarding and they
               | simply "forgot".
               | 
               | Just like the users I had to reset their passwords for
               | every monday because they forgot their passwords. It's an
               | irritation but that doesn't mean they didn't do their job
               | well. They met all performance expectations, they just
               | needed to be handheld with technology .
               | 
               | The real world isn't black and white and this isn't
               | Reddit.
        
               | TeMPOraL wrote:
               | > _Explain exactly how any AV prevents a user from
               | checking e-mails and opening word?_
               | 
               | For example by doing continuous scans that consume so
               | much CPU the machine stays thermally throttled at all
               | times.
               | 
               | (Yes, really. I've seen a colleague raising a ticket
               | about AV making it near-impossible to do dev work, to
               | which IT replied the company will reimburse them for a
               | cooling pad for the laptop, and closed the issue as
               | solved.)
               | 
               | The problem is so bad that Microsoft, despite Defender
               | being by far the lightest and least bullshit AV solution,
               | created "dev drive", a designated drive that's excluded
               | by design from Defender scanning, as a blatant workaround
               | for corporate policies preventing users and admins from
               | setting custom Defender exclusions. Before that, your
               | only alternative was to run WSL2 or a regular VM, which
               | are opaque to AVs, but that tends to be restricted by
               | corporate too, because "sekhurity".
               | 
               | And yes, people in these situations invent workarounds,
               | such as VMs, unauthorized third-party SaaS, or using
               | personal devices, because at the end of the day, the work
               | still needs to be done. So all those security measures do
               | is _reduce_ actual security.
        
               | kchr wrote:
               | Most AV and EDR solutions support exceptions, either on
               | specific assets or fleets of assets. You can make
               | exceptions for some employees (for example developers or
               | IT) while keeping (sane) defaults for everybody else.
               | Exceptions are usually applied on file paths, executable
               | image names, file hashes, signature certificates or the
               | complete asset. It sounds like people are applying these
               | solutions wrong, which of course has a negative outcome
               | for everybody and builds distrust.
        
               | TeMPOraL wrote:
               | In theory, those solutions could be used right. In
               | practice, they never are.
               | 
               | People making decisions about purchasing, deploying and
               | configuring those systems are separated by many layers
               | from rank-and-file employees. The impact on business
               | downstream is diffuse and doesn't affect them directly,
               | while the direct incentives they have are not aligned
               | with the overall business operations. The top doesn't
               | _feel_ the damage this is doing, and the bottom has no
               | way of communicating it in a way that will be heard.
               | 
               | It does build distrust, but not necessarily in the sense
               | that "company thinks I'm a potential criminal" - rather,
               | just the mundane expectation that work will continue to
               | get more difficult to perform with every new announcement
               | from the security team.
        
               | jpc0 wrote:
               | I'm going to just echo my sibling comment here. This
               | seems like a management issue. If IT wouldn't help it was
               | up to your management to intervene and say that it needs
               | to be addressed.
               | 
               | Also I'm unsure I've ever seen an AV even come close to
               | stressing a machine I would spec for dev work. Likely
               | misconfigured for the use case but I've been there and
               | definitely understand the other side of the coin,
               | sometimes a beer or pizza with someone high up at IT gets
               | you much further than barking. We all live in a society
               | with other people.
               | 
               | I would also hazard a guess that the defender drive is
               | more a matter of just making it easier for IT to do the
               | right thing, requested by IT departments more than
               | likely. I personally have my entire dev tree excluded
               | from AV purely because of false positives on binaries and
               | just unnecessary scans because the fines change content
               | so regularly. That can be annoying to do with group
               | policy if where that data is stored isn't mandated and
               | then you have engineers who would be babies about "I
               | really want my data in %USERPROFILE%/documents instead oF
               | %USERPROFILE%/source" now IT can much easier just say
               | that the Microsoft blessed solution is X and you need to
               | use it.
               | 
               | Regarding WSL, if it's needed for you job then go for it
               | and have you manager out in a request. However if you are
               | only doing it to circumvent IT restrictions, well don't
               | expect anyone to play nice.
               | 
               | On the person devices note. If there's company data on
               | your device it and all it's content can be subpoenad in a
               | court case. You really want that? Keep work and personal
               | seperate, it really is better for all parties involved.
        
               | TeMPOraL wrote:
               | > _sometimes a beer or pizza with someone high up at IT
               | gets you much further than barking. We all live in a
               | society with other people._
               | 
               | That's true, but it gets tricky in a large multinational,
               | when the rules are set by some team in a different
               | country, whose responsibilities are to the corporate HQ,
               | and the IT department of the merged-in company I worked
               | for has zero authority on the issue. I tried, I've also
               | sent tickets up the chain, they all got politely ignored.
               | 
               | From the POV of all the regular employees, it looks like
               | this: there are some annoying restrictions here and
               | there, and you learn how to navigate the CPU-eating AV
               | scans; you adapt and learn how to do your work. Then one
               | day, some sneaky group policy update kills one of your
               | workarounds and you notice this by observing that
               | compilation takes 5x as long as it used to, and git
               | operations take 20x as long as they should. You find a
               | way to deal (goodbye small commits). Then one day, you
               | get an e-mail from corporate IT saying that they just
               | partnered with ESET or CrowdStrike or ZScaler or not, and
               | they'll be deploying the new software to everyone. Then
               | they do, and everything goes to shit, and you need to
               | start to triple every estimate from now on, as the new
               | software noticeably slows down everything across the
               | board. You think to yourself, at least corporate gave you
               | top-of-the-line laptops with powerful CPUs and absurd
               | amount of RAM; too bad for sales and managers who are
               | likely using much weaker machines. And then you realize
               | that sales and management were doing half their work in
               | random third-party SaaS, and there is an ongoing process
               | to reluctantly in-house some of the shadow IT that's been
               | going on.
               | 
               | Fortunately for me, in my various corporate jobs, I've
               | always managed to cope by using Ubuntu VMs or (later)
               | WSL2, and that this always managed to stay "in the clear"
               | with company security rules. Even if it meant I had to
               | figure out some nasty hacks to operate Windows compilers
               | from inside Linux, or to stop the newest and bestest
               | corporate VPN from blackholing all network traffic
               | to/from WSL2 (was worth it, at least my work wasn't
               | disrupted by the Docker Desktop licensing fiasco...). I
               | never had to use personal devices, and I learned long ago
               | to keep firm separation between private and work
               | hardware, but for many people, this is a fuzzy boundary.
               | 
               | There was one job where corporate installed a blatant
               | keylogger on everyones' machines, and for a while, with
               | our office IT's and our manager's blessing, our team
               | managed to stave it off - and keep local admin rights -
               | by conveniently forgetting to sign relevant consent
               | forms. The bad taste this left was a major factor in me
               | quitting that job few months later, though.
               | 
               | Anyway, the point to these stories is, I've experienced
               | first-hand how security in medium and large enterprises
               | impacts day-to-day work. I fought both alongside and
               | against IT departments over these. I know that most of
               | the time, from the corporate HQ's perspective, it's
               | difficult to quantify the impact of various security
               | practices on everyone's day-to-day work (and I briefly
               | worked _in_ cybersecurity, so I also know this isn 't
               | even obvious to people this should be considered!). I
               | also know that large organizations can eat _a lot_ of
               | inefficiency without noticing it, because at that size,
               | they have huge inertia. The corporate may not notice the
               | work slowing down 2x across the board, when it 's still
               | completing million-dollar contracts on time (negotiated
               | accordingly). It just really sucks to work in this
               | environment; the inefficiency has a way of touching your
               | soul.
               | 
               | EDIT:
               | 
               | The worst is the learned helplessness. One day, you get
               | fed up with Git taking 2+ minutes to make a goddamn
               | commit, and you whine a bit on the team channel. You hope
               | someone will point out you're just stupid and holding it
               | wrong, but no - you get couple people saying "yeah,
               | that's how it is", and one saying "yeah, I tried to get
               | IT to fix that; they told me a cooling stand for the
               | laptop should speed things a bit". You eventually learn
               | that security people just don't care, or can't care, and
               | you can only try to survive it.
               | 
               | (And then you go through several mandatory cybersecurity
               | trainings, and then you discover a dumb SQL injection bug
               | in a new flagship project after 2 hours of playing with
               | it, and start questioning your own sanity.)
        
               | chrisjj wrote:
               | > Computer users should be educated about security, given
               | control of their devices, and trusted that they will do
               | the right thing.
               | 
               | Imagine you are a bank. Imagine you have no way to ensure
               | no employee is a crook.
               | 
               | It does happen.
        
               | matwood wrote:
               | > Imagine you have no way to ensure no employee is a
               | crook.
               | 
               | Wait, are you saying we have gotten rid of all the crooks
               | in a bank/or those that handle money?
        
           | WA wrote:
           | > Companies that use it don't understand security
           | 
           | What should these companies understand about security
           | exactly?
           | 
           | And aren't they kinda right to not trust their employees if
           | they employ 50,000 people with different skills and
           | intentions?
        
             | Voultapher wrote:
             | Security is a process not a product. Anyone selling you
             | security as a product is scamming you.
             | 
             | These endpoint security companies latch onto people making
             | decisions, those people want security and these software
             | vendors promise to make the process as easy as possible. No
             | need to change the way a company operates, just buy our
             | stuff and you're good. That's the scam.
        
               | imiric wrote:
               | Exactly, well said.
               | 
               | Truthfully, it must be practically infeasible to
               | transform security practices of a large company
               | overnight. Most of the time they buy into these products
               | because they're chasing a security certification (ISO
               | 27001, SOC2, etc.), and by just deploying this to their
               | entire fleet they get to sidestep the actually difficult
               | part.
               | 
               | The irony is that at the end of this they're not anymore
               | "secure" than they were before, but since they have the
               | certification, their customers trust that they are. It's
               | security theater 101.
        
             | InsideOutSanta wrote:
             | "And aren't they kinda right to not trust their employees
             | if they employ 50,000 people with different skills and
             | intentions?"
             | 
             | Yes, in a 50k employee company, the CEO won't know every
             | single employee and be able to vouch for their skills and
             | intentions.
             | 
             | But in a non-dysfunctional company, you have a hierarchy of
             | trust, where each management level knows and trusts the
             | people above and below them. You also have siloed data,
             | where people have access to the specific things they need
             | to do their jobs. And you have disaster mitigation
             | mechanisms for when things go wrong.
             | 
             | Having worked in companies of different sizes and with
             | different trust cultures, I do think that problems start to
             | arise when you add things like individual monitoring and
             | control. You're basically telling people that you don't
             | trust them, which makes them see their employer in an
             | adversarial role, which actually makes them start to behave
             | less trustworthy, which further diminishes trust across the
             | company, harms collaboration, and eventually harms
             | productivity and security.
        
               | snotrockets wrote:
               | That's a lie we tell children so they think the world is
               | fair.
               | 
               | A Marxist reading would suggest alienation, but a more
               | modern one would realize that it is a bit more than that:
               | to enable modern business practices (both good and bad!)
               | we designed systems of management to remove or reduce
               | trust and accountability in the org, yet maintain as
               | similar results to a world that is more in line with the
               | one you believe is possible.
               | 
               | A security professional though would tell you that even
               | in such a world, you can not expect even the most
               | diligent folks to be able to identify all risks (e.g.
               | phishing became so good, even professionals can't always
               | discern the real from fake), or practice perfect opsec
               | (which probably requires one to be a psychopath).
        
               | protomolecule wrote:
               | "But in a non-dysfunctional company, you have a hierarchy
               | of trust, where each management level knows and trusts
               | the people above and below them. "
               | 
               | Even in a company of two sometimes a husband or a wife
               | betrays the trust. Now multiply that probability by
               | 50000.
        
               | TeMPOraL wrote:
               | Yet we don't apply total surveillance to people. The
               | reason isn't just ethics and US constitution, but also
               | that it's just not possible without destroying society.
               | Same perhaps applies to computer systems.
        
               | protomolecule wrote:
               | Which is a completely different argument
        
               | TeMPOraL wrote:
               | I think it doesn't. I think that the kind of security the
               | likes of CrowdStrike promise is fundamentally impossible
               | to have, and pursuing it is a fool's errand.
        
               | kemotep wrote:
               | Setting aside the possibility of deploying an EDR like
               | Crowdstrike just being a box ticking exercise for
               | compliance or insurance purposes, can something like an
               | EDR be used not because of a lack of trust but a desire
               | to protect the environment?
               | 
               | A user doesn't have to do anything wrong for the computer
               | to become compromised, or even if they do, being able to
               | limit the blast radius and lock down the computer or at
               | least after the fact have collected the data to be able
               | to identify what went wrong seems important.
               | 
               | How would you secure a network of computers without an
               | agent that can do anti-virus, detect anomalies, and
               | remediate them? That is to say, how would you manage to
               | secure it without doing something that has monitoring and
               | lockdown capabilities? In your words, signaling that you
               | do not trust the users?
        
               | kchr wrote:
               | This. From all the comments I've seen in the multiple
               | posts and threads about the incident, this simple fact
               | seems to be the least discussed. How else to protect a
               | complex IT environment with thousands of assets in form
               | of servers and workstations, without some kind of
               | endpoint protection? Sure, these solutions like
               | CrowdStrike et al are box-checking and risk transferring
               | exercises in one sense, but they actually work as
               | intended when it comes to protecting endpoints from novel
               | malware and TTP:s. As long as they don't botch their own
               | software, that is :D
        
               | imiric wrote:
               | > How else to protect a complex IT environment with
               | thousands of assets in form of servers and workstations,
               | without some kind of endpoint protection?
               | 
               | There is no straightforward answer to this question.
               | Assuming that your infrastructure is "secure" because you
               | deployed an EDR solution is wrong. It only gives you a
               | false sense of security.
               | 
               | The reality is that security takes a lot of effort from
               | everyone involved, and it starts by educating people.
               | There is no quick bandaid solution to these problems,
               | and, as with anything in IT, any approach has tradeoffs.
               | In this case, and particularly after the recent events,
               | it's evident that an EDR system is as much of a liability
               | as it is an asset--perhaps even more so. You give away
               | control of your systems to a 3rd party, and expect them
               | to work flawlessly 100% of the time. The alarming thing
               | is how much this particular vendor was trusted with
               | critical parts of our civil infrastructure. It not only
               | exposes us to operational failures due to negligence, but
               | to attacks from actors who will seek to exploit that 3rd
               | party.
        
               | matwood wrote:
               | > starts by educating people
               | 
               | Any security certification has a section on regularly
               | educating employees on the topic.
               | 
               | To your point, I agree that companies are attempting to
               | bypass the hard work by deploying a tool and thinking
               | they are done.
        
               | kchr wrote:
               | Absolutely, training is key. Alas, managers don't seem to
               | want their employees spending time on anything other than
               | delivering profit and so the training courses are zipped
               | through just to mark them as completed.
               | 
               | Personally, I don't know how to solve that problem.
        
               | kchr wrote:
               | I totally agree. In my current work environment, we do
               | deploy EDR but it is primarily for assets critical for
               | delivering our main service to customers. Ironically,
               | this incident caused them all to be unavailable and there
               | is for sure a lesson to be learned here!
               | 
               | It is not considered a silver bullet by the security
               | team, rather a last-resort detection mechanism for
               | suspicious behavior (for example if the network
               | segmentation or access control fails, or someone managed
               | to get foothold by other means). It also helps them
               | identify which employees need more training as they keep
               | downloading random executables from the web.
        
               | morning-coffee wrote:
               | It is a good question. Is there a possibility of
               | fundamentally fixing software/hardware to eliminate the
               | vectors that malware exploits to gain a foot hold at all?
               | e.g. not storing return address on the stack or letting
               | it be manipulated by callee? memory bounds enforcement,
               | either statically at compile time, or with the help of
               | hardware, to prevent writing past memory not yours? (Not
               | asking about feasibility of coexisting with or migrating
               | from the current world, just about the possibility of
               | fundamentally solving this at all...)
        
               | com wrote:
               | Economic drivers spring to mind, possibly connected with
               | civil or criminal liability in some cases.
               | 
               | But this will be the work of at least two human
               | generations; our tools and work practices are woefully
               | inadequate, so even if the pointy haired bosses (fearing
               | imprisonment for gratuitous failure) and grasping, greedy
               | investors fear (for the destruction of "hard earned"
               | capital), it's not going to be done in the snap of our
               | fingers, not least because the people occupying
               | technology industry - and this is an overgeneralisation,
               | but I'm pretty angry so I'm going to let it stand - Just
               | Don't Care Enough.
               | 
               | If we cared, it would be nigh on impossible for my granny
               | to get tricked to pop her Windows desktop by opening an
               | attachment in her email client.
               | 
               | It wouldn't be possible to sell (or buy!) cloud services
               | for which we don't get security data in real time and
               | signal about what our vendor advises to do if worst comes
               | to worst.
               | 
               | And on and on.
        
               | mylastattempt wrote:
               | I disagree. You seem to start from a premise that all
               | people are honest, except those that aren't, but you
               | don't work with or meet dishonest people, unless the
               | employer sets himself up in an adversarial role?
               | 
               | As the other reply to your comment said: the world is not
               | 'fair' or 'honest', that's just a lie told to children.
               | Apart from geuinely evil people, there are unlimited
               | variables that dictate people's behavior. Culture,
               | personality, nutrition, financial situation, mood,
               | stress, bully coworkers, intrinsic values, etc etc. To
               | think people are all fair and honest "unless" is a really
               | harmful worldview to have and in my opinion the reason
               | for a lot of bad things being allowed to happen and
               | continue (troughout all society, not just work).
               | 
               | Zero-trust in IT is just the digitized version of "trust
               | is earned". In computers you can be more crude and direct
               | about it, but it should be the same for social
               | connections and interactions.
        
               | matwood wrote:
               | > You seem to start from a premise that all people are
               | honest
               | 
               | You have to start with that premise otherwise
               | organizations and society fail. Every hour of every day,
               | even people in high security organizations have
               | opportunities to betray the trust bestowed on them.
               | Software and processes are about keeping honest people
               | honest. The dishonest ones you cannot do too much about
               | but hope you limit the damage they can cause.
               | 
               | If everyone is treated as dishonest then there will
               | eventually be an organizational breakdown. Creativity,
               | high productivity, etc... do not work in a low/zero trust
               | environment.
        
           | echoangle wrote:
           | If your company is large enough, you can't really trust your
           | employees. Do you really think google can trust their
           | employees that not a single user does something stupid or
           | even is actively malicious?
        
             | iforgotpassword wrote:
             | Limit their abilities using OS features? Have the vendor
             | fix security issues rather than a third party incompetently
             | slapping on band-aid?
             | 
             | It's like you let one company build your office building
             | and then bring in another contractor to randomly add walls
             | and have others removed while having never looked at the
             | blueprints and then one day "whoopsie, that was a
             | supporting wall I guess".
             | 
             | Why is it not just completely normal but even expected that
             | an OS vendor can't build an OS properly, or that the admins
             | can't properly configure it, but instead you need to
             | install a bunch of crap that fucks around with OS internals
             | in batshit crazy ways? I guess because it has a nice
             | dashboard somewhere that says "you're protected". Checkbox
             | software.
        
               | lyu07282 wrote:
               | The sensor basically monitors everything that's happening
               | on the system and then uses heuristics and known attack
               | vectors and behavior to for example then lock compromised
               | systems down. For example a fileless malware that
               | connects to a c&c and then begins to upload all local
               | documents and stored passwords, then slowly enumerates
               | every service the employee has access to for
               | vulnerabilities.
               | 
               | If you manage a fleet of tens of thousands of systems and
               | you need to protect against well funded organized crime?
               | Employees running malicious code under their user is a
               | given and can't be prevented. Buying crowdstrike sensor
               | doesn't seem like such a bad idea to me. What would you
               | do instead?
        
               | iforgotpassword wrote:
               | > What would you do instead?
               | 
               | As said, limit the user's abilities as much as possible
               | with features of the OS and software in use. Maybe if you
               | want those other metrics, use a firewall, but not a Tls-
               | breaking virus scanning abomination that has all the same
               | problems, but a simple one that can warn you on unusual
               | traffic patterns. If soneone from accounting starts
               | uploading a lot of data, connects to Google cloud when
               | you don't use any of their products, that should be odd.
               | 
               | If we're talking about organized crime, I'm not convinced
               | crowdstrike in particular doesn't actually enlarge the
               | attack surface. So we had what now as the cause, a
               | malformed binary ruleset that the parser, running with
               | kernel privileges, choked on and crashed the system.
               | Because of course the parsing needs to happen in kernel
               | space and not a sandboxed process. That's enough for me
               | to make assumptions about the quality of the rest of the
               | software, and answer the question regarding attack
               | surface.
               | 
               | Before this incident nobody ever really looked at this
               | product at all from a security standpoint, maybe because
               | it is (supposed to be) a security product and thus cannot
               | have any flaws. But it seems now security researchers all
               | over the planet start looking at this thing and are
               | having a field day.
               | 
               | Bill gates sent that infamous email in the early 2000s, I
               | think after sasser hit the world, that security should be
               | made the no1 priority for Windows. As much as I dislike
               | windows for various reasons, I think overall Microsoft
               | does a rather good job about this. Maybe it's time those
               | companies behind these security products start taking
               | security serious too?
        
               | lyu07282 wrote:
               | > Before this incident nobody ever really looked at this
               | product at all from a security standpoint
               | 
               | If you only knew how absurd of a statement that is. But
               | in any case, there are just too many threats network
               | IDS/IPS solutions won't help you with, any decent C2 will
               | make it trivial to circumvent them. You can't limit the
               | permissions of your employees to the point of being
               | effective against such attacks while still being able to
               | do their job.
        
               | iforgotpassword wrote:
               | > If you only knew how absurd of a statement that is.
               | 
               | You don't seem to know either since you don't elaborate
               | on this. As said, people are picking this apart on
               | Twitter and mastodon right now. Give it a week or two and
               | I bet we'll see a couple CVEs from this.
               | 
               | For the rest of your post you seem to ignore the argument
               | regarding attack surface, as well as the fact that there
               | are companies not using this kind of software and
               | apparently doing fine. But I guess we can just claim they
               | are fully infiltrated and just don't know because they
               | don't use crowdstrike. Are you working for crowdstrike by
               | any chance?
               | 
               | But sure, at the end of the day you're just gonna weigh
               | the damage this outage did to your bottom line and the
               | frequency you expect this to happen with, against a
               | potential hack - however you even come up with the
               | numbers here, maybe crowdstrike salespeople will help you
               | out - and maybe tell yourself it's still worth it.
        
               | 7952 wrote:
               | In a sense the secure platform already exists. You use
               | web apps as much as possible. You store data in cloud
               | storage. You restrict local file access and execute
               | permissions. Authenticate using passkeys.
               | 
               | The trouble is that people still need local file access,
               | and use network file shares. You have hundreds of apps
               | used by a handful of users that need to run locally. And
               | a few intranet apps that are mission critical and have
               | dubious security. That creates the necessity for wrapping
               | users in firewalls, vpns, tls interception, end point
               | security etc. And the less well it all works the more you
               | need to fill the gaps.
        
           | ironbound wrote:
           | Next you'll be saying "I dont need an immune system..."
           | 
           | Fun fact an attacker only needs to steal credentials from the
           | home directory to jump into a companies AWS account where all
           | the juicy customer data lives, so there are reasons we want
           | this control.
           | 
           | Frankly I'd like to see the smart people complaining help
           | write better solutions rather than hinder.
        
             | pavel_pt wrote:
             | If that's all it takes an attacker, you're doing AWS wrong.
        
               | snotrockets wrote:
               | Problem is that many do.
               | 
               | Doing it right requires very capable individuals and a
               | significant effort. Less than it used to take, more than
               | most companies are ready to invest.
        
               | ironbound wrote:
               | people get lazy
        
               | hello_moto wrote:
               | This is the real world, everyone is doing something
               | wrong.
               | 
               | The alternative is to replace you with AI yes?
        
         | matheusmoreira wrote:
         | There are no "ethical standards" to move to. Nobody should be
         | able to usurp control of our computers. That should simply be
         | declared illegal. Creating contractual obligations that require
         | people to cede control of their computers should also be
         | prohibited. Anything that does this is _malware_ and malware
         | does not become justified or  "ethical" when some corporation
         | does it. Open source malware is still malware.
        
           | callalex wrote:
           | What does "our computer" mean when it is not owned by you,
           | but issued to you to perform a task with by your employer?
           | Does that also apply to the operator at a switchboard in a
           | nuclear missile launch facility?
        
             | z3phyr wrote:
             | Does the switchboard in a nuclear missile launch facility
             | run Crowdstrike? I picture it as a high quality analog
             | circuit board that does 1 thing and 1 thing only. No way to
             | run anything else.
             | 
             | Globally networked personal computers were kind of cultural
             | revolution against the setting you describe. Everyone had
             | their own private compute and compute time and everyone
             | could share their own opinion. Computers became our
             | personal extensions. This is what IBM, Atari, Commodore,
             | Be, Microsoft and Apple (and later desktop Linux) sold. Now
             | given this ideology, can a company own my limbs? If not,
             | they can't own my computers.
        
             | derefr wrote:
             | > What does "our computer" mean when it is not owned by
             | you, but issued to you to perform a task with by your
             | employer?
             | 
             | Well, presuming that:
             | 
             | 1. the employee is issued a computer, that they have
             | _possession_ of even if not _ownership_ (i.e. they bring
             | the computer home with them, etc.)
             | 
             | 2. and the employee is required to perform
             | creative/intellectual labor activities on this computer --
             | implying that they do things like connecting their online
             | accounts to this computer; installing software on this
             | computer (whether themselves or by asking IT to do it);
             | doing general web-browsing on this computer; etc.
             | 
             | 3. and where the extent of their job duties, blurs the line
             | between "work" and "not work" (most salaried intellectual-
             | labor jobs are like this) such that the employee basically
             | "lives in" this computer, even when not at work...
             | 
             | 4. ...to the point that the employee could reasonably
             | conclude that it'd be silly for them to maintain a separate
             | "personal" computer -- and so would potentially _sell_ any
             | such devices (if they owned any), leaving them _dependent_
             | on this employer-issued computer for all their computing
             | needs...
             | 
             | ...then I would argue that, by the same chain of reasoning
             | as in the GP post, employers _should not be legally
             | permitted_ to "issue" employees such devices.
             | 
             | Instead, the employer should either _purchase_ such
             | equipment for the employee, giving it to them permanently
             | as a taxable benefit; or they should require that the
             | employee purchase it themselves, and recompense them for
             | doing so.
             | 
             | Cyberpunk analogy: imagine you are a brain in a vat. Should
             | your employer be able to purchase an arbitrary android body
             | for you; make you use it while at work; and stuff it full
             | of monitoring and DRM? No, that'd be awful.
             | 
             | Same analogy, but with the veil stripped off: imagine you
             | are paraplegic. Should your employer be allowed to issue
             | you an arbitrary specific _wheelchair_ , and require you to
             | use it at work, and then monitor everything you do with it
             | / limit what you can do with it because it's "theirs"? No,
             | that'd be ridiculous. And _humanity already knows that_ --
             | employers _already_ can 't do that, in any country with
             | even a shred of awareness about accessibility devices. The
             | employer -- or very much more likely, the employer's
             | insurance provider -- just buys the person the chair. And
             | then it's _the employee 's_ chair.
             | 
             | And yes, by exactly the same logic, this also means that
             | issuing an employee a _company car_ should be illegal -- at
             | least in cases where the employee lives in a non-walkable
             | area, and doesn 't already have another car (that they
             | could afford to keep + maintain + insure); and/or where
             | their commute is long enough that they'd do most non-
             | employment-related car-requiring things around work and
             | thus using their company car. Just buy them a car. (Or, if
             | you're worried they might run away with it, then _lease-to-
             | own_ them a car -- i.e. where their  "equity in the car" is
             | in the form of options that vest over time, right along-
             | side any equity they have in the company itself.)
             | 
             | > Does that also apply to the operator at a switchboard...
             | 
             | Actually, no! Because an operator of a switchboard is not a
             | "user" of the computer that powers the switchboard, in the
             | same sense that a regular person sitting at a workstation
             | is a "user" of the workstation.
             | 
             | The system in this case is a "kiosk computer", and the
             | operator is performing a prescribed domain-specific
             | function through a limited UX they're locked into by said
             | system. The operator of a nuclear power plant is akin to a
             | customer ordering food from a fast-food kiosk -- just
             | providing slightly more mission-critical inputs. (Or, for a
             | maybe better analogy: they're akin to a transit security
             | officer using one of those scanner kiosk-handhelds to check
             | people's tickets.)
             | 
             | If the "computer" the nuclear-plant operator was operating,
             | exposed a purely electromechanical UX rather than a digital
             | one -- switches and knobs and LEDs rather than screens and
             | keyboards[1] -- then nothing about the operator's workflow
             | would change. Which means that the operator isn't truly
             | _computing_ with the computer; they 're just _interacting
             | with an interface_ that _happens_ to be a computer.
             | 
             | [1] ...which, in fact, "modern" nuclear plants are. The UX
             | for a nuclear power plant control-center has not changed
             | much since the 1960s; the sort of "just make it a
             | touchscreen"-ification that has infected e.g. automotive
             | has thankfully not made its way into these more mission-
             | critical systems yet. (I believe it's all computers _under
             | the hood_ now, but those computers are GPIO-relayed up to
             | panels with lots and lots of analogue controls. Or maybe
             | those panels are USB HID devices these days; I dunno, I 'm
             | not a nuclear control-systems engineer.)
             | 
             | Anyway, in the general case, you can recognize these "the
             | operator is just interacting with an interface, not
             | computing on a computer" cases because:
             | 
             | * The machine has separate system administrators who log
             | onto it frequently -- less like a workstation, more like a
             | server.
             | 
             | * The machine is never allowed to run anything other than
             | the kiosk app (which might be some kind of custom launcher
             | providing several kiosk apps, but where these are all
             | business-domain specific apps, with none of them being
             | general-purpose "use this device as a computer" apps.)
             | 
             | * The machine is set up to use domain login rather than
             | local login, and keeps no local per-user state; or, more
             | often, the machine is configured to auto-login to an "app
             | user" account (in modern Windows, this would be a Mandatory
             | User Profile) -- and then the actual user authentication
             | mechanism is built into the kiosk app itself.
             | 
             | * _Hopefully_ , the machine is using an embedded version of
             | the OS, which has had all general-purpose software stripped
             | out of it to remove vulnerability surface.
        
               | derefr wrote:
               | Tangent -- a question you didn't ask, but I'll pretend
               | you did:
               | 
               | > If employers allowed employees to "bring their own
               | devices", and then _didn 't_ force said employees to run
               | MDM software on those devices, then how in the world
               | could the employer guarantee the integrity of any line-
               | of-business software the employee must run on the device;
               | impose controls to stop PII + customer-shared data +
               | trade secrets from being leaked outside the domain; and
               | so forth?
               | 
               | My answer to that question: it's safe to say that most
               | people in the modern day _are_ fine with the compromise
               | that your device might be 100% yours most of the time;
               | but, when necessary -- _when you decide it to be so_ --
               | 99% yours, 1% someone else 's.
               | 
               | For example, anti-cheat software in online games.
               | 
               | The anti-cheat logic in online games, is this little
               | nugget of code that runs on a little sub-computer within
               | your computer (Intel SGX or equivalent.) This sub-
               | computer acts as a "black box" -- it's something the root
               | user of the PC can't introspect or tamper with. However:
               | 
               | * Whenever you're not playing a game, the anti-cheat
               | software _isn 't loaded_. So most of the time, your
               | computer is _entirely_ yours.
               | 
               | * _You_ get to decide when to play an online game, and
               | you are explicitly aware of doing so.
               | 
               | * When you _are_ playing an online game, most of your
               | computer -- the CPU 's "application cores", and 99% of
               | the RAM -- is still 100% under your control. The anti-
               | cheat software isn't _actually_ a rootkit (despite what
               | some people say); it can 't affect any app that doesn't
               | explicitly hook into it.
               | 
               | * In a brute-force sense, you still "control" the little
               | sub-computer as well -- in that you can _force it to stop
               | running whatever it 's running_ whenever you want. SGX
               | and the like aren't like Intel's Management Engine (which
               | really _could_ be used by a state actor to plant a non-
               | removable  "ring -3" rootkit on your PC); instead, SGX is
               | more like a TPM, or an FPGA: it's something that's
               | ultimately controlled _by_ the CPU from ring 0, just with
               | a very circumscribed API that doesn 't give the CPU the
               | ability to "get in the way" of a workload once the CPU
               | has deployed that workload to it, other than by shutting
               | that workload off.
               | 
               | As much as people like Richard Stallman might freak out
               | at the above design, it really _isn 't_ the same thing as
               | your employer having root on your wheelchair. It's more
               | like how someone in a wheelchair knows that if they get
               | on a plane, then they're not allowed to wheel their own
               | wheelchair around on the plane, and a flight attendant
               | will instead be doing that for them.
               | 
               | How does that translate to employer MDM software?
               | 
               | Well, there's no clear translation currently, because
               | we're currently in a paradigm that favors employer-issued
               | devices.
               | 
               | But here's what we _could_ do:
               | 
               | * Modern PCs are powerful enough that anything a
               | corporation wants you to do, can be done in a
               | corporation-issued VM that runs on the computer.
               | 
               | * The employer could then require the installation of an
               | integrity-verification extension (essentially "anti-cheat
               | for VMs") that ensures that the VM itself, and the
               | hypervisor software that runs it, and the host kernel the
               | hypervisor is running on top of, all haven't been
               | tampered with. (If any of them were, then the extension
               | wouldn't be able to sign a remote-attestation packet, and
               | the employer's server in turn wouldn't return a
               | decryption key for the VM, so the VM wouldn't start.)
               | 
               | * The employer could feel free to MDM the _VM guest
               | kernel_ -- but they likely wouldn 't _need_ to, as they
               | could instead just lock it down in much-more-severe ways
               | (the sorts of approaches you use to lock down a server!
               | or a kiosk computer!) that would make a general-purpose
               | PC next-to-useless, but which would be fine in the
               | context of a VM running only line-of-business software.
               | (Remember, all your general-purpose  "personal computer"
               | software would be running _outside_ the VM. Web browsing?
               | Outside the VM. The VM is just for interacting with
               | Intranet apps, reading secure email, etc.)
               | 
               | (Why yes, I _am_ describing
               | https://en.wikipedia.org/wiki/Multilevel_security.)
        
               | matheusmoreira wrote:
               | > For example, anti-cheat software in online games
               | 
               | > The anti-cheat software isn't actually a rootkit
               | (despite what some people say); it can't affect any app
               | that doesn't explicitly hook into it.
               | 
               | Out of all examples you could have cited, you chose this
               | one.
               | 
               | https://www.theregister.com/2016/09/23/capcom_street_figh
               | ter...
               | 
               | https://twitter.com/TheWack0lian/status/77939784076224512
               | 4
               | 
               | There you go. An anti-cheat rootkit so ineptly coded it
               | serves as literal privilege escalation as a service. Can
               | we stop normalizing this stuff already?
               | 
               | My computer is my computer, and your computer is your
               | computer.
               | 
               | The game company owns _their servers_ , not my computer.
               | If their game runs on my machine, then cheating is my
               | prerrogative. It is quite literally an exercise of my
               | computer freedom if I decide to change the game's state
               | to give myself infinite health or see through walls or
               | whatever. It's not their business what software I run on
               | my computer. I can do whatever I want.
               | 
               | It's my machine. I am the _god_ of this domain. The game
               | doesn 't get to protect itself from me. It _will_ bend to
               | my will if I so decide. It doesn 't have a choice in the
               | matter. Anything that strips me of this divine power
               | should be straight up illegal. I don't care what the
               | consequences are for corporations, they should not get to
               | usurp me. They don't get to create little
               | extraterritorial islands in our domains where they have
               | higher power and control than we do.
               | 
               | I don't try to own their servers and mess with the code
               | running on them. They owe me the exact same respect in
               | return.
        
               | valicord wrote:
               | > the employee could reasonably conclude that it'd be
               | silly for them to maintain a separate "personal" computer
               | -- and so would potentially sell any such devices
               | 
               | What a bizarre leap of logic. Can Fedex employees
               | reasonably sell their non-uniform clothes? Just because
               | the employer in this scenario didn't 100% lock down the
               | computer (which is a good thing because the alternative
               | would be incredibly annoying for day-to-day work),
               | doesn't mean the the employee can treat it as their own.
               | Even from the privacy perspective, it would be pretty
               | silly. Are you going to use the employer provided
               | computer to apply to your next job?
        
               | derefr wrote:
               | People do _do_ it, though. Especially poor people, who
               | might not use their personal computers very often.
               | 
               | Also, many people don't own a separate "personal"
               | computer in the first place. Especially, again, poor
               | people. (I know many people who, if needing to use "a PC"
               | for something, would go to a public library to use the
               | computers there.)
               | 
               | Not every job is a software dev position in the Bay Area,
               | where everyone has enough disposable income to have a
               | pile of old technology laying around. Many jobs for which
               | you might be issued a work laptop still might not pay
               | enough to get you above the poverty line. McDonald's
               | managers are issued work laptops, for instance.
               | 
               | (Also, disregarding economic class for a moment: in the
               | modern day, most people who aren't in tech solve most of
               | their computing problems by owning _a smartphone_ , and
               | so are unlikely to have a full _PC_ at home. But their
               | phone can 't do everything, so if they have a work
               | computer they happen to be sat in front of for hours each
               | day -- whether one issued to them, or a fixed workstation
               | _at work_ -- then they 'll default to doing their rare
               | personal "productivity" tasks on that work computer. And
               | yes, this _does_ include updating their CV!)
               | 
               | ---
               | 
               | Maybe you can see it more clearly with the case of
               | company cars.
               | 
               | People sometimes don't own any other car (that actually
               | works) until they get issued a company car; so they end
               | up using their company car for everything. (Think
               | especially: tradespeople using their company-logo-branded
               | work box-truck for everything. Where I live, every third
               | vehicle in any parking lot is one of those.)
               | 
               | And people -- especially poorer people -- also often sell
               | their personal vehicle when they are issued a company
               | car, because this 1. releases them from the need to pay a
               | lease + insurance on that vehicle, and 2. gets them
               | possibly tens of thousands of dollars in a lump sum (that
               | they _don 't_ need to immediately reinvest into another
               | car, because they can now rely on the company car.)
        
               | valicord wrote:
               | The point is that if you do do it, it's on you to
               | understand the limitations of using someone else
               | property. Just like the difference between rental vs
               | owned housing.
               | 
               | There are also fairly obvious differences between work-
               | issued computers and all of your other analogies:
               | 
               | 1. A car (and presumably the cyberpunk android body) is
               | much more expensive than a computer, so the downside of
               | owning both a personal and a work one is much higher.
               | 
               | 2. A chair or a wheel chair doesn't need security
               | monitoring because it's a chair (I guess you could come
               | up with an incredibly convoluted scenario where it would
               | make sense to put GPS tracking in a wheelchair, but come
               | on).
               | 
               | > just buys the person the chair. And then it's the
               | employee's chair.
               | 
               | It's not because there's a law against loaning chairs,
               | it's because the chair is likely customized for a
               | specific person and can't be reused. Or if you're talking
               | about WFH scenarios, they just don't want to bother with
               | return shipping.
        
               | derefr wrote:
               | No, it's the difference between owned housing vs renting
               | from _a landlord who is also your boss in a company town_
               | , where the landlord has a vested interest in e.g.
               | preventing you from using your apartment to also do work
               | for a competitor.
               | 
               | Which is, again, a situation _so_ shitty that we 've
               | outlawed it entirely! And then also imposed further
               | regulations on regular, non-employer landlords, about
               | what kinds of conditions they can impose on tenants.
               | (E.g. in most jurisdictions, your landlord can't restrict
               | you from having guests stay the night in your room.)
               | 
               | Tenants' rights are actually a great analogy for what I'm
               | talking about here. A company-issued laptop is very much
               | like an apartment, in that you're "living in it"
               | (literally and figuratively, respectively), and that you
               | therefore _should_ deserve certain rights to autonomous
               | possession /use, privacy, freedom from
               | restriction/compromise in use, etc.
               | 
               | While you don't literally own an apartment you're
               | renting, the law tries to, as much as possible, give
               | tenants the rights of someone who _does_ own that
               | property; and to restrict the set of legal justifications
               | that a landlord can use to punish someone for exercising
               | those (temporary) rights over their property.
               | 
               | IMHO having the equivalent of "tenants' rights" for
               | something like a laptop is silly, because that'd be a lot
               | of additional legal edifice for not-much gain. But,
               | unlike with real-estate rental, it'd actually be quite
               | practical to just make the "tenancy" case of company IT
               | equipment use impossible/illegal -- forcing employers to
               | do something else instead -- something that _doesn 't_
               | force employees into the sort of legal area that would
               | make "tenants' rights" considerations applicable in the
               | first place.
        
               | valicord wrote:
               | No, that would be more like sleeping at the office
               | (purely because of employee preferences, not because the
               | employer forces you to or anything like that) and
               | complaining about security cameras.
        
           | eptcyka wrote:
           | Yes, that is why the owners of the computers (corps) use
           | these tools - to maintain control over their hardware (and IP
           | accessible on it). The end user is not the customer or user
           | here.
        
           | cqqxo4zV46cp wrote:
           | Oh stop it. It's not your machine, it's your employer's
           | machine. You're the user of the machine. You're cargo-culting
           | some ideological take that doesn't apply here at all.
        
             | imiric wrote:
             | > It's not your machine, it's your employer's machine.
             | 
             | Agreed. I'm fine with this, as long as the employer also
             | accepts that I will never use a personal device for work,
             | that I will never use a minute of personal time for work,
             | and that my productivity is significantly affected by
             | working on devices and systems provided and configured by
             | the employer. This knife cuts both ways.
        
               | fragmede wrote:
               | If only that were possible. Luckily for my employer, I
               | end up thinking about problems to be solved during my off
               | hours like when I'm sleeping and in the shower. Then
               | again, I also think about non-work life problems sitting
               | at my desk when I'm supposed to be working, so
               | (hopefully) it evens out.
        
               | imiric wrote:
               | I don't think it's possible either. But the moment my
               | employer forces me to install a surveillance rootkit on
               | the machine I use for work--regardless of who owns the
               | machine--any trust that existed in the relationship is
               | broken. And trust is paramount, even in professional
               | settings.
        
               | valicord wrote:
               | Setting aside the question whether these security tools
               | are effective at their stated goal, what does this have
               | to do with trust at all? Does the existence of a bank
               | vault break the trust between the bank and the tellers?
               | What is the mechanism that would prevent your computer
               | from getting infected by a 0-day if only your employer
               | trusted you?
        
               | imiric wrote:
               | > Does the existence of a bank vault break the trust
               | between the bank and the tellers?
               | 
               | That's a strange analogy, since the vault is meant to
               | safeguard customer assets from the public, not from bank
               | employees. Besides, the vault doesn't make the teller's
               | job more difficult.
               | 
               | > What is the mechanism that would prevent your computer
               | from getting infected by a 0-day if only your employer
               | trusted you?
               | 
               | There isn't one. What my employer does is trust that I
               | take care of their assets and follow good security
               | practices to the best of my abilities. Making me install
               | monitoring software is an explicit admission that they
               | don't trust me to do this, and with that they also break
               | my trust in them.
        
               | valicord wrote:
               | You mean like AV software is meant to safeguard the
               | computer from malware? I'm sure banks have a lot of
               | annoying security related processes that make teller's
               | job more difficult.
        
               | mr_mitm wrote:
               | If you don't already have an anti virus on your work
               | machine, you're in a extremely small minority. As a
               | consultant with projects that go about a week, I've
               | experienced the onboarding process of over a hundred orgs
               | first hand. They almost all hand out a Windows laptop,
               | and every single Windows laptop had an AV on it. It's
               | considered negligent not to have some AV solution in the
               | corporate world. And these days, almost all the fancy AVs
               | live in the kernel.
        
               | imiric wrote:
               | I don't doubt that to be the case, but I'm happy to not
               | work in corporate environments (anymore...). :)
        
               | kchr wrote:
               | My experience is that in these workplaces where EDR is
               | enforced on all devices used for work, your hypothetical
               | is true (i.e. you are not expected to work on devices not
               | provided by your employer - on the contrary, that is most
               | likely forbidden).
        
         | plantain wrote:
         | There is an open source alternative. GRR:
         | 
         | https://github.com/google/grr
         | 
         | Every Google client device has it.
        
           | G3rn0ti wrote:
           | It sounds really interesting. But the only thing it does not
           | do is scanning for vira/malwares, although this could be
           | implemented using GRR I guess. How does Google mitigate
           | malware threats in-house?
        
         | giantpotato wrote:
         | > _By-passing the discussion whether one actually needs root
         | kit powered endpoint surveillance software such as CS perhaps
         | an open-source solution would be a killer to move this whole
         | sector to more ethical standards._
         | 
         | As a red teamer developing malware for my team to evade EDR
         | solutions we come across, I can tell you that EDR systems are
         | essential. The phrase "root kit powered endpoint surveillance"
         | is a mischaracterization, often fueled by misconceptions from
         | the gaming community. These tools provide essential protection
         | against sophisticated threats, and they catch them. Without
         | them, my job would be 90% easier when doing a test where
         | Windows boxes are included.
         | 
         | > _So the main tool would be open source and it would be
         | transparent what it does exactly and that it is free of
         | backdoors or really bad bugs._
         | 
         | Open-source EDR solutions, like OpenEDR [1], exist but are
         | outdated and offer poor telemetry. Assembling various GitHub
         | POCs that exist for production EDR is impractical and insecure.
         | 
         | The EDR sensor itself becomes the targeted thing. As a threat
         | actor, the EDR is the only thing in your way most of the time.
         | Open sourcing them increases the risk of attackers contributing
         | malicious code to slow down development or introduce
         | vulnerabilities. It becomes a nightmare for development, as you
         | can't be sure who is on the other side of the pull request. TAs
         | will do everything to slow down the development of a security
         | sensor. It is a very adversarial atmosphere.
         | 
         | > _On the other hand it could still be a business model to
         | supply malware signatures as a security team feeding this
         | system._
         | 
         | It is actually the other way around. Open-source malware
         | heuristic rules do exist, such as Elastic Security's detection
         | rules [2]. Elastic also provides EDR solutions that include
         | kernel drivers and is, in my experience, the harder one to
         | bypass. Again, please make an EDR without drivers for Windows,
         | it makes my job easier.
         | 
         | > *It could be audited by the public."
         | 
         | The EDR sensors already do get "audited" by security
         | researchers and the threat actors themselves. Reverse
         | engineering and debugging the EDR sensors to spot weaknesses
         | that can be "abused." If I spot things like the EDR just
         | plainly accepting kernel mode shellcode and executing it, I
         | will, of course, publicly disclose that. EDR sensors are under
         | a lot of scrutiny.
         | 
         | [1] https://github.com/ComodoSecurity/openedr [2]
         | https://github.com/elastic/detection-rules
        
           | manquer wrote:
           | > Open sourcing them increases the risk of attackers
           | contributing malicious code to slow down development or
           | introduce vulnerabilities.
           | 
           | This is a such tired non-sequitur argument with no evidence
           | whatsoever to back it up that the risk is actually higher for
           | open source versus closed source.
           | 
           | I can just easily argue that a state or non-state actor could
           | buy[1], bribe or simply threaten to get weak code in a
           | proprietary system, without users having any means to ever
           | find out. On the other hand, it is always easier(easier not
           | easy) to discover compromise in open-source like it happened
           | with xz[2] and verify such reports independently.
           | 
           | If there is no proof that compromise is less likely with
           | closed source and it is far easier to discover them in open-
           | source, the logical conclusion is simply open source is
           | better for security libraries.
           | 
           | Funding defensive security infrastructure which is open
           | source and freely available for everyone to use even with
           | 1/100th of the NSA budget that is effectively only offensive,
           | would improve info-security enormously for everyone not just
           | from nation state actors, but also from scammers etc. Instead
           | we get companies like CS that have enormous vested interest
           | in seeing that never happens and trying to scare the rest of
           | us that open-source is bad for security.
           | 
           | [1] https://en.wikipedia.org/wiki/Dual_EC_DRBG
           | 
           | [2] https://en.wikipedia.org/wiki/XZ_Utils_backdoor
        
             | jpc0 wrote:
             | I have a different take on this.
             | 
             | I feel having the solution open sourced isn't bad from a
             | code security standpoint, but rathee that it is simply not
             | economically viable. To my knowledge most of the major open
             | source technologies are currently funded by FAANG and
             | purely because it's needed by them to conduct business and
             | the moment it becomes inconvenient for them to support it
             | they fork it or develop their own, see Terraform/Redis...
             | 
             | I also cannot get behind a government funding model purely
             | because it will simply become a design by committee
             | nightmare because this isn't flashy tech. Just see how many
             | private companies have beaten NASA to market in a pretty
             | well funded and very flashy industry. The very government
             | you want to fund these solutions are currently running on
             | private companies infrastructure for all their IT needs.
             | 
             | Yes opensouring is definitely amazing and if executed well
             | will be better, just like communism.
        
               | manquer wrote:
               | Plenty of fundamental research and development happens in
               | academia fairly effectively.
               | 
               | Government has to fund not run it like any other grant
               | works today. The existing foundations and non profits
               | like Apache or even mixed ones like Mozilla are fairly
               | capable of handling the grants.
               | 
               | Expecting private companies or dedicated volunteers to
               | maintain mission critical libraries like xz is not a
               | viable option as we are doing it now.
        
               | jpc0 wrote:
               | Seems like we agree then. There is a middle point and I
               | would actually prefer for it to be some sort of open
               | source one.
        
             | mardifoufs wrote:
             | I could see an open source solution with "private" or
             | vendor specific definition files. But I think I'd disagree
             | with the statement that open sourcing everything wouldn't
             | cause any problem. Engineering isn't necessarily about peer
             | reviewed studies, it's about empirical observations and
             | applying the engineering method (which can be complemented
             | by a more scientific one but shouldn't be confused for it).
             | It's clear that this type of stuff is a game of cat and
             | mouse. Attackers search for any possible vulnerability,
             | bypass etc. It does make sense that exposing one side's
             | machinery will make it easier for the other side to see how
             | it works. A good example of that is how active hackers are
             | at finding different ways to bypass Windows Defender by
             | using certain types of Office file formats, or certain
             | combinations of file conversions to execute code. Exposing
             | the code would just make all of those immediately visible
             | to everyone.
             | 
             | Eventually that's something that gets exposed anyways, but
             | I think the crucial part is timing and being a few steps
             | ahead in the cat and mouse game. Otherwise I'm not sure
             | what kind of proof would even be meaningful here.
        
               | manquer wrote:
               | > open sourcing everything wouldn't cause any problem
               | 
               | That is not what am saying, I am saying open sourcing
               | doesn't cause more problems than proprietary systems
               | which is the argument OP was making .
               | 
               | Open source is not a panacea, it is just not objectively
               | worse as OP implies.
        
               | aforwardslash wrote:
               | I actually agree there is no intrinsic advantage in
               | having this piece of software as opensource - closed
               | teams tend to have a more contained collaborator "blast
               | radius", and you don't have 500 forks with patches that
               | may modify behaviour in a subtle way and that are somehow
               | conflated with the original project.
               | 
               | On the other hand, anyone serious about malware
               | development already has "the actual source code", either
               | for defensive operations and offensive operations.
        
           | sudosysgen wrote:
           | > The phrase "root kit powered endpoint surveillance" is a
           | mischaracterization, often fueled by misconceptions from the
           | gaming community.
           | 
           | How exactly is this is mischaracterization? Technically these
           | EDR tools are identical to kernel level anticheat and they
           | are identical to rootkits, because fundamentally they're all
           | the same thing just with a different owner. If you disagree
           | it would be nice if you explained why.
           | 
           | As for open source EDRs becoming the target, this is just as
           | true of closed source EDR. Cortex for example was hilariously
           | easy to exploit for years and years until someone was nice
           | enough to tell them as much. This event from CrowdStrike
           | means that it's probably just as true here.
           | 
           | The fact that the EDR is 90% of the work of attacking a
           | Windows network isn't a sign that we should continue using
           | EDRs. It means that nothing privileged should be in a Windows
           | network. This isn't that complicated, I've administered such
           | a network where everything important was on Linux while end
           | users could run Windows clients, and if anything it's easier
           | than doing a modern Windows/AD deployment. Good luck pivoting
           | from one computer to another when they're completely isolated
           | through a Linux server you have no credentials for. No
           | endpoint should have any credentials that are valid anywhere
           | except on the endpoint itself and no two endpoints should be
           | talking to each other directly: this is in fact not very
           | restrictive to end users and completely shuts down lateral
           | movement - it's a far better solution than convoluted and
           | insecure EDR schemes that claim to provide zero-trust but
           | fundamentally can't, while following this simple rule
           | actually provides you zero-trust.
           | 
           | Look at it this way - if you (and other redteamers) can
           | economically get past EDR systems for the cost of a pentest,
           | what do you think competent hackers with economies of scale
           | and million dollar payouts can do? For now there's enough
           | systems without EDRs that many just won't bother, but as it
           | spread more they will just be exploited more. This is true as
           | well of the technical analogue in kernel anticheat, which you
           | and I can bypass in a couple days of work.
           | 
           | Where we are is that we're using EDRs as a patch over a
           | fundamentally insecure security model in a misguided attempt
           | to keep the convenience that insecurity brings.
        
         | ndr_ wrote:
         | There used to be Winpooch Watchguard, based on ClamAV. Stopped
         | using it when it caused Bluescreens. A "Killer" indeed.
        
         | intelVISA wrote:
         | Security isn't really a product you can just buy or outsource,
         | but here we are.
        
           | kemotep wrote:
           | Crowdstrike is a gun. A tool. But not the silver bullet. Or
           | training to be able to fire it accurately under pressure at
           | the werewolf.
           | 
           | You can very easily shoot your own foot off instead of
           | slaying the monster, use the wrong ammunition to be
           | effective, or in this case a poorly crafted gun can explode
           | in your hand when you are holding it.
        
         | cedws wrote:
         | The value CrowdStrike provides is the maintenance of the
         | signature database, and being able to monitor attack campaigns
         | worldwide. That takes a fair amount of resources that an open
         | source project wouldn't have. It's a bit more complicated than
         | a basic hash lookup program.
        
         | ymck wrote:
         | There are a number of OSS EDRs. They all suck.
         | 
         | DAT-style content updates and signature-based prevention are
         | very archaic. Directly loading content into memory and a hard-
         | coded list of threats? I was honestly shocked that CS was still
         | doing DAT-style updates in an age of ML and real-time threat
         | feeds. There are a number of vendors who've offered it for
         | almost a decade. We use one. We have to run updates a couple of
         | times a year.
         | 
         | SMH. The 90's want their endpoint tech back.
        
       | iwontberude wrote:
       | Crowdstrike isn't a company anymore, this is probably their end.
       | The litigation will be death by thousand cuts.
        
         | t0mas88 wrote:
         | Has anyone looked into their terms and conditions? Usually any
         | resulting damage from software malfunctioning is excluded. Only
         | the software itself being unavailable may be an SLA breach.
         | 
         | Typically there would also be some clauses where CS is the only
         | one that is allowed to determine an SLA breach, SLA breaches
         | only result in future licence credits no cash, and if you
         | disagree it's limited to mandatory arbitration...
         | 
         | The biggest impact is probably only their reputation taking a
         | huge hit. Loosing some customers over this and making it harder
         | to win future business.
        
           | iwontberude wrote:
           | They will still need to hire lawyers to prove this. Thousands
           | of litigants. I am sure there is some tort which is not
           | covered by the arbitration agreement that would give
           | plaintiff standing no?
           | 
           | Commenter on stack exchange had an interesting counter: In
           | some jurisdictions, any attempt to sidestep consumer law may
           | be interpreted by the courts as conspiracy, which can prove
           | more serious than merely accepting the original penalties.
        
             | chii wrote:
             | > Thousands of litigants
             | 
             | i would imagine a class action suit instead of individual
             | cases if this were to happen.
        
               | iwontberude wrote:
               | Potentially we will see some, but this occurred in many
               | jurisdictions across the world.
        
               | disgruntledphd2 wrote:
               | They'll be sued by the insurance companies probably.
        
           | clwg wrote:
           | No big company is going to agree to the terms and conditions
           | that are listed on their website, they'll have their own
           | schedules for indemnification that CS would agree to, not the
           | other way around. Those 300 of the Fortune 500 companies are
           | going to rip CS apart.
        
         | joelthelion wrote:
         | The stock market disagrees:
         | https://www.google.com/finance/quote/CRWD:NASDAQ?window=5Y
         | 
         | To be clear, I feel investors are a bit delusional, I just
         | thought it was an interesting perspective to share.
        
           | asynchronous wrote:
           | They really are delusional, as a security person crowdstrike
           | was overvalued before this event, and to everyone in tech
           | this shows how bad their engineering practices are.
        
             | chii wrote:
             | but they are able to insert themselves into this many
             | enterprise machines! So regardless of your security
             | credentials, they made good business decisions.
             | 
             | On the other hand, this may open the veil for a lot of
             | companies to dump them.
        
               | bni wrote:
               | For another similar product from a competitor that there
               | is no reason to believe are any better.
        
           | Osiris wrote:
           | Wow. Cause a global meltdown and only lose 18% of your stock
           | value? They must be doing something that investors like.
        
             | imtringued wrote:
             | They are probably pivoting to charging ransoms aka
             | "consulting fees" to fix crashing systems and those are
             | priced in.
        
           | aflag wrote:
           | The stock market only had a day to react and they were also
           | heavily affected by the issue. Let's see where the stock
           | price goes in the following week.
        
         | markus_zhang wrote:
         | I'd bet $100 that Crowdstrike won't pay out more than $100m for
         | that dozens of billions of damage.
        
         | ai4ever wrote:
         | software vendors should be required to face consequences of
         | shipping a poor product.
         | 
         | one possibility is: clawback or refunds for past payments equal
         | to business damage caused by the flawed product.
        
           | hypeatei wrote:
           | I would say the companies compelling others to buy and
           | install this shitty security software, e.g. cyber insurance,
           | should also be punished.
        
       | system2 wrote:
       | Maybe one day people will learn what a blog is.
        
       | delta_p_delta_x wrote:
       | The moment I read 'it is a _content update_ that causes the BSOD,
       | deleting it solves the problem ', I was immediately willing to
       | bet a hundred quid (for the non-British, that's PS100) that it
       | was a combination of said bad binary data and a poorly-written
       | parser that didn't error out correctly upon reading invalid data
       | (in this case, read an array of pointers, didn't verify that all
       | of them were both non-null and pointed to valid data/code).
       | 
       | In the past ten years or so of having done somewhat serious
       | computing and zero cybersecurity whatsoever, I have my mind
       | concluded, feel free to disagree.
       | 
       | Approximately _100%_ of CVEs, crashes, bugs, slowdowns, and pain
       | points of computing have to do with various forms of
       | deserialising binary data back into machine-readable data
       | structures. All because a) human programmers forget to account
       | for edge cases, and b) imperative programming languages allow us
       | to do so.
       | 
       | This includes everything from: decompression algorithms; font
       | outline readers; image, video, and audio parsers; video game data
       | parsers; XML and HTML parsers; the various
       | certificate/signature/key parsers in OpenSSL (and derivatives);
       | and now, this CrowdStrike content parser in its EDR program.
       | 
       | That wager stands, by the way, and I'm happy to up the ante by
       | PS50 to account for my second theory.
        
         | bostik wrote:
         | > _Approximately 100% of CVEs, crashes, bugs, [...],
         | deserialising binary data_
         | 
         | I'd make that 98%. Outside of rounding errors in the margins,
         | the remaining two percent is made up of logic bugs,
         | configuration errors, bad defaults, and outright insecure
         | design choices.
         | 
         | Disclosure: infosec for more than three decades.
        
           | epanchin wrote:
           | They forgot to account for those edge cases
        
             | delta_p_delta_x wrote:
             | Heh, touche.
        
           | delta_p_delta_x wrote:
           | I feel vindicated but also a bit surprised that my gut
           | feeling was this accurate.
        
             | bostik wrote:
             | Not really a surprise, to be honest. "Deserialisation"
             | encapsulates most forms of injection attacks.
             | 
             | OWASP top-10 was dominated by those for a very long time.
             | They have only recently been overtaken by authorization
             | failures.
        
         | smackeyacky wrote:
         | Hmmm. Most common problems these days are certificate related I
         | would have thought. Binary data transfers are pretty rare in an
         | age of base64 json bloat
        
           | madaxe_again wrote:
           | There are plenty of binary serialisation protocols out there,
           | many proprietary - maybe you'll stuff that base64'd in a json
           | container for transit, but you're still dealing with a binary
           | decoder.
        
         | Sakos wrote:
         | I can't decide what's more damning. The fact that there was
         | effectively no error/failure handling or this:
         | 
         | > Note "channel updates ...bypassed client's staging controls
         | and was rolled out to everyone regardless"
         | 
         | > A few IT folks who had set the CS policy to ignore latest
         | version confirmed this was, ya, bypassed, as this was "content"
         | update (vs. a version update)
         | 
         | If your content updates can break clients, they should not be
         | able to bypass staging controls or policies.
        
           | vladvasiliu wrote:
           | The way I understand it, the policy the users can configure
           | are about "agent versions". I don't think there's a setting
           | for "content versions" you can toggle.
        
             | sateesh wrote:
             | Maybe there isn't a switch that says "content version",but
             | from end user perspective it is a new version. Whether it
             | was a content change, or just a fix for typo in
             | documentation (say) the change being pushed is different
             | than what currently exists.And for the end user the
             | configuration implies that they have a chance to decide
             | whether to accept any new change being pushed or not.
        
           | SoftTalker wrote:
           | > If your content updates can break clients
           | 
           | This is going to be what most customers did not realize. I'm
           | sure Crowdstrike assured them that content updates were
           | completely safe "it's not a change to the software" etc.
           | 
           | Well they know differently now.
        
         | miohtama wrote:
         | I was immediately willing to bet a hundred quid this was C/C++
         | code :)
        
           | formerly_proven wrote:
           | Not that interesting a bet considering we know it's a Windows
           | driver.
        
         | fire_lake wrote:
         | Yes indeed. If you are doing this kind of job, reach for a
         | parser generator framework and fuzz your program.
         | 
         | Also go read Parse Don't Validate https://lexi-
         | lambda.github.io/blog/2019/11/05/parse-don-t-va...
        
           | teeheelol wrote:
           | Yep.
           | 
           | Looking at how this whole thing is pasted together, there's
           | probably a regex engine in one of those sys files somewhere
           | that was doing the "parsing"...
        
           | lolinder wrote:
           | > reach for a parser generator framework and fuzz your
           | program
           | 
           | I agree to the second but disagree on the first. Parser
           | generator frameworks produce a lot of code that is hard to
           | read and understand and they don't necessarily do a better
           | job of error handling than you would. A hand-written
           | recursive descent parser will usually be more legible, will
           | clearly line up with the grammar that you're supposed to be
           | parsing, and will be easier to add _better_ error handling
           | to.
           | 
           | Once you're aware of the risks of a bad parser you're halfway
           | there. Write a parser with proper parsing theory in mind and
           | in a language that forces you to handle all cases. Then fuzz
           | the program, turn bad inputs that turn up into permanent
           | regression tests, and write your own tests with your
           | knowledge of the inner workings of your parser in mind.
           | 
           | This isn't like rolling your own crypto because the
           | alternative isn't a battle-tested open source library, it's a
           | framework that generates a brand new library that only you
           | will use and maintain. If you're going to end up with a
           | bespoke library anyway, you ought to understand it well.
        
         | mtlmtlmtlmtl wrote:
         | There's at least five different things that went wrong
         | simultaneously.
         | 
         | 1. Poorly written code in the kernel module crashed the whole
         | OS, and kept trying to parse the corrupted files, causing a
         | boot loop. Instead of handling the error gracefully and
         | deleting/marking the files as corrupt.
         | 
         | 2. Either the corrupted files slipped through internal testing,
         | or there is no internal testing.
         | 
         | 3. Individual settings for when to apply such updates were
         | apparently ignored. It's unclear whether this was a glitch or
         | standard practice. Either way I consider it a bug(it's just a
         | matter of whether it's a software bug or a bug in their
         | procedures).
         | 
         | 4. This was pushed out everywhere simultaneously instead of
         | staggered to limit any potential damage.
         | 
         | 5. Whatever caused the corruption in the first place, which is
         | anyone's guess.
        
           | rwmj wrote:
           | Zero effort to fuzz test the parser too. I mean, we _know_
           | how to harden parsers against bugs and attacks, and any semi-
           | competent fuzzer would have caught such a trivial bug.
        
             | chrisjj wrote:
             | The triggering file was all zeros.
             | 
             | It is not possible that only this pattern caused the crash,
             | and fuzzing omitted to try this unfuzzy pattern?
        
               | gliptic wrote:
               | No, it wasn't. Crowdstrike denied it had to do with zeros
               | in the files.
        
               | jojobas wrote:
               | At this point I wouldn't be paying too much attention to
               | what Crowdstrike is saying.
        
               | hello_moto wrote:
               | Have to speak the truth albeit at minimum, in case
               | legal...
        
               | kchr wrote:
               | Which also explains why they, only if needed to cover
               | their back legally, confirm or deny details being shared
               | on social and mass media.
        
               | watwut wrote:
               | Possible? Yes. Likely? No.
        
               | monsieurbanana wrote:
               | In my limited experience, I thought any serious fuzzing
               | program does test for all "standard" patters like only
               | null bytes, empty strings, etc...
        
               | formerly_proven wrote:
               | Instrumented fuzzing (like AFL and friends) tweaks the
               | input to traverse unseen code paths in the target, so
               | they're super quick to find stuff like "heyyyyy, nobody
               | is actually checking if this offset is in bounds before
               | loading from that address".
        
               | omeid2 wrote:
               | The files in question has a magic number is "0xAAAAAAAA"
               | so it is not possible that the file was all zeros.
        
               | Retr0id wrote:
               | Competent fuzzers don't just use random bytes, they
               | systematically explore the state-space of the target
               | program. If there's a crash state to be found by feeding
               | in a file full of null bytes, it's probably going to be
               | found quickly.
               | 
               | A fun example is that if you point AFL at a JPEG parser,
               | it will eventually "learn" to produce valid JPEG files as
               | test cases, without ever having been told what JPEG file
               | is supposed to look like.
               | https://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-
               | of-th...
        
               | rwmj wrote:
               | AFL is really "magical". It finds bugs very quickly and
               | with little effort on our part except to leave it running
               | and look at the results occasionally. We use it to fuzz
               | test a variety of file formats and network interfaces,
               | including QEMU image parsing, nbdkit, libnbd, hivex. We
               | also use clang's libfuzzer with QEMU which is another
               | good fuzzing solution. There's really no excuse for
               | CrowdStrike not to have been using fuzzing.
        
               | layer8 wrote:
               | No, it wasn't all zeros:
               | https://x.com/patrickwardle/status/1814782404583936170
        
             | mavhc wrote:
             | AV software is a great target for malware, badly written,
             | probably runs too much stuff in the kernel, tries to parse
             | everything
        
               | Comfy-Tinwork wrote:
               | And at the very least straight to system level access if
               | not more.
        
               | MyFedora wrote:
               | Anti-cheats also whitelist legit AV drivers, even though
               | cheaters exploit them to no end.
        
               | londons_explore wrote:
               | AV software needs kernel privilidges to have access to
               | everything it needs to inspect, but the actual inspection
               | of that data should be done with no privilidges.
               | 
               | I think most AV companies now have a helper process to do
               | that.
               | 
               | If you successfully exploit the helper process, the worst
               | damage you ought to be able to do is falsely find files
               | to be clean.
        
             | jatins wrote:
             | You are seriously overestimating the engineering practises
             | at these companies. I have worked in "enterprise security"
             | previously though not at this scale. In a previous life I
             | worked with of the engineering leaders currently at
             | Crowdstrike.
             | 
             | I'll bet you this company has some arbitrary unit test
             | coverage requirements for PRs which developers game be
             | mocking the heck out of dependencies. I am sure they have
             | some vanity sonarqube integration to ensure great "code
             | quality". This likely also went through manual QA.
             | 
             | However I am sure the topic of fuzz testing would not have
             | come up once. These companies sell checkbox compliance, and
             | they themselves develop their software the same way.
             | Checking all the "quality engineering" boxes with very
             | little regards for long term engineering initiatives that
             | would provide real value.
             | 
             | And I am not trying to kick Crowdstrike when they are down.
             | It's the state of any software company run by suits with
             | myopic vision. Their engineering blogs and their codebases
             | are poles apart.
        
           | simonh wrote:
           | There is a story out that the problem was introduced in a
           | post processing step after testing. That makes more sense
           | than that there was no testing. If true it means they thought
           | they'd tested the update, but actually hadn't.
        
           | hulitu wrote:
           | 6. No development process, no testing.
        
             | krisoft wrote:
             | How is that different from point 2?
        
           | ratorx wrote:
           | I'd also maybe add another one on the Windows end:
           | 
           | 6) some form of sandboxing/error handling/api changes to make
           | it possible to write safer kernel modules (not sure if it
           | already exists and was just not used). It seems like the
           | design could be better if a bad kernel module can cause a
           | boot loop in the OS...
        
             | leosarev wrote:
             | There is sandboxing API in Windows. It's called running
             | programs in userspace.
        
               | hello_moto wrote:
               | Run what a userspace?
        
             | layer8 wrote:
             | It's a tough problem, because you also don't want the
             | system to start without the CrowdStrike protection. Or more
             | generally, a kernel driver is supposedly installed for a
             | reason, and presumably you don't want to keep the system
             | running if it doesn't work. So the alternative would be to
             | shut down the system upon detection of the faulty driver
             | without rebooting, which wouldn't be much of an improvement
             | in the present case.
        
               | ratorx wrote:
               | I can imagine better defaults. Assuming the threat vector
               | is malicious programs running in userspace (probably
               | malicious programs in kernel space is game over anyway
               | right?), then you could simply boot into safe mode or
               | something instead of crashlooping.
               | 
               | One of the problems with this outage was that you
               | couldn't even boot into safe mode without having the bit
               | locker recovery key.
        
               | layer8 wrote:
               | You don't want to boot into safe mode with networking
               | enabled if the software that is supposed to detect
               | attacks from the network isn't running. Safe mode doesn't
               | protect you from malicious code in userspace, it only
               | "protects" you from faulty drivers. Safe mode is for
               | troubleshooting system components, not for increasing
               | security.
               | 
               | I don't know the exact reasoning why safe mode requires
               | the BitLocker recovery key, but presumably not doing so
               | would open up an attack vector defeating the BitLocker
               | protection.
        
               | Uvix wrote:
               | Normally BitLocker gets the key from the TPM, which will
               | have its own driver that's likely disabled in Safe Mode.
        
               | discostrings wrote:
               | The BitLocker configurations I've seen over the last few
               | days don't require the recovery key to enter safe mode.
        
             | sm_1024 wrote:
             | Doesn't microsoft support eBPF on Windows?
             | 
             | https://github.com/microsoft/ebpf-for-windows
        
           | dartos wrote:
           | Bugs happen.
           | 
           | Not staggering the updates is what blew my mind.
        
             | londons_explore wrote:
             | Since the issue manifested at 04:09 UTC, which is 11pm
             | where Crowdstrikes HQ is, I would guess someone was working
             | late at night and skipped the proper process so they could
             | get the update done and go to bed.
             | 
             | They probably considered it low risk, had done similar
             | things of times hundreds of times before, etc.
        
               | dartos wrote:
               | > They probably considered it low risk
               | 
               | Wild that anyone would consider anything in the "critical
               | path" low risk. I would bet that they just don't do
               | rolling releases normally since it never caused issues
               | before.
        
               | hello_moto wrote:
               | Companies these days are global btw.
               | 
               | Not everyone is working on the same timezone.
        
               | londons_explore wrote:
               | They don't appear to have engineering jobs in any
               | location where that would be considered regular office
               | hours...
        
               | hello_moto wrote:
               | https://crowdstrike.wd5.myworkdayjobs.com/crowdstrikecare
               | ers
               | 
               | I see remote, Israel, Canada.
               | 
               | https://crowdstrike.wd5.myworkdayjobs.com/en-
               | US/crowdstrikec...
               | 
               | This one specifically Spain and Romania
               | 
               | I know they bought companies all over the globe from
               | Denmark to other locations.
        
               | londons_explore wrote:
               | 0409UTC is 07:09 AM in Israel. Doubt an engineer was
               | doing a push then either...
               | 
               | All the other engineering locations seem even less
               | likely.
        
               | vitus wrote:
               | On Friday, no less. (Israel's weekend is Friday /
               | Saturday instead of the usual Saturday / Sunday.)
        
               | kchr wrote:
               | A good reminder of the fact that your Thursday might be
               | someone else's Friday.
        
           | rco8786 wrote:
           | Number 4 continues to be the most surprising bit to me. I
           | could not fathom having a process that involves deploying to
           | 8.5 million remote machines simultaneously.
           | 
           | Bugs in code I can almost always understand and forgive, even
           | the ones that seem like they'd be obvious with hindsight. But
           | this is just an egregious lack of the most basic rollout
           | standards.
        
             | gitfan86 wrote:
             | They probably don't get to claim agile story points until
             | the ticket is in finished state. And they probably have a
             | culture where vanity Metrics like "velocity" are
             | prioritized
        
               | nmg wrote:
               | This would answer the question that i've not heard anyone
               | asking:
               | 
               | what incentivized the bad decisions that led to this
               | catastrophic failure?
        
               | phs318u wrote:
               | My understanding is that the culture (as reported by some
               | customers) is quite aggressive and pushy. They are quite
               | vocal when customers don't turn in automatic updates.
               | 
               | It makes sense in a way - given their fast growth
               | strategy (from nowhere to top 3) and desire to "do things
               | differently" - the iconoclast upstarts that redefine the
               | industry.
               | 
               | Or to summarise - hubris.
        
               | hello_moto wrote:
               | To catch 0day quickly, EDR needs to know "how".
               | 
               | The "how" here is AV definition or a way to identify the
               | attack. In CS-speak: content.
               | 
               | Catching 0day quickly results in good reputation that
               | your EDR works well.
               | 
               | If people turn off their AV definition auto-update, they
               | are at-risk. Why use EDR if folks don't want to stop
               | attack quickly?
        
               | LtWorf wrote:
               | In theory you're correct. In practice it seems that
               | crowdstrike has crashed systems with their updates much
               | more often than 0day attacks.
        
               | 77pt77 wrote:
               | > They are quite vocal when customers don't turn in
               | automatic updates.
               | 
               | I'm sorry but this is the customer's fault.
               | 
               | If I'm using your services you work for me and you don't
               | get to bully me into doing whatever you think needs to be
               | done.
               | 
               | People that chose this solution need to be penalized, but
               | they won't.
        
               | mbreese wrote:
               | Customers don't always have a choice here. They could be
               | restricted by compliance programs (PCI, et al) and be
               | required under those terms to have auto updates on.
               | 
               | Compliance also has to share some of the blame here, if
               | best practices (local testing) aren't allowed to be
               | followed in the name of "security".
        
               | nerdjon wrote:
               | This needs to keep being repeated anytime someone wants
               | to blame the company.
               | 
               | Many don't have a choice, a lot of compliance is doing x
               | to satisfy a checkbox and you don't have a lot of
               | flexibility in that or you may not be able to things like
               | process credit cards which is kinda unacceptable
               | depending on your company. (Note: I didn't say all)
               | 
               | CrowdStrike automatic update happens to satisfy some of
               | those checkboxes.
        
               | cruffle_duffle wrote:
               | Oh the games I have to play with story points that have
               | personal performance metrics attached to them. Splitting
               | tickets to span sprints so there aren't holes in some
               | dudes "effort" because they didn't compete some task they
               | committed to.
               | 
               | I never thought such stories were real until I
               | encountered them...
        
             | thundershart wrote:
             | Surely, CrowdStrike's safety posture for update rollouts is
             | in serious need of improvement. No argument there.
             | 
             | But is there any responsibility for the clients consuming
             | the data to have verified these updates prior to taking
             | them in production? I haven't worn the sysadmin hat in a
             | while now, but back when I was responsible for the upkeep
             | of many thousands of machines, we'd never have blindly
             | consumed updates without at least a basic smoke test in a
             | production-adjacent UAT type environment. Core OS updates,
             | firmware updates, third party software, whatever -- all of
             | it would get at least some cursory smoke testing before
             | allowing it to hit production.
             | 
             | On the other hand, given EDR's real-world purpose and the
             | speed at which novel attacks propagate, there's probably a
             | compelling argument for always taking the latest
             | definition/signature updates as soon as they're available,
             | even in your production environments.
             | 
             | I'm certainly not saying that CrowdStrike did nothing wrong
             | here, that's clearly not the case. But if conventional
             | wisdom says that you should kick the tires on the latest
             | batch of OS updates from Microsoft in a test environment,
             | maybe that same rationale should apply to EDR agents?
        
               | stoolpigeon wrote:
               | I think point 3 of the grand parent indicates admins were
               | not given an opportunity to test this.
               | 
               | My company had a lot of Azure vms impacted by this and
               | I'm not sure who the admin was who should have tested it.
               | Microsoft? I don't think we have anything to do with
               | crowdstrike software on our vms. ( I think - I'm sure
               | I'll find out this week.)
               | 
               | Edit: I just learned the Azure central region failure
               | wasn't related to the larger event - and we weren't
               | impacted by the crowd strike issue - I didn't know it was
               | two different things. So my second part of the comment is
               | irrelevant.
        
               | thundershart wrote:
               | Oh, I'd missed point #3 somehow. If individual consumers
               | weren't even given the opportunity to test this, whether
               | by policy or by bug, then ... yeesh. Even worse than I'd
               | thought.
               | 
               | Exactly which team owns the testing is probably left up
               | to each individual company to determine. But ultimately,
               | if you have a team of admins supporting the production
               | deployment of the machines that enable your business,
               | then someone's responsible for ensuring the availability
               | of those machines. Given how impactful this CrowdStrike
               | incident was, maybe these kinds of third-party auto-
               | update postures need to be reviewed and potentially
               | brought back into the fold of admin-reviewed updates.
        
               | kiitos wrote:
               | > But is there any responsibility for the clients
               | consuming the data to have verified these updates prior
               | to taking them in production
               | 
               | In the boolean sense, yes. United Airlines (for example)
               | is ultimately responsible for their own production
               | uptime, so any change they apply without validation is a
               | risk vector.
               | 
               | In pragmatic terms, it's a bit fuzzier. Does CrowdStrike
               | provide any _practical_ way for customers to validate,
               | canary-deploy, etc. changes before applying them to
               | production? And not just changes with type=important, but
               | _all_ changes? From what I understand, the answer to that
               | question is no, at least for the type=channel-update
               | change that triggered this outage. In which case I think
               | the blame ultimately falls almost entirely on
               | CrowdStrike.
        
               | cozzyd wrote:
               | Arguably United airlines shouldn't have chosen a product
               | they can't test updates of, though maybe there are no
               | good options.
        
               | suzzer99 wrote:
               | Yeah one of the major problems seems to be CrowdStrike's
               | assumptions that channel files are benign. Which isn't
               | true if there's a bug in your code that only gets
               | triggered by the right virus definition.
               | 
               | I don't know how you could assert that this is
               | impossible, hence channel files should be treated as
               | code.
        
               | thundershart wrote:
               | > From what I understand, the answer to that question is
               | no, at least for the type=channel-update change that
               | triggered this outage. In which case I think the blame
               | ultimately falls almost entirely on CrowdStrike.
               | 
               | Honestly, it hadn't even occurred to me that software
               | like this marketed at enterprise customers _wouldn 't_
               | have this kind of control already available. It seems
               | like an obvious thing that any big organization would
               | insist on that I just took it for granted that it
               | existed.
               | 
               | Whoops.
        
               | volkl48 wrote:
               | It's not an option. While the admins at the customer have
               | the ability to control when/how revisions of the client
               | software go out (and this, can + generally do their own
               | testing, can decide to stay one rev back as default,
               | etc), there is no control over updates to the kind of
               | update/definition files that were the primary cause here.
               | 
               | Which is also why you see every single customer affected
               | - what you are suggesting is simply not an available
               | thing to do at present for them.
               | 
               | At least for now - I imagine that some kind of
               | staggered/slowed/ringed option will have to be
               | implemented in the future if they want to retain
               | customers.
        
             | mbreese wrote:
             | For me, number 1 is the worst of the bunch. You should
             | always expect that there will be bugs in processes, input
             | files, etc... the fact that their code wasn't robust enough
             | to recognize a corrupted file and not crash is inexcusable.
             | Especially in kernel code that is so widely deployed.
             | 
             | If any one of the five points above hadn't happened, this
             | event would have been avoided. However, if number 1 had
             | been addressed - any of the others could have happened (or
             | all at the same time) and it would have been fine.
             | 
             | I understand that we should assume that bugs will be
             | present anywhere, which is why staggered deployments are
             | also important. If there had been staggered deployments,
             | the. The damage would have happened, but it would have been
             | localized. I think security people would argue against a
             | staged deployment though, as if it were discovered what the
             | new definitions protected against, an exploit could be
             | developed quickly to put those servers that aren't in the
             | "canary" group at risk. (At least in theory -- I can't see
             | how staggering deployment over a 6-12 hour window would
             | have been that risky).
        
               | timmytokyo wrote:
               | They're all terrible, but I agree #1 is particularly
               | egregious for a company ostensibly dedicated to security.
               | A simple fuzz tester would have caught this type of bug,
               | so they clearly don't perform even a minimal amount of
               | testing on their code.
        
               | nsguy wrote:
               | Totally agree. Not only would a coverage guided fuzzer
               | catch this they should also be adding every single file
               | they send out to the corpus of that automated fuzz
               | testing so they can get somewhat increased coverage on
               | their parser.
               | 
               | There may not be out of the box fuzzers that test device
               | drivers so you hoist all the parser code, build it into a
               | stand-alone application, and fuzz that.
               | 
               | Likely this is a form of technical debt since I can
               | understand not doing all of this day #1 when you have 5
               | customers but at some point as you scale up you need to
               | change the way you look at risk.
        
               | jayd16 wrote:
               | You admit that bugs are inevitable and then claim a bug
               | free parser as the most important bullet. That seems
               | flawed to me. It would certainly be nice, but is that
               | achievable?
               | 
               | Policy changes seem more reliable and would catch other,
               | as of yet unknown classes of bugs.
        
               | throwaway5752 wrote:
               | I disagree. Has to be 4, something will always go wrong,
               | so you have to deliver in cohorts.
               | 
               | That goes equally if it was a Windows Update rolled out
               | in one motion that broke the falcon agent/driver, or if
               | it was Crowdstrike. There is almost no excuse for a
               | global rollout without telemetry checks, whether it's
               | security agent updates or os patches.
        
             | layer8 wrote:
             | Malware signature updates are supposed to be deployed ASAP,
             | because every minute may count when a new attack is
             | spreading. The mistake may have been to apply that policy
             | indiscriminately.
        
             | mrbombastic wrote:
             | And here I thought shipping a new version on the app store
             | was scary.
             | 
             | Is there anything we can take from other
             | professions/tradecraft/unions/legislation to ensure shops
             | can't skip the basic best practices we are aware of in the
             | industry like staged rollouts? How do we set incentives to
             | prevent this? Seriously the App Store was raking in $$ from
             | us for years with no support for staged rollouts and no
             | other options.
        
             | avree wrote:
             | A lot of snarky replies to this comment, but the reality is
             | that if you were selling an anti-virus, identified a
             | malicious virus, and then chose not to update millions of
             | your machines with that virus's signature, you'd also be in
             | the wrong.
        
               | naasking wrote:
               | > identified a malicious virus, and then chose not to
               | update millions of your machines with that virus's
               | signature, you'd also be in the wrong.
               | 
               | No, for exactly the reason we just saw, and the same
               | reason why vaccines are tested before widespread rollout.
        
               | aforwardslash wrote:
               | On the other hand, diseases vaccines prevent dont have
               | almost instantaneous propagation, thats why they are
               | effective at containing propagation.
               | 
               | As an example, reaction time is paramount to counter many
               | kinds of attacks - thats why blocklists are so popular,
               | and AS blackholing is a viable option.
        
             | VirusNewbie wrote:
             | > But this is just an egregious lack of the most basic
             | rollout standards.
             | 
             | Agreed. It's crazy that the top tech companies enforce this
             | in a biblical fashion, despite all sorts of pressure to
             | ship and all that. Crowdstrike went YOLO at a _global_
             | scale.
        
             | alsetmusic wrote:
             | I worked at one of the big ones and we always shipped live
             | to all consumer devices at the same time. But this was for
             | a popular suite of products that generate a lot of consumer
             | demand, so we had a rigorous QA process to make sure this
             | wouldn't be a problem. As I was typing this, it occurred to
             | me that zero people would have cared if this update was
             | staggered making it pretty silly not to.
        
               | trhway wrote:
               | As the QA manager said in our recent product meeting -
               | "as the canary doesn't work we roll out and test on the
               | production cloud".
        
             | robomc wrote:
             | I wonder if there's a concern that staggering the malware
             | signatures would open them up to lawsuits if somebody was
             | hacked in between other customers getting the data and them
             | getting the data.
        
               | thundershart wrote:
               | > I wonder if there's a concern that staggering the
               | malware signatures would open them up to lawsuits if
               | somebody was hacked in between other customers getting
               | the data and them getting the data.
               | 
               | I'd assume that sort of thing would be covered in the
               | EULA and contract -- but even if it weren't, it seems
               | like allowing customers to define their own definition
               | update strategy would give them a pretty compelling
               | avenue to claim non-liability. If CrowdStrike can
               | credibly claim "hey, we made the definitions available,
               | you chose to wait for 2 weeks to apply them, that's on
               | you", then it becomes much less of a concern.
        
           | rainsford wrote:
           | > 2. Either the corrupted files slipped through internal
           | testing, or there is no internal testing.
           | 
           | This is the most interesting question to me because it
           | doesn't seem like there is an obviously guessable answer. It
           | seem very unlikely to me that a company like CrowdStrike
           | pushes out updates of any kind without doing some sort of
           | testing, but the widespread nature of the outage would also
           | seem to suggest any sort of testing setup should have caught
           | the issue. Unless it's somehow possible for CrowdStrike to
           | test an update that was different than what was deployed,
           | it's not obvious what went wrong here.
        
             | bloopernova wrote:
             | I had read somewhere that the definition file was corrupted
             | after testing, during the final CI/CD pipeline.
        
           | shrimp_emoji wrote:
           | Well, Microsoft led by example with #2:
           | https://news.ycombinator.com/item?id=20557488
        
           | pclmulqdq wrote:
           | Number 4 is what everyone will fixate on, but I have the
           | biggest problem with number 1. Anything like this sort of
           | file should have (1) validation on all its pointers and (2)
           | probably >2 layers of checksumming/signing. They should
           | generally expect these files to get corrupted in transit once
           | in a while, but they didn't seem to plan for anything other
           | than exactly perfect communication between their intent and
           | their _kernel driver_.
        
           | dcuthbertson wrote:
           | I wonder if it was pushed anywhere that didn't crash, as an
           | extension of "It works on my machine. Ship it!"
           | 
           | I've built a couple of kernel drivers over the years and what
           | I know is that ".sys" files are to the kernel as ".dll" files
           | are to user-space programs in that the ones with code in them
           | run only after they are loaded and a desired function is run
           | (assuming boilerplate initialization code is good).
           | 
           | I've never made a data-only .sys file, but I don't see why
           | someone couldn't. In that case, I'd guess that no one ever
           | checked it was correct, and the service/program that loads it
           | didn't do any verification either -- why would it, the
           | developers of said service/program would tend to trust their
           | own data .sys file would be valid, never thinking they'd
           | release a broken file or consider that files sometimes get
           | corrupted -- another failure mode waiting to happen on some
           | unfortunate soul's computer.
        
             | kchr wrote:
             | The file extension is `sys` by convention, it's nothing
             | magical to it and it's not handled in any special way by
             | the OS. In the case of CrowdStrike, there seems to be some
             | confusion as to why they use this file extension since it's
             | only supposed to be a config/data file to be used by the
             | real kernel driver.
        
               | dcuthbertson wrote:
               | Thanks. I understand that '.sys' is a naming convention.
               | I'd guess that they used it because those config/data
               | files are used by their kernel driver, and so makes
               | kernel vs user-space files easier to distinguish.
        
           | cynicalsecurity wrote:
           | I'm betting on them having no internal testing.
        
           | fhub wrote:
           | #1 could be slit into two parts I think. Microsoft kernel
           | side and CloudStrike module side.
        
           | LtWorf wrote:
           | > 4. This was pushed out everywhere simultaneously instead of
           | staggered to limit any potential damage.
           | 
           | Most importantly it was never tested at all :D
        
           | spike021 wrote:
           | 6. Companies using CS have no testing to verify that new
           | updates won't break anything.
           | 
           | Any SWE job I've worked over my entire career, nothing is
           | deployed with new versions of dependencies without testing
           | them against a staging environment first.
        
             | strunz wrote:
             | Crowdstrike doesn't give that option. Updates happen
             | without a choice to "keep you safe".
        
           | bryant wrote:
           | Of all of these, I think #3 has crowdstrike the most exposed,
           | legally. Companies with robust update and config management
           | protocols got burned by this as well, including places like
           | hospitals and others with mission critical systems where
           | config management is more strictly enforced.
           | 
           | If the crowdstrike selloff continues, I'm betting this will
           | be why.
           | 
           | (There's a chance I'll make trading decisions based on this
           | rationale in the next 72 hours, though I'm not certain yet)
        
           | cratermoon wrote:
           | > Individual settings for when to apply such updates were
           | apparently ignored.
           | 
           | I've heard that said elsewhere, but I haven't found a source
           | for it at all. Are you able to point to one for me?
        
         | bradley13 wrote:
         | No bet. There are two failures here. (1) Failing to check the
         | data for validity, and (2) Failing to handle an error
         | gracefully.
         | 
         | Both of these are undergraduate-level techniques. Heck, they
         | are covered in most first-semester programming courses. Either
         | of these failures is inexcusable in a professional product,
         | much less one that is running with kernel-level privileges.
         | 
         | Bet: CrowdStrike has outsourced much of its development work.
        
           | ahoka wrote:
           | What do you mean by outsourced?
        
             | Rinzler89 wrote:
             | He probably means work was sent offshore to offices with
             | cheaper labor that's less skilled or less vested into
             | delivering quality work. Though there's no proof of that
             | yet, people just like to throw the blame on offshoring
             | whoever $BIG_CORP fucks up, as if all programmers in the US
             | are John Carmack and they can never cause catastrophic
             | fuckups with their code or processes.
        
               | jojobas wrote:
               | Not everyone in the US might be Carmack, but it's
               | ridiculously nearsighted to assert that cultural
               | differences don't play into people desire and ability to
               | Do It Right.
        
               | Rinzler89 wrote:
               | It's not cultural differences that make the difference in
               | output quality, it's pay and quality standards of the
               | output set by the team/management, which is also mostly a
               | function of pay, since underpaid and unhappy developers
               | tend not to care at all beyond doing the bare minimum to
               | not getting fired (#notmyjob, laying flat movement, etc).
               | 
               | You think everyone writing code in the US would give two
               | shits about the quality of their output if they see the
               | CEO pocketing another private jet while they can barley
               | make big-city rent?
               | 
               | Hell, even well paid devs at top companies in the US can
               | be careless and lazy if their company doesn't care about
               | quality. Have you seen some of the vulnerabilities and
               | bugs that make it into the Android source code and on
               | Pixel devices? And guess what, that code was written by
               | well paid developers in the US, hired at Google leetcode
               | standards, yet would give far-east sweatshops a run for
               | their money in terms of carelessness. It's what you get
               | when you have a high barrier of entry but a low barrier
               | of output quality where devs just care about "rest and
               | vest".
        
               | bradley13 wrote:
               | I was talking about outsourcing (and not necessarily
               | offshoring). Too many companies like CrowdStrike are run
               | by managers who think that management, sales, and
               | marketing are the important activities. Software
               | development is just an unpleasant expense that needs to
               | be minimized. Hence: outsourcing.
               | 
               | That said, I have had some experience with classic
               | offshoring. Cultural differences make a huge difference!
               | 
               | My experience with "typical" programmers from India,
               | China, et al is that they do _exactly_ what they are
               | told. Their boss makes the design decisions down to the
               | last detail, and the  "programmers" are little more than
               | typists. I specifically remember one sweatshop where the
               | boss looped continually among the desks, giving each
               | person very specific instructions of what they were to do
               | next. The individual programmers implemented his
               | instructions literally, with zero thought and zero
               | knowledge of the big picture.
               | 
               | Even if the boss was good enough to actually keep the big
               | picture of a dozen simultaneous activities in his head,
               | his non-thinking minions certainly made mistakes. I have
               | no idea how this all got integrated and tested, and I
               | probably don't want to know.
        
               | Rinzler89 wrote:
               | _> That said, I have had some experience with classic
               | offshoring. Cultural differences make a huge difference!
               | 
               | _
               | 
               | Sure but there's no proof yet that was the case here.
               | That's just masive speculations based on anecdotes on
               | your side. There's plenty of offshore devs that can run
               | rings around western devs.
        
               | Spooky23 wrote:
               | Staff trained at outsourcers have a different type of
               | focus. My experience is more operational, and usually the
               | training for those guys is about restoration to hit SLA,
               | period. Makes root cause harder to ID sometimes.
               | 
               | It doesn't mean 'Murica better, just that the origin
               | story of staff matter, especially if you don't have good
               | processes around things like rca.
        
               | jojobas wrote:
               | Western slacker movements never came close to deadma or
               | the dedicated indifference in the face of samsara. You
               | seem to have a lot of experience with the former and
               | little of the latter two, but what do I know.
               | 
               | Every stereotype exists for a reason.
        
               | ahoka wrote:
               | Offshoring and outsourcing is very different. It would be
               | also very hard to talk about offshoring at a company
               | claiming to provider services in 170 countries.
        
             | spotplay wrote:
             | It's probably just the common US-centric bias that external
             | development teams, particularly those overseas, may deliver
             | subpar software quality. This notion is often veiled under
             | seemingly intellectual critiques to avoid overt xenophobic
             | rhetoric like "They're taking our jobs!".
             | 
             | Alternatively, there might be a general assumption that
             | lower development costs equate to inferior quality, which
             | is a flawed yet prevalent human bias.
        
               | chuckadams wrote:
               | "You get what you pay for" is still a reasonable metric,
               | even if it is more a relative scale than an absolute one.
        
           | danielPort9 wrote:
           | > Either of these failures is inexcusable in a professional
           | product
           | 
           | Don't we have those kind of failures in almost every
           | professional product? I've been working in the industry for
           | over a decade and in every single company we had those bugs.
           | The only difference was that none of those companies were
           | developing kernel modules or whatever. Simple saas. And no,
           | none of the bugs were outsourced (the companies I worked for
           | hired only locals and people in the range of +- 2h time zone)
        
         | variadix wrote:
         | More or less. Binary parsers are the easiest place to find
         | exploits because of how hard it is to do correctly. Bounds
         | checks, overflow checks, pointer checks, etc. Especially when
         | the data format is complicated.
        
         | praptak wrote:
         | > imperative languages allow us to do so
         | 
         | This problem has a promising solution, WUFFS, "a memory-safe
         | programming language (and a standard library written in that
         | language) for Wrangling Untrusted File Formats Safely."
         | 
         | HN discussion: https://news.ycombinator.com/item?id=40378433
         | 
         | HN discussion of Wuffs implementation of PNG parser:
         | https://news.ycombinator.com/item?id=26714831
        
         | noobermin wrote:
         | So, I also have near zero cybersecurity expertise (I took an
         | online intro course on cryptography due to curiousity) and no
         | expertise in writing kernel modules actually, but why if ever
         | would you parse an array of pointers...in a file...instead of
         | any other way of serializing data that doesn't include
         | hardcoded array offsets in an on-disk file...
         | 
         | Ignore this failure which was catastrophic, this was a bad
         | design asking to be exploited by criminals.
        
           | Jare wrote:
           | Performance, I assume. Right now it may look like the wrong
           | tradeoff, but every day in between incidents like this we're
           | instead complaining that software is slow.
           | 
           | Of course it doesn't have to be either/or; you can have fast
           | + secure, but it costs a lot more to design, develop,
           | maintain and validate. What you can't have is a "why don't
           | they just" simple and obvious solution that makes it cheap
           | without making it either less secure, less performant, or
           | both.
           | 
           | Given all the other mishaps in this story, it is very well
           | possible that the software is insecure (we know that), slow
           | and also still very expensive. There's a limit to how high
           | you can push the triangle, but there's not bottom to how bad
           | it can get.
        
           | deaddodo wrote:
           | I'm curious, how else would you store direct memory offsets?
           | No matter how you store/transmit them, eventually you're
           | going to need those same offsets.
           | 
           | The problem wasn't storing raw memory offsets, it was not
           | having some way to validate the data at runtime.
        
         | lol768 wrote:
         | > I'm happy to up the ante by PS50 to account for my second
         | theory
         | 
         | What's that, three pints in a pub inside the M25? :P
         | 
         | Completely agree with this sentiment though, we've known that
         | handling of binary data in memory unsafe languages has been
         | risky for yonks. At the very least, fuzzing should've been
         | employed here to try and detect these sorts of issues. More
         | fundamentally though, where was their QA? These "channel files"
         | just went out of the door without any idea as to their
         | validity? Was there no continuous integration check to just ..
         | ensure they parsed with the same parser as was deployed to the
         | endpoints? And why were the channel files not deployed
         | gradually?
        
           | TeMPOraL wrote:
           | FWIW, before someone brings up JSON, GP's bet only makes
           | sense when "binary" includes parsing text as well. In fact,
           | most notorious software bugs are related to misuse of textual
           | formats like SQL or JS.
        
         | 1992spacemovie wrote:
         | Interesting observation. As a non-developer, what can one do to
         | enhance coverage for these types of scenerios? Fuzz testing?
        
           | rwmj wrote:
           | Fuzz testing absolutely should be used whenever you parse
           | anything.
        
             | SoftTalker wrote:
             | Yeah, even if you are only parsing "safe" inputs such as
             | ones you created yourself. Other bugs and sometimes even
             | truly random events can corrupt data.
        
         | throw0101d wrote:
         | > _Approximately_ 100% _of CVEs, crashes, bugs, slowdowns, and
         | pain points of computing have to do with various forms of
         | deserialising binary data back into machine-readable data
         | structures._
         | 
         | For the record, the top 25 common weaknesses for 2023 are
         | listed at:
         | 
         | *
         | https://cwe.mitre.org/top25/archive/2023/2023_top25_list.htm...
         | 
         | Deserialization of Untrusted Data (CWE-502) was number fifteen.
         | Number one was Out-of-bounds Write (CWE-787), Use After Free
         | (CWE-416) was number four.
         | 
         | CWEs that have been in every list since they started doing this
         | (2019):
         | 
         | *
         | https://cwe.mitre.org/top25/archive/2023/2023_stubborn_weakn...
        
           | lioeters wrote:
           | # Top Stubborn Software Weaknesses (2019-2023)
           | 
           | Out-of-bounds Write
           | 
           | Improper Neutralization of Input During Web Page Generation
           | ('Cross-site Scripting')
           | 
           | Improper Neutralization of Special Elements used in an SQL
           | Command ('SQL Injection')
           | 
           | Use After Free
           | 
           | Improper Neutralization of Special Elements used in an OS
           | Command ('OS Command Injection')
           | 
           | Improper Input Validation
           | 
           | Out-of-bounds Read
           | 
           | Improper Limitation of a Pathname to a Restricted Directory
           | ('Path Traversal')
           | 
           | Cross-Site Request Forgery (CSRF)
           | 
           | NULL Pointer Dereference
           | 
           | Improper Authentication
           | 
           | Integer Overflow or Wraparound
           | 
           | Deserialization of Untrusted Data
           | 
           | Improper Restriction of Operations within Bounds of a Memory
           | Buffer
           | 
           | Use of Hard-coded Credentials
        
             | TeMPOraL wrote:
             | Yup. Almost all of them are various flavor of fucking up a
             | parser or misusing it (in particular, all the injection
             | cases are typically caused by writing stupid code that
             | glues strings together instead of proper parsing).
        
               | lolinder wrote:
               | That's not parsing, that's the inverse of parsing. It's
               | taking untrusted data and injecting it into a string that
               | will later be parsed into code without treating the data
               | as untrusted and adapting accordingly. It's compiling, of
               | a sort.
               | 
               | Parsing is the reverse--taking an untrusted string (or
               | binary string) that is meant to be code and converting it
               | into a data structure.
               | 
               | Both are the result of taking untrusted data and assuming
               | it'll look like what you expect, but both are not parsing
               | issues.
        
               | TeMPOraL wrote:
               | > _It 's taking untrusted data and injecting it into a
               | string that will later be parsed into code without
               | treating the data as untrusted and adapting accordingly._
               | 
               | Which is precisely why parsing should've been used here
               | instead. The correct way to do this is to work at the
               | level after parsing, not before it. "SELECT * FROM foo
               | WHERE bar LIKE ${untrusted input}" is dumb. Parsing the
               | query with a placeholder in it, replacing it as an
               | abstract node in the parsed form with data, and then
               | serializing to string if needed to be sent elsewhere, is
               | the correct way to do it, and is immune to injection
               | attacks.
        
               | lolinder wrote:
               | For SQL we tend to use prepared statements as the answer,
               | which probably do some parsing under the hood but that's
               | not visible to the programmer. I'd raise a lot of
               | questions if I saw someone breaking out a parser to
               | handle a SQL injection risk.
        
               | TeMPOraL wrote:
               | That's because prepared statements were developed before
               | understaning of langsec was mature enough. They provide a
               | very simple API, but it's at (or above) the right level -
               | you just get to use special symbols to mark "this node
               | will be provided separately", and provide it separately,
               | while the API makes sure it's correctly integrated into
               | the whole according to the rules of the language.
               | 
               | (Probably one other factor is that SQL was designed in a
               | peculiar way, for "readability to non-programmers", which
               | tends to result with languages that don't map well to
               | simple data structures. Still, there are tools that let
               | you construct a tree, and will generate a valid SQL from
               | that.)
               | 
               | HTML is a better example, because it's inherently tree-
               | structured, and trees tend to be convenient to work with
               | in code. There it's more obvious when you're crossing
               | from dumb string to parsed representation, and then back.
        
           | stouset wrote:
           | > Number one was Out-of-bounds Write (CWE-787)
           | 
           | Surely many of these originate from deserialization of
           | untrusted data (e.g., trusting a supplied length). It's
           | probably documented but I'm passively curious how they
           | disambiguate these cases.
        
         | eru wrote:
         | > Approximately 100% of CVEs, crashes, bugs, slowdowns, and
         | pain points of computing have to do with various forms of
         | deserialising binary data back into machine-readable data
         | structures. All because a) human programmers forget to account
         | for edge cases, and b) imperative programming languages allow
         | us to do so.
         | 
         | I wouldn't blame imperative programming.
         | 
         | Eg Rust is imperative, and pretty good at telling you off when
         | you forgot a case in your switch.
         | 
         | By contrast the variant of Scheme I used twenty years ago was
         | functional, but didn't have checks for covering all cases. (And
         | Haskell's ghc didn't have that checked turned on by default a
         | few years ago. Not sure if they changed that.)
        
         | seymore_12 wrote:
         | >Approximately 100% of CVEs, crashes, bugs, slowdowns, and pain
         | points of computing have to do with various forms of
         | deserialising binary data back into machine-readable data
         | structures. All because a) human programmers forget to account
         | for edge cases, and b) imperative programming languages allow
         | us to do so.
         | 
         | This. One year ago UK air traffic control collapsed due to
         | inability to properly parse "faulty" flight plan:
         | https://news.ycombinator.com/item?id=37461695
        
         | cedws wrote:
         | I'd say that it is a bug by definition if your program
         | ungracefully crashes when it's passed malformed data at
         | runtime.
        
         | stefan_ wrote:
         | People are target fixating too much. Sure, this parser crashed
         | and caused the system to go down. But in an alternative
         | universe they push a definition file that rejects every
         | openat() or connect() syscall. Your system is now equally as
         | dead, except it probably won't even have the grace to restart.
         | 
         | The whole concept of "we fuck with the system in kernel based
         | on data downloaded from the internet" is just not very sound
         | and safe.
        
           | hello_moto wrote:
           | It's not and that's the sad state of AV in Windows
        
         | xxs wrote:
         | >(for the non-British, that's PS100)
         | 
         | next time you'd be adding /s to your posts
        
         | back_to_basics wrote:
         | "human programmers forget to account for edge cases"
         | 
         | Which is precisely the rationale which led to Standard
         | Operating Procedures and Best Practices (much like any other
         | Sector of business has developed).
         | 
         | I submit to you, respectfully, that a corporation shall never
         | rise to a $75 Billion Market Cap without a bullet-proof
         | adherence to such, and thus, this "event" should be properly
         | characterized and viewed as a very suspicious anomaly, at the
         | least
         | 
         | https://news.ycombinator.com/item?id=41023539 fleshes out the
         | proper context.
        
         | divan wrote:
         | Related talk:
         | 
         | 28c3: The Science of Insecurity (2011)
         | 
         | https://www.youtube.com/watch?v=3kEfedtQVOY
        
           | nonrandomstring wrote:
           | Excellent talk. So long ago and what since?
        
         | smsm42 wrote:
         | > combination of said bad binary data and a poorly-written
         | parser that didn't error out correctly upon reading invalid
         | data
         | 
         | By now, if you write any parser that deals with any outside
         | data and don't fuzz the heck out of it, you are willfully
         | negligent. Fuzzers are pretty easy to use, automatic and would
         | likely catch any such problem pretty soon. So, did they fuzz
         | and got very very unlucky or do they just like to live
         | dangerously?
        
       | hannasm wrote:
       | Do these customers of crowd strike even have a say in these
       | updates going out or do they all just bend over and let crowd
       | strike have full RCE on every machine in their enterprise.
       | 
       | I sure hope the certificate authorities and other crypto folks
       | get to keep that stuff off their systems at least.
        
         | Centigonal wrote:
         | I don't know if there's a way to outsource ongoing endpoint
         | security to a third party like Crowdstrike _without_ giving
         | them RCE (and ring 0 too) on all endpoints to be secured.
         | Having Crowdstrike automate that part is kind of the point of
         | their product.
        
         | Kwpolska wrote:
         | Auto-updates of "content" (what it thinks is malware) are
         | mandatory and bypass the option to delay updates:
         | https://twitter.com/patrickwardle/status/1814367918425079934
        
         | raincole wrote:
         | In our lifetime we'll see an auto update to self-driving cars
         | that kills millions.
         | 
         | Well it's likely we don't see that because we might be one of
         | the millions.
        
       | JSDevOps wrote:
       | Hasn't this been debunked?
        
       | codeulike wrote:
       | 'Analysis' of the null pointer is completely missing the point.
       | The simple fact of the matter is they didnt do anywhere near
       | enough testing before pushing the files out. Auto update comes
       | with big responsibility, this was criminally reckless
        
         | CaliforniaKarl wrote:
         | There are enough people in the world that some can examine how
         | this happened while others simultaneously examine why this
         | happened.
        
       | mkl95 wrote:
       | How feasible would it be to implement blue green deployments in
       | that kind of system?
        
       | hatsunearu wrote:
       | So was the totally empty channel file just a red herring?
        
         | Kwpolska wrote:
         | I think the file with all zeros was the fix that CS pushed out
         | after they learned of their mistake.
        
       | donatj wrote:
       | I am genuinely curious what their CI process that passed this
       | looks like, as well as if they're doing any sort of dogfooding or
       | manual QA? Are changes just CI/CD'd out to production right away?
        
       | webprofusion wrote:
       | The girl on the supermarket checkout said she hoped her computer
       | wouldn't be affected. I knowingly laughed and said "you probably
       | don't have on your own computer unless your a bank".
       | 
       | She said, "I installed it before for my cybersecurity course but
       | I think it was just a trial"
       | 
       | Assumptions eh.
        
       | hatsunearu wrote:
       | I see a paradox that the null bytes are "not related" to the
       | current situation and yet deleting the file seems to cure the
       | issue. Perhaps the CS official statement that "This is not
       | related to null bytes contained within Channel File 291 or any
       | other Channel File." is poorly worded.
       | 
       | My opinion is that CS is trying to say the null bytes themselves
       | aren't the actual root cause of the issue, but merely a trigger
       | for the actual root cause, which is that CSAgent.sys has a
       | problem where malformed input vectors can cause it to crash. Well
       | designed programs should error out gracefully for foreseeable
       | errors, like corrupted config files.
       | 
       | If we interpret that quoted sentence such that "this" is
       | referring to "the logical error", and that "the logical error" is
       | the error in CSAgent.sys that causes it to crash upon reading a
       | bad channel file, then that statement makes sense.
       | 
       | This is a bit of a stretch, but so far my impression with CS
       | corporate communication regarding this issue has been nothing but
       | abject chaos, so this is totally on-brand for them.
        
         | chrisjj wrote:
         | > My opinion is that CS is trying to say the null bytes
         | themselves aren't the actual root cause of the issue, but
         | merely a trigger for the actual root cause,
         | 
         | My opinion is they say "unrelated" because they are trying to
         | say unrelated - and hence no, this was not a trigger.
        
           | hatsunearu wrote:
           | Then are the null bytes just a coincidence? Why does deleting
           | it fix the issue then, and why is it that it is missing the
           | 0xAAA... file signature?
        
       | peter_retief wrote:
       | I don't do windows either.
        
       | throwyhrowghjj wrote:
       | This is a pretty brief 'analysis'. The poster traces back one
       | stack frame in assembler, it basically amounts to just reading
       | out a stack dump from gdb. It's a good starting point I guess.
        
       | siscia wrote:
       | The thing I don't understand about all of this is another, much
       | less technical and much more important.
       | 
       | Why the blas radius was so huge?
       | 
       | I have deployed much less important services much more slowly
       | with automatic monitoring and rollback in place.
       | 
       | You first deploy to beta, where you don't get customers traffic,
       | if everything goes right to a small part of your fleet, and
       | slowly increase the percentage of hosts that receives the
       | updates.
       | 
       | This would have stopped the issue immediately, and I somehow I
       | thought it was common practices...
        
         | moogly wrote:
         | They don't seem to dogfood their own software. They don't seem
         | to think it's very useful software in their own org, I guess.
        
         | INTPenis wrote:
         | Considering the impact this incident had they definitely should
         | have a large staging environment of windows clients to deploy
         | first.
         | 
         | There are so many ways to avoid this issue, or at least
         | minimize the risk of it happening, but as always profits come
         | before people.
        
         | andy81 wrote:
         | Even if there was a canary release process for code updates,
         | the config updates seem to have been on a separate channel.
         | 
         | The expectation being that people want up-to-date virus
         | detection rules constantly even if they don't want potentially
         | breaking changes.
         | 
         | The missed edge case being an untested config that breaks
         | existing code.
         | 
         | Source: Pure speculation, don't quote this in news articles.
        
         | vbezhenar wrote:
         | It wasn't software update. It was signature database update.
         | It's supposed to roll out as fast as possible. When you learn
         | about new virus, it's already in the wild, so every minute
         | counts. You don't want to delay update for a day just to find
         | out that your servers were breached 20 hours ago.
        
           | TeMPOraL wrote:
           | We can see clearly now that this is a stupid approach.
           | Viruses don't move that fast.
           | 
           | This situation is akin to the immune system overreacting and
           | melting the patient in response to a papercut. This sometimes
           | happens, but it's considered a serious medical condition, and
           | I believe the treatment is to nuke someone's immune system
           | entirely with hard radiation, and reinstall a less aggressive
           | copy. Take from that analogy what you want.
        
             | orf wrote:
             | > Viruses don't move that fast
             | 
             | Yes they do? And it's more akin to a shared immune system
             | than a single organism.
             | 
             | In this case, it's not like viruses move fast relative to
             | the total population of machines, but within the population
             | of machines being targeted they do move fast.
        
               | TeMPOraL wrote:
               | Still, better to let them spread a bit and deal with the
               | localized damage than risk nuking everything. There is
               | such a thing as treatment that's very effective, but not
               | used because of a low probability risk of terminal
               | damage.
        
               | proveitbh wrote:
               | Cite one virus thay crashed the supposed 10 or 100
               | million machines in 70 minutes.
               | 
               | Just one.
        
               | orf wrote:
               | Microsoft puts the count at 8.5 million computers. So,
               | percentage wise, the MyDoom virus in 2004 infected a far
               | greater % of computers in a month: which in the context
               | of internet penetration, availability and speeds (40kb/s
               | average, 450kb/s fastest) in 2004 was about as fast as it
               | could have. So it might as well have been 70 minutes,
               | given downloading a 50mb file on dial up would take way
               | longer than 70 mins.
               | 
               | To the smart people below:
               | 
               | It's clear to everyone that 70 minutes is not 1 month.
               | The point is that it's not a fair comparison: it would
               | simply not have been possible to infect that many
               | computers in 70 minutes: the internet infrastructure just
               | wasn't there.
               | 
               | It's like saying "the Spanish flu didn't do that much
               | damage because there where less people on the planet" -
               | it's a meaningless absolute comparison, whereas the
               | relative comparison is what matters.
        
               | smartpeoplebelw wrote:
               | There's also orders of magnitudes more machines today
               | than 20 years ago -- so it should be easier to infect
               | more machines now than before, and yet no one can sight a
               | virus that was as quickly moving and damaging as what
               | crowdstrike did through gross negligence.
               | 
               | Be better.
        
               | orf wrote:
               | This entire thread is stupid.
               | 
               | Computer security as a whole has improved, whilst the
               | complexity of interconnected systems has exponentially
               | increased.
               | 
               | This has made the barrier to entry for malware higher,
               | and so means we no longer have the same historic examples
               | of large scale worms targeting consumer machines _that we
               | used to_.
               | 
               | At the same time the financial rewards for finding and
               | exploiting a vulnerability within an organisations
               | complex stack have greatly increased. The rewards are
               | coupled to the time it takes to execute on the
               | vulnerability.
               | 
               | This leads to what we have today: localised, and often
               | specialised attacks against valuable targets that are
               | executed as fast as possible in order to minimise the
               | chance a target has to respond or the vulnerability they
               | are exploiting to be burned.
               | 
               | Of course the "smart people belw" must know this, so it's
               | unclear why they are pretending to be dumb.
        
               | TeMPOraL wrote:
               | > _This leads to what we have today: localised, and often
               | specialised attacks against valuable targets that are
               | executed as fast as possible in order to minimise the
               | chance a target has to respond or the vulnerability they
               | are exploiting to be burned._
               | 
               | Yup, exactly that.
               | 
               | So what I'm saying it, it's beyond idiotic to combat this
               | with a kernel-level backdoor managed by one entity and
               | deployed across half the Internet. If anyone manages to
               | breach _that_ , they have a way to make their attack much
               | simpler and much less localized (though they're unlikely
               | to be prepared to capitalize on that). A fuckup on the
               | defense side, on the other hand, can kill everything
               | everywhere all at once. Which is what just happened.
               | 
               | It's a "cure" for disease that happens to both boost the
               | potency of the disease, _and_ , once in blue moon,
               | randomly kills the patient for no reason.
        
               | orf wrote:
               | But now you run into the tragedy of the commons.
               | 
               | The fact is that this _does_ help organisations.
               | Definitely not all of the orgs that buy Crowdstrike, but
               | rapid defence against evolving threats is a valuable
               | thing for companies.
               | 
               | So, individually it's good for a company. But as a whole,
               | and as currently implemented, it's not good for everyone.
               | 
               | However that doesn't matter. Because individually it's a
               | benefit.
        
               | TeMPOraL wrote:
               | That's right.
               | 
               | Which is why I'm hoping that this incident will make both
               | security professionals and regulators reconsider the idea
               | of endpoint security as it's currently done, and that
               | there will be some cultural and regulatory pushback.
               | Maybe this will incentivize people to come up with other
               | ideas on how to secure systems and companies, that don't
               | look like a police state on steroids.
        
               | 8organicbits wrote:
               | ILOVEYOU is a pretty decent contender, although the
               | Internet was smaller back then and it didn't "crash"
               | computers, it did different damage. Computer viruses and
               | worms can spread extremely quickly.
               | 
               | > infected millions of Windows computers worldwide within
               | a few hours of its release
               | 
               | See: https://en.wikipedia.org/wiki/Timeline_of_computer_v
               | iruses_a...
        
               | echoangle wrote:
               | Can you explain why you find this idea of fast moving
               | viruses so improbable? Just from the way the internet
               | works, I wouldn't be surprised if every reachable host
               | could be infected in a few hours if the virus can infect
               | a machine in a short time (a few seconds) and would then
               | begin infecting other machines. Why is that so hard to
               | imagine?
        
               | SoftTalker wrote:
               | Proper firewalling for one. "Every reachable host" should
               | be a fairly small set, ideally an empty set, when you're
               | on the outside looking in.
               | 
               | And operating systems aren't _that_ bad anymore. You don
               | 't have services out of the box opening ports on all the
               | interfaces, no firewalls, accepting connections from
               | everywhere, and using well-known default (or no)
               | credentials.
               | 
               | Even stuff like the recent OpenSSH bug that is remotely
               | exploitable and grants root access wasn't anything close
               | to this kind of disaster because (a) most computers are
               | not running SSH servers on the public internet (b) the
               | exploit is rather difficult to actually execute.
               | Eventually it might not be, but that gives people a bit
               | of breathing space to react.
               | 
               | Most cyberattacks use old, unpatched vulnerabilites
               | against unprotected systems combined with social
               | engineering to get the payload past the network boundary.
               | If you are within a pretty broad window of "up to date"
               | on your OS and antivirus updates, you are pretty safe.
        
               | echoangle wrote:
               | The focus seems to have been the time limit though. All
               | the reasons you mention are just that there aren't even
               | that many targets.
        
               | hello_moto wrote:
               | The malware doesn't need to infect 100 million machines.
               | 
               | It just needs to infect 200k devices to get to the pot:
               | hundred million dollars of ransomware.
        
               | TeMPOraL wrote:
               | It's a trivial cost to pay if the alternative is
               | CrowdStrike inflicting billions of dollars of damage and
               | loss of life across several countries.
               | 
               | (I expect this to tally up to double-digit billions and
               | thousands of lives lost directly to the outages when the
               | dust settles.)
        
               | hello_moto wrote:
               | Trivial cost to pay from which side?
               | 
               | The organization like MGM and London Drugs?
        
               | nullindividual wrote:
               | https://www.caida.org/catalog/papers/2003_sapphire/
               | 
               | [SQL] Slammer spread incredibly quickly, even though the
               | vulnerability was patched in the prior year.
               | 
               | > As it began spreading throughout the Internet, it
               | doubled in size every 8.5 seconds. It infected more than
               | 90 percent of vulnerable hosts within 10 minutes.
               | 
               | Worms are not technically viruses, but they can have
               | similar impacts/perform similar tasks on an infected
               | host.
        
               | smartpeoplebelw wrote:
               | You are off by several orders of magnitude
               | 
               | Also keep in mind 8.5 million is likely the count of
               | machines fully impacted and are not counting the machines
               | impacted but were able to be automatically recovered.
        
               | nullindividual wrote:
               | > You are off by several orders of magnitude
               | 
               | Can you cite something? This is HN, not reddit.
               | 
               | > Also keep in mind 8.5 million is likely the count of
               | machines fully impacted and are not counting the machines
               | impacted but were able to be automatically recovered.
               | 
               | Do you have evidence of this? Please bring sources with
               | you.
        
               | aldousd666 wrote:
               | No they're in the wrong. They didn't test adequately,
               | regardless of their motive for not doing so. Obviously
               | reality is not backing up your theory there
        
               | orf wrote:
               | FYI, both the following statements can be true:
               | 
               | 1. Crowdstrike didn't test adequately
               | 
               | 2. Viruses can move pretty fast once a foothold is gained
        
             | UncleMeat wrote:
             | https://en.wikipedia.org/wiki/SQL_Slammer
             | 
             | There is no real "speed limit" on malware spread.
        
               | TeMPOraL wrote:
               | No, but there are impenetrable barriers. 0days in
               | paricular are usually very specific and affect few
               | systems directly but even the broader ones aren't usually
               | followed by a blanket attack that pwns everything and
               | steals all the data or monies. Just about the only way to
               | achieve this kind of blast radius is to have a kernel-
               | level backdoor installed in every other computer on the
               | planet - which is _exactly_ what those endpoint
               | "security" systems are.
        
           | pyeri wrote:
           | But why does a signature database update have to mess with
           | the kernel in any kind of way? Shouldn't such a database stay
           | in the user land?
        
             | vbezhenar wrote:
             | Because kernel needs to parse the data in some way and that
             | parser apparently was broken enough. Whether it could be
             | done in a more resilient manner, I don't know, you need to
             | remember that antivirus works in hostile environment and
             | can't necessarily trust userspace, so probably they need to
             | verify signatures and parse payload in the kernel space.
        
             | theshrike79 wrote:
             | The scanner is a Ring 0[0] program. Windows only has 2
             | options 0 and 3. 3 won't work for any kind of security
             | scanners, so they're forced to use 0.
             | 
             | The proper place would be Ring 1, which doesn't exist on
             | Windows.
             | 
             | And being a kernel-level operation, it has the capability
             | to crash the whole system before the actual OS has any
             | chance to intervene.
             | 
             | [0] https://en.wikipedia.org/wiki/Protection_ring
        
               | leosarev wrote:
               | Why is so?
        
               | hello_moto wrote:
               | That's a question for Microsoft OS architects
        
               | benchloftbrunch wrote:
               | Historical reasons. Windows NT was designed to support
               | architectures with only two privilege rings.
        
               | layer8 wrote:
               | All modern OSes only use ring 0 and 3. Intel is
               | considering removing rings 1 and 2 in a future revision
               | for that reason: https://www.intel.com/content/www/us/en/
               | developer/articles/t...
        
           | siscia wrote:
           | Thanks for the clarification, this makes more sense.
        
           | LeonB wrote:
           | It's quite impressive really -- crowdstrike were deploying a
           | content update to all of their servers to warn them of the
           | "nothing but nulls, anti-crowdstrike virus"
           | 
           | Their precognitive intelligence suggested that a world wide
           | attack was only moments away. The same precognitive system
           | showed that the virus was so totally incapacitating that the
           | only safe response was to incapacitate the server.
           | 
           | Knowing that the virus was capable of taking down _every_
           | crowdstrike server, they didn't waste time trying it on a
           | subset of servers.
           | 
           | When you know you know.
        
           | Ensorceled wrote:
           | Surely there is a happy medium between zero
           | (nil,none,nada,zilch) staging and 24 hours of rolling
           | updates? A single 30 second or so VM test would have revealed
           | this issues.
        
             | layer8 wrote:
             | There should have been a test catching the error before
             | rollout, however this doesn't require a staged rollout as
             | suggested by the GP comment, testing the update at some
             | customers (which would still be hosed in that case), it
             | only requires executing the test before the rollout.
        
           | jrochkind1 wrote:
           | Yup. If they were delaying update to half of their customers
           | for 24 hours, and in that 24 hours some of their customers
           | got hacked by a zero day, say leading to ransomeware, the
           | comment threads would be demanding their head for _that_!
        
             | sateesh wrote:
             | Even if it is a staged rollout why would one do it in 24
             | hour phases ? It can be a hourly (say) staggered rollout
             | too.
        
               | jrochkind1 wrote:
               | Sure. And if someone showed up here with a story about
               | how they got attacked and ransomwared enterprise-wide in
               | the however many several hours that they were waiting for
               | their turn to rollout, what do you think HN response
               | would be?
               | 
               | Hmm, maybe you could have companies pay more to be in the
               | first rollout group? That'd go over well too.
        
           | sateesh wrote:
           | It doesn't matter what kind of update it was: signature,
           | content,etc. Only thing that matters is does the update has a
           | potential to disrupt the user's normal activity (leave alone
           | bricking the host), if yes ensure it either works or have a
           | staged rollout with a remediation plan.
        
           | aldousd666 wrote:
           | You do want to fuzz test it like crazy. Can be automated.
           | Takes minutes, saves billions
        
         | robxorb wrote:
         | "Blast radius" seems... apt.
         | 
         | It would be rather easier to understand and explain if it were
         | intentional. Likely not able to be discussed though.
         | 
         | Anyone able to do that here?
        
         | rplnt wrote:
         | It's answered in the post (in the thread) as well. But for
         | comparison, when I worked for an AV vendor we pushed maybe 4
         | updates a day to a much bigger customer base (if the numbers
         | reported by MS are true).
        
           | kchr wrote:
           | I'm curious, what did your deployment plan look like?
           | Phased/staggered, if so how?
        
       | heraldgeezer wrote:
       | How strange to cite ResetEra, a gaming forum with a significant
       | certain community, and may not be considered a reliable source.
        
       | switch007 wrote:
       | Is there commercial pressure to push out "content" updates asap
       | so you can say you're quicker than your competition at responding
       | to emerging threats?
        
       | minhoryang wrote:
       | Can we find an uptime(availability) graph for the CrowdStrike
       | agent? Don't you think this graph should be included in the
       | postmortem?
        
       | wasabinator wrote:
       | I wonder what privilege level this service runs at. If it's less
       | than ring 0, i think some blame needs to go to Windows itself. If
       | it's ring 0, did it really need to be that high??
       | 
       | Surely an OS doesn't have to go completely kaput due to one
       | service crashing.
        
         | Kwpolska wrote:
         | It's not a service, it's a driver. "Anti"malware drivers
         | typically run with a lot of permissions to allow spying on all
         | processes. Driver failures likely mean the kernel state is
         | borked as well, so Windows errs on the side of caution and
         | halts.
        
       | FergusArgyll wrote:
       | Why wasn't it caught?
       | 
       | https://manifold.markets/ChrisGreene/why-didnt-the-crowdstri...
        
       | dallas wrote:
       | Those who have spent time writing NDIS/TDI drivers are those who
       | know the minefield!
        
       | MuffinFlavored wrote:
       | Did this cause the Azure outage
       | https://status.dev.azure.com/_event/524064579 that happened like
       | 12 hours before or were they separate?
        
       | cedws wrote:
       | Does anybody know if these "channel files" are signed and
       | verified by the CS driver? Because if not, that seems like a
       | gaping hole for a ring 0 rootkit. Yeah, you need privileges to
       | install the channel files, but once you have it you can hide
       | yourself much deeper in the system. If the channel files can
       | cause a segfault, they can probably do more.
       | 
       | Any input for something that runs at such high privilege should
       | be at least integrity checked. That's the basics.
       | 
       | And the fact that you can simply delete these channel files
       | suggests there isn't even an anti-tamper mechanism.
        
       | calrain wrote:
       | This reminds me of the vulnerability that hit jwt tokens a few
       | years ago, when you could set the 'alg' to 'none'.
       | 
       | Surely CrowdStrike encrypts and signs their channel files, and
       | I'm wondering if a file full of 0's inadvertently signaled to the
       | validating software than a 'null' or 'none' encryption algo was
       | being used.
       | 
       | This could imply the file full of zeros is just fine, as the null
       | encryption passes, because it's not encrypted.
       | 
       | That could explain why it tried to reference the null memory
       | location, because the null encryption file full of zeroes just
       | forced it to run to memory location zero.
       | 
       | The risk is, if this is true, then their channel loading
       | verification system is critically exposed by being able to load
       | malicious channel drivers through disabled encryption on channel
       | files.
       | 
       | Just a hunch.
        
         | kachapopopow wrote:
         | That was the first thing I thought about when I started
         | analyzing this file.
        
       | Gazoche wrote:
       | What really blew my mind about this story is learning that a
       | single company (CrowdStrike) has the power to push random kernel
       | code to a large part of the world's IT infrastructure, at any
       | time, at their will.
       | 
       | Correct me if I'm wrong but isn't kernel-level access essentially
       | God Mode on every computer their software is installed on?
       | Including spying on the entire memory, running any code, deleting
       | data, installing ransomware? This feels like an insane amount of
       | power concentrated into the hands of a single entity, on the
       | level of a nuclear submarine. Wouldn't that make them a prime
       | target for all sorts of nation-state actors?
       | 
       | This time the damage was (likely) unintentional and no data was
       | lost (save for lost BitLocker keys), but were we really all this
       | time one compromised employee away from the largest-ever
       | ransomware attack, or even worse?
        
         | andix wrote:
         | It's not perfectly clear yet if CrowdStrike is able to push
         | executable code via those updates. It looks like they updated
         | some definition files and not the kernel driver itself.
         | 
         | But the kernel driver obviously contains some bugs, so it's
         | possible that those definition updates can inject code. There
         | might be a bug inside the driver that allows code execution (it
         | happens all the time that some file parsing code can be tricked
         | into executing parts of the data). I'm not sure, but I guess a
         | lot of kernel memory is not fully protected by NX bits.
         | 
         | I still have the gut feeling, that this incident was connected
         | to some kind of attack. Maybe a distraction from another attack
         | while everyone is busy about fixing all the clients. During
         | this incident security measures were for sure lowered, lists
         | with BitLocker keys printed out for service technicians to fix
         | the systems. Even the fix itself was to remove some parts of
         | the CroudStrike protection. I would really like to know what
         | was inside the C-00000291*.sys file before the update replaced
         | it with all zeros. Maybe it was a cleanup job to remove
         | something concerning that went wrong. But Hanlon's razor tells
         | me not to trust my gut: "Never attribute to malice that which
         | is adequately explained by stupidity."
        
           | neom wrote:
           | For what it's worth, I 10000% agree with your gut feeling,
           | and mine is a gut feeling too so I didn't mention it on HN
           | because we typically don't talk about these types of guts
           | feelings because of the directions they become speculative in
           | (+the razor), but what you wrote is _exactly_ what is in my
           | head, fwiw.
        
           | milkshakes wrote:
           | falcon absolutely has a remote code execution function as a
           | part of Falcon Response
        
             | andix wrote:
             | So CrowdStrike has direct access to a lot of critical
             | infrastructure? LOL.
        
         | neom wrote:
         | Well kernel agents and drivers are not uncommon, however anyone
         | doing anything at scale where there is anything touching a
         | kernel is typically well understood in the system you're
         | implementing it on. That aside, I gather from skimming around
         | (so might be wrong here) - seems people were specifically
         | implementing this because of a business case not a technical
         | case, I read it's mostly used to create compliance (I think via
         | shifted liability) - so I think it was probably too easy to
         | happen and so it happened - in that - someone in the bizniz
         | dept said "if we run this software we are compliant with
         | whatever, enabling XYZ multiple of new revenue, clear business
         | case!!!" and the tech people probably went "bizniz people want
         | this, bizniz case is clear, this seems like a relatively
         | advanced business who know what they're doing, it doesn't
         | really do much on my system and I'm mostly deploying it to
         | innocuous edge user systems, so seems fine _shrug_ " - and then
         | a bad push happened and lots and lots of IT departments had had
         | the same convo aforementioned.
         | 
         | Could be wrong here so if anyone knows better and can correct
         | me...plz do!
        
           | hello_moto wrote:
           | A lot of people, especially the non cybersecurity ones, are
           | way off the mark so you're not the only one.
        
           | lyu07282 wrote:
           | > implementing this because of a business case not a
           | technical case
           | 
           | there are some certification requirements to do pentests/red
           | teaming and then those security folk will all tell them to
           | install an EDR so they picked crowdstrike, but the security
           | people have a very valid technical case for that
           | recommendation.
           | 
           | it doesn't shift liability to crowdstrike, thats not how this
           | works. In this specific case they are very likely liable due
           | to gross negligence, but that is different
        
         | nightowl_games wrote:
         | > no data was lost
         | 
         | Data was lost in the knock on effects of this, I assure you.
         | 
         | > largest-ever ransomware attack
         | 
         | A ransomware attack would be a terrible use of this power. A
         | terrorist attack or cover while a country invades another
         | country is a more appropriate scale of potential damage here.
         | Perhaps even worse.
        
         | wellknownfakts wrote:
         | It is a well known fact that these companies who hold huge sway
         | on the world's IT landscape are commonly infiltrated at the top
         | levels by Intel agents.
        
         | ChoGGi wrote:
         | "What really blew my mind about this story is learning that a
         | single company (CrowdStrike) has the power to push random
         | kernel code to a large part of the world's IT infrastructure,
         | at any time, at their will."
         | 
         | Isn't that every antivirus software and game anticheat?
        
         | SoftTalker wrote:
         | The OS vendors themselves (Microsoft, Apple, all the linux
         | distros) have this power as well via their automatic update
         | channels. As do many others who have automatically-updating
         | applications. So it's not a single company, it's many
         | companies.
        
           | Gazoche wrote:
           | That's true; I suppose it doesn't feel as bad because they're
           | much larger companies and more in the public's eye. It's
           | still scary to think about the amount of power they yield.
        
         | Shorel wrote:
         | What blew my mind is that a single company has such a good
         | sales team to sell an unnecessary product to a large part of
         | the world's IT.
         | 
         | And if any part of it is necessary, then that's a failure of
         | the operating system. It should be a feature of Active
         | Directory or Windows.
         | 
         | So, great job sales team, you earned your commissions, now get
         | ready to jump ship, 'cause this one is sinking.
        
         | vimbtw wrote:
         | This is the mini existential crisis I have randomly. The attack
         | area for a modern IT computer is mind bogglingly massive.
         | Computers are pulling and executing code from a vast array of
         | "trusted" sources without a sandbox. If any one of those
         | "trusted" sources are compromised (package managers, cdns, OS
         | updates, security software updates, just app updates in
         | general, even specific utilities like xz) then you're
         | absolutely screwed.
         | 
         | It's hard not to be a little nihilistic about security.
        
       | flappyeagle wrote:
       | The only thing I know about crowdstrike is they hired a large
       | percentage of the underperforming engineers we fired at multiple
       | companies I've worked at
        
         | lizknope wrote:
         | https://www.zdnet.com/article/defective-mcafee-update-causes...
         | 
         | April 21, 2010
         | 
         | In 2010 McAffe caused a global IT meltdown due to a faulty
         | update. CTO at this time was George Kurtz. Now he is CEO of
         | crowdstrike
        
       | nesas wrote:
       | Nesa
        
       | taormina wrote:
       | Imagine if Microsoft sold you a secure operation system like
       | Apple. A staggering portion of the existing cybersecurity
       | industry would be irrelevant if this ever happened.
        
         | natdempk wrote:
         | Most enterprises these days also run stuff like Crowdstrike (or
         | literally Crowdstrike) on their macOS deployments. Similarly
         | Windows these days is bundled with OS-level antivirus which is
         | sufficient for non-enterprise users.
         | 
         | Not in the security industry, but my take is that basically the
         | desktop OS permissions and security model is wrong for a lot of
         | these devices, but there is no alternative that is suitable or
         | that companies are willing to invest in. Probably many of the
         | highest-profile affected machines (airport terminals, signage,
         | medical systems, etc.) should just resemble a
         | phone/iPad/Chromebook in terms of security/trust, but for
         | historical/cost/practical reasons are Windows PCs with
         | Crowdstrike.
        
           | kchr wrote:
           | CrowdStrike uses eBPF on Linux and System Extensions on
           | macOS. Neither if which need kernel level presence. Microsoft
           | should move towards offering these kind of solutions to make
           | AV and EDR more resistent on Windows devices, without
           | jeopardising system integrity and availability.
        
       | cybervegan wrote:
       | Boy is crowdstrike's software going to get seriously fuzz tested
       | now. All their vulns will be on public display in the next week
       | or so.
        
       | ai4ever wrote:
       | why is openai/anthropic letting this crisis go to waste ?
       | 
       | where are tweets from sama and amodei on how agi is going to fix
       | these issues ?
        
       | ok123456 wrote:
       | When your snake oil is poisonous.
        
       | meindnoch wrote:
       | Because it wasn't written in Rust!
        
         | qingcharles wrote:
         | https://tenor.com/view/oh-boy-here-we-go-again-oh-dear-omg-a...
        
       | kachapopopow wrote:
       | These "channel files" sound like they could be used to execute
       | arbitrary code... Would be a big embarrassment if it shows up in
       | KDU as a provider...
       | 
       | (This is just an early guess from looking at some of the csagent
       | in ida decompiler, haven't validated that all the sanity checks
       | can be bypassed as these channel files appear to have some kind
       | of signature attached to them.)
        
       | jonhohle wrote:
       | I don't run CrowdStrike and to the best of my knowledge haven't
       | had it installed on one of my systems (something similar ran on
       | my machine at the last corporate Jon I had), so correct me if I'm
       | wrong.
       | 
       | It seems great pains are made to ensure the CS driver is
       | installed first _and_ cannot be uninstalled (presumably the
       | remote monitor will notice) or tampered with (signed driver).
       | 
       | Then the driver goes and loads unsigned data files that can be
       | arbitrarily deleted by end users? Can these files also be
       | arbitrarily added by end users to get the driver to behave in
       | ways that it shouldn't? What prevents a malicious actor from
       | writing a malicious data file and starting another cascade of
       | failing machines or worse, getting kernel privileges?
        
         | mr_mitm wrote:
         | These files cannot be deleted or modified by the user, even
         | with admin privs. That would make it trivial to disable the
         | antivirus. It's only possible by mounting the file system in a
         | different OS, which is typically prevented by Bitlocker.
        
           | jonhohle wrote:
           | The files are deletable through safe mode, no? I'm assuming
           | they are writable by a program outside of the driver, right?
        
             | mr_mitm wrote:
             | Yes, but you need the Bitlocker key to get into safe mode
        
               | discostrings wrote:
               | Not in the BitLocker configurations I've seen over the
               | last few days. The file is deletable as a local
               | administrator in safe mode without the BitLocker recovery
               | key in at least some configurations.
        
       | fifteen1506 wrote:
       | Parsers, verifiers, whatever?
       | 
       | User space downloads file.
       | 
       | User space sets up probation dir.
       | 
       | User space requests kernel to load once the new file.
       | 
       | After that, after a successful boot or 36 hours the file is
       | marked as safe and set to autoload.
       | 
       | Or, you know, just load it. It will be cheaper. The ROI on
       | loading it immediately is far greater and that's what counts.
        
       | apatheticonion wrote:
       | One thing I am surprised no one has been discussing is the role
       | Microsoft have played in this and how they set the stage for the
       | CrowdStrike outage through a lack of incentive (profit,
       | competition) to make Windows resilient to this sort of situation.
       | 
       | While they were not directly responsible for the bug that caused
       | the crashes, Microsoft does hold an effective monopoly position
       | over workstation computing space (I'd consider this as
       | infrastructure at this point) and therefore have a duty of care
       | to ensure the security/reliability and capabilities of their
       | product.
       | 
       | Without competition, Microsoft have been asleep at the wheel on
       | innovations to Windows - some of which could have prevented this
       | outage.
       | 
       | For example; Crowdstrike runs in user space on MacOS and Linux -
       | does Windows not provide the capabilities needed to run
       | Crowdstrike in user space?
       | 
       | What about innovations in application sandboxing which could
       | mitigate the need for level of control CrowdStrike requires?
       | 
       | The fact is; Microsoft is largely uncontested in holding the keys
       | to the world's computing infrastructure and they have virtually
       | no oversight.
       | 
       | Windows has fallen from making over 80% of Microsoft's revenue to
       | 10% today - there is nothing wrong with being a private company
       | chasing money - but when your product is critical to the
       | operation of hospitals, airlines, critical infrastructure, you
       | can't be out there tickling your undercarriage on AI assistants
       | and advertisements to increase the product's profitability.
       | 
       | IMO Microsoft have dropped the ball on their duty of care to
       | consumers and CrowdStrike is a symptom of that. Governments need
       | to seriously consider encouraging competition in the desktop
       | workspace market. That, or regulate Microsoft's Windows product
        
       ___________________________________________________________________
       (page generated 2024-07-21 23:03 UTC)