[HN Gopher] Initial details about why CrowdStrike's CSAgent.sys ...
___________________________________________________________________
Initial details about why CrowdStrike's CSAgent.sys crashed
Author : pilfered
Score : 466 points
Date : 2024-07-21 00:17 UTC (22 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| blirio wrote:
| So is unmapped address another way of saying null pointer?
| two_handfuls wrote:
| It's an invalid pointer yes, but it doesn't say whether it's
| null specifically.
| blirio wrote:
| Oh wait, I just remembered null is normally 0 in C and C++.
| So probably not that if it is not 0.
| taspeotis wrote:
| What? If you have a null pointer to a class, and try to
| reference the member that starts 156 bytes from the start
| of the class, you'll deference 0x9c (0 + 156)
| emmelaich wrote:
| Strangely, not necessarily on every implementation on
| every processor.
|
| It's not guaranteed that NULL is 0.
|
| Still, I don't think you'd find a counterexample in the
| wild these days.
| chongli wrote:
| NULL isn't always the integer 0 in C. It's implementation-
| defined.
| loeg wrote:
| In every real world implementation anyone cares about,
| it's zero. Also I believe it is defined to compare equal
| to zero in the standard, but don't quote me on that.
| tzs wrote:
| > Also I believe it is defined to compare equal to zero
| in the standard, but don't quote me on that.
|
| That's true for the literal constant 0. For 0 in a
| variable it is not necessarily true. Basically when a
| literal 0 is assigned to a pointer or compared to a
| pointer the compiler takes that 0 to mean whatever bit
| pattern represents the null pointer on the target system.
| cmpxchg8b wrote:
| If you have a page mapped at address 0, accessing address 0
| is valid.
| cratermoon wrote:
| Looks like a null pointer error to me
| https://www.youtube.com/watch?v=pCxvyIx922A
| jeffbee wrote:
| "Attempt to read from address 0x9c" doesn't strike me as
| "null pointer". It's an invalid address and it doesn't
| really matter if it was null or not.
| GeneralMayhem wrote:
| 0x9c (156 dec) is still a very small number, all things
| considered. To me that sounds like attempting to access
| an offset from null - for instance, using a null pointer
| to a struct type, and trying to access one of its member
| fields.
| Aloisius wrote:
| Could just as easily be accessing an uninitialized
| pointer, especially given there is a null check
| immediately before.
| Dwedit wrote:
| 9C means that it's a NULL address plus some offset of 9C.
| Like a particular field of a struct.
| loeg wrote:
| It is pretty common for null pointers to structures to
| have members dereferenced at small offsets, and people
| usually consider those null dereferences despite not
| literally being 0. (However, the assembly generated in
| this case does not match that access pattern, and in fact
| there was an explicit null check before the dereference.)
| jmb99 wrote:
| As an example to illustrate the sibling comments'
| explanations:
|
| int *array = NULL
|
| int position = 0x9C
|
| int a = *(array[pos]) //equivalent to *(array + 0x9C) -
| dereferencing NULL+0x9C, which is just 0x9C
|
| This will segfault (or equivalent) due to reading invalid
| memory at address 0x9C. Most people would call array[pos]
| a null pointer dereference casually, even though it's
| actually a 0x9C pointer dereference, because there's very
| little effective difference between them.
|
| Now, whether this case was actually something like this
| (dereferencing some element of a null array pointer) or
| something like type confusion (value 0x9C was supposed to
| be loaded into an int, or char, or some other non-pointer
| type) isn't clear to me. But I haven't dug into it
| really, someone smarter than me could probably figure out
| which it is.
| UncleMeat wrote:
| Except we don't see the instructions you'd expect to see
| if the code was as you describe.
|
| https://x.com/taviso/status/1814762302337654829
| jeffbee wrote:
| What we are witnessing quite starkly in this thread is
| that the majority of HN commenters are the kinds of
| people exposed to anti-woke/DEI culture warriors on
| Twitter.
| stravant wrote:
| Such an invalid access of a very small address probably
| does result from a nullptr error:
| struct BigObject { char stuff[0x9c]; //
| random fields int field; }
| BigObject* object = nullptr; printf("%d",
| object->field);
|
| That will result in "Attempt to read from address 0x9c".
| Just because it's not trying to read from literal address
| 0x0 doesn't mean it's not nullptr error.
| phire wrote:
| Probably not.
|
| R8 is 0x9c in that example, which is somewhat typical for
| null+offset, but in the twitter thread it's
| 0xffff9c8e0000008a.
|
| So the actual bug is further back. It's not a null pointer
| dereference, but it somehow results in the mov r8,
| [rax+r11*8] instruction reading random data (could be
| anything) into r8, which then gets used as a pointer.
|
| Maybe this is a use-after-free?
| saagarjha wrote:
| It seems unlikely that it's a null pointer:
| https://twitter.com/taviso/status/1814762302337654829
| leeter wrote:
| No this is kernelspace, an so while all addresses are 'virtual'
| an unmapped address is an address that hasn't been mapped in
| the page tables. Normally critical kernel drivers and data are
| marked as non-pagable (note: The Linux Kernel doesn't page,
| NTKernel does a legacy of when it was first written and memory
| constraints of the time). So if a driver needs to access
| pagable data it must not be part of the storage flow (and
| Crowdstrike is almost certainly part of it), and at the correct
| IRQL (the Interrupt priority level, anything above dispatch,
| AKA the scheduler, has severe restraints on what can happen
| there).
|
| So no an unmapped address is a completely different BSOD,
| usually PAGE_FAULT_IN_UNPAGED_AREA which is a very bad sign
| jkrejcha wrote:
| PAGE_FAULT_IN_NONPAGED_AREA[1]... was the BSOD that occurred
| in this case. That's basically the first sign that it was a
| bad pointer dereference in the first place.
|
| (DRIVER_)IRQL_NOT_LESS_OR_EQUAL[2][3] is not this case, but
| it's probably one of the most common reasons drivers crash
| the system generally. Like you said it's basically attempting
| to access pageable memory at a time that paging isn't allowed
| (i.e. when at DISPATCH_LEVEL or higher).
|
| [1]: https://learn.microsoft.com/en-us/windows-
| hardware/drivers/d...
|
| [2]: https://learn.microsoft.com/en-us/windows-
| hardware/drivers/d...
|
| [3]: https://learn.microsoft.com/en-us/windows-
| hardware/drivers/d...
| loeg wrote:
| No; lots of virtual addresses are not mapped. Null is a subset
| of all unmapped addresses.
| qmarchi wrote:
| Meta Conversation: The fact that X has a "Show Probable Spam" and
| both of the responses were pretty valid, with one even getting a
| reply from the creator.
|
| I just don't understand how they still have users.
| honeybadger1 wrote:
| I believe that is dependent on your account settings. I block
| all comments on accounts that do not have a verified phone
| number as an example and they get dropped into that.
| fireflies_ wrote:
| > I just don't understand how they still have users.
|
| Because this post is here and not somewhere else. Strong
| network effects.
| hipadev23 wrote:
| There's literally not a better alternative and nobody seems to
| be earnestly trying to fill that gap. Threads is boomer chat
| with an instagram requirement. Every Mastodon instance is slow
| beyond reason and it's still confusing to regular users in
| terms of how it works. And is Bluesky still invite only?
| Honestly haven't heard about it in a long time.
| honeybadger1 wrote:
| It is the best internet social feed to me as well. I use pro
| a lot for following different communities and there is
| nothing that can comes close today to being on the edge of
| change online.
| ric2b wrote:
| Mastodon doesn't feel any slower to me than Twitter, maybe I
| got lucky, according to you?
| MBCook wrote:
| Same. I have no issues at all on Mastodon. I'm quite happy
| with it.
| r2vcap wrote:
| Maybe the experience varies depending on where the user is
| located. Users near Mastodon servers (possibly on the US
| East or West Coast) may not feel the slowness as much as
| users in other parts of the world. I notice noticeably
| slower response times when I use Mastodon in my location
| (Korea).
| robjan wrote:
| I think a lot of people use Hetzner. I notice slowness,
| especially with media, in Hong Kong. A workaround I've
| found is to use VPNs which seem to utilise networks with
| better peering with local ISPs
| cageface wrote:
| All the people I know that are still active on Twitter
| because they need to be "informed" are constantly sending me
| alarmist "news" that breaks on Twitter that, far more often
| than not, turns out to be wrong.
| lutoma wrote:
| > Every Mastodon instance is slow beyond reason and it's
| still confusing to regular users in terms of how it works.
|
| I'll concede the confusing part but all the major Mastodon
| servers I interact with regularly are pretty quick so I'm not
| sure where that part comes from.
| Lt_Riza_Hawkeye wrote:
| It is not so bad with Mastodon but much fedi software gets
| slower the longer it's been running. "Akkoma Rot" is the
| one that's typically most talked about but the universe of
| misskey forks experiences the same problems, and Mastodon
| can sometimes absolutely crunch to a halt on 4GB of ram
| even for a single user instance.
| add-sub-mul-div wrote:
| > And is Bluesky still invite only?
|
| Not since February. But it's for the best that the Eternal
| September has remained quarantined on Twitter.
| TechSquidTV wrote:
| Mastodon is a PERFECT replacement. But it'll never win
| because there isn't a business propping it up and there is
| inherent complexity, mixed with the biggest problem, cost.
|
| No one wants to pay for anything, and that's the true root of
| every issue around this. People complain YouTube has ads, but
| wont buy premium. People hate Elon and Twitter but won't take
| even an ounce of temporary inconvenience to try and solve it.
|
| Threads exists, I'm happy they integrate with Activity Pub,
| which should give us the best of both worlds. Why don't
| people use Threads? I'd a little more popular outside the US
| but personally, I think the "algorithm" pushes a lot of
| engagement bait nonsense.
| doodlebugging wrote:
| >No one wants to pay for anything, and that's the true root
| of every issue around this. People complain YouTube has
| ads, but wont buy premium.
|
| Perhaps if buying into a service guaranteed that they would
| not be sold out then there would be more engagement. When
| someone signs up it is pretty much a rock-hard guarantee
| that their personal information will be marketed and sold
| to any entity with the money and interest to buy it -
| paying customers, free-loaders, etc.
|
| When someone chooses to buy your app or SaaS then they
| should be excluded from the list of users that you sell or
| trade between "business partners".
|
| When paying for a service guarantees that you're selling
| all details of your engagement with that service to
| unrelated business entities you have a disincentive to pay.
|
| People are wising up to all this PII harvesting and those
| clowns who sold everyone out need to find a different model
| or quit bitching when real people choose to avoid their
| "services" since most of these things are not necessary for
| people to enjoy life anyway. They are distractions.
|
| EDIT: This is not intended as a personal attack on you but
| is instead a general observation from the perspective of
| someone who does not use or pay for any apps or SaaS
| services and who actively avoids handing out accurate
| personal information when the opportunity arises.
| jnurmine wrote:
| Mastodon - mixed feelings.
|
| In my experience, Mastodon is nice until you want to
| partake in discussions. To do so, you need an account.
|
| With an account you can engage in civilized discussions.
| Some people don't agree with you, and you don't agree with
| some people. That's fine, maybe you'll learn something new.
| It's a discussion.
|
| And then, suddenly, a secret court convenes and kills your
| account just like that; no reason will be given, no
| recourse will be available, admins won't reply, and you can
| do two things: go away for good, or try again on a
| different server.
|
| I'm happy with a read-only Mastodon via a web interface.
|
| But read-write? Never again, I probably don't have the
| correct ideology for it.
| fragmede wrote:
| > Threads is boomer chat with an instagram requirement.
|
| You're being too dismissive of Threads. It's fine, there are
| adults there.
|
| What weirdo doesn't have an insta?
| macintux wrote:
| Some of us stay far, far away from Facebook.
| II2II wrote:
| _raises hand_
|
| Some people don't jump on every fad out there. Most of the
| people who miss out on fads quickly realize that they
| aren't losing out on much simply because fads are so
| ephemeral. As far as I can tell, this is normal (though
| different people will come to that realization at different
| stages of their life).
| fragmede wrote:
| Facebook is going to run threads for as long as it wants,
| time will tell if it's a fad or not. Is ChatGPT a fad?
| II2II wrote:
| While a fad (in this context) depends upon a company
| maintaining a product, the act of maintaining a product
| is not a measure of how long the fad lasts. Take
| Facebook, the product. I'm fairly certain that it is long
| past its peak as a communications tool between family,
| friends, and colleagues. Facebook, the company, remains
| relevant for other reasons.
|
| As for ChatGPT, I'm sure time will prove it is a fad.
| That doesn't mean that LLMs are a fad (though it is too
| early to tell).
| zdragnar wrote:
| I don't have any social media of any kind, unless you count
| HN.
|
| My wife only uses Facebook, and even then pretty sparingly.
| shzhdbi09gv8ioi wrote:
| I never had insta. Why would anyone use that.
| mardifoufs wrote:
| Sadly enough the "average" instagram user doesn't use
| threads. It's just a weird subset of them that use it, and
| imo it's not the subset that makes Instagram great lol.
| (It's a lot of pre 2021 twitter refugees, and that's an
| incredibly obnoxious and self centered crowd in my
| experience)
| shzhdbi09gv8ioi wrote:
| Strange take.. Mastodon is where alot of the IT discussion
| happens these days.
|
| The quality vs crap ratio is stellar on mastodon. Not so much
| on anywhere else.
| ants_everywhere wrote:
| Relatedly, it's crazy to me how many people still get their
| news from X. I mean serious people, not just Joe Schmoe.
|
| The probable spam thing was nuts to me too. My guess was it's
| maybe trying to detect users with lower engagement. Like people
| who aren't moving the investigation forward but are trying to
| follow it and be in the discussion.
| pyinstallwoes wrote:
| Relatedly, it's crazy to me how many people still get news
| from the Sunday times!
| jen729w wrote:
| Relatedly, it's crazy to me how many people still read the
| news!
| AnthonyMouse wrote:
| One of the things to keep in mind is that Twitter had most of
| these misfeatures before Musk bought it.
|
| The basic problem is, no moderation results in a deluge of
| spam and algorithmic moderation is hot garbage that can only
| filter out the bulk of the spam by also filtering out like
| half of the legitimate comments. Human moderation is
| prohibitively expensive unless you want to hire Mechanical
| Turk-level moderators and not give them enough time to do a
| good job, in which case you're back to hot garbage.
|
| Nobody really knows how to solve it outside of the knob
| everybody knows about that can improve the false negative
| rate at the expense of the false positive rate or vice versa.
| Do you want less ham or more spam?
| ants_everywhere wrote:
| I agree the problem is hard from a technical level.
|
| The problem is also getting significantly worse because
| it's trivial to generate entire pages of inorganic content
| with LLMs.
|
| The backstories of inorganic accounts are also much more
| convincing now that they can be generated by LLMs. Before
| LLMs, backstories all focused on a small handful of topics
| (e.g. sports, games) because humans had to generate them
| from playbooks of best pracitces. Now they can be into
| almost anything.
| pyinstallwoes wrote:
| If you can't tell, is it spam?
| ungreased0675 wrote:
| When something big happens, Twitter is probably the best
| place to get real time information from people on location.
|
| Most everything else goes through a filter and pasteurization
| before public consumption.
| dclowd9901 wrote:
| I had to log in to see responses. Pretty sure that's how they
| still have users.
| pyinstallwoes wrote:
| How's that logic work when the platform depends upon content?
| Jimmc414 wrote:
| I use X solely for the AI discussions and I actively curate who
| I follow, but where is there a better platform to join in
| conversations with the top 500 people in a particular field?
|
| I always assumed that the reason legit answers often fall under
| "Show probable spam" is because of the inevitable reports
| coming in on controversial topics. It seems like the community
| notes feature works well most of the time.
| wrycoder wrote:
| When I see that, I usually upvote it.
| mardifoufs wrote:
| If bad spam detection was such a big issue for a social
| platform, YouTube wouldn't be used by anyone ;). In fact it's
| even worse on YouTube, it's the same pattern of accounts with
| weird profile pictures copy pasting an existing comment as is
| and posting it, for thousands of videos, and it's been going on
| for a year now. It's actually so basic that I really wonder if
| there's some other secret sauce to those bots to make them
| undetectable.
| omoikane wrote:
| Well if it's just the comments, I think a lot of people just
| don't read those. In fact, it's a fair bit of effort just to
| read the descriptions with the YouTube app on some devices
| (e.g. smart TVs), and it's really not worth the effort to
| read the comments when users can just move on to the next
| video.
| mardifoufs wrote:
| I don't necessarily think that's true anymore. YouTube
| comments are important to the algorithm so creators are
| more and more active in the comment section, and the
| comments in general have been a lot more alive and often
| add a lot of context or info for some type of videos.
| YouTube has also started giving the comments a lot more
| visibility in the layout (more than say, the video
| description). But you're probably right w.r.t platforms
| like TVs.
|
| Before this wave of insane bot spam, the comments had
| started to be so much better than what they used to be (low
| effort, boomer spam). In fact I think they were much better
| than the absolute cringy mess that comments on dedicated
| forums like Reddit turned into
| ascorbic wrote:
| I'd go so far to say that almost all responses that I see under
| "probable spam" are legitimate. Meanwhile real spam is
| everywhere in replies, and most ads are dropshipped crap and
| crypto scams with community notes. It's far worse than it's
| ever been before.
| js2 wrote:
| https://threadreaderapp.com/thread/1814343502886477857.html
| MBCook wrote:
| https://twitter-thread.com/t/1814343502886477857
| Fr0styMatt88 wrote:
| The scarier thought I've had -- if a black hat had discovered
| this crash case, could it have been turned into a widely deployed
| code execution vulnerability?
| MBCook wrote:
| I had that same one. If loading a file crashed the kernel
| module, could it have been exploitable? Or was there a
| different exploitable bug in there?
|
| Did any nation states/other groups have 0-days on this?
|
| Did this event reveal something known to the public, or did
| this screw up accidentally protect us from someone finding +
| exploiting this in the future?
| plorkyeran wrote:
| Shockingly it turns out that installing a rootkit can have some
| negative security implications.
| llm_trw wrote:
| Trying to explain to execs that giving someone root access to
| your computers means they have root access to your computers
| is surprisingly difficult.
| tonetegeatinst wrote:
| I mean kernal level access does provide feature not
| accessible in userspace. Is it alsooverused when other
| solutions exist, you bet.
|
| Most people don't need this stuff. Just keeping shit up to
| date, no not on the nightly build branch, but like
| installing windows update atleast a day or two after they
| come out. Or maby regular antivirus scans.
|
| But let's be honest, your kernal drivers are useless if
| your employees fall for phishing or social engineering. See
| then its not malware, its an authorized user on the
| system....just copying data onto a USB drive or a rouge
| employee taking your customer list to your competition.
| That fancy pants kernal driver might be really good at
| stopping sophisticated threats and I'm sure the marketing
| majors at any company cram products full of buzz words. But
| remember, you can't fix incompetent or malicious employees
| unless your taking steps to prevent it.
|
| What's more likely: some foreign government hacking khols?
| Or a script kiddie social engineers some poor worker
| pretending to be the support desk?
|
| Not here to shit on this product, it has its place and it
| obviously does a good job....(heard its expensive but most
| xrd/edr is)
|
| Seems like we are learning how vulnerable certain things
| are once again. As a fellow security fellow, I must say
| that Jia Tan must be so envious that he couldn't have this
| level of market impact.
| rdtsc wrote:
| Start a story for them: "and then, the hackers managed to
| install a rootkit which runs in kernel mode. The rootkit
| has sophisticated C2 mechanism with configuration files
| pretending to be drivers suffixed with .sys extensions. And
| then, they used that to prevent hospitals and 911 systems
| around the world from working, resulting in delayed
| emergency responses, injuries, possibly deaths".
|
| After they cuss the hackers under their breath exclaiming
| something like: "they should be locked up in jail for the
| rest of their lives!...", tell them that's exactly what
| happened, but CS were the hackers, and maybe they should
| reconsider mandating installing that crap everywhere.
| naveen99 wrote:
| The hard part is the deploying. Yes if you can get control of
| the crowdstrike deployment machinery, you can do whatever you
| want on hundreds of millions of machines. but you don't need
| any vulnerabilities in the crowdstrike deployed software for
| that only the deploying servers.
| tranceylc wrote:
| Call me crazy but that is a real worry for me, and has been
| for a while. How long until we see some large corporate
| software have their deployment process hijacked, and have it
| affect a ton of computers that auto-update?
| spydum wrote:
| I mean, isn't that roughly the solarwinds story? There is
| no real shortage of supply chain incidents in the last few
| years. The reality is we are all mostly okay with that
| tradeoff.
| jen20 wrote:
| Around -4 years? [1]
|
| [1]: https://en.wikipedia.org/wiki/2020_United_States_feder
| al_gov...
| alsodumb wrote:
| You mean like the SolarWinds hack that happened a lil while
| ago?
|
| https://www.techtarget.com/whatis/feature/SolarWinds-hack-
| ex...
| btown wrote:
| One of the most dangerous versions of this IMO is someone
| who compromises a NPM/Pypi package that's widely used as a
| dependency. If you can make it so that the original
| developer doesn't know you've compromised their accounts
| (spear-phished SIM swap + email compromise while the target
| is traveling, for instance, or simply compromising the
| developer themselves), you don't need every downstream user
| to manually update - you just need enough projects that
| aren't properly configured with lockfiles, and you've got
| code execution on a huge number of servers.
|
| I'm hopeful that the fallout from Crowdstrike will be a
| larger emphasis on software BOM risk - when your systems
| regularly phone home for updates, you're at the mercy of
| the weakest link in that chain, and that applies to CI/CD
| and end user devices alike.
| IncreasePosts wrote:
| It makes me wonder how many core software libraries to
| modern infrastructure could be compromised by merely
| threatening a single person.
| jmb99 wrote:
| As always, a relevant xkcd[1]. I would not be surprised
| if the answer to "how many machines can be compromised in
| 24 hours by threatening one person" was less than 8
| figures. If you can find the right person, probably 9+.
|
| [1] https://xkcd.com/2347/
| leni536 wrote:
| Just compromise one popular vim plugin and you have dev
| access to half of the industry.
| inferiorhuman wrote:
| if you can get control of the crowdstrike deployment
| machinery
|
| Or combine a lack of certificate pinning with BGP hijacking.
| Murky3515 wrote:
| Probably would've been use to mine bitcoin before it was
| patched
| phire wrote:
| No.
|
| To trigger the crash, you need to write a bad file into
| C:\Windows\System32\drivers\CrowdStrike\
|
| You need Administrator permissions to write a file there, which
| means you already have code execution permissions, and don't
| need an exploit.
|
| The only people who can trigger it over network are CrowdStrike
| themselves... Or a malicious entity inside their system who
| controls both their update signing keys, and the update
| endpoint.
| cyrnel wrote:
| Anyone know if the updates use outbound HTTPS requests? If
| so, those companies that have crappy TLS terminating outbound
| proxies are looking juicy. And if they aren't pinning certs
| or using CAA, I'm sure a $5 wrench[1] could convince one of
| the lesser certificate authorities to sign a cert for
| whatever domain they're using.
|
| [1]: https://xkcd.com/538/
| phire wrote:
| The update files are almost certainly signed.
|
| Even if the HTTPS channel is compromised with a man-in-the-
| middle attack, the attacker shouldn't be able to craft a
| valid update, unless they also compromised CrowdStrke's
| keys.
|
| However, the fact that this update apparently managed to
| bypass any internal testing or staging release channels
| makes me question how good CrowdStrike's procedures are
| about securing those update keys.
| cyrnel wrote:
| Depends when/how the signature is checked. I could
| imagine a signature being embedded in the file itself, or
| the file could be partially parsed before the signature
| is checked.
|
| It's wild to me that it's so normal to install software
| like this on critical infrastructure, but questions about
| how they do code signing is a closely guarded/obfuscated
| secret.
| jmb99 wrote:
| Kind of a side talent, but I'm currently (begrudgingly)
| working on a project with a Fortune 20 company that
| involves a complicated mess of PKI management, custom
| (read: non-standard) certificates, a variety of
| management/logging/debugging keys, and (critically) code
| signing. It's taken me months of pulling teeth just to
| get details about the hierarchy and how the PKI is
| supposed to work from my own coworkers in a different
| department (who are in charge of the project), let alone
| from the client. I still have absolutely 0 idea how they
| perform code signing, how it's validated, or how I can
| test that the non-standard certificates can validate this
| black-hole-box code signing process. So yeah, companies
| really don't like sharing details about code signing.
| phire wrote:
| Sure, it's certainly possible.
|
| Though, I prefer to give people benefit of doubt for this
| type of thing. IMO, the level of incompetence to parse a
| binary file before checking the signature is
| significantly higher (or at least different) than simply
| pushing out a bad update (even if the latter produces a
| much more spectacular result).
|
| Besides, we don't need to speculate. We have the driver.
| We have the signature files [1]. Because of the
| publicity, I bet thousands of people are throwing it into
| Binary RE tools right now, and if they are doing
| something as stupid as parsing a binary file before
| checking it's signature (or not checking a signature at
| all), I'm sure we will hear about it.
|
| We can't see how it was signed because that's happening
| on Cloudstrike's infrastructure, but checking the
| signature verification code is trivial.
|
| [1] Both in this zip file: https://drive.google.com/file/
| d/1OVIWLDMN9xzYv8L391V1ob2ghp8...
| emmelaich wrote:
| See my speculation above.
|
| https://news.ycombinator.com/item?id=41022110
| gruez wrote:
| that's assuming they don't do cert pinning. Moreover
| despite all the evil things you can supposedly do with a $5
| wrench, I'm not aware of any documented cases of this sort
| of attack happening. The closest we've seen are
| misissuances seemingly caused by buggy code.
| emmelaich wrote:
| My speculation is the bit of code/data that was broken, is
| added after the build and testing _precisely to avoid_ the
| $5 wrench attack.
|
| That is, the data is signed and they don't want to use the
| real signing key during testing / in the continuous build
| because then it is too exposed.
|
| So it's added after as something that "could not break".
| But it of course did.
| phire wrote:
| I can think of a bunch of different answers:
|
| This wasn't a code update, just a configuration update.
| Maybe they don't put config update though QA at all,
| assuming they are safe.
|
| It's possible that QA is different enough from production
| (for example debug builds, or signature checking
| disabled) that it didn't detect this bug.
|
| Might be an ordering issue, and that they tested applying
| update A then update B, but pushed out update B first.
|
| The fact that it instantly went out to all channels is
| interesting. Maybe they tested it for the beta channel it
| was meant for (and it worked, because that version of the
| driver knew how to cope with that config) but then
| accidentally pushed it out to all channels, and the older
| versions had no idea what to do wiht it.
|
| Or maybe they though they were only sending it to their
| QA systems but pushed the wrong button and sent it out
| everywhere.
| emmelaich wrote:
| > _This wasn 't a code update, just a configuration
| update_
|
| Configuration is data, data is code.
| Animats wrote:
| How does it validate the updates, exactly?
|
| Microsoft supposedly has source IP addresses known by their
| update clients, so that DNS spoofing won't work.
| FreakLegion wrote:
| Microsoft signs its updates. There's no restriction on
| where you can get them from.
| ffhhj wrote:
| Microsoft has previously leaked their keys.
| FreakLegion wrote:
| Not that I recall.
|
| Microsoft has leaked keys that weren't used for code
| signing. I've been on the receiving end of this actually,
| when someone from the Microsoft Active Protections
| Program accidentally sent me the program's email private
| key.
|
| Microsoft has been tricked into signing bad code
| themselves, just like Apple, Google, and everyone else
| who does centralized review and signing.
|
| Microsoft has had certificates forged, basically, through
| MD5 collisions. Trail of Bits did a good write-up of this
| years ago.
|
| But I can't think of a case of Microsoft losing control
| of a code signing key. What are you referring to?
| Randor wrote:
| As a former member of the Windows Update software
| engineering team, I can say this is absolutely false. The
| updates are signed.
| Animats wrote:
| I know they are signed. But is that enough?
|
| Attackers today may be willing to spend a few million
| dollars to access those keys.
| jackjeff wrote:
| If you get have privileged escalation vulnerability there are
| worse things you can do. Just making the system unbootable by
| destroying the boot sector/EFI partition and overwriting
| system files. No more rebooting in safe mode and no more
| deleting a single file to fix the boot.
|
| This would probably be classified as a terrorist attack and
| frankly it's just a matter of time until we get one some day.
| A small dedicated team could pull it off. It's just so
| happens that the people with the skills currently either opt
| for cyber criminality (crypto lockers and such), work for a
| state actor (think Stuxnet) or play defense in a cyber
| security firm.
| canistel wrote:
| Out of curiosity: In the old days, SoftIce could have been used
| which was a kernel mode debugger. What tool can be used these
| days?
| mauvehaus wrote:
| SoftIce predates me, but when I was doing filesystem filter
| driver work, the tool of choice was WinDbg. Been out of the
| trade for a bit, but it looks to still be in use. We had it set
| up between a couple of VMs on VMware.
| Dwedit wrote:
| You'd use WinDBG today. It allows you to do remote kernel
| debugging over a network. This also includes running Windows in
| a virtual machine, and debugging it through the private network
| connection.
| gonesilent wrote:
| FireWire is also still used to dump out kernel debug.
| the8472 wrote:
| Shouldn't IOMMUs block that these days?
| swdunlop wrote:
| https://qemu-project.gitlab.io/qemu/system/gdb.html
| golemiprague wrote:
| But how come they didn't catch it in the testing deployments?
| what was the difference that caused it to happen when they
| deployed to the outside world. I find it hard to believe that
| they didn't test it before deployment. I also think companies
| should all have a testing environment before deploying 3rd party
| components. I mean, we all install some packages during
| development that fails or cause some problems but nobody think it
| is a good idea to do it directly in their production environment
| before testing, so how is this different?
| someonehere wrote:
| That's what a lot of us are wondering. There's a lot of outside
| thinking of the box right now about this in certain circles.
| IAmGraydon wrote:
| There's no point in leaving vague allusions. Can you expand
| on this?
| kbar13 wrote:
| security industry's favorite language is nothingspeak
| jmb99 wrote:
| > I find it hard to believe that they didn't test it before
| deployment.
|
| I'm not sure why you find that hard to believe - based on the
| (admittedly fairly limited) evidence we have right now, it's
| highly unlikely that this deployment was tested much, if at
| all. It seems much more likely to me that they were playing
| fast and loose with definition updates to meet some arbitrary
| SLAs[1] on zero-day prevention, and it finally caught up with
| them. Much more likely than somehow every single real-world pc
| running their software being affected but their test machines
| somehow all impervious.
|
| [1] When my company was considering getting into endpoint
| security and network anomaly detection, we were required on
| multiple occasions by multiple potential clients to provide a
| 4-hour SLA on a wide number of CVE types and severities. That
| would mean 24/7 on-call security engineers and a sub-4-hour
| definition creation and deployment. Yes, that 4 hours was for
| the deployment being available on 100% of the targets. Good
| luck writing and deploying a high-quality definition for a zero
| day in 4 hours, let alone running it through a test pipeline,
| let alone writing new tests to actually cover it. We very
| quickly noped out of the space, because that was considered
| "normal" (at least to the potential clients we were
| discussing). It wouldn't shock me if CS was working in roughly
| the same way here.
| drooopy wrote:
| This whole f*up was a failure of management and processes at
| Crowdstrike. "Intern Steve" pushing faulty code to production
| on a Friday is only a couple of cm of the tip of an enormous
| iceberg.
| chronid wrote:
| I wrote this in another thread already, but the fuck up was
| both at crowdstrike (they borked a release) but _also_ and
| more importantly their customers. Shit happens even with
| the best testing in the world.
|
| You do not deploy _anything_ , _ever_ on your entire
| production fleet at the same time and you do not buy
| software that does that. It 's madness and we're not
| talking about small companies with tiny IT departments
| here.
| perbu wrote:
| Shit might happen with the best testing, but with decent
| testing it would not be this serious.
| wazzaps wrote:
| Apparently CrowdStrike bypassed clients' staging areas
| with this update.
|
| Source:
| https://x.com/patrickwardle/status/1814367918425079934
| owl57 wrote:
| _> you do not buy software that does that_
|
| Note how the incident disproportionally affected highly
| regulated industries, where businesses don't have a
| choice to screw "best practice".
| TeMPOraL wrote:
| Only highlighting that "best practice" of cybersecurity
| is, charitably, total bullshit; less charitably, a
| racket. This is apparent if you look at the costs to the
| day-to-day ability of employees to do work, but maybe
| it'll be more apparent now that people got killed because
| of it.
| badgersnake wrote:
| It's absolutely a racket.
| d1sxeyes wrote:
| That's a tricky one. CrowdStrike is cybersecurity. Wait
| until the first customer complains that they were hit by
| WannaCry v2 because CrowdStrike wanted to wait a few days
| after they updated a canary fleet.
|
| The problem here is that this type of update (a content
| update) should _never_ be able to cause this however
| badly it goes. In case the software receives a bad
| content update, it should fail back to the last known
| good content update (potentially with a warning fired off
| to CS, the user, or someone else about the failed
| update).
|
| In principle, updates that _could_ go wrong and cause
| this kind of issue should absolutely be deployed slowly,
| but per my understanding, that's already the practice for
| non-content updates at CrowdStrike.
| chronid wrote:
| Windows updates are also cybersecurity, but the customer
| has (had?) a choice to how to roll those out (with Intune
| nowadays?). The customer should decide when to update,
| they own the fleet not the vendor!
|
| You do not know if a content update will screw you over
| and mark all the files of your company as malware. The
| "It should never happen" situations are the thing you
| need to prepare for, the reason we talk about security as
| an onion, the reason we still do staggered production
| releases with baking times even after tests and QA have
| passed...
|
| "But it's cybersecurity" is _not_ a justification. I know
| that security departments and IT departments and
| companies in general love dropping the "responsibility"
| part on someone else, but in the end of the day the thing
| getting screwed over is the company fleet. You should
| retain control and make sure things work properly, the
| fact those billion dollar revenue companies are unable to
| do so is a joke. A terrible one, since IT underpins
| everything nowadays.
| chrisjj wrote:
| > The customer should decide when to update, they own the
| fleet not the vendor!
|
| The CS customer has decided to update whenever 24/7 CS
| says. The alternative is to arrive on Monday morning to
| an infected fleet.
| chronid wrote:
| Sorry, this is untrue. Enterprises have SOCs and oncalls,
| if there is a high risk they can do at least minimal
| testing (which would have found this issue as it has a
| 100% bsod rate) and then fleet rollout. It would have
| been rolled out by Friday evening in this case without
| crashing hundred of thousands of servers.
|
| The CS customer has decided to offload the responsibility
| of its fleet to CS. In my opinion that's bullshit and
| negligence (it doesn't mean I don't understand why they
| did it), particularly at the scale of some of the
| customers :)
| chrisjj wrote:
| > they can do at least minimal testing (which would have
| found this issue as it has a 100% bsod rate)
|
| Incorrect, I believe, given they did not and could not
| get advance sight of the offending forced update.
| Kwpolska wrote:
| I doubt CrowdStrike had done any testing of the update.
| chrisjj wrote:
| > they can do at least minimal testing (which would have
| found this issue as it has a 100% bsod rate)
|
| Incorrect, I believe, given they could and did not get
| advance sight of the offending forced update.
| d1sxeyes wrote:
| It _is_ a justification, just not necessarily one you
| agree with.
|
| Companies choose to work with Crowdstrike. One of the
| reasons they do that is 'hands-off' administration-let a
| trusted partner do it for you. There are absolutely risks
| of doing it this way. But there are also risks of doing
| it the other way.
|
| The difference is, if you hand over to Crowdstrike,
| you're not on your own if something goes wrong. If you
| manage it yourself, you've only got yourself working on
| the problem if something goes wrong.
|
| Or worse, something goes wrong and your vendor says "yes,
| we knew about this issue and released the fix in the
| patch last Tuesday. Only 5% of your fleet took the patch?
| Oh. Sounds like your IT guys have got a lot of work on
| their hands to fix the remaining 95% then!".
| stef25 wrote:
| You'd think that the software would sit in a kind of
| sandbox so that it couldn't nuke the whole device but
| only itself. It's crazy that this is possible.
| echoangle wrote:
| The software basically works as a kernel module as far as
| I understand, I don't think there's a good way to
| separate that from the OS while still allowing it to have
| the capabilities it needs to have to surveil all other
| processes.
| temac wrote:
| Something like ebpf.
| layer8 wrote:
| And even then, you wouldn't want the system to continue
| running if the security software crashes. Such a crash
| might indicate a successful security breach.
| KaiserPro wrote:
| > You do not deploy anything, ever on your entire
| production fleet at the same time and you do not buy
| software that does that
|
| I am sympathetic to that, but its only possible if both
| policy and staffing allow.
|
| for policy, there are lots of places that demand CVEs be
| patched within x hours depending on severity. A lot of
| times, that policy comes from the payment integration
| systems provider/third party.
|
| However you are also dependent on programs you install
| not autoupdating. Now, most have an option to flip that
| off, but its not always 100% effective.
| chronid wrote:
| > I am sympathetic to that, but its only possible if both
| policy and staffing allow.
|
| We are not talking about small companies here. We're
| talking about massive billion revenue enterprises with
| enormous IT teams and in some cases multiple NOCs and
| SOCs and probably thousands consultants all around at
| minimum.
|
| I find it hard to be sympathetic to this complete
| disregard of ownership just to ship responsibility
| somewhere else (because this is the need at the of the
| day let's not joke around). I can understand it, sure,
| and I can believe - to a point - someone did a risk
| calculation (possibility of crowdstrike upgrade killing
| all systems vs hack if we don't patch a CVE in <4h), but
| it's still madness from a reliability standpoint.
|
| > for policy, there are lots of places that demand CVEs
| be patched within x hours depending on severity.
|
| I'm pretty sure leadership when they need to choose
| between production being down for an unspecified amount
| of time and taking the risk of delaying (of hours in this
| case) the patching will choose the delay. Partners and
| payment integration providers can be reasoned with,
| contracts are not code. A BSOD you cannot talk away.
|
| Sure, leadership is also now saying "but we were doing
| the same thing as everyone else, the consultants told us
| to and how could have we have known this random software
| with root on every machine we own could kill us?!" to
| cover their asses. The problem is solved already, since
| it impacted everyone, and they're not the ones spending
| their weekend hammering systems back to life.
|
| > However you are also dependent on programs you install
| not autoupdating. Now, most have an option to flip that
| off, but its not always 100% effective.
|
| You choose what to install on your systems, and you have
| the option to refuse to engage with companies that don't
| provide such options. If you don't, you accept the risk.
| sateesh wrote:
| Disagree with the part where you put onus on customer. As
| has been mentioned in other HN thread [1], this update
| was pushed ignoring whatever the settings customer had
| configured. The original mistake of the customer, if any,
| was they didn't read this in fine print of the contract
| (if this point about updates was explicitly mentioned in
| the contract). 1.
| https://news.ycombinator.com/item?id=41003390
| chrisjj wrote:
| > You do not deploy anything, ever on your entire
| production fleet at the same time
|
| And if an attacker does??
| jmb99 wrote:
| Oh absolutely. There's many levels of failure here. A few
| that I see as being likely:
|
| - Lack of testing of a deployment - Lack of required
| procedures to validate a deployment - Engineering
| management prioritizing release pace over stability/testing
| - Management prioritizing tech debt/pentests/etc far too
| low - Sales/etc promising fast turnarounds that can't be
| feasibly met while following proper standards - Lack of
| top-down company culture of security and stability first,
| which should be a must for _any_ security company
|
| This outage wasn't caused only by "the intern pushing
| release." It was caused by a poor company culture (read:
| incorrect direction from the top) resulting in a lack of
| testing of the program code, lack of testing environment
| for deployments, lack of formal deployment process, and
| someone messing up a definition file that was caught by 0
| other employees or automated systems.
| _moof wrote:
| I can't speak to its veracity but there's a screenshot making
| its way around in which Crowdstrike discouraged sites from
| testing due to the urgency of the update.
| AmericanChopper wrote:
| I don't work with CS products atm, but my experience with a
| big CS deployment was exactly like this. They were openly
| quite hostile to any suggestion of testing their products,
| we were frequently rebuked for running our prod censors on
| version n-1. I talked about it a bit in this comment.
|
| https://news.ycombinator.com/item?id=%2041002864
|
| Very much not surprised to see this now.
| jmb99 wrote:
| It's kind of hard to pitch "zero-day prevention" if you
| suggest people roll out definitions slowly, over the course
| of days/weeks. Thus making it a lot harder to charge to the
| moon for your service.
|
| Now, if these sorts of things were battle tested before
| release, and had a (ideally decade+-long) history of
| stability with well-documented processes to ensure that
| stability, you can more easily make the argument that it's
| worth it. None of those things are close to true though
| (and more than likely will never be for any AV/endpoint
| solution), so it is very hard to justify this sort of
| configuration.
| qaq wrote:
| While true agent should roll back to previous content version
| if it keeps crashing
| Kwpolska wrote:
| Detecting system crashes would be hard. You could try
| logging and comparing timestamps on agent startups and see
| if the difference is 5 minutes or less. Buggy kernel
| drivers crash Windows hard and fast.
| qaq wrote:
| loading content is pretty specific step so your solution
| is more or less valid
| kchr wrote:
| > Detecting system crashes would be hard.
|
| Store something like an `attemptingUpdate` flag before
| updating, and remove it if the update was successful.
| Upon system startup, if the flag is present, revert to
| the previous config and mark the new config bad.
| treflop wrote:
| I've seen places where failed releases are just "part of normal
| engineering." Because no one is perfect, they say.
| slenk wrote:
| I really dislike this mentality. Don't even get me started on
| celebrating when your rocket blows up
| galangalalgol wrote:
| If it is a standard production rocket, I agree. If it is a
| first of kind or even third of kind launch, celebrating the
| lessons learned from a failure is a healthy attitude. This
| production software is not the same thing at all.
| Heliosmaster wrote:
| spaceX celebrating when their rocket blows up _after a
| certain milestone_ it 's like us devs celebrating when our
| branch with that new big feature only fails a few tests.
| Did it pass no? Are you satisfied as first try? Probably
| photonthug wrote:
| Even on hn, comments advocating engineering excellence or
| just quality in general are frequently looked down on, which
| probably also tells you a lot about the wider world.
|
| This is why we can't have nice things, but maybe we just
| don't want them anyway? "Mistakes will be made" is way less
| true if you actually put the effort in to prevent them, but I
| am beginning to think this has become code for quiet-quitters
| to telegraph a "I want to get paid for no effort and
| sympathize with others who feel the same" sentiment and
| appear compassionate and grimly realistic all at the same
| time.
|
| yes, billion dollar companies are going to make mistakes, but
| almost always because of cost cutting, willful ignorance, or
| negligence. If average people are apologizing for them and
| excusing that, there has to be some reason that it's good for
| them.
| treflop wrote:
| Personally while I value excellence, I reduce the frequency
| of errors through process and procedure because I'm lazy.
|
| I don't mind meetings but being in a 4 hour emergency
| meeting because some due diligence wasn't done is a waste
| of my time.
|
| Life is easier when you do good work.
| usrusr wrote:
| One possible explanation could be automated testing deployments
| for definitions updates that don't run the current version of
| the definition consumer, and the old one they do run is
| unaffected.
| itronitron wrote:
| for all we know, the deployment was the test
| owl57 wrote:
| As the old saying goes, everyone has a test environment, and
| some also have a separate production one.
| albert_e wrote:
| My guess -- there are two separate pipelines one for code
| changes and one for data files.
|
| Pipeline 1 --
|
| Code updates to their software are treated as material changes
| that require non-production and canary testing before global
| roll-out of a new "Version".
|
| Pipeline 2 --
|
| Content / channel updates are handled differently -- via a
| separate pipeline -- because only new malware signatures and
| the like are distrubuted via this route. The new files are just
| data files -- they are supposed to be in a standard format and
| only read, not "executed".
|
| This pipeline itself must have been tested originally and found
| tobe working satisfactorily -- but inside the pipeline there is
| no "test" stagethat verifies the integrity of the data fine so
| generated, nor - more importantly - checking if this new data
| file works without errors when deployed to the latest versions
| of the software in use.
|
| The agent software that reads these daily channel files must
| have been "thoroughly" tested (as part of pipeline 1) for all
| conceivable data file sizes and simulated contents before
| deployment. (any invalid data files should simply be rejected
| with an error ... "obviously")
|
| But the exact scenario here -- possibly caused by a broken
| pipeline in the second path (pipeline 2) -- created invalid
| data files with some quirks. And THAT specific scenario was not
| imagined or tested in the software version dev-test-deploy
| pipeine (pipeline 1).
|
| If this is true --
|
| The lesson obviously is that even for "data" only distributions
| and roll-outs, however standardized and stable their pipelines
| may be, testing is still an essential part before large scale
| roll-outs. It will increase cost and add latency sure, but we
| have to live with it. (similar to how people pay for "security"
| software in the first place)
|
| Same lesson for enterprise customers as well -- test new
| distributions on non-production within your IT setup, or have a
| canary deployment in place before allowing full roll-outs into
| production fleets.
| sateesh wrote:
| _Same lesson for enterprise customers as well -- test new
| distributions on non-production within your IT setup, or have
| a canary deployment in place before allowing full roll-outs
| into production fleets._
|
| It was mentioned in one of the HN threads, that the update
| was pushed overriding the settings customer had [1]. What
| recourse any customer can have in in such a case ?
|
| 1. https://news.ycombinator.com/item?id=41003390
| perryizgr8 wrote:
| > What recourse any customer can have in in such a case ?
|
| Sue them and use something else.
| teeheelol wrote:
| Ah that was me. We don't accept "content updates" and they
| are staged.
|
| We got this update pushed right through.
| rramadass wrote:
| Nice.
|
| But the problem here is that _the code runs in kernel mode_.
| As such any data that it may consume should have been tested
| with the same care as the code itself which has never been
| the case in this industry.
| Wytwwww wrote:
| > It will increase cost
|
| And of of course that cost would be absolutely insignificant
| relative to the potential risk...
| masfuerte wrote:
| I find it hard to believe they didn't do any testing. I wonder
| if they tested the virus signatures against the engine, but
| didn't check the final release artefact (the .sys file) and the
| bug was somehow introduced in the packaging step.
|
| This would have been poor, but to have released it with no
| testing would have been the most staggering negligence.
| andix wrote:
| How sure are we, that this was not a cyberattack?
|
| It seems really scary to me, that crowdstrike is able to push
| updates in real time to most of their customers systems. I don't
| know of any other system, that would provide a similar method to
| inject code at kernel level. Not even windows updates, as they
| always roll out with some delay and not to all computers at the
| same time
|
| If you want to attack high profile systems, crowdstrike would be
| one of the best possible targets.
| Grimblewald wrote:
| The amount of self pwning that goes on in both corporate and
| personal devices these days is insane. The amount of games that
| want you to install kernal level anti-cheat is astounding. The
| amount of companies that have centralized remote surveillance
| and control of all devices, where access to this is through a
| great number of sloppily managed accounts, is beyond spooky.
| padjo wrote:
| I mean centralized control of devices is great for the far
| more common occurrence of Bob from accounting leaving his
| laptop on the train with his password on post-it note stuck
| to the screen.
| andix wrote:
| Exactly. It's ridiculous to open up all/most of a companies
| systems to such a single point of failure. We install
| redundant PSUs, backup networks, generators, and many more
| things. But one single automatic update can bring down all
| systems within minutes. Without any redundancy.
| Anonymityisdead wrote:
| Where is a good place and way to start practicing disassembly in
| 2024?
| nophunphil wrote:
| Take this with a grain of salt as I'm not an SME, but there is
| a need for volunteers on reverse-engineering projects such as
| the Zelda decompilation projects[1]. This would probably give
| you some level of exposure, particularly if you have an
| interest in videogames.
|
| [1] https://zelda64.dev/
| Scene_Cast2 wrote:
| Try solving some crackme's. They're binary executables of
| various difficulty (with rated difficulty), where the goal
| ranges from finding a hardcoded password to making a keygen to
| patching the executable. They used to be more popular, but I'm
| guessing you can still find tutorials on how to get started and
| solve a simple one.
| commandersaki wrote:
| I found https://pwn.college to be excellent, even though they
| mostly focus on exploitation, pretty much everything involves
| disassembly.
| 13of40 wrote:
| Writing your own simple programs and debugging/disassembling
| them is a solid option. Windbg and Ida are good tools to start
| with. Reading a disassembly is a lot easier than coding in
| assembly, and once you know what things like function calls and
| switch statements, etc. look like you can get a feel for what
| the original program was doing.
| mauvia wrote:
| first you need to learn assembly, second you can start by
| downloading ghidra and directly start decompiling some simple
| things you use and seeing what they do.
| 0xDEADFED5 wrote:
| you can compile your own hello world and look at the executable
| with x64dbg. press space on any instruction and you can
| assemble your own instruction in it's place (optionally filling
| the leftover bytes with NOPs)
| CodeArtisan wrote:
| As a very first step, you may start playing with
| https://godbolt.org/ to see how code is translated into lower-
| level instructions.
| m0llusk wrote:
| Ended up being forced because it was a "content update". This is
| the update of our discontent!
| brcmthrowaway wrote:
| How did it pass CI?
| voidfunc wrote:
| I suspect some engineer has discovered their CI scripts were
| just "exit 0"
| 01HNNWZ0MV43FF wrote:
| Ah, the French mutation testing. Has never been celebrated
| for its excellence. </orson>
| dehugger wrote:
| What is French mutation testing? A casual kagi seems to
| imply its a type of genetic testing, or perhaps just tests
| that have been done in France?
| zerocrates wrote:
| They're referencing an (in)famous video of a
| drunk/drugged/tired Orson Welles attempting to do a
| commercial; his line is "Ahhh, the... French... champagne
| has always been celebrated for its excellence..."
|
| I don't think there's anything more to the inclusion of
| "French" in their comment beyond it being in the original
| line.
|
| https://www.youtube.com/watch?v=VFevH5vP32s
|
| and the successful version:
| https://www.youtube.com/watch?v=qb1KndrrXsY
| Too wrote:
| lol, I've lost count of how many CI systems I've seen that
| are essentially no-ops, letting through all errors, because
| somewhere there was a bash script without set -o errexit.
| emmelaich wrote:
| I is added after CI, testing. At least according to something I
| read previously on HN. See my the comment which speculates why.
|
| https://news.ycombinator.com/item?id=41022110
| xyst wrote:
| Bold of you to assume there is CI to begin with
| Osiris wrote:
| It wasn't a code update. It was a data file update. It certain
| seems that they don't include adequate testing for data file
| updates.
| bni wrote:
| In my experience, testing data and config is very rare in the
| whole industry. Feeding software corrupted config files or
| corrupted content from its own database often makes software
| to crash. Most often this content is "trusted" to be
| "correct".
| nickm12 wrote:
| It's really difficult to evaluate the risk the CrowdStrike system
| imposed. Was this a confluence of improbable events or an
| inevitable disaster waiting to happen?
|
| Some still-open questions in my mind:
|
| - was the broken rule in the config file (C-00000291-...32.sys)
| human authored and reviewed or machine-generated?
|
| - was the config file syntactically or semantically invalid
| according to its spec?
|
| - what is the intended failure mode of the kernel driver that
| encounters an invalid config (presumably it's not "go into a boot
| loop")?
|
| - what automated testing was done on both the file going out and
| the kernel driver code? Where would we have expected to catch
| this bug?
|
| - what release strategy, if any, was in place to limit the blast
| radius of a bug? Was there a bug in the release gates or were
| there simply no release gates?
|
| Given what we know so far, it seems much more likely that this
| was a "disaster waiting to happen" but I still think there's a
| lot more to know. I look forward to the public post-mortem.
| refulgentis wrote:
| Would any of these, or even a collection of these, resolving in
| some direction make it highly improbable that it'll never
| happen again?
|
| Seems to me 3rd party code, running in the kernel, on parsed
| inputs, that can be remotely updated is enough to be disaster
| waiting to happen _gestures breezily at Friday_
|
| That's, in the Taleb parlance, a Fat Tony argument, but barring
| it being a cosmic ray causing a uncorrected bit flop during
| deploy, I don't think there's room to call it anything but "a
| disaster waiting to happen"
| slt2021 wrote:
| kernel driver could have data check on the channel file and
| fail gracefully/ignore wrong file instead of BSOD.
|
| this code is executed only once during the driver
| initialization, so shouldn't be much overhead, but will
| greatly improve reliability against broken channel file
| refulgentis wrote:
| This is going to code as radical, but I always assumed it
| was derivable from bog-standard first principles that would
| fit in any economics class I sat in for my 40 credits:
|
| the natural cost of these bits we sell is zero, so in the
| long run, if the bar is "just write a good & tested kernel
| driver", there will always be one more subsequent market
| entrant who will go too cheap on engineering. Then, they
| touch the hot wire and burn down the establishment.
|
| That doesn't mean capitalism bad, but it does mean I expect
| only Microsoft is capable of writing and maintaining this
| type of software in the long run.
|
| Ex. The dentist and dental hygienist were asking me who was
| attacking Microsoft on Friday, and they were not going to
| get through to the the subtleties of 3rd kernel driver
| release gating strategy.
|
| MS has a very strong incentive to fix this. I don't know
| how they will. But I love when incentives align and assume
| they always will, in the long run.
| nickm12 wrote:
| Yes, if CrowdStrike was following industry best practices and
| this happened, it would teach us something novel about
| industry practices that we could learn from and use to reduce
| the risk of a similar scale outage happening again.
|
| If they weren't following these practices, this is kind of a
| boring incident with not much to be learned, despite how
| dramatic the scale is. Practices like staged rollout of
| changes exist precisely because we've learned these lessons
| before.
| YZF wrote:
| Well, kernel code is kernel code, and kernel code in general
| takes input from outside the kernel. An audio driver takes
| audio data, a video driver might take drawing instructions, a
| file system interacts with files, etc. Microsoft, and others,
| have been releasing kernel code since forever and for the
| most part, not crashlooping their entire install base.
|
| My Tesla remote updates ... hmph.
|
| It doesn't feel like this is inherently impossible. It feels
| more like not enough design/process to mitigate the risks.
| hdhshdhshdjd wrote:
| Was somebody trying to install an exploit or back door and
| fucked up?
| TechDebtDevin wrote:
| Everything is a conspiracy now eh?
| choppaface wrote:
| To be fair, the xd backdoor wasn't immediately obvious
| https://www.wired.com/story/xz-backdoor-everything-you-
| need-...
| hdhshdhshdjd wrote:
| You do remember Solarwinds right? This is an obvious high
| value target, so it is reasonable to entertain malicious
| causes.
|
| Given the number of systems infected, if you could push
| code that rebooted every client into a compromised state
| you'd still have run of some % of the lot until it was
| halted. That time window could be invaluable.
|
| Now, imagine if you screw up the code and just boot loop
| everything.
|
| I'd say business wise it's better for crowd strike to let
| people think it's an own-goal.
|
| The truth may be mundane but a hack is as reasonable a
| theory as "oops we pushed boot loop code to world+dog".
| saagarjha wrote:
| > The truth may be mundane but a hack is as reasonable a
| theory as "oops we pushed boot loop code to world+dog".
|
| No it's not. There are many signs that point to this
| being a mistake. There are very few that point to it
| being a hack. You can't just go "oh it being a hack is
| one of the options therefore it is also something worth
| considering".
| azinman2 wrote:
| Especially because if it was crowdstrike wouldn't be
| apologizing and accepting blame.
| owl57 wrote:
| Why? They are in a very specific business and have more
| incentive to cover up successful attacks than most other
| companies.
|
| And while I'm 99% for Hanlon's razor here, I don't see a
| reason to be sure it wasn't even a _completely
| successful_ DoS attack.
| hdhshdhshdjd wrote:
| "Our employee pushed bad code by accident" is _VASTLY_
| better for them than "we didn't secure the infra that
| pushes updates to millions of machines".
| Huggernaut wrote:
| Look there are two options on the table so it's 50/50.
| Ipso facto.
| bunabhucan wrote:
| I believe the flying spaghetti monster touched the file
| with His invisible noodly appendage so now it's a three
| way split.
| hdhshdhshdjd wrote:
| I didn't say it was 50/50, but an accurate enumeration of
| options does include a failed attempt at a hack.
|
| I fail to see why this is so difficult to understand.
| Guthur wrote:
| The glaring question is how and why it was rolled out
| everywhere all at once?
|
| Many corporations have pretty strict rules on system update
| scheduling so as to ensure business continuity in case of
| situations like this but all of those were completely
| circumvented and we had fully synchronised global failure. It
| really does not seem like business as usual situation.
| chii wrote:
| > strict rules on system update scheduling
|
| which crowdstrike gets to bypass because they claime
| themselves as an antivirus and malware detection platform -
| at least, this is what the executives they've wined and dined
| into the purchase contracts have been told. The update
| schedule is independently controlled by crowdstrike, rather
| than by a system admin i believe.
| xvector wrote:
| CrowdStrike's reasoning is that an instantaneous global
| rollout helps them protect against rapidly spreading malware.
|
| However, I doubt they need an instantaneous rollout for every
| deployment.
| slenk wrote:
| I feel like they need to at least first rollout to
| themselves
| kijin wrote:
| Well, millions of PCs bluescreening at the same time does
| help stop a rapidly spreading malware.
|
| Only this time, crowdstrike itself has become
| indistinguishable from malware.
| imtringued wrote:
| Whe I first saw news about the outage I was wondering
| what this malware "CrowdStrike" was. I mean, the name
| kind of sounds hostile.
| TeMPOraL wrote:
| They say that, but all I hear is immune system triggering a
| cytokine storm and killing you because it was worried you
| may catch a cold.
| inejge wrote:
| _The glaring question is how and why it was rolled out
| everywhere all at once?_
|
| Because the point of these updates is to be rolled out
| quickly and globally. It wasn't a system/driver update, but a
| data file update: think antivirus signature file. (Yes, I
| know it can get complicated, and that AV signatures can be
| dynamic... not the point here.)
|
| Why those data updates skipped validity testing at the source
| is another question, and one that CrowdStrike better be
| prepared to answer; but the tempo of redistribution can't be
| changed.
| Brybry wrote:
| But is there a need for quick global releases?
|
| Is it realistic that there's a threat actor that will be
| attacking every computer on the whole planet at once?
|
| I can understand that it's most practical to update
| _everyone_ when pushing an update to protect _a few_
| actively under attack but I can also imagine policies where
| that isn 't how it's done, while still getting urgent
| updates to those under attack.
| padjo wrote:
| Is there a need? Maybe, possibly, depends on
| circumstances.
|
| Is this what people are paying CS for? Absolutely.
| RowanH wrote:
| After this I imagine there will be an option "do you want
| updates immediately, or updates when released - n, or
| n+2, n+6, n+24, n+48 hrs?"
|
| Given the choice I bet there's going to be surprisingly
| large number of orgs go "we'll take n+24hrs thanks"
| maeil wrote:
| A customer should be able to test an update, whether a
| signature file or literally any kind of update, before
| rolling it out to production systems. Anything else is
| madness. Being "vulnerable" for an extra few hours carries
| less risk than auto-updates (of any kind) on production
| systems. As we've seen here. If you can point to hard
| evidence to the contrary, where many companies were saved
| just in time because of a signature update and would have
| been exploited if they'd waited a few hours, I'd love to
| read about it. It would have to have happened on a rather
| large scale for all of the instances combined to have had a
| larger positive impact than this single instance.
| hmottestad wrote:
| From the article on The Verge it seems that this kind of
| update is downloaded automatically even if you disable
| automatic updates. So those users who took this kind of issue
| seriously would have thought that everything was configured
| correctly to not automatically update.
| danielPort9 wrote:
| > The glaring question is how and why it was rolled out
| everywhere all at once?
|
| Because it worked good for them so far? There are plenty of
| companies that do the same and we don't hear about them until
| something goes wrong.
| YZF wrote:
| It seems like a none of the above situation because each of
| those should have really minimized the chances of something
| like this happening. But this is pure speculation. Even the
| most perfect organization engineering culture can still have
| one thing get through... (Wasn't there some Linux incident a
| little back though?)
|
| Quality starts with good design, good people, etc. the process
| parts come much after that. I'd like to think that if you do
| this "right" then this sort of stuff simply can't happen.
|
| If we have organization/culture/engineering/process issues then
| we're likely not going to get an in-depth public most-mortem.
| I'd love to get one just for all of us to learn from it. Let's
| see. Given the cost/impact having something like the Challenger
| investigation with some smart uninvolved people would be good.
| 7952 wrote:
| In a world of complex systems a "confluence of improbable
| events" is the same thing as "a disaster waiting to happen".
| Its the swiss cheese model of failure. Y
| k8sToGo wrote:
| Every system can only survive so many improbable events. Even
| in aviation.
| mianos wrote:
| A 'channel file' is a file interpreted by their signature
| detection system. How far is this from a bytecode compiled domain
| specific language? Javascript anyone?
|
| eBPF, much the same thing, is actually thought about and well
| designed. If it wasn't it would be easy to crash linux.
|
| This is what they do and they are doing badly. I bet it's just
| shit on shit under the hood, developed by somewhat competent
| engineers, all gone or promoted to management.
| broknbottle wrote:
| Oddly enough, there was an issue last month with CrowdStrike
| and RHEL 9 kernel where they were triggering a kernel panic
| when attempting to load a bpf program from their newer bpf
| sensor. One of the workarounds was to switch to their kernel
| driver mode.
|
| This was obviously a bug in RHEL kernel because even if the bpf
| program was bunk it should not cause the kernel to panic.
| However, it's almost like CrowdStrike does zero testing of
| their software and looks at their end users as Test/QA.
|
| https://access.redhat.com/solutions/7068083
|
| > 4bb7ea946a37 bpf: fix precision backtracking instruction
| iteration
| CaliforniaKarl wrote:
| The kernel update in question was released as part of a RHEL
| point release (9.3 or 9.4, I forget which).
|
| I'm not sure how much early warning RH gives to folks when a
| kernel change comes in via a point release. Looking at
| https://www.redhat.com/en/blog/upcoming-improvements-red-
| hat..., it seems like it's changing for 9.5. I hope
| CrowdStrike will be able to start testing against those beta
| kernels.
| Taniwha wrote:
| Really the underlying problem here is that their software is
| loading external data into their kernel driver and not correctly
| sanitising their inputs
| xvector wrote:
| I find it absolutely insane they wouldn't be doing this. At the
| level their software operates, it's sheer negligence to not
| sanitize inputs.
| blackeyeblitzar wrote:
| I wonder if it's for performance reasons.
| prisenco wrote:
| Maybe, maybe, but if it's not in a hot loop, why would the
| performance gain be worth it?
| silisili wrote:
| I'm not overly familiar with crowdstrike processes, but
| assume they are long running. If it's all loaded to memory,
| eg a config, I can't see how you'd get any performance gain
| at all. It just seems lazy.
| 0xDEADFED5 wrote:
| wild speculation aside, i'd say a little less performance
| is preferable to this outcome.
| dboreham wrote:
| It's for incompetence reasons.
| Taniwha wrote:
| The other issue is that they push to everyone - as someone who
| at my last job had a million boxes in the wild, and was very
| aware that bricking them all would kill the company we would
| NEVER push them all at once, we'd push a few 'friends and
| family' (ie practice each release on ourselves first), then do
| a few % of the customer base and wait for problems, then maybe
| 10%, wait again, then the rest.
|
| Of course we didn't have had any third party loading code into
| our boxes out of our control (and we run linux)
| szundi wrote:
| Same here. Also before the first phase, we test wether we can
| remote downgrade after upgrade.
| anothername12 wrote:
| I found windows confusing. In Linux speak, was this some kind of
| kernel module thing that CS installed? It's all I can think of
| for why the machines BSOD
| G3rn0ti wrote:
| It was a binary data file (supposedly invalid) that caused the
| actual CS driver component to BSOD. However, they used the
| ,,sys" suffix to make it look just like a driver supposedly to
| get Windows protection from a malicious actor to just delete
| it. AFAIU.
| stevekemp wrote:
| Windows filesystem protection doesn't rely upon the filename,
| but on the location.
|
| They could have named their files "foo.cfg", "foo.dat",
| "foo.bla" and been equally protected.
|
| The use of ".sys" here is probably related to the fact it is
| used by their system driver. I don't think anybody was trying
| to pretend the files there are system drivers themselves, and
| a quick look at the exports/disassembly would make that
| apparent anyway.
| G3rn0ti wrote:
| By-passing the discussion whether one actually needs root kit
| powered endpoint surveillance software such as CS perhaps an
| open-source solution would be a killer to move this whole sector
| to more ethical standards. So the main tool would be open source
| and it would be transparent what it does exactly and that it is
| free of backdoors or really bad bugs. It could be audited by the
| public. On the other hand it could still be a business model to
| supply malware signatures as a security team feeding this system.
| imiric wrote:
| I'd say no. Kolide is one such attempt, and their practices,
| and how it's used in companies, are as insidious as those from
| a proprietary product. As a user, it gives me no assurance that
| an open source surveillance rootkit is better tested and
| developed, or that it has my best interests in mind.
|
| The problem is the entire category of surveillance software. It
| should not exist. Companies that use it don't understand
| security, and don't trust their employees. They're not good
| places to work at.
| pxc wrote:
| I'm curious about this bad 'news' about Kolide. Could you
| tell me more about your experience with it?
| imiric wrote:
| I don't have first-hand experience with Kolide, as I
| refused to install it when it was pushed upon everyone in a
| company I worked for.
|
| Complaints voiced by others included false positives
| (flagging something as a threat when it wasn't, or alerting
| that a system wasn't in place when it was), being too
| intrusive and affecting their workflow, and privacy
| concerns (reading and reporting all files, web browsing
| history, etc.). There were others I'm not remembering, as I
| mostly tried to stay away from the discussion, but it was
| generally disliked by the (mostly technical) workforce.
| Everyone just accepted it as the company deemed it
| necessary to secure some enterprise customers.
|
| Also, Kolide's whole spiel about "honest security"[1] reeks
| of PR mumbo jumbo whose only purpose is to distance
| themselves from other "bad" solutions in the same space,
| when in reality they're not much different. It's built by
| Facebook alumni, after all, and relies on FB software
| (osquery).
|
| [1]: https://honest.security/
| DrRobinson wrote:
| I think some of the information here is misleading and a
| bit unfair.
|
| > being too intrusive and affecting their workflow
|
| Kolide is a reporting tool, it doesn't for example remove
| files or put them in quarantine. You also cannot execute
| commands remotely like in Crowdstrike. As you mentioned,
| it's based on osquery which makes it possible to query
| machine information using SQL. Usually, Kolide is
| configured to send a Slack message or email if there is a
| finding, which I guess can be seen as intrusive but IMO
| not very.
|
| > reading and reporting all files
|
| It does not read and report all files as far as I know,
| but I think it's possible to make SQL queries to read
| specific files. But all files or file names aren't stored
| in Kolide or anything like that. And that live query
| feature is audited (ens users can see all queries run
| against their machines) and can be disabled by
| administrators.
|
| > web browsing history
|
| This is not directly possible as far as I know, but maybe
| via a file read query but it's not something built-in out
| of the box/default. And again, custom queries are
| transparent to users and can be disabled.
|
| > Kolide's whole spiel about "honest security"[1] reeks
| of PR mumbo jumbo whose only purpose is to distance
| themselves from other "bad" solutions in the same space
|
| While it's definitely a PR thing, they might still
| believe in it and practice what they preach. To me it
| sounds like a good thing to differentiate oneself from
| bad actors.
|
| Kolide gives users full transparency of what data is
| collected via their Privacy Center, and they allow end
| users to make decisions about what to do about findings
| (if anything) rather than enforcing them.
|
| > It's built by Facebook alumni, after all, and relies on
| FB software (osquery).
|
| For example React and Semgrep is also built by
| Facebook/Facebook alumni, but I don't really see the
| relevance other than some ad-hominem.
|
| Full disclosure: No association with Kolide, just a happy
| user.
| madeofpalk wrote:
| Great news - Kolide has a new integration with Okta
| that'll prevent you from logging into anything if Kolide
| has a problem with your device!
| imiric wrote:
| I concede that I may be unreasonably biased against
| Kolide because of the type of software it is, but I think
| you're minimizing some of these issues. My memory may be
| vague on the specifics, but there were certainly many
| complaints in the areas I mentioned in the company I
| worked at.
|
| That said, since Kolide/osquery is a very flexible
| product, the complaints might not have been directed at
| the product itself, but at how it was configured by the
| security department as well. There are definitely some
| growing pains until the company finds the right balance
| of features that everyone finds acceptable.
|
| Re: intrusiveness, it doesn't matter that Kolide is a
| report-only tool. Although, it's also possible to install
| extensions[1,2] that give it a deeper control over the
| system.
|
| The problem is that the policies it enforces can
| negatively affect people's workflow. For example, forcing
| screen locking after a short period of inactivity has
| dubious security benefits if I'm working from a trusted
| environment like my home, yet it's highly disruptive.
| (No, the solution is not to track my location, or give me
| a setting I have to manage...) Forcing automatic system
| updates is also disruptive, since I want to update and
| reboot at my own schedule. Things like this add up, and
| the combination of all of them is equivalent to working
| in a babyproofed environment where I'm constantly
| monitored and nagged about issues that don't take any
| nuance into account, and at the end of the day do not
| improve security in the slightest.
|
| Re: web browsing history, I do remember one engineer
| looking into this and noticing that Kolide read their
| browser's profile files, and coming up with a way to read
| the contents of the history data in SQLite files. But I
| am very vague on the details, so I won't claim that this
| is something that Kolide enables by default. osquery
| developers are clearly against this kind of use case[3].
| It is concerning that the product can, in theory, be
| exploited to do this. It's also technically possible to
| pull any file from endpoints[4], so even if this is not
| directly possible, it could easily be done outside of
| Kolide/osquery itself.
|
| > Kolide gives users full transparency of what data is
| collected via their Privacy Center
|
| Honestly, why should I trust what that says? Facebook and
| Google also have privacy policies, yet have been caught
| violating their users' privacy numerous times. Trust is
| earned, not assumed based on "trust me, bro" statements.
|
| > For example React and Semgrep is also built by
| Facebook/Facebook alumni, but I don't really see the
| relevance other than some ad-hominem.
|
| Facebook has historically abused their users' privacy,
| and even has a Wikipedia article about it.[5] In the
| context of an EDR system, ensuring trust from users and
| handling their data with the utmost care w.r.t. their
| privacy are two of the most paramount features. Actually,
| it's a bit silly that Kolide/osquery is so vocal in favor
| of preserving user privacy, when this goes against
| working with employer-owned devices where employee
| privacy is definitely not expected. In any case, the fact
| this product is made by people who worked at a company
| built by exploiting its users is very relevant
| considering the type of software it is. React and Semgrep
| have an entirely different purpose.
|
| [1]: https://github.com/trailofbits/osquery-extensions
|
| [2]: https://github.com/hippwn/osquery-exec
|
| [3]: https://github.com/osquery/osquery/issues/7177
|
| [4]:
| https://osquery.readthedocs.io/en/stable/deployment/file-
| car...
|
| [5]: https://en.wikipedia.org/wiki/Privacy_concerns_with_
| Facebook
| chii wrote:
| whether you morally agree with surveillance software's
| purpose is not the same as whether a particular piece of
| surveillence software works well or not.
|
| I would imagine an open source version of crowdstrike would
| not have had such a bad outcome.
| imiric wrote:
| I disagree with the concept of surveillance altogether.
| Computer users should be educated about security, given
| control of their devices, and trusted that they will do the
| right thing. If a company can't do that, that's a sign that
| they don't have good security practices to begin with, and
| don't do a good job at hiring and training.
|
| The only reason this kind of software is used is so that
| companies can tick a certification checkbox that gives the
| appearance of running a tight ship.
|
| I realize it's the easy way out, and possibly the only
| practical solution for a large corporation, but then this
| type of issues is unavoidable. Whether the product is free
| or proprietary makes no difference.
| sooper wrote:
| Most people do not understand, or care to understand,
| what "security" means.
|
| You highlight training as a control. Training is
| expensive - to reduce cost and enhanced effectiveness,
| how do you focus training on those that need it without
| any method to identify those that do things in insecure
| ways?
|
| Additionally, I would say a major function of these
| systems is not surveillance at all - it is preventive
| controls to prevent compromise of your systems.
|
| Overall, your comment strikes me a naive and not based on
| operational experience.
| TeMPOraL wrote:
| This type of software is notorious for severely degrading
| employees' ability to do their jobs, occasionally
| preventing it entirely. It's a main reason why "shadow
| IT" is a thing - bullshit IT restrictions and endpoint
| security malware can't reach third-party SaaS' servers.
|
| This is to say, there are costs and threats caused by
| deploying these systems too, and they should be
| considered when making security decisions.
| jpc0 wrote:
| Explain exactly how any AV prevents a user from checking
| e-mails and opening word?
|
| The years I spent doing IT at that level, every time,
| every single time I got a request for admin privileges to
| be granted to a user or for software to be installed on
| an endpoint we already had a solution in place for
| exactly what the user wanted, installed and tested on
| their workstation that was taught in onboarding and they
| simply "forgot".
|
| Just like the users I had to reset their passwords for
| every monday because they forgot their passwords. It's an
| irritation but that doesn't mean they didn't do their job
| well. They met all performance expectations, they just
| needed to be handheld with technology .
|
| The real world isn't black and white and this isn't
| Reddit.
| TeMPOraL wrote:
| > _Explain exactly how any AV prevents a user from
| checking e-mails and opening word?_
|
| For example by doing continuous scans that consume so
| much CPU the machine stays thermally throttled at all
| times.
|
| (Yes, really. I've seen a colleague raising a ticket
| about AV making it near-impossible to do dev work, to
| which IT replied the company will reimburse them for a
| cooling pad for the laptop, and closed the issue as
| solved.)
|
| The problem is so bad that Microsoft, despite Defender
| being by far the lightest and least bullshit AV solution,
| created "dev drive", a designated drive that's excluded
| by design from Defender scanning, as a blatant workaround
| for corporate policies preventing users and admins from
| setting custom Defender exclusions. Before that, your
| only alternative was to run WSL2 or a regular VM, which
| are opaque to AVs, but that tends to be restricted by
| corporate too, because "sekhurity".
|
| And yes, people in these situations invent workarounds,
| such as VMs, unauthorized third-party SaaS, or using
| personal devices, because at the end of the day, the work
| still needs to be done. So all those security measures do
| is _reduce_ actual security.
| kchr wrote:
| Most AV and EDR solutions support exceptions, either on
| specific assets or fleets of assets. You can make
| exceptions for some employees (for example developers or
| IT) while keeping (sane) defaults for everybody else.
| Exceptions are usually applied on file paths, executable
| image names, file hashes, signature certificates or the
| complete asset. It sounds like people are applying these
| solutions wrong, which of course has a negative outcome
| for everybody and builds distrust.
| TeMPOraL wrote:
| In theory, those solutions could be used right. In
| practice, they never are.
|
| People making decisions about purchasing, deploying and
| configuring those systems are separated by many layers
| from rank-and-file employees. The impact on business
| downstream is diffuse and doesn't affect them directly,
| while the direct incentives they have are not aligned
| with the overall business operations. The top doesn't
| _feel_ the damage this is doing, and the bottom has no
| way of communicating it in a way that will be heard.
|
| It does build distrust, but not necessarily in the sense
| that "company thinks I'm a potential criminal" - rather,
| just the mundane expectation that work will continue to
| get more difficult to perform with every new announcement
| from the security team.
| jpc0 wrote:
| I'm going to just echo my sibling comment here. This
| seems like a management issue. If IT wouldn't help it was
| up to your management to intervene and say that it needs
| to be addressed.
|
| Also I'm unsure I've ever seen an AV even come close to
| stressing a machine I would spec for dev work. Likely
| misconfigured for the use case but I've been there and
| definitely understand the other side of the coin,
| sometimes a beer or pizza with someone high up at IT gets
| you much further than barking. We all live in a society
| with other people.
|
| I would also hazard a guess that the defender drive is
| more a matter of just making it easier for IT to do the
| right thing, requested by IT departments more than
| likely. I personally have my entire dev tree excluded
| from AV purely because of false positives on binaries and
| just unnecessary scans because the fines change content
| so regularly. That can be annoying to do with group
| policy if where that data is stored isn't mandated and
| then you have engineers who would be babies about "I
| really want my data in %USERPROFILE%/documents instead oF
| %USERPROFILE%/source" now IT can much easier just say
| that the Microsoft blessed solution is X and you need to
| use it.
|
| Regarding WSL, if it's needed for you job then go for it
| and have you manager out in a request. However if you are
| only doing it to circumvent IT restrictions, well don't
| expect anyone to play nice.
|
| On the person devices note. If there's company data on
| your device it and all it's content can be subpoenad in a
| court case. You really want that? Keep work and personal
| seperate, it really is better for all parties involved.
| TeMPOraL wrote:
| > _sometimes a beer or pizza with someone high up at IT
| gets you much further than barking. We all live in a
| society with other people._
|
| That's true, but it gets tricky in a large multinational,
| when the rules are set by some team in a different
| country, whose responsibilities are to the corporate HQ,
| and the IT department of the merged-in company I worked
| for has zero authority on the issue. I tried, I've also
| sent tickets up the chain, they all got politely ignored.
|
| From the POV of all the regular employees, it looks like
| this: there are some annoying restrictions here and
| there, and you learn how to navigate the CPU-eating AV
| scans; you adapt and learn how to do your work. Then one
| day, some sneaky group policy update kills one of your
| workarounds and you notice this by observing that
| compilation takes 5x as long as it used to, and git
| operations take 20x as long as they should. You find a
| way to deal (goodbye small commits). Then one day, you
| get an e-mail from corporate IT saying that they just
| partnered with ESET or CrowdStrike or ZScaler or not, and
| they'll be deploying the new software to everyone. Then
| they do, and everything goes to shit, and you need to
| start to triple every estimate from now on, as the new
| software noticeably slows down everything across the
| board. You think to yourself, at least corporate gave you
| top-of-the-line laptops with powerful CPUs and absurd
| amount of RAM; too bad for sales and managers who are
| likely using much weaker machines. And then you realize
| that sales and management were doing half their work in
| random third-party SaaS, and there is an ongoing process
| to reluctantly in-house some of the shadow IT that's been
| going on.
|
| Fortunately for me, in my various corporate jobs, I've
| always managed to cope by using Ubuntu VMs or (later)
| WSL2, and that this always managed to stay "in the clear"
| with company security rules. Even if it meant I had to
| figure out some nasty hacks to operate Windows compilers
| from inside Linux, or to stop the newest and bestest
| corporate VPN from blackholing all network traffic
| to/from WSL2 (was worth it, at least my work wasn't
| disrupted by the Docker Desktop licensing fiasco...). I
| never had to use personal devices, and I learned long ago
| to keep firm separation between private and work
| hardware, but for many people, this is a fuzzy boundary.
|
| There was one job where corporate installed a blatant
| keylogger on everyones' machines, and for a while, with
| our office IT's and our manager's blessing, our team
| managed to stave it off - and keep local admin rights -
| by conveniently forgetting to sign relevant consent
| forms. The bad taste this left was a major factor in me
| quitting that job few months later, though.
|
| Anyway, the point to these stories is, I've experienced
| first-hand how security in medium and large enterprises
| impacts day-to-day work. I fought both alongside and
| against IT departments over these. I know that most of
| the time, from the corporate HQ's perspective, it's
| difficult to quantify the impact of various security
| practices on everyone's day-to-day work (and I briefly
| worked _in_ cybersecurity, so I also know this isn 't
| even obvious to people this should be considered!). I
| also know that large organizations can eat _a lot_ of
| inefficiency without noticing it, because at that size,
| they have huge inertia. The corporate may not notice the
| work slowing down 2x across the board, when it 's still
| completing million-dollar contracts on time (negotiated
| accordingly). It just really sucks to work in this
| environment; the inefficiency has a way of touching your
| soul.
|
| EDIT:
|
| The worst is the learned helplessness. One day, you get
| fed up with Git taking 2+ minutes to make a goddamn
| commit, and you whine a bit on the team channel. You hope
| someone will point out you're just stupid and holding it
| wrong, but no - you get couple people saying "yeah,
| that's how it is", and one saying "yeah, I tried to get
| IT to fix that; they told me a cooling stand for the
| laptop should speed things a bit". You eventually learn
| that security people just don't care, or can't care, and
| you can only try to survive it.
|
| (And then you go through several mandatory cybersecurity
| trainings, and then you discover a dumb SQL injection bug
| in a new flagship project after 2 hours of playing with
| it, and start questioning your own sanity.)
| chrisjj wrote:
| > Computer users should be educated about security, given
| control of their devices, and trusted that they will do
| the right thing.
|
| Imagine you are a bank. Imagine you have no way to ensure
| no employee is a crook.
|
| It does happen.
| matwood wrote:
| > Imagine you have no way to ensure no employee is a
| crook.
|
| Wait, are you saying we have gotten rid of all the crooks
| in a bank/or those that handle money?
| WA wrote:
| > Companies that use it don't understand security
|
| What should these companies understand about security
| exactly?
|
| And aren't they kinda right to not trust their employees if
| they employ 50,000 people with different skills and
| intentions?
| Voultapher wrote:
| Security is a process not a product. Anyone selling you
| security as a product is scamming you.
|
| These endpoint security companies latch onto people making
| decisions, those people want security and these software
| vendors promise to make the process as easy as possible. No
| need to change the way a company operates, just buy our
| stuff and you're good. That's the scam.
| imiric wrote:
| Exactly, well said.
|
| Truthfully, it must be practically infeasible to
| transform security practices of a large company
| overnight. Most of the time they buy into these products
| because they're chasing a security certification (ISO
| 27001, SOC2, etc.), and by just deploying this to their
| entire fleet they get to sidestep the actually difficult
| part.
|
| The irony is that at the end of this they're not anymore
| "secure" than they were before, but since they have the
| certification, their customers trust that they are. It's
| security theater 101.
| InsideOutSanta wrote:
| "And aren't they kinda right to not trust their employees
| if they employ 50,000 people with different skills and
| intentions?"
|
| Yes, in a 50k employee company, the CEO won't know every
| single employee and be able to vouch for their skills and
| intentions.
|
| But in a non-dysfunctional company, you have a hierarchy of
| trust, where each management level knows and trusts the
| people above and below them. You also have siloed data,
| where people have access to the specific things they need
| to do their jobs. And you have disaster mitigation
| mechanisms for when things go wrong.
|
| Having worked in companies of different sizes and with
| different trust cultures, I do think that problems start to
| arise when you add things like individual monitoring and
| control. You're basically telling people that you don't
| trust them, which makes them see their employer in an
| adversarial role, which actually makes them start to behave
| less trustworthy, which further diminishes trust across the
| company, harms collaboration, and eventually harms
| productivity and security.
| snotrockets wrote:
| That's a lie we tell children so they think the world is
| fair.
|
| A Marxist reading would suggest alienation, but a more
| modern one would realize that it is a bit more than that:
| to enable modern business practices (both good and bad!)
| we designed systems of management to remove or reduce
| trust and accountability in the org, yet maintain as
| similar results to a world that is more in line with the
| one you believe is possible.
|
| A security professional though would tell you that even
| in such a world, you can not expect even the most
| diligent folks to be able to identify all risks (e.g.
| phishing became so good, even professionals can't always
| discern the real from fake), or practice perfect opsec
| (which probably requires one to be a psychopath).
| protomolecule wrote:
| "But in a non-dysfunctional company, you have a hierarchy
| of trust, where each management level knows and trusts
| the people above and below them. "
|
| Even in a company of two sometimes a husband or a wife
| betrays the trust. Now multiply that probability by
| 50000.
| TeMPOraL wrote:
| Yet we don't apply total surveillance to people. The
| reason isn't just ethics and US constitution, but also
| that it's just not possible without destroying society.
| Same perhaps applies to computer systems.
| protomolecule wrote:
| Which is a completely different argument
| TeMPOraL wrote:
| I think it doesn't. I think that the kind of security the
| likes of CrowdStrike promise is fundamentally impossible
| to have, and pursuing it is a fool's errand.
| kemotep wrote:
| Setting aside the possibility of deploying an EDR like
| Crowdstrike just being a box ticking exercise for
| compliance or insurance purposes, can something like an
| EDR be used not because of a lack of trust but a desire
| to protect the environment?
|
| A user doesn't have to do anything wrong for the computer
| to become compromised, or even if they do, being able to
| limit the blast radius and lock down the computer or at
| least after the fact have collected the data to be able
| to identify what went wrong seems important.
|
| How would you secure a network of computers without an
| agent that can do anti-virus, detect anomalies, and
| remediate them? That is to say, how would you manage to
| secure it without doing something that has monitoring and
| lockdown capabilities? In your words, signaling that you
| do not trust the users?
| kchr wrote:
| This. From all the comments I've seen in the multiple
| posts and threads about the incident, this simple fact
| seems to be the least discussed. How else to protect a
| complex IT environment with thousands of assets in form
| of servers and workstations, without some kind of
| endpoint protection? Sure, these solutions like
| CrowdStrike et al are box-checking and risk transferring
| exercises in one sense, but they actually work as
| intended when it comes to protecting endpoints from novel
| malware and TTP:s. As long as they don't botch their own
| software, that is :D
| imiric wrote:
| > How else to protect a complex IT environment with
| thousands of assets in form of servers and workstations,
| without some kind of endpoint protection?
|
| There is no straightforward answer to this question.
| Assuming that your infrastructure is "secure" because you
| deployed an EDR solution is wrong. It only gives you a
| false sense of security.
|
| The reality is that security takes a lot of effort from
| everyone involved, and it starts by educating people.
| There is no quick bandaid solution to these problems,
| and, as with anything in IT, any approach has tradeoffs.
| In this case, and particularly after the recent events,
| it's evident that an EDR system is as much of a liability
| as it is an asset--perhaps even more so. You give away
| control of your systems to a 3rd party, and expect them
| to work flawlessly 100% of the time. The alarming thing
| is how much this particular vendor was trusted with
| critical parts of our civil infrastructure. It not only
| exposes us to operational failures due to negligence, but
| to attacks from actors who will seek to exploit that 3rd
| party.
| matwood wrote:
| > starts by educating people
|
| Any security certification has a section on regularly
| educating employees on the topic.
|
| To your point, I agree that companies are attempting to
| bypass the hard work by deploying a tool and thinking
| they are done.
| kchr wrote:
| Absolutely, training is key. Alas, managers don't seem to
| want their employees spending time on anything other than
| delivering profit and so the training courses are zipped
| through just to mark them as completed.
|
| Personally, I don't know how to solve that problem.
| kchr wrote:
| I totally agree. In my current work environment, we do
| deploy EDR but it is primarily for assets critical for
| delivering our main service to customers. Ironically,
| this incident caused them all to be unavailable and there
| is for sure a lesson to be learned here!
|
| It is not considered a silver bullet by the security
| team, rather a last-resort detection mechanism for
| suspicious behavior (for example if the network
| segmentation or access control fails, or someone managed
| to get foothold by other means). It also helps them
| identify which employees need more training as they keep
| downloading random executables from the web.
| morning-coffee wrote:
| It is a good question. Is there a possibility of
| fundamentally fixing software/hardware to eliminate the
| vectors that malware exploits to gain a foot hold at all?
| e.g. not storing return address on the stack or letting
| it be manipulated by callee? memory bounds enforcement,
| either statically at compile time, or with the help of
| hardware, to prevent writing past memory not yours? (Not
| asking about feasibility of coexisting with or migrating
| from the current world, just about the possibility of
| fundamentally solving this at all...)
| com wrote:
| Economic drivers spring to mind, possibly connected with
| civil or criminal liability in some cases.
|
| But this will be the work of at least two human
| generations; our tools and work practices are woefully
| inadequate, so even if the pointy haired bosses (fearing
| imprisonment for gratuitous failure) and grasping, greedy
| investors fear (for the destruction of "hard earned"
| capital), it's not going to be done in the snap of our
| fingers, not least because the people occupying
| technology industry - and this is an overgeneralisation,
| but I'm pretty angry so I'm going to let it stand - Just
| Don't Care Enough.
|
| If we cared, it would be nigh on impossible for my granny
| to get tricked to pop her Windows desktop by opening an
| attachment in her email client.
|
| It wouldn't be possible to sell (or buy!) cloud services
| for which we don't get security data in real time and
| signal about what our vendor advises to do if worst comes
| to worst.
|
| And on and on.
| mylastattempt wrote:
| I disagree. You seem to start from a premise that all
| people are honest, except those that aren't, but you
| don't work with or meet dishonest people, unless the
| employer sets himself up in an adversarial role?
|
| As the other reply to your comment said: the world is not
| 'fair' or 'honest', that's just a lie told to children.
| Apart from geuinely evil people, there are unlimited
| variables that dictate people's behavior. Culture,
| personality, nutrition, financial situation, mood,
| stress, bully coworkers, intrinsic values, etc etc. To
| think people are all fair and honest "unless" is a really
| harmful worldview to have and in my opinion the reason
| for a lot of bad things being allowed to happen and
| continue (troughout all society, not just work).
|
| Zero-trust in IT is just the digitized version of "trust
| is earned". In computers you can be more crude and direct
| about it, but it should be the same for social
| connections and interactions.
| matwood wrote:
| > You seem to start from a premise that all people are
| honest
|
| You have to start with that premise otherwise
| organizations and society fail. Every hour of every day,
| even people in high security organizations have
| opportunities to betray the trust bestowed on them.
| Software and processes are about keeping honest people
| honest. The dishonest ones you cannot do too much about
| but hope you limit the damage they can cause.
|
| If everyone is treated as dishonest then there will
| eventually be an organizational breakdown. Creativity,
| high productivity, etc... do not work in a low/zero trust
| environment.
| echoangle wrote:
| If your company is large enough, you can't really trust your
| employees. Do you really think google can trust their
| employees that not a single user does something stupid or
| even is actively malicious?
| iforgotpassword wrote:
| Limit their abilities using OS features? Have the vendor
| fix security issues rather than a third party incompetently
| slapping on band-aid?
|
| It's like you let one company build your office building
| and then bring in another contractor to randomly add walls
| and have others removed while having never looked at the
| blueprints and then one day "whoopsie, that was a
| supporting wall I guess".
|
| Why is it not just completely normal but even expected that
| an OS vendor can't build an OS properly, or that the admins
| can't properly configure it, but instead you need to
| install a bunch of crap that fucks around with OS internals
| in batshit crazy ways? I guess because it has a nice
| dashboard somewhere that says "you're protected". Checkbox
| software.
| lyu07282 wrote:
| The sensor basically monitors everything that's happening
| on the system and then uses heuristics and known attack
| vectors and behavior to for example then lock compromised
| systems down. For example a fileless malware that
| connects to a c&c and then begins to upload all local
| documents and stored passwords, then slowly enumerates
| every service the employee has access to for
| vulnerabilities.
|
| If you manage a fleet of tens of thousands of systems and
| you need to protect against well funded organized crime?
| Employees running malicious code under their user is a
| given and can't be prevented. Buying crowdstrike sensor
| doesn't seem like such a bad idea to me. What would you
| do instead?
| iforgotpassword wrote:
| > What would you do instead?
|
| As said, limit the user's abilities as much as possible
| with features of the OS and software in use. Maybe if you
| want those other metrics, use a firewall, but not a Tls-
| breaking virus scanning abomination that has all the same
| problems, but a simple one that can warn you on unusual
| traffic patterns. If soneone from accounting starts
| uploading a lot of data, connects to Google cloud when
| you don't use any of their products, that should be odd.
|
| If we're talking about organized crime, I'm not convinced
| crowdstrike in particular doesn't actually enlarge the
| attack surface. So we had what now as the cause, a
| malformed binary ruleset that the parser, running with
| kernel privileges, choked on and crashed the system.
| Because of course the parsing needs to happen in kernel
| space and not a sandboxed process. That's enough for me
| to make assumptions about the quality of the rest of the
| software, and answer the question regarding attack
| surface.
|
| Before this incident nobody ever really looked at this
| product at all from a security standpoint, maybe because
| it is (supposed to be) a security product and thus cannot
| have any flaws. But it seems now security researchers all
| over the planet start looking at this thing and are
| having a field day.
|
| Bill gates sent that infamous email in the early 2000s, I
| think after sasser hit the world, that security should be
| made the no1 priority for Windows. As much as I dislike
| windows for various reasons, I think overall Microsoft
| does a rather good job about this. Maybe it's time those
| companies behind these security products start taking
| security serious too?
| lyu07282 wrote:
| > Before this incident nobody ever really looked at this
| product at all from a security standpoint
|
| If you only knew how absurd of a statement that is. But
| in any case, there are just too many threats network
| IDS/IPS solutions won't help you with, any decent C2 will
| make it trivial to circumvent them. You can't limit the
| permissions of your employees to the point of being
| effective against such attacks while still being able to
| do their job.
| iforgotpassword wrote:
| > If you only knew how absurd of a statement that is.
|
| You don't seem to know either since you don't elaborate
| on this. As said, people are picking this apart on
| Twitter and mastodon right now. Give it a week or two and
| I bet we'll see a couple CVEs from this.
|
| For the rest of your post you seem to ignore the argument
| regarding attack surface, as well as the fact that there
| are companies not using this kind of software and
| apparently doing fine. But I guess we can just claim they
| are fully infiltrated and just don't know because they
| don't use crowdstrike. Are you working for crowdstrike by
| any chance?
|
| But sure, at the end of the day you're just gonna weigh
| the damage this outage did to your bottom line and the
| frequency you expect this to happen with, against a
| potential hack - however you even come up with the
| numbers here, maybe crowdstrike salespeople will help you
| out - and maybe tell yourself it's still worth it.
| 7952 wrote:
| In a sense the secure platform already exists. You use
| web apps as much as possible. You store data in cloud
| storage. You restrict local file access and execute
| permissions. Authenticate using passkeys.
|
| The trouble is that people still need local file access,
| and use network file shares. You have hundreds of apps
| used by a handful of users that need to run locally. And
| a few intranet apps that are mission critical and have
| dubious security. That creates the necessity for wrapping
| users in firewalls, vpns, tls interception, end point
| security etc. And the less well it all works the more you
| need to fill the gaps.
| ironbound wrote:
| Next you'll be saying "I dont need an immune system..."
|
| Fun fact an attacker only needs to steal credentials from the
| home directory to jump into a companies AWS account where all
| the juicy customer data lives, so there are reasons we want
| this control.
|
| Frankly I'd like to see the smart people complaining help
| write better solutions rather than hinder.
| pavel_pt wrote:
| If that's all it takes an attacker, you're doing AWS wrong.
| snotrockets wrote:
| Problem is that many do.
|
| Doing it right requires very capable individuals and a
| significant effort. Less than it used to take, more than
| most companies are ready to invest.
| ironbound wrote:
| people get lazy
| hello_moto wrote:
| This is the real world, everyone is doing something
| wrong.
|
| The alternative is to replace you with AI yes?
| matheusmoreira wrote:
| There are no "ethical standards" to move to. Nobody should be
| able to usurp control of our computers. That should simply be
| declared illegal. Creating contractual obligations that require
| people to cede control of their computers should also be
| prohibited. Anything that does this is _malware_ and malware
| does not become justified or "ethical" when some corporation
| does it. Open source malware is still malware.
| callalex wrote:
| What does "our computer" mean when it is not owned by you,
| but issued to you to perform a task with by your employer?
| Does that also apply to the operator at a switchboard in a
| nuclear missile launch facility?
| z3phyr wrote:
| Does the switchboard in a nuclear missile launch facility
| run Crowdstrike? I picture it as a high quality analog
| circuit board that does 1 thing and 1 thing only. No way to
| run anything else.
|
| Globally networked personal computers were kind of cultural
| revolution against the setting you describe. Everyone had
| their own private compute and compute time and everyone
| could share their own opinion. Computers became our
| personal extensions. This is what IBM, Atari, Commodore,
| Be, Microsoft and Apple (and later desktop Linux) sold. Now
| given this ideology, can a company own my limbs? If not,
| they can't own my computers.
| derefr wrote:
| > What does "our computer" mean when it is not owned by
| you, but issued to you to perform a task with by your
| employer?
|
| Well, presuming that:
|
| 1. the employee is issued a computer, that they have
| _possession_ of even if not _ownership_ (i.e. they bring
| the computer home with them, etc.)
|
| 2. and the employee is required to perform
| creative/intellectual labor activities on this computer --
| implying that they do things like connecting their online
| accounts to this computer; installing software on this
| computer (whether themselves or by asking IT to do it);
| doing general web-browsing on this computer; etc.
|
| 3. and where the extent of their job duties, blurs the line
| between "work" and "not work" (most salaried intellectual-
| labor jobs are like this) such that the employee basically
| "lives in" this computer, even when not at work...
|
| 4. ...to the point that the employee could reasonably
| conclude that it'd be silly for them to maintain a separate
| "personal" computer -- and so would potentially _sell_ any
| such devices (if they owned any), leaving them _dependent_
| on this employer-issued computer for all their computing
| needs...
|
| ...then I would argue that, by the same chain of reasoning
| as in the GP post, employers _should not be legally
| permitted_ to "issue" employees such devices.
|
| Instead, the employer should either _purchase_ such
| equipment for the employee, giving it to them permanently
| as a taxable benefit; or they should require that the
| employee purchase it themselves, and recompense them for
| doing so.
|
| Cyberpunk analogy: imagine you are a brain in a vat. Should
| your employer be able to purchase an arbitrary android body
| for you; make you use it while at work; and stuff it full
| of monitoring and DRM? No, that'd be awful.
|
| Same analogy, but with the veil stripped off: imagine you
| are paraplegic. Should your employer be allowed to issue
| you an arbitrary specific _wheelchair_ , and require you to
| use it at work, and then monitor everything you do with it
| / limit what you can do with it because it's "theirs"? No,
| that'd be ridiculous. And _humanity already knows that_ --
| employers _already_ can 't do that, in any country with
| even a shred of awareness about accessibility devices. The
| employer -- or very much more likely, the employer's
| insurance provider -- just buys the person the chair. And
| then it's _the employee 's_ chair.
|
| And yes, by exactly the same logic, this also means that
| issuing an employee a _company car_ should be illegal -- at
| least in cases where the employee lives in a non-walkable
| area, and doesn 't already have another car (that they
| could afford to keep + maintain + insure); and/or where
| their commute is long enough that they'd do most non-
| employment-related car-requiring things around work and
| thus using their company car. Just buy them a car. (Or, if
| you're worried they might run away with it, then _lease-to-
| own_ them a car -- i.e. where their "equity in the car" is
| in the form of options that vest over time, right along-
| side any equity they have in the company itself.)
|
| > Does that also apply to the operator at a switchboard...
|
| Actually, no! Because an operator of a switchboard is not a
| "user" of the computer that powers the switchboard, in the
| same sense that a regular person sitting at a workstation
| is a "user" of the workstation.
|
| The system in this case is a "kiosk computer", and the
| operator is performing a prescribed domain-specific
| function through a limited UX they're locked into by said
| system. The operator of a nuclear power plant is akin to a
| customer ordering food from a fast-food kiosk -- just
| providing slightly more mission-critical inputs. (Or, for a
| maybe better analogy: they're akin to a transit security
| officer using one of those scanner kiosk-handhelds to check
| people's tickets.)
|
| If the "computer" the nuclear-plant operator was operating,
| exposed a purely electromechanical UX rather than a digital
| one -- switches and knobs and LEDs rather than screens and
| keyboards[1] -- then nothing about the operator's workflow
| would change. Which means that the operator isn't truly
| _computing_ with the computer; they 're just _interacting
| with an interface_ that _happens_ to be a computer.
|
| [1] ...which, in fact, "modern" nuclear plants are. The UX
| for a nuclear power plant control-center has not changed
| much since the 1960s; the sort of "just make it a
| touchscreen"-ification that has infected e.g. automotive
| has thankfully not made its way into these more mission-
| critical systems yet. (I believe it's all computers _under
| the hood_ now, but those computers are GPIO-relayed up to
| panels with lots and lots of analogue controls. Or maybe
| those panels are USB HID devices these days; I dunno, I 'm
| not a nuclear control-systems engineer.)
|
| Anyway, in the general case, you can recognize these "the
| operator is just interacting with an interface, not
| computing on a computer" cases because:
|
| * The machine has separate system administrators who log
| onto it frequently -- less like a workstation, more like a
| server.
|
| * The machine is never allowed to run anything other than
| the kiosk app (which might be some kind of custom launcher
| providing several kiosk apps, but where these are all
| business-domain specific apps, with none of them being
| general-purpose "use this device as a computer" apps.)
|
| * The machine is set up to use domain login rather than
| local login, and keeps no local per-user state; or, more
| often, the machine is configured to auto-login to an "app
| user" account (in modern Windows, this would be a Mandatory
| User Profile) -- and then the actual user authentication
| mechanism is built into the kiosk app itself.
|
| * _Hopefully_ , the machine is using an embedded version of
| the OS, which has had all general-purpose software stripped
| out of it to remove vulnerability surface.
| derefr wrote:
| Tangent -- a question you didn't ask, but I'll pretend
| you did:
|
| > If employers allowed employees to "bring their own
| devices", and then _didn 't_ force said employees to run
| MDM software on those devices, then how in the world
| could the employer guarantee the integrity of any line-
| of-business software the employee must run on the device;
| impose controls to stop PII + customer-shared data +
| trade secrets from being leaked outside the domain; and
| so forth?
|
| My answer to that question: it's safe to say that most
| people in the modern day _are_ fine with the compromise
| that your device might be 100% yours most of the time;
| but, when necessary -- _when you decide it to be so_ --
| 99% yours, 1% someone else 's.
|
| For example, anti-cheat software in online games.
|
| The anti-cheat logic in online games, is this little
| nugget of code that runs on a little sub-computer within
| your computer (Intel SGX or equivalent.) This sub-
| computer acts as a "black box" -- it's something the root
| user of the PC can't introspect or tamper with. However:
|
| * Whenever you're not playing a game, the anti-cheat
| software _isn 't loaded_. So most of the time, your
| computer is _entirely_ yours.
|
| * _You_ get to decide when to play an online game, and
| you are explicitly aware of doing so.
|
| * When you _are_ playing an online game, most of your
| computer -- the CPU 's "application cores", and 99% of
| the RAM -- is still 100% under your control. The anti-
| cheat software isn't _actually_ a rootkit (despite what
| some people say); it can 't affect any app that doesn't
| explicitly hook into it.
|
| * In a brute-force sense, you still "control" the little
| sub-computer as well -- in that you can _force it to stop
| running whatever it 's running_ whenever you want. SGX
| and the like aren't like Intel's Management Engine (which
| really _could_ be used by a state actor to plant a non-
| removable "ring -3" rootkit on your PC); instead, SGX is
| more like a TPM, or an FPGA: it's something that's
| ultimately controlled _by_ the CPU from ring 0, just with
| a very circumscribed API that doesn 't give the CPU the
| ability to "get in the way" of a workload once the CPU
| has deployed that workload to it, other than by shutting
| that workload off.
|
| As much as people like Richard Stallman might freak out
| at the above design, it really _isn 't_ the same thing as
| your employer having root on your wheelchair. It's more
| like how someone in a wheelchair knows that if they get
| on a plane, then they're not allowed to wheel their own
| wheelchair around on the plane, and a flight attendant
| will instead be doing that for them.
|
| How does that translate to employer MDM software?
|
| Well, there's no clear translation currently, because
| we're currently in a paradigm that favors employer-issued
| devices.
|
| But here's what we _could_ do:
|
| * Modern PCs are powerful enough that anything a
| corporation wants you to do, can be done in a
| corporation-issued VM that runs on the computer.
|
| * The employer could then require the installation of an
| integrity-verification extension (essentially "anti-cheat
| for VMs") that ensures that the VM itself, and the
| hypervisor software that runs it, and the host kernel the
| hypervisor is running on top of, all haven't been
| tampered with. (If any of them were, then the extension
| wouldn't be able to sign a remote-attestation packet, and
| the employer's server in turn wouldn't return a
| decryption key for the VM, so the VM wouldn't start.)
|
| * The employer could feel free to MDM the _VM guest
| kernel_ -- but they likely wouldn 't _need_ to, as they
| could instead just lock it down in much-more-severe ways
| (the sorts of approaches you use to lock down a server!
| or a kiosk computer!) that would make a general-purpose
| PC next-to-useless, but which would be fine in the
| context of a VM running only line-of-business software.
| (Remember, all your general-purpose "personal computer"
| software would be running _outside_ the VM. Web browsing?
| Outside the VM. The VM is just for interacting with
| Intranet apps, reading secure email, etc.)
|
| (Why yes, I _am_ describing
| https://en.wikipedia.org/wiki/Multilevel_security.)
| matheusmoreira wrote:
| > For example, anti-cheat software in online games
|
| > The anti-cheat software isn't actually a rootkit
| (despite what some people say); it can't affect any app
| that doesn't explicitly hook into it.
|
| Out of all examples you could have cited, you chose this
| one.
|
| https://www.theregister.com/2016/09/23/capcom_street_figh
| ter...
|
| https://twitter.com/TheWack0lian/status/77939784076224512
| 4
|
| There you go. An anti-cheat rootkit so ineptly coded it
| serves as literal privilege escalation as a service. Can
| we stop normalizing this stuff already?
|
| My computer is my computer, and your computer is your
| computer.
|
| The game company owns _their servers_ , not my computer.
| If their game runs on my machine, then cheating is my
| prerrogative. It is quite literally an exercise of my
| computer freedom if I decide to change the game's state
| to give myself infinite health or see through walls or
| whatever. It's not their business what software I run on
| my computer. I can do whatever I want.
|
| It's my machine. I am the _god_ of this domain. The game
| doesn 't get to protect itself from me. It _will_ bend to
| my will if I so decide. It doesn 't have a choice in the
| matter. Anything that strips me of this divine power
| should be straight up illegal. I don't care what the
| consequences are for corporations, they should not get to
| usurp me. They don't get to create little
| extraterritorial islands in our domains where they have
| higher power and control than we do.
|
| I don't try to own their servers and mess with the code
| running on them. They owe me the exact same respect in
| return.
| valicord wrote:
| > the employee could reasonably conclude that it'd be
| silly for them to maintain a separate "personal" computer
| -- and so would potentially sell any such devices
|
| What a bizarre leap of logic. Can Fedex employees
| reasonably sell their non-uniform clothes? Just because
| the employer in this scenario didn't 100% lock down the
| computer (which is a good thing because the alternative
| would be incredibly annoying for day-to-day work),
| doesn't mean the the employee can treat it as their own.
| Even from the privacy perspective, it would be pretty
| silly. Are you going to use the employer provided
| computer to apply to your next job?
| derefr wrote:
| People do _do_ it, though. Especially poor people, who
| might not use their personal computers very often.
|
| Also, many people don't own a separate "personal"
| computer in the first place. Especially, again, poor
| people. (I know many people who, if needing to use "a PC"
| for something, would go to a public library to use the
| computers there.)
|
| Not every job is a software dev position in the Bay Area,
| where everyone has enough disposable income to have a
| pile of old technology laying around. Many jobs for which
| you might be issued a work laptop still might not pay
| enough to get you above the poverty line. McDonald's
| managers are issued work laptops, for instance.
|
| (Also, disregarding economic class for a moment: in the
| modern day, most people who aren't in tech solve most of
| their computing problems by owning _a smartphone_ , and
| so are unlikely to have a full _PC_ at home. But their
| phone can 't do everything, so if they have a work
| computer they happen to be sat in front of for hours each
| day -- whether one issued to them, or a fixed workstation
| _at work_ -- then they 'll default to doing their rare
| personal "productivity" tasks on that work computer. And
| yes, this _does_ include updating their CV!)
|
| ---
|
| Maybe you can see it more clearly with the case of
| company cars.
|
| People sometimes don't own any other car (that actually
| works) until they get issued a company car; so they end
| up using their company car for everything. (Think
| especially: tradespeople using their company-logo-branded
| work box-truck for everything. Where I live, every third
| vehicle in any parking lot is one of those.)
|
| And people -- especially poorer people -- also often sell
| their personal vehicle when they are issued a company
| car, because this 1. releases them from the need to pay a
| lease + insurance on that vehicle, and 2. gets them
| possibly tens of thousands of dollars in a lump sum (that
| they _don 't_ need to immediately reinvest into another
| car, because they can now rely on the company car.)
| valicord wrote:
| The point is that if you do do it, it's on you to
| understand the limitations of using someone else
| property. Just like the difference between rental vs
| owned housing.
|
| There are also fairly obvious differences between work-
| issued computers and all of your other analogies:
|
| 1. A car (and presumably the cyberpunk android body) is
| much more expensive than a computer, so the downside of
| owning both a personal and a work one is much higher.
|
| 2. A chair or a wheel chair doesn't need security
| monitoring because it's a chair (I guess you could come
| up with an incredibly convoluted scenario where it would
| make sense to put GPS tracking in a wheelchair, but come
| on).
|
| > just buys the person the chair. And then it's the
| employee's chair.
|
| It's not because there's a law against loaning chairs,
| it's because the chair is likely customized for a
| specific person and can't be reused. Or if you're talking
| about WFH scenarios, they just don't want to bother with
| return shipping.
| derefr wrote:
| No, it's the difference between owned housing vs renting
| from _a landlord who is also your boss in a company town_
| , where the landlord has a vested interest in e.g.
| preventing you from using your apartment to also do work
| for a competitor.
|
| Which is, again, a situation _so_ shitty that we 've
| outlawed it entirely! And then also imposed further
| regulations on regular, non-employer landlords, about
| what kinds of conditions they can impose on tenants.
| (E.g. in most jurisdictions, your landlord can't restrict
| you from having guests stay the night in your room.)
|
| Tenants' rights are actually a great analogy for what I'm
| talking about here. A company-issued laptop is very much
| like an apartment, in that you're "living in it"
| (literally and figuratively, respectively), and that you
| therefore _should_ deserve certain rights to autonomous
| possession /use, privacy, freedom from
| restriction/compromise in use, etc.
|
| While you don't literally own an apartment you're
| renting, the law tries to, as much as possible, give
| tenants the rights of someone who _does_ own that
| property; and to restrict the set of legal justifications
| that a landlord can use to punish someone for exercising
| those (temporary) rights over their property.
|
| IMHO having the equivalent of "tenants' rights" for
| something like a laptop is silly, because that'd be a lot
| of additional legal edifice for not-much gain. But,
| unlike with real-estate rental, it'd actually be quite
| practical to just make the "tenancy" case of company IT
| equipment use impossible/illegal -- forcing employers to
| do something else instead -- something that _doesn 't_
| force employees into the sort of legal area that would
| make "tenants' rights" considerations applicable in the
| first place.
| valicord wrote:
| No, that would be more like sleeping at the office
| (purely because of employee preferences, not because the
| employer forces you to or anything like that) and
| complaining about security cameras.
| eptcyka wrote:
| Yes, that is why the owners of the computers (corps) use
| these tools - to maintain control over their hardware (and IP
| accessible on it). The end user is not the customer or user
| here.
| cqqxo4zV46cp wrote:
| Oh stop it. It's not your machine, it's your employer's
| machine. You're the user of the machine. You're cargo-culting
| some ideological take that doesn't apply here at all.
| imiric wrote:
| > It's not your machine, it's your employer's machine.
|
| Agreed. I'm fine with this, as long as the employer also
| accepts that I will never use a personal device for work,
| that I will never use a minute of personal time for work,
| and that my productivity is significantly affected by
| working on devices and systems provided and configured by
| the employer. This knife cuts both ways.
| fragmede wrote:
| If only that were possible. Luckily for my employer, I
| end up thinking about problems to be solved during my off
| hours like when I'm sleeping and in the shower. Then
| again, I also think about non-work life problems sitting
| at my desk when I'm supposed to be working, so
| (hopefully) it evens out.
| imiric wrote:
| I don't think it's possible either. But the moment my
| employer forces me to install a surveillance rootkit on
| the machine I use for work--regardless of who owns the
| machine--any trust that existed in the relationship is
| broken. And trust is paramount, even in professional
| settings.
| valicord wrote:
| Setting aside the question whether these security tools
| are effective at their stated goal, what does this have
| to do with trust at all? Does the existence of a bank
| vault break the trust between the bank and the tellers?
| What is the mechanism that would prevent your computer
| from getting infected by a 0-day if only your employer
| trusted you?
| imiric wrote:
| > Does the existence of a bank vault break the trust
| between the bank and the tellers?
|
| That's a strange analogy, since the vault is meant to
| safeguard customer assets from the public, not from bank
| employees. Besides, the vault doesn't make the teller's
| job more difficult.
|
| > What is the mechanism that would prevent your computer
| from getting infected by a 0-day if only your employer
| trusted you?
|
| There isn't one. What my employer does is trust that I
| take care of their assets and follow good security
| practices to the best of my abilities. Making me install
| monitoring software is an explicit admission that they
| don't trust me to do this, and with that they also break
| my trust in them.
| valicord wrote:
| You mean like AV software is meant to safeguard the
| computer from malware? I'm sure banks have a lot of
| annoying security related processes that make teller's
| job more difficult.
| mr_mitm wrote:
| If you don't already have an anti virus on your work
| machine, you're in a extremely small minority. As a
| consultant with projects that go about a week, I've
| experienced the onboarding process of over a hundred orgs
| first hand. They almost all hand out a Windows laptop,
| and every single Windows laptop had an AV on it. It's
| considered negligent not to have some AV solution in the
| corporate world. And these days, almost all the fancy AVs
| live in the kernel.
| imiric wrote:
| I don't doubt that to be the case, but I'm happy to not
| work in corporate environments (anymore...). :)
| kchr wrote:
| My experience is that in these workplaces where EDR is
| enforced on all devices used for work, your hypothetical
| is true (i.e. you are not expected to work on devices not
| provided by your employer - on the contrary, that is most
| likely forbidden).
| plantain wrote:
| There is an open source alternative. GRR:
|
| https://github.com/google/grr
|
| Every Google client device has it.
| G3rn0ti wrote:
| It sounds really interesting. But the only thing it does not
| do is scanning for vira/malwares, although this could be
| implemented using GRR I guess. How does Google mitigate
| malware threats in-house?
| giantpotato wrote:
| > _By-passing the discussion whether one actually needs root
| kit powered endpoint surveillance software such as CS perhaps
| an open-source solution would be a killer to move this whole
| sector to more ethical standards._
|
| As a red teamer developing malware for my team to evade EDR
| solutions we come across, I can tell you that EDR systems are
| essential. The phrase "root kit powered endpoint surveillance"
| is a mischaracterization, often fueled by misconceptions from
| the gaming community. These tools provide essential protection
| against sophisticated threats, and they catch them. Without
| them, my job would be 90% easier when doing a test where
| Windows boxes are included.
|
| > _So the main tool would be open source and it would be
| transparent what it does exactly and that it is free of
| backdoors or really bad bugs._
|
| Open-source EDR solutions, like OpenEDR [1], exist but are
| outdated and offer poor telemetry. Assembling various GitHub
| POCs that exist for production EDR is impractical and insecure.
|
| The EDR sensor itself becomes the targeted thing. As a threat
| actor, the EDR is the only thing in your way most of the time.
| Open sourcing them increases the risk of attackers contributing
| malicious code to slow down development or introduce
| vulnerabilities. It becomes a nightmare for development, as you
| can't be sure who is on the other side of the pull request. TAs
| will do everything to slow down the development of a security
| sensor. It is a very adversarial atmosphere.
|
| > _On the other hand it could still be a business model to
| supply malware signatures as a security team feeding this
| system._
|
| It is actually the other way around. Open-source malware
| heuristic rules do exist, such as Elastic Security's detection
| rules [2]. Elastic also provides EDR solutions that include
| kernel drivers and is, in my experience, the harder one to
| bypass. Again, please make an EDR without drivers for Windows,
| it makes my job easier.
|
| > *It could be audited by the public."
|
| The EDR sensors already do get "audited" by security
| researchers and the threat actors themselves. Reverse
| engineering and debugging the EDR sensors to spot weaknesses
| that can be "abused." If I spot things like the EDR just
| plainly accepting kernel mode shellcode and executing it, I
| will, of course, publicly disclose that. EDR sensors are under
| a lot of scrutiny.
|
| [1] https://github.com/ComodoSecurity/openedr [2]
| https://github.com/elastic/detection-rules
| manquer wrote:
| > Open sourcing them increases the risk of attackers
| contributing malicious code to slow down development or
| introduce vulnerabilities.
|
| This is a such tired non-sequitur argument with no evidence
| whatsoever to back it up that the risk is actually higher for
| open source versus closed source.
|
| I can just easily argue that a state or non-state actor could
| buy[1], bribe or simply threaten to get weak code in a
| proprietary system, without users having any means to ever
| find out. On the other hand, it is always easier(easier not
| easy) to discover compromise in open-source like it happened
| with xz[2] and verify such reports independently.
|
| If there is no proof that compromise is less likely with
| closed source and it is far easier to discover them in open-
| source, the logical conclusion is simply open source is
| better for security libraries.
|
| Funding defensive security infrastructure which is open
| source and freely available for everyone to use even with
| 1/100th of the NSA budget that is effectively only offensive,
| would improve info-security enormously for everyone not just
| from nation state actors, but also from scammers etc. Instead
| we get companies like CS that have enormous vested interest
| in seeing that never happens and trying to scare the rest of
| us that open-source is bad for security.
|
| [1] https://en.wikipedia.org/wiki/Dual_EC_DRBG
|
| [2] https://en.wikipedia.org/wiki/XZ_Utils_backdoor
| jpc0 wrote:
| I have a different take on this.
|
| I feel having the solution open sourced isn't bad from a
| code security standpoint, but rathee that it is simply not
| economically viable. To my knowledge most of the major open
| source technologies are currently funded by FAANG and
| purely because it's needed by them to conduct business and
| the moment it becomes inconvenient for them to support it
| they fork it or develop their own, see Terraform/Redis...
|
| I also cannot get behind a government funding model purely
| because it will simply become a design by committee
| nightmare because this isn't flashy tech. Just see how many
| private companies have beaten NASA to market in a pretty
| well funded and very flashy industry. The very government
| you want to fund these solutions are currently running on
| private companies infrastructure for all their IT needs.
|
| Yes opensouring is definitely amazing and if executed well
| will be better, just like communism.
| manquer wrote:
| Plenty of fundamental research and development happens in
| academia fairly effectively.
|
| Government has to fund not run it like any other grant
| works today. The existing foundations and non profits
| like Apache or even mixed ones like Mozilla are fairly
| capable of handling the grants.
|
| Expecting private companies or dedicated volunteers to
| maintain mission critical libraries like xz is not a
| viable option as we are doing it now.
| jpc0 wrote:
| Seems like we agree then. There is a middle point and I
| would actually prefer for it to be some sort of open
| source one.
| mardifoufs wrote:
| I could see an open source solution with "private" or
| vendor specific definition files. But I think I'd disagree
| with the statement that open sourcing everything wouldn't
| cause any problem. Engineering isn't necessarily about peer
| reviewed studies, it's about empirical observations and
| applying the engineering method (which can be complemented
| by a more scientific one but shouldn't be confused for it).
| It's clear that this type of stuff is a game of cat and
| mouse. Attackers search for any possible vulnerability,
| bypass etc. It does make sense that exposing one side's
| machinery will make it easier for the other side to see how
| it works. A good example of that is how active hackers are
| at finding different ways to bypass Windows Defender by
| using certain types of Office file formats, or certain
| combinations of file conversions to execute code. Exposing
| the code would just make all of those immediately visible
| to everyone.
|
| Eventually that's something that gets exposed anyways, but
| I think the crucial part is timing and being a few steps
| ahead in the cat and mouse game. Otherwise I'm not sure
| what kind of proof would even be meaningful here.
| manquer wrote:
| > open sourcing everything wouldn't cause any problem
|
| That is not what am saying, I am saying open sourcing
| doesn't cause more problems than proprietary systems
| which is the argument OP was making .
|
| Open source is not a panacea, it is just not objectively
| worse as OP implies.
| aforwardslash wrote:
| I actually agree there is no intrinsic advantage in
| having this piece of software as opensource - closed
| teams tend to have a more contained collaborator "blast
| radius", and you don't have 500 forks with patches that
| may modify behaviour in a subtle way and that are somehow
| conflated with the original project.
|
| On the other hand, anyone serious about malware
| development already has "the actual source code", either
| for defensive operations and offensive operations.
| sudosysgen wrote:
| > The phrase "root kit powered endpoint surveillance" is a
| mischaracterization, often fueled by misconceptions from the
| gaming community.
|
| How exactly is this is mischaracterization? Technically these
| EDR tools are identical to kernel level anticheat and they
| are identical to rootkits, because fundamentally they're all
| the same thing just with a different owner. If you disagree
| it would be nice if you explained why.
|
| As for open source EDRs becoming the target, this is just as
| true of closed source EDR. Cortex for example was hilariously
| easy to exploit for years and years until someone was nice
| enough to tell them as much. This event from CrowdStrike
| means that it's probably just as true here.
|
| The fact that the EDR is 90% of the work of attacking a
| Windows network isn't a sign that we should continue using
| EDRs. It means that nothing privileged should be in a Windows
| network. This isn't that complicated, I've administered such
| a network where everything important was on Linux while end
| users could run Windows clients, and if anything it's easier
| than doing a modern Windows/AD deployment. Good luck pivoting
| from one computer to another when they're completely isolated
| through a Linux server you have no credentials for. No
| endpoint should have any credentials that are valid anywhere
| except on the endpoint itself and no two endpoints should be
| talking to each other directly: this is in fact not very
| restrictive to end users and completely shuts down lateral
| movement - it's a far better solution than convoluted and
| insecure EDR schemes that claim to provide zero-trust but
| fundamentally can't, while following this simple rule
| actually provides you zero-trust.
|
| Look at it this way - if you (and other redteamers) can
| economically get past EDR systems for the cost of a pentest,
| what do you think competent hackers with economies of scale
| and million dollar payouts can do? For now there's enough
| systems without EDRs that many just won't bother, but as it
| spread more they will just be exploited more. This is true as
| well of the technical analogue in kernel anticheat, which you
| and I can bypass in a couple days of work.
|
| Where we are is that we're using EDRs as a patch over a
| fundamentally insecure security model in a misguided attempt
| to keep the convenience that insecurity brings.
| ndr_ wrote:
| There used to be Winpooch Watchguard, based on ClamAV. Stopped
| using it when it caused Bluescreens. A "Killer" indeed.
| intelVISA wrote:
| Security isn't really a product you can just buy or outsource,
| but here we are.
| kemotep wrote:
| Crowdstrike is a gun. A tool. But not the silver bullet. Or
| training to be able to fire it accurately under pressure at
| the werewolf.
|
| You can very easily shoot your own foot off instead of
| slaying the monster, use the wrong ammunition to be
| effective, or in this case a poorly crafted gun can explode
| in your hand when you are holding it.
| cedws wrote:
| The value CrowdStrike provides is the maintenance of the
| signature database, and being able to monitor attack campaigns
| worldwide. That takes a fair amount of resources that an open
| source project wouldn't have. It's a bit more complicated than
| a basic hash lookup program.
| ymck wrote:
| There are a number of OSS EDRs. They all suck.
|
| DAT-style content updates and signature-based prevention are
| very archaic. Directly loading content into memory and a hard-
| coded list of threats? I was honestly shocked that CS was still
| doing DAT-style updates in an age of ML and real-time threat
| feeds. There are a number of vendors who've offered it for
| almost a decade. We use one. We have to run updates a couple of
| times a year.
|
| SMH. The 90's want their endpoint tech back.
| iwontberude wrote:
| Crowdstrike isn't a company anymore, this is probably their end.
| The litigation will be death by thousand cuts.
| t0mas88 wrote:
| Has anyone looked into their terms and conditions? Usually any
| resulting damage from software malfunctioning is excluded. Only
| the software itself being unavailable may be an SLA breach.
|
| Typically there would also be some clauses where CS is the only
| one that is allowed to determine an SLA breach, SLA breaches
| only result in future licence credits no cash, and if you
| disagree it's limited to mandatory arbitration...
|
| The biggest impact is probably only their reputation taking a
| huge hit. Loosing some customers over this and making it harder
| to win future business.
| iwontberude wrote:
| They will still need to hire lawyers to prove this. Thousands
| of litigants. I am sure there is some tort which is not
| covered by the arbitration agreement that would give
| plaintiff standing no?
|
| Commenter on stack exchange had an interesting counter: In
| some jurisdictions, any attempt to sidestep consumer law may
| be interpreted by the courts as conspiracy, which can prove
| more serious than merely accepting the original penalties.
| chii wrote:
| > Thousands of litigants
|
| i would imagine a class action suit instead of individual
| cases if this were to happen.
| iwontberude wrote:
| Potentially we will see some, but this occurred in many
| jurisdictions across the world.
| disgruntledphd2 wrote:
| They'll be sued by the insurance companies probably.
| clwg wrote:
| No big company is going to agree to the terms and conditions
| that are listed on their website, they'll have their own
| schedules for indemnification that CS would agree to, not the
| other way around. Those 300 of the Fortune 500 companies are
| going to rip CS apart.
| joelthelion wrote:
| The stock market disagrees:
| https://www.google.com/finance/quote/CRWD:NASDAQ?window=5Y
|
| To be clear, I feel investors are a bit delusional, I just
| thought it was an interesting perspective to share.
| asynchronous wrote:
| They really are delusional, as a security person crowdstrike
| was overvalued before this event, and to everyone in tech
| this shows how bad their engineering practices are.
| chii wrote:
| but they are able to insert themselves into this many
| enterprise machines! So regardless of your security
| credentials, they made good business decisions.
|
| On the other hand, this may open the veil for a lot of
| companies to dump them.
| bni wrote:
| For another similar product from a competitor that there
| is no reason to believe are any better.
| Osiris wrote:
| Wow. Cause a global meltdown and only lose 18% of your stock
| value? They must be doing something that investors like.
| imtringued wrote:
| They are probably pivoting to charging ransoms aka
| "consulting fees" to fix crashing systems and those are
| priced in.
| aflag wrote:
| The stock market only had a day to react and they were also
| heavily affected by the issue. Let's see where the stock
| price goes in the following week.
| markus_zhang wrote:
| I'd bet $100 that Crowdstrike won't pay out more than $100m for
| that dozens of billions of damage.
| ai4ever wrote:
| software vendors should be required to face consequences of
| shipping a poor product.
|
| one possibility is: clawback or refunds for past payments equal
| to business damage caused by the flawed product.
| hypeatei wrote:
| I would say the companies compelling others to buy and
| install this shitty security software, e.g. cyber insurance,
| should also be punished.
| system2 wrote:
| Maybe one day people will learn what a blog is.
| delta_p_delta_x wrote:
| The moment I read 'it is a _content update_ that causes the BSOD,
| deleting it solves the problem ', I was immediately willing to
| bet a hundred quid (for the non-British, that's PS100) that it
| was a combination of said bad binary data and a poorly-written
| parser that didn't error out correctly upon reading invalid data
| (in this case, read an array of pointers, didn't verify that all
| of them were both non-null and pointed to valid data/code).
|
| In the past ten years or so of having done somewhat serious
| computing and zero cybersecurity whatsoever, I have my mind
| concluded, feel free to disagree.
|
| Approximately _100%_ of CVEs, crashes, bugs, slowdowns, and pain
| points of computing have to do with various forms of
| deserialising binary data back into machine-readable data
| structures. All because a) human programmers forget to account
| for edge cases, and b) imperative programming languages allow us
| to do so.
|
| This includes everything from: decompression algorithms; font
| outline readers; image, video, and audio parsers; video game data
| parsers; XML and HTML parsers; the various
| certificate/signature/key parsers in OpenSSL (and derivatives);
| and now, this CrowdStrike content parser in its EDR program.
|
| That wager stands, by the way, and I'm happy to up the ante by
| PS50 to account for my second theory.
| bostik wrote:
| > _Approximately 100% of CVEs, crashes, bugs, [...],
| deserialising binary data_
|
| I'd make that 98%. Outside of rounding errors in the margins,
| the remaining two percent is made up of logic bugs,
| configuration errors, bad defaults, and outright insecure
| design choices.
|
| Disclosure: infosec for more than three decades.
| epanchin wrote:
| They forgot to account for those edge cases
| delta_p_delta_x wrote:
| Heh, touche.
| delta_p_delta_x wrote:
| I feel vindicated but also a bit surprised that my gut
| feeling was this accurate.
| bostik wrote:
| Not really a surprise, to be honest. "Deserialisation"
| encapsulates most forms of injection attacks.
|
| OWASP top-10 was dominated by those for a very long time.
| They have only recently been overtaken by authorization
| failures.
| smackeyacky wrote:
| Hmmm. Most common problems these days are certificate related I
| would have thought. Binary data transfers are pretty rare in an
| age of base64 json bloat
| madaxe_again wrote:
| There are plenty of binary serialisation protocols out there,
| many proprietary - maybe you'll stuff that base64'd in a json
| container for transit, but you're still dealing with a binary
| decoder.
| Sakos wrote:
| I can't decide what's more damning. The fact that there was
| effectively no error/failure handling or this:
|
| > Note "channel updates ...bypassed client's staging controls
| and was rolled out to everyone regardless"
|
| > A few IT folks who had set the CS policy to ignore latest
| version confirmed this was, ya, bypassed, as this was "content"
| update (vs. a version update)
|
| If your content updates can break clients, they should not be
| able to bypass staging controls or policies.
| vladvasiliu wrote:
| The way I understand it, the policy the users can configure
| are about "agent versions". I don't think there's a setting
| for "content versions" you can toggle.
| sateesh wrote:
| Maybe there isn't a switch that says "content version",but
| from end user perspective it is a new version. Whether it
| was a content change, or just a fix for typo in
| documentation (say) the change being pushed is different
| than what currently exists.And for the end user the
| configuration implies that they have a chance to decide
| whether to accept any new change being pushed or not.
| SoftTalker wrote:
| > If your content updates can break clients
|
| This is going to be what most customers did not realize. I'm
| sure Crowdstrike assured them that content updates were
| completely safe "it's not a change to the software" etc.
|
| Well they know differently now.
| miohtama wrote:
| I was immediately willing to bet a hundred quid this was C/C++
| code :)
| formerly_proven wrote:
| Not that interesting a bet considering we know it's a Windows
| driver.
| fire_lake wrote:
| Yes indeed. If you are doing this kind of job, reach for a
| parser generator framework and fuzz your program.
|
| Also go read Parse Don't Validate https://lexi-
| lambda.github.io/blog/2019/11/05/parse-don-t-va...
| teeheelol wrote:
| Yep.
|
| Looking at how this whole thing is pasted together, there's
| probably a regex engine in one of those sys files somewhere
| that was doing the "parsing"...
| lolinder wrote:
| > reach for a parser generator framework and fuzz your
| program
|
| I agree to the second but disagree on the first. Parser
| generator frameworks produce a lot of code that is hard to
| read and understand and they don't necessarily do a better
| job of error handling than you would. A hand-written
| recursive descent parser will usually be more legible, will
| clearly line up with the grammar that you're supposed to be
| parsing, and will be easier to add _better_ error handling
| to.
|
| Once you're aware of the risks of a bad parser you're halfway
| there. Write a parser with proper parsing theory in mind and
| in a language that forces you to handle all cases. Then fuzz
| the program, turn bad inputs that turn up into permanent
| regression tests, and write your own tests with your
| knowledge of the inner workings of your parser in mind.
|
| This isn't like rolling your own crypto because the
| alternative isn't a battle-tested open source library, it's a
| framework that generates a brand new library that only you
| will use and maintain. If you're going to end up with a
| bespoke library anyway, you ought to understand it well.
| mtlmtlmtlmtl wrote:
| There's at least five different things that went wrong
| simultaneously.
|
| 1. Poorly written code in the kernel module crashed the whole
| OS, and kept trying to parse the corrupted files, causing a
| boot loop. Instead of handling the error gracefully and
| deleting/marking the files as corrupt.
|
| 2. Either the corrupted files slipped through internal testing,
| or there is no internal testing.
|
| 3. Individual settings for when to apply such updates were
| apparently ignored. It's unclear whether this was a glitch or
| standard practice. Either way I consider it a bug(it's just a
| matter of whether it's a software bug or a bug in their
| procedures).
|
| 4. This was pushed out everywhere simultaneously instead of
| staggered to limit any potential damage.
|
| 5. Whatever caused the corruption in the first place, which is
| anyone's guess.
| rwmj wrote:
| Zero effort to fuzz test the parser too. I mean, we _know_
| how to harden parsers against bugs and attacks, and any semi-
| competent fuzzer would have caught such a trivial bug.
| chrisjj wrote:
| The triggering file was all zeros.
|
| It is not possible that only this pattern caused the crash,
| and fuzzing omitted to try this unfuzzy pattern?
| gliptic wrote:
| No, it wasn't. Crowdstrike denied it had to do with zeros
| in the files.
| jojobas wrote:
| At this point I wouldn't be paying too much attention to
| what Crowdstrike is saying.
| hello_moto wrote:
| Have to speak the truth albeit at minimum, in case
| legal...
| kchr wrote:
| Which also explains why they, only if needed to cover
| their back legally, confirm or deny details being shared
| on social and mass media.
| watwut wrote:
| Possible? Yes. Likely? No.
| monsieurbanana wrote:
| In my limited experience, I thought any serious fuzzing
| program does test for all "standard" patters like only
| null bytes, empty strings, etc...
| formerly_proven wrote:
| Instrumented fuzzing (like AFL and friends) tweaks the
| input to traverse unseen code paths in the target, so
| they're super quick to find stuff like "heyyyyy, nobody
| is actually checking if this offset is in bounds before
| loading from that address".
| omeid2 wrote:
| The files in question has a magic number is "0xAAAAAAAA"
| so it is not possible that the file was all zeros.
| Retr0id wrote:
| Competent fuzzers don't just use random bytes, they
| systematically explore the state-space of the target
| program. If there's a crash state to be found by feeding
| in a file full of null bytes, it's probably going to be
| found quickly.
|
| A fun example is that if you point AFL at a JPEG parser,
| it will eventually "learn" to produce valid JPEG files as
| test cases, without ever having been told what JPEG file
| is supposed to look like.
| https://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-
| of-th...
| rwmj wrote:
| AFL is really "magical". It finds bugs very quickly and
| with little effort on our part except to leave it running
| and look at the results occasionally. We use it to fuzz
| test a variety of file formats and network interfaces,
| including QEMU image parsing, nbdkit, libnbd, hivex. We
| also use clang's libfuzzer with QEMU which is another
| good fuzzing solution. There's really no excuse for
| CrowdStrike not to have been using fuzzing.
| layer8 wrote:
| No, it wasn't all zeros:
| https://x.com/patrickwardle/status/1814782404583936170
| mavhc wrote:
| AV software is a great target for malware, badly written,
| probably runs too much stuff in the kernel, tries to parse
| everything
| Comfy-Tinwork wrote:
| And at the very least straight to system level access if
| not more.
| MyFedora wrote:
| Anti-cheats also whitelist legit AV drivers, even though
| cheaters exploit them to no end.
| londons_explore wrote:
| AV software needs kernel privilidges to have access to
| everything it needs to inspect, but the actual inspection
| of that data should be done with no privilidges.
|
| I think most AV companies now have a helper process to do
| that.
|
| If you successfully exploit the helper process, the worst
| damage you ought to be able to do is falsely find files
| to be clean.
| jatins wrote:
| You are seriously overestimating the engineering practises
| at these companies. I have worked in "enterprise security"
| previously though not at this scale. In a previous life I
| worked with of the engineering leaders currently at
| Crowdstrike.
|
| I'll bet you this company has some arbitrary unit test
| coverage requirements for PRs which developers game be
| mocking the heck out of dependencies. I am sure they have
| some vanity sonarqube integration to ensure great "code
| quality". This likely also went through manual QA.
|
| However I am sure the topic of fuzz testing would not have
| come up once. These companies sell checkbox compliance, and
| they themselves develop their software the same way.
| Checking all the "quality engineering" boxes with very
| little regards for long term engineering initiatives that
| would provide real value.
|
| And I am not trying to kick Crowdstrike when they are down.
| It's the state of any software company run by suits with
| myopic vision. Their engineering blogs and their codebases
| are poles apart.
| simonh wrote:
| There is a story out that the problem was introduced in a
| post processing step after testing. That makes more sense
| than that there was no testing. If true it means they thought
| they'd tested the update, but actually hadn't.
| hulitu wrote:
| 6. No development process, no testing.
| krisoft wrote:
| How is that different from point 2?
| ratorx wrote:
| I'd also maybe add another one on the Windows end:
|
| 6) some form of sandboxing/error handling/api changes to make
| it possible to write safer kernel modules (not sure if it
| already exists and was just not used). It seems like the
| design could be better if a bad kernel module can cause a
| boot loop in the OS...
| leosarev wrote:
| There is sandboxing API in Windows. It's called running
| programs in userspace.
| hello_moto wrote:
| Run what a userspace?
| layer8 wrote:
| It's a tough problem, because you also don't want the
| system to start without the CrowdStrike protection. Or more
| generally, a kernel driver is supposedly installed for a
| reason, and presumably you don't want to keep the system
| running if it doesn't work. So the alternative would be to
| shut down the system upon detection of the faulty driver
| without rebooting, which wouldn't be much of an improvement
| in the present case.
| ratorx wrote:
| I can imagine better defaults. Assuming the threat vector
| is malicious programs running in userspace (probably
| malicious programs in kernel space is game over anyway
| right?), then you could simply boot into safe mode or
| something instead of crashlooping.
|
| One of the problems with this outage was that you
| couldn't even boot into safe mode without having the bit
| locker recovery key.
| layer8 wrote:
| You don't want to boot into safe mode with networking
| enabled if the software that is supposed to detect
| attacks from the network isn't running. Safe mode doesn't
| protect you from malicious code in userspace, it only
| "protects" you from faulty drivers. Safe mode is for
| troubleshooting system components, not for increasing
| security.
|
| I don't know the exact reasoning why safe mode requires
| the BitLocker recovery key, but presumably not doing so
| would open up an attack vector defeating the BitLocker
| protection.
| Uvix wrote:
| Normally BitLocker gets the key from the TPM, which will
| have its own driver that's likely disabled in Safe Mode.
| discostrings wrote:
| The BitLocker configurations I've seen over the last few
| days don't require the recovery key to enter safe mode.
| sm_1024 wrote:
| Doesn't microsoft support eBPF on Windows?
|
| https://github.com/microsoft/ebpf-for-windows
| dartos wrote:
| Bugs happen.
|
| Not staggering the updates is what blew my mind.
| londons_explore wrote:
| Since the issue manifested at 04:09 UTC, which is 11pm
| where Crowdstrikes HQ is, I would guess someone was working
| late at night and skipped the proper process so they could
| get the update done and go to bed.
|
| They probably considered it low risk, had done similar
| things of times hundreds of times before, etc.
| dartos wrote:
| > They probably considered it low risk
|
| Wild that anyone would consider anything in the "critical
| path" low risk. I would bet that they just don't do
| rolling releases normally since it never caused issues
| before.
| hello_moto wrote:
| Companies these days are global btw.
|
| Not everyone is working on the same timezone.
| londons_explore wrote:
| They don't appear to have engineering jobs in any
| location where that would be considered regular office
| hours...
| hello_moto wrote:
| https://crowdstrike.wd5.myworkdayjobs.com/crowdstrikecare
| ers
|
| I see remote, Israel, Canada.
|
| https://crowdstrike.wd5.myworkdayjobs.com/en-
| US/crowdstrikec...
|
| This one specifically Spain and Romania
|
| I know they bought companies all over the globe from
| Denmark to other locations.
| londons_explore wrote:
| 0409UTC is 07:09 AM in Israel. Doubt an engineer was
| doing a push then either...
|
| All the other engineering locations seem even less
| likely.
| vitus wrote:
| On Friday, no less. (Israel's weekend is Friday /
| Saturday instead of the usual Saturday / Sunday.)
| kchr wrote:
| A good reminder of the fact that your Thursday might be
| someone else's Friday.
| rco8786 wrote:
| Number 4 continues to be the most surprising bit to me. I
| could not fathom having a process that involves deploying to
| 8.5 million remote machines simultaneously.
|
| Bugs in code I can almost always understand and forgive, even
| the ones that seem like they'd be obvious with hindsight. But
| this is just an egregious lack of the most basic rollout
| standards.
| gitfan86 wrote:
| They probably don't get to claim agile story points until
| the ticket is in finished state. And they probably have a
| culture where vanity Metrics like "velocity" are
| prioritized
| nmg wrote:
| This would answer the question that i've not heard anyone
| asking:
|
| what incentivized the bad decisions that led to this
| catastrophic failure?
| phs318u wrote:
| My understanding is that the culture (as reported by some
| customers) is quite aggressive and pushy. They are quite
| vocal when customers don't turn in automatic updates.
|
| It makes sense in a way - given their fast growth
| strategy (from nowhere to top 3) and desire to "do things
| differently" - the iconoclast upstarts that redefine the
| industry.
|
| Or to summarise - hubris.
| hello_moto wrote:
| To catch 0day quickly, EDR needs to know "how".
|
| The "how" here is AV definition or a way to identify the
| attack. In CS-speak: content.
|
| Catching 0day quickly results in good reputation that
| your EDR works well.
|
| If people turn off their AV definition auto-update, they
| are at-risk. Why use EDR if folks don't want to stop
| attack quickly?
| LtWorf wrote:
| In theory you're correct. In practice it seems that
| crowdstrike has crashed systems with their updates much
| more often than 0day attacks.
| 77pt77 wrote:
| > They are quite vocal when customers don't turn in
| automatic updates.
|
| I'm sorry but this is the customer's fault.
|
| If I'm using your services you work for me and you don't
| get to bully me into doing whatever you think needs to be
| done.
|
| People that chose this solution need to be penalized, but
| they won't.
| mbreese wrote:
| Customers don't always have a choice here. They could be
| restricted by compliance programs (PCI, et al) and be
| required under those terms to have auto updates on.
|
| Compliance also has to share some of the blame here, if
| best practices (local testing) aren't allowed to be
| followed in the name of "security".
| nerdjon wrote:
| This needs to keep being repeated anytime someone wants
| to blame the company.
|
| Many don't have a choice, a lot of compliance is doing x
| to satisfy a checkbox and you don't have a lot of
| flexibility in that or you may not be able to things like
| process credit cards which is kinda unacceptable
| depending on your company. (Note: I didn't say all)
|
| CrowdStrike automatic update happens to satisfy some of
| those checkboxes.
| cruffle_duffle wrote:
| Oh the games I have to play with story points that have
| personal performance metrics attached to them. Splitting
| tickets to span sprints so there aren't holes in some
| dudes "effort" because they didn't compete some task they
| committed to.
|
| I never thought such stories were real until I
| encountered them...
| thundershart wrote:
| Surely, CrowdStrike's safety posture for update rollouts is
| in serious need of improvement. No argument there.
|
| But is there any responsibility for the clients consuming
| the data to have verified these updates prior to taking
| them in production? I haven't worn the sysadmin hat in a
| while now, but back when I was responsible for the upkeep
| of many thousands of machines, we'd never have blindly
| consumed updates without at least a basic smoke test in a
| production-adjacent UAT type environment. Core OS updates,
| firmware updates, third party software, whatever -- all of
| it would get at least some cursory smoke testing before
| allowing it to hit production.
|
| On the other hand, given EDR's real-world purpose and the
| speed at which novel attacks propagate, there's probably a
| compelling argument for always taking the latest
| definition/signature updates as soon as they're available,
| even in your production environments.
|
| I'm certainly not saying that CrowdStrike did nothing wrong
| here, that's clearly not the case. But if conventional
| wisdom says that you should kick the tires on the latest
| batch of OS updates from Microsoft in a test environment,
| maybe that same rationale should apply to EDR agents?
| stoolpigeon wrote:
| I think point 3 of the grand parent indicates admins were
| not given an opportunity to test this.
|
| My company had a lot of Azure vms impacted by this and
| I'm not sure who the admin was who should have tested it.
| Microsoft? I don't think we have anything to do with
| crowdstrike software on our vms. ( I think - I'm sure
| I'll find out this week.)
|
| Edit: I just learned the Azure central region failure
| wasn't related to the larger event - and we weren't
| impacted by the crowd strike issue - I didn't know it was
| two different things. So my second part of the comment is
| irrelevant.
| thundershart wrote:
| Oh, I'd missed point #3 somehow. If individual consumers
| weren't even given the opportunity to test this, whether
| by policy or by bug, then ... yeesh. Even worse than I'd
| thought.
|
| Exactly which team owns the testing is probably left up
| to each individual company to determine. But ultimately,
| if you have a team of admins supporting the production
| deployment of the machines that enable your business,
| then someone's responsible for ensuring the availability
| of those machines. Given how impactful this CrowdStrike
| incident was, maybe these kinds of third-party auto-
| update postures need to be reviewed and potentially
| brought back into the fold of admin-reviewed updates.
| kiitos wrote:
| > But is there any responsibility for the clients
| consuming the data to have verified these updates prior
| to taking them in production
|
| In the boolean sense, yes. United Airlines (for example)
| is ultimately responsible for their own production
| uptime, so any change they apply without validation is a
| risk vector.
|
| In pragmatic terms, it's a bit fuzzier. Does CrowdStrike
| provide any _practical_ way for customers to validate,
| canary-deploy, etc. changes before applying them to
| production? And not just changes with type=important, but
| _all_ changes? From what I understand, the answer to that
| question is no, at least for the type=channel-update
| change that triggered this outage. In which case I think
| the blame ultimately falls almost entirely on
| CrowdStrike.
| cozzyd wrote:
| Arguably United airlines shouldn't have chosen a product
| they can't test updates of, though maybe there are no
| good options.
| suzzer99 wrote:
| Yeah one of the major problems seems to be CrowdStrike's
| assumptions that channel files are benign. Which isn't
| true if there's a bug in your code that only gets
| triggered by the right virus definition.
|
| I don't know how you could assert that this is
| impossible, hence channel files should be treated as
| code.
| thundershart wrote:
| > From what I understand, the answer to that question is
| no, at least for the type=channel-update change that
| triggered this outage. In which case I think the blame
| ultimately falls almost entirely on CrowdStrike.
|
| Honestly, it hadn't even occurred to me that software
| like this marketed at enterprise customers _wouldn 't_
| have this kind of control already available. It seems
| like an obvious thing that any big organization would
| insist on that I just took it for granted that it
| existed.
|
| Whoops.
| volkl48 wrote:
| It's not an option. While the admins at the customer have
| the ability to control when/how revisions of the client
| software go out (and this, can + generally do their own
| testing, can decide to stay one rev back as default,
| etc), there is no control over updates to the kind of
| update/definition files that were the primary cause here.
|
| Which is also why you see every single customer affected
| - what you are suggesting is simply not an available
| thing to do at present for them.
|
| At least for now - I imagine that some kind of
| staggered/slowed/ringed option will have to be
| implemented in the future if they want to retain
| customers.
| mbreese wrote:
| For me, number 1 is the worst of the bunch. You should
| always expect that there will be bugs in processes, input
| files, etc... the fact that their code wasn't robust enough
| to recognize a corrupted file and not crash is inexcusable.
| Especially in kernel code that is so widely deployed.
|
| If any one of the five points above hadn't happened, this
| event would have been avoided. However, if number 1 had
| been addressed - any of the others could have happened (or
| all at the same time) and it would have been fine.
|
| I understand that we should assume that bugs will be
| present anywhere, which is why staggered deployments are
| also important. If there had been staggered deployments,
| the. The damage would have happened, but it would have been
| localized. I think security people would argue against a
| staged deployment though, as if it were discovered what the
| new definitions protected against, an exploit could be
| developed quickly to put those servers that aren't in the
| "canary" group at risk. (At least in theory -- I can't see
| how staggering deployment over a 6-12 hour window would
| have been that risky).
| timmytokyo wrote:
| They're all terrible, but I agree #1 is particularly
| egregious for a company ostensibly dedicated to security.
| A simple fuzz tester would have caught this type of bug,
| so they clearly don't perform even a minimal amount of
| testing on their code.
| nsguy wrote:
| Totally agree. Not only would a coverage guided fuzzer
| catch this they should also be adding every single file
| they send out to the corpus of that automated fuzz
| testing so they can get somewhat increased coverage on
| their parser.
|
| There may not be out of the box fuzzers that test device
| drivers so you hoist all the parser code, build it into a
| stand-alone application, and fuzz that.
|
| Likely this is a form of technical debt since I can
| understand not doing all of this day #1 when you have 5
| customers but at some point as you scale up you need to
| change the way you look at risk.
| jayd16 wrote:
| You admit that bugs are inevitable and then claim a bug
| free parser as the most important bullet. That seems
| flawed to me. It would certainly be nice, but is that
| achievable?
|
| Policy changes seem more reliable and would catch other,
| as of yet unknown classes of bugs.
| throwaway5752 wrote:
| I disagree. Has to be 4, something will always go wrong,
| so you have to deliver in cohorts.
|
| That goes equally if it was a Windows Update rolled out
| in one motion that broke the falcon agent/driver, or if
| it was Crowdstrike. There is almost no excuse for a
| global rollout without telemetry checks, whether it's
| security agent updates or os patches.
| layer8 wrote:
| Malware signature updates are supposed to be deployed ASAP,
| because every minute may count when a new attack is
| spreading. The mistake may have been to apply that policy
| indiscriminately.
| mrbombastic wrote:
| And here I thought shipping a new version on the app store
| was scary.
|
| Is there anything we can take from other
| professions/tradecraft/unions/legislation to ensure shops
| can't skip the basic best practices we are aware of in the
| industry like staged rollouts? How do we set incentives to
| prevent this? Seriously the App Store was raking in $$ from
| us for years with no support for staged rollouts and no
| other options.
| avree wrote:
| A lot of snarky replies to this comment, but the reality is
| that if you were selling an anti-virus, identified a
| malicious virus, and then chose not to update millions of
| your machines with that virus's signature, you'd also be in
| the wrong.
| naasking wrote:
| > identified a malicious virus, and then chose not to
| update millions of your machines with that virus's
| signature, you'd also be in the wrong.
|
| No, for exactly the reason we just saw, and the same
| reason why vaccines are tested before widespread rollout.
| aforwardslash wrote:
| On the other hand, diseases vaccines prevent dont have
| almost instantaneous propagation, thats why they are
| effective at containing propagation.
|
| As an example, reaction time is paramount to counter many
| kinds of attacks - thats why blocklists are so popular,
| and AS blackholing is a viable option.
| VirusNewbie wrote:
| > But this is just an egregious lack of the most basic
| rollout standards.
|
| Agreed. It's crazy that the top tech companies enforce this
| in a biblical fashion, despite all sorts of pressure to
| ship and all that. Crowdstrike went YOLO at a _global_
| scale.
| alsetmusic wrote:
| I worked at one of the big ones and we always shipped live
| to all consumer devices at the same time. But this was for
| a popular suite of products that generate a lot of consumer
| demand, so we had a rigorous QA process to make sure this
| wouldn't be a problem. As I was typing this, it occurred to
| me that zero people would have cared if this update was
| staggered making it pretty silly not to.
| trhway wrote:
| As the QA manager said in our recent product meeting -
| "as the canary doesn't work we roll out and test on the
| production cloud".
| robomc wrote:
| I wonder if there's a concern that staggering the malware
| signatures would open them up to lawsuits if somebody was
| hacked in between other customers getting the data and them
| getting the data.
| thundershart wrote:
| > I wonder if there's a concern that staggering the
| malware signatures would open them up to lawsuits if
| somebody was hacked in between other customers getting
| the data and them getting the data.
|
| I'd assume that sort of thing would be covered in the
| EULA and contract -- but even if it weren't, it seems
| like allowing customers to define their own definition
| update strategy would give them a pretty compelling
| avenue to claim non-liability. If CrowdStrike can
| credibly claim "hey, we made the definitions available,
| you chose to wait for 2 weeks to apply them, that's on
| you", then it becomes much less of a concern.
| rainsford wrote:
| > 2. Either the corrupted files slipped through internal
| testing, or there is no internal testing.
|
| This is the most interesting question to me because it
| doesn't seem like there is an obviously guessable answer. It
| seem very unlikely to me that a company like CrowdStrike
| pushes out updates of any kind without doing some sort of
| testing, but the widespread nature of the outage would also
| seem to suggest any sort of testing setup should have caught
| the issue. Unless it's somehow possible for CrowdStrike to
| test an update that was different than what was deployed,
| it's not obvious what went wrong here.
| bloopernova wrote:
| I had read somewhere that the definition file was corrupted
| after testing, during the final CI/CD pipeline.
| shrimp_emoji wrote:
| Well, Microsoft led by example with #2:
| https://news.ycombinator.com/item?id=20557488
| pclmulqdq wrote:
| Number 4 is what everyone will fixate on, but I have the
| biggest problem with number 1. Anything like this sort of
| file should have (1) validation on all its pointers and (2)
| probably >2 layers of checksumming/signing. They should
| generally expect these files to get corrupted in transit once
| in a while, but they didn't seem to plan for anything other
| than exactly perfect communication between their intent and
| their _kernel driver_.
| dcuthbertson wrote:
| I wonder if it was pushed anywhere that didn't crash, as an
| extension of "It works on my machine. Ship it!"
|
| I've built a couple of kernel drivers over the years and what
| I know is that ".sys" files are to the kernel as ".dll" files
| are to user-space programs in that the ones with code in them
| run only after they are loaded and a desired function is run
| (assuming boilerplate initialization code is good).
|
| I've never made a data-only .sys file, but I don't see why
| someone couldn't. In that case, I'd guess that no one ever
| checked it was correct, and the service/program that loads it
| didn't do any verification either -- why would it, the
| developers of said service/program would tend to trust their
| own data .sys file would be valid, never thinking they'd
| release a broken file or consider that files sometimes get
| corrupted -- another failure mode waiting to happen on some
| unfortunate soul's computer.
| kchr wrote:
| The file extension is `sys` by convention, it's nothing
| magical to it and it's not handled in any special way by
| the OS. In the case of CrowdStrike, there seems to be some
| confusion as to why they use this file extension since it's
| only supposed to be a config/data file to be used by the
| real kernel driver.
| dcuthbertson wrote:
| Thanks. I understand that '.sys' is a naming convention.
| I'd guess that they used it because those config/data
| files are used by their kernel driver, and so makes
| kernel vs user-space files easier to distinguish.
| cynicalsecurity wrote:
| I'm betting on them having no internal testing.
| fhub wrote:
| #1 could be slit into two parts I think. Microsoft kernel
| side and CloudStrike module side.
| LtWorf wrote:
| > 4. This was pushed out everywhere simultaneously instead of
| staggered to limit any potential damage.
|
| Most importantly it was never tested at all :D
| spike021 wrote:
| 6. Companies using CS have no testing to verify that new
| updates won't break anything.
|
| Any SWE job I've worked over my entire career, nothing is
| deployed with new versions of dependencies without testing
| them against a staging environment first.
| strunz wrote:
| Crowdstrike doesn't give that option. Updates happen
| without a choice to "keep you safe".
| bryant wrote:
| Of all of these, I think #3 has crowdstrike the most exposed,
| legally. Companies with robust update and config management
| protocols got burned by this as well, including places like
| hospitals and others with mission critical systems where
| config management is more strictly enforced.
|
| If the crowdstrike selloff continues, I'm betting this will
| be why.
|
| (There's a chance I'll make trading decisions based on this
| rationale in the next 72 hours, though I'm not certain yet)
| cratermoon wrote:
| > Individual settings for when to apply such updates were
| apparently ignored.
|
| I've heard that said elsewhere, but I haven't found a source
| for it at all. Are you able to point to one for me?
| bradley13 wrote:
| No bet. There are two failures here. (1) Failing to check the
| data for validity, and (2) Failing to handle an error
| gracefully.
|
| Both of these are undergraduate-level techniques. Heck, they
| are covered in most first-semester programming courses. Either
| of these failures is inexcusable in a professional product,
| much less one that is running with kernel-level privileges.
|
| Bet: CrowdStrike has outsourced much of its development work.
| ahoka wrote:
| What do you mean by outsourced?
| Rinzler89 wrote:
| He probably means work was sent offshore to offices with
| cheaper labor that's less skilled or less vested into
| delivering quality work. Though there's no proof of that
| yet, people just like to throw the blame on offshoring
| whoever $BIG_CORP fucks up, as if all programmers in the US
| are John Carmack and they can never cause catastrophic
| fuckups with their code or processes.
| jojobas wrote:
| Not everyone in the US might be Carmack, but it's
| ridiculously nearsighted to assert that cultural
| differences don't play into people desire and ability to
| Do It Right.
| Rinzler89 wrote:
| It's not cultural differences that make the difference in
| output quality, it's pay and quality standards of the
| output set by the team/management, which is also mostly a
| function of pay, since underpaid and unhappy developers
| tend not to care at all beyond doing the bare minimum to
| not getting fired (#notmyjob, laying flat movement, etc).
|
| You think everyone writing code in the US would give two
| shits about the quality of their output if they see the
| CEO pocketing another private jet while they can barley
| make big-city rent?
|
| Hell, even well paid devs at top companies in the US can
| be careless and lazy if their company doesn't care about
| quality. Have you seen some of the vulnerabilities and
| bugs that make it into the Android source code and on
| Pixel devices? And guess what, that code was written by
| well paid developers in the US, hired at Google leetcode
| standards, yet would give far-east sweatshops a run for
| their money in terms of carelessness. It's what you get
| when you have a high barrier of entry but a low barrier
| of output quality where devs just care about "rest and
| vest".
| bradley13 wrote:
| I was talking about outsourcing (and not necessarily
| offshoring). Too many companies like CrowdStrike are run
| by managers who think that management, sales, and
| marketing are the important activities. Software
| development is just an unpleasant expense that needs to
| be minimized. Hence: outsourcing.
|
| That said, I have had some experience with classic
| offshoring. Cultural differences make a huge difference!
|
| My experience with "typical" programmers from India,
| China, et al is that they do _exactly_ what they are
| told. Their boss makes the design decisions down to the
| last detail, and the "programmers" are little more than
| typists. I specifically remember one sweatshop where the
| boss looped continually among the desks, giving each
| person very specific instructions of what they were to do
| next. The individual programmers implemented his
| instructions literally, with zero thought and zero
| knowledge of the big picture.
|
| Even if the boss was good enough to actually keep the big
| picture of a dozen simultaneous activities in his head,
| his non-thinking minions certainly made mistakes. I have
| no idea how this all got integrated and tested, and I
| probably don't want to know.
| Rinzler89 wrote:
| _> That said, I have had some experience with classic
| offshoring. Cultural differences make a huge difference!
|
| _
|
| Sure but there's no proof yet that was the case here.
| That's just masive speculations based on anecdotes on
| your side. There's plenty of offshore devs that can run
| rings around western devs.
| Spooky23 wrote:
| Staff trained at outsourcers have a different type of
| focus. My experience is more operational, and usually the
| training for those guys is about restoration to hit SLA,
| period. Makes root cause harder to ID sometimes.
|
| It doesn't mean 'Murica better, just that the origin
| story of staff matter, especially if you don't have good
| processes around things like rca.
| jojobas wrote:
| Western slacker movements never came close to deadma or
| the dedicated indifference in the face of samsara. You
| seem to have a lot of experience with the former and
| little of the latter two, but what do I know.
|
| Every stereotype exists for a reason.
| ahoka wrote:
| Offshoring and outsourcing is very different. It would be
| also very hard to talk about offshoring at a company
| claiming to provider services in 170 countries.
| spotplay wrote:
| It's probably just the common US-centric bias that external
| development teams, particularly those overseas, may deliver
| subpar software quality. This notion is often veiled under
| seemingly intellectual critiques to avoid overt xenophobic
| rhetoric like "They're taking our jobs!".
|
| Alternatively, there might be a general assumption that
| lower development costs equate to inferior quality, which
| is a flawed yet prevalent human bias.
| chuckadams wrote:
| "You get what you pay for" is still a reasonable metric,
| even if it is more a relative scale than an absolute one.
| danielPort9 wrote:
| > Either of these failures is inexcusable in a professional
| product
|
| Don't we have those kind of failures in almost every
| professional product? I've been working in the industry for
| over a decade and in every single company we had those bugs.
| The only difference was that none of those companies were
| developing kernel modules or whatever. Simple saas. And no,
| none of the bugs were outsourced (the companies I worked for
| hired only locals and people in the range of +- 2h time zone)
| variadix wrote:
| More or less. Binary parsers are the easiest place to find
| exploits because of how hard it is to do correctly. Bounds
| checks, overflow checks, pointer checks, etc. Especially when
| the data format is complicated.
| praptak wrote:
| > imperative languages allow us to do so
|
| This problem has a promising solution, WUFFS, "a memory-safe
| programming language (and a standard library written in that
| language) for Wrangling Untrusted File Formats Safely."
|
| HN discussion: https://news.ycombinator.com/item?id=40378433
|
| HN discussion of Wuffs implementation of PNG parser:
| https://news.ycombinator.com/item?id=26714831
| noobermin wrote:
| So, I also have near zero cybersecurity expertise (I took an
| online intro course on cryptography due to curiousity) and no
| expertise in writing kernel modules actually, but why if ever
| would you parse an array of pointers...in a file...instead of
| any other way of serializing data that doesn't include
| hardcoded array offsets in an on-disk file...
|
| Ignore this failure which was catastrophic, this was a bad
| design asking to be exploited by criminals.
| Jare wrote:
| Performance, I assume. Right now it may look like the wrong
| tradeoff, but every day in between incidents like this we're
| instead complaining that software is slow.
|
| Of course it doesn't have to be either/or; you can have fast
| + secure, but it costs a lot more to design, develop,
| maintain and validate. What you can't have is a "why don't
| they just" simple and obvious solution that makes it cheap
| without making it either less secure, less performant, or
| both.
|
| Given all the other mishaps in this story, it is very well
| possible that the software is insecure (we know that), slow
| and also still very expensive. There's a limit to how high
| you can push the triangle, but there's not bottom to how bad
| it can get.
| deaddodo wrote:
| I'm curious, how else would you store direct memory offsets?
| No matter how you store/transmit them, eventually you're
| going to need those same offsets.
|
| The problem wasn't storing raw memory offsets, it was not
| having some way to validate the data at runtime.
| lol768 wrote:
| > I'm happy to up the ante by PS50 to account for my second
| theory
|
| What's that, three pints in a pub inside the M25? :P
|
| Completely agree with this sentiment though, we've known that
| handling of binary data in memory unsafe languages has been
| risky for yonks. At the very least, fuzzing should've been
| employed here to try and detect these sorts of issues. More
| fundamentally though, where was their QA? These "channel files"
| just went out of the door without any idea as to their
| validity? Was there no continuous integration check to just ..
| ensure they parsed with the same parser as was deployed to the
| endpoints? And why were the channel files not deployed
| gradually?
| TeMPOraL wrote:
| FWIW, before someone brings up JSON, GP's bet only makes
| sense when "binary" includes parsing text as well. In fact,
| most notorious software bugs are related to misuse of textual
| formats like SQL or JS.
| 1992spacemovie wrote:
| Interesting observation. As a non-developer, what can one do to
| enhance coverage for these types of scenerios? Fuzz testing?
| rwmj wrote:
| Fuzz testing absolutely should be used whenever you parse
| anything.
| SoftTalker wrote:
| Yeah, even if you are only parsing "safe" inputs such as
| ones you created yourself. Other bugs and sometimes even
| truly random events can corrupt data.
| throw0101d wrote:
| > _Approximately_ 100% _of CVEs, crashes, bugs, slowdowns, and
| pain points of computing have to do with various forms of
| deserialising binary data back into machine-readable data
| structures._
|
| For the record, the top 25 common weaknesses for 2023 are
| listed at:
|
| *
| https://cwe.mitre.org/top25/archive/2023/2023_top25_list.htm...
|
| Deserialization of Untrusted Data (CWE-502) was number fifteen.
| Number one was Out-of-bounds Write (CWE-787), Use After Free
| (CWE-416) was number four.
|
| CWEs that have been in every list since they started doing this
| (2019):
|
| *
| https://cwe.mitre.org/top25/archive/2023/2023_stubborn_weakn...
| lioeters wrote:
| # Top Stubborn Software Weaknesses (2019-2023)
|
| Out-of-bounds Write
|
| Improper Neutralization of Input During Web Page Generation
| ('Cross-site Scripting')
|
| Improper Neutralization of Special Elements used in an SQL
| Command ('SQL Injection')
|
| Use After Free
|
| Improper Neutralization of Special Elements used in an OS
| Command ('OS Command Injection')
|
| Improper Input Validation
|
| Out-of-bounds Read
|
| Improper Limitation of a Pathname to a Restricted Directory
| ('Path Traversal')
|
| Cross-Site Request Forgery (CSRF)
|
| NULL Pointer Dereference
|
| Improper Authentication
|
| Integer Overflow or Wraparound
|
| Deserialization of Untrusted Data
|
| Improper Restriction of Operations within Bounds of a Memory
| Buffer
|
| Use of Hard-coded Credentials
| TeMPOraL wrote:
| Yup. Almost all of them are various flavor of fucking up a
| parser or misusing it (in particular, all the injection
| cases are typically caused by writing stupid code that
| glues strings together instead of proper parsing).
| lolinder wrote:
| That's not parsing, that's the inverse of parsing. It's
| taking untrusted data and injecting it into a string that
| will later be parsed into code without treating the data
| as untrusted and adapting accordingly. It's compiling, of
| a sort.
|
| Parsing is the reverse--taking an untrusted string (or
| binary string) that is meant to be code and converting it
| into a data structure.
|
| Both are the result of taking untrusted data and assuming
| it'll look like what you expect, but both are not parsing
| issues.
| TeMPOraL wrote:
| > _It 's taking untrusted data and injecting it into a
| string that will later be parsed into code without
| treating the data as untrusted and adapting accordingly._
|
| Which is precisely why parsing should've been used here
| instead. The correct way to do this is to work at the
| level after parsing, not before it. "SELECT * FROM foo
| WHERE bar LIKE ${untrusted input}" is dumb. Parsing the
| query with a placeholder in it, replacing it as an
| abstract node in the parsed form with data, and then
| serializing to string if needed to be sent elsewhere, is
| the correct way to do it, and is immune to injection
| attacks.
| lolinder wrote:
| For SQL we tend to use prepared statements as the answer,
| which probably do some parsing under the hood but that's
| not visible to the programmer. I'd raise a lot of
| questions if I saw someone breaking out a parser to
| handle a SQL injection risk.
| TeMPOraL wrote:
| That's because prepared statements were developed before
| understaning of langsec was mature enough. They provide a
| very simple API, but it's at (or above) the right level -
| you just get to use special symbols to mark "this node
| will be provided separately", and provide it separately,
| while the API makes sure it's correctly integrated into
| the whole according to the rules of the language.
|
| (Probably one other factor is that SQL was designed in a
| peculiar way, for "readability to non-programmers", which
| tends to result with languages that don't map well to
| simple data structures. Still, there are tools that let
| you construct a tree, and will generate a valid SQL from
| that.)
|
| HTML is a better example, because it's inherently tree-
| structured, and trees tend to be convenient to work with
| in code. There it's more obvious when you're crossing
| from dumb string to parsed representation, and then back.
| stouset wrote:
| > Number one was Out-of-bounds Write (CWE-787)
|
| Surely many of these originate from deserialization of
| untrusted data (e.g., trusting a supplied length). It's
| probably documented but I'm passively curious how they
| disambiguate these cases.
| eru wrote:
| > Approximately 100% of CVEs, crashes, bugs, slowdowns, and
| pain points of computing have to do with various forms of
| deserialising binary data back into machine-readable data
| structures. All because a) human programmers forget to account
| for edge cases, and b) imperative programming languages allow
| us to do so.
|
| I wouldn't blame imperative programming.
|
| Eg Rust is imperative, and pretty good at telling you off when
| you forgot a case in your switch.
|
| By contrast the variant of Scheme I used twenty years ago was
| functional, but didn't have checks for covering all cases. (And
| Haskell's ghc didn't have that checked turned on by default a
| few years ago. Not sure if they changed that.)
| seymore_12 wrote:
| >Approximately 100% of CVEs, crashes, bugs, slowdowns, and pain
| points of computing have to do with various forms of
| deserialising binary data back into machine-readable data
| structures. All because a) human programmers forget to account
| for edge cases, and b) imperative programming languages allow
| us to do so.
|
| This. One year ago UK air traffic control collapsed due to
| inability to properly parse "faulty" flight plan:
| https://news.ycombinator.com/item?id=37461695
| cedws wrote:
| I'd say that it is a bug by definition if your program
| ungracefully crashes when it's passed malformed data at
| runtime.
| stefan_ wrote:
| People are target fixating too much. Sure, this parser crashed
| and caused the system to go down. But in an alternative
| universe they push a definition file that rejects every
| openat() or connect() syscall. Your system is now equally as
| dead, except it probably won't even have the grace to restart.
|
| The whole concept of "we fuck with the system in kernel based
| on data downloaded from the internet" is just not very sound
| and safe.
| hello_moto wrote:
| It's not and that's the sad state of AV in Windows
| xxs wrote:
| >(for the non-British, that's PS100)
|
| next time you'd be adding /s to your posts
| back_to_basics wrote:
| "human programmers forget to account for edge cases"
|
| Which is precisely the rationale which led to Standard
| Operating Procedures and Best Practices (much like any other
| Sector of business has developed).
|
| I submit to you, respectfully, that a corporation shall never
| rise to a $75 Billion Market Cap without a bullet-proof
| adherence to such, and thus, this "event" should be properly
| characterized and viewed as a very suspicious anomaly, at the
| least
|
| https://news.ycombinator.com/item?id=41023539 fleshes out the
| proper context.
| divan wrote:
| Related talk:
|
| 28c3: The Science of Insecurity (2011)
|
| https://www.youtube.com/watch?v=3kEfedtQVOY
| nonrandomstring wrote:
| Excellent talk. So long ago and what since?
| smsm42 wrote:
| > combination of said bad binary data and a poorly-written
| parser that didn't error out correctly upon reading invalid
| data
|
| By now, if you write any parser that deals with any outside
| data and don't fuzz the heck out of it, you are willfully
| negligent. Fuzzers are pretty easy to use, automatic and would
| likely catch any such problem pretty soon. So, did they fuzz
| and got very very unlucky or do they just like to live
| dangerously?
| hannasm wrote:
| Do these customers of crowd strike even have a say in these
| updates going out or do they all just bend over and let crowd
| strike have full RCE on every machine in their enterprise.
|
| I sure hope the certificate authorities and other crypto folks
| get to keep that stuff off their systems at least.
| Centigonal wrote:
| I don't know if there's a way to outsource ongoing endpoint
| security to a third party like Crowdstrike _without_ giving
| them RCE (and ring 0 too) on all endpoints to be secured.
| Having Crowdstrike automate that part is kind of the point of
| their product.
| Kwpolska wrote:
| Auto-updates of "content" (what it thinks is malware) are
| mandatory and bypass the option to delay updates:
| https://twitter.com/patrickwardle/status/1814367918425079934
| raincole wrote:
| In our lifetime we'll see an auto update to self-driving cars
| that kills millions.
|
| Well it's likely we don't see that because we might be one of
| the millions.
| JSDevOps wrote:
| Hasn't this been debunked?
| codeulike wrote:
| 'Analysis' of the null pointer is completely missing the point.
| The simple fact of the matter is they didnt do anywhere near
| enough testing before pushing the files out. Auto update comes
| with big responsibility, this was criminally reckless
| CaliforniaKarl wrote:
| There are enough people in the world that some can examine how
| this happened while others simultaneously examine why this
| happened.
| mkl95 wrote:
| How feasible would it be to implement blue green deployments in
| that kind of system?
| hatsunearu wrote:
| So was the totally empty channel file just a red herring?
| Kwpolska wrote:
| I think the file with all zeros was the fix that CS pushed out
| after they learned of their mistake.
| donatj wrote:
| I am genuinely curious what their CI process that passed this
| looks like, as well as if they're doing any sort of dogfooding or
| manual QA? Are changes just CI/CD'd out to production right away?
| webprofusion wrote:
| The girl on the supermarket checkout said she hoped her computer
| wouldn't be affected. I knowingly laughed and said "you probably
| don't have on your own computer unless your a bank".
|
| She said, "I installed it before for my cybersecurity course but
| I think it was just a trial"
|
| Assumptions eh.
| hatsunearu wrote:
| I see a paradox that the null bytes are "not related" to the
| current situation and yet deleting the file seems to cure the
| issue. Perhaps the CS official statement that "This is not
| related to null bytes contained within Channel File 291 or any
| other Channel File." is poorly worded.
|
| My opinion is that CS is trying to say the null bytes themselves
| aren't the actual root cause of the issue, but merely a trigger
| for the actual root cause, which is that CSAgent.sys has a
| problem where malformed input vectors can cause it to crash. Well
| designed programs should error out gracefully for foreseeable
| errors, like corrupted config files.
|
| If we interpret that quoted sentence such that "this" is
| referring to "the logical error", and that "the logical error" is
| the error in CSAgent.sys that causes it to crash upon reading a
| bad channel file, then that statement makes sense.
|
| This is a bit of a stretch, but so far my impression with CS
| corporate communication regarding this issue has been nothing but
| abject chaos, so this is totally on-brand for them.
| chrisjj wrote:
| > My opinion is that CS is trying to say the null bytes
| themselves aren't the actual root cause of the issue, but
| merely a trigger for the actual root cause,
|
| My opinion is they say "unrelated" because they are trying to
| say unrelated - and hence no, this was not a trigger.
| hatsunearu wrote:
| Then are the null bytes just a coincidence? Why does deleting
| it fix the issue then, and why is it that it is missing the
| 0xAAA... file signature?
| peter_retief wrote:
| I don't do windows either.
| throwyhrowghjj wrote:
| This is a pretty brief 'analysis'. The poster traces back one
| stack frame in assembler, it basically amounts to just reading
| out a stack dump from gdb. It's a good starting point I guess.
| siscia wrote:
| The thing I don't understand about all of this is another, much
| less technical and much more important.
|
| Why the blas radius was so huge?
|
| I have deployed much less important services much more slowly
| with automatic monitoring and rollback in place.
|
| You first deploy to beta, where you don't get customers traffic,
| if everything goes right to a small part of your fleet, and
| slowly increase the percentage of hosts that receives the
| updates.
|
| This would have stopped the issue immediately, and I somehow I
| thought it was common practices...
| moogly wrote:
| They don't seem to dogfood their own software. They don't seem
| to think it's very useful software in their own org, I guess.
| INTPenis wrote:
| Considering the impact this incident had they definitely should
| have a large staging environment of windows clients to deploy
| first.
|
| There are so many ways to avoid this issue, or at least
| minimize the risk of it happening, but as always profits come
| before people.
| andy81 wrote:
| Even if there was a canary release process for code updates,
| the config updates seem to have been on a separate channel.
|
| The expectation being that people want up-to-date virus
| detection rules constantly even if they don't want potentially
| breaking changes.
|
| The missed edge case being an untested config that breaks
| existing code.
|
| Source: Pure speculation, don't quote this in news articles.
| vbezhenar wrote:
| It wasn't software update. It was signature database update.
| It's supposed to roll out as fast as possible. When you learn
| about new virus, it's already in the wild, so every minute
| counts. You don't want to delay update for a day just to find
| out that your servers were breached 20 hours ago.
| TeMPOraL wrote:
| We can see clearly now that this is a stupid approach.
| Viruses don't move that fast.
|
| This situation is akin to the immune system overreacting and
| melting the patient in response to a papercut. This sometimes
| happens, but it's considered a serious medical condition, and
| I believe the treatment is to nuke someone's immune system
| entirely with hard radiation, and reinstall a less aggressive
| copy. Take from that analogy what you want.
| orf wrote:
| > Viruses don't move that fast
|
| Yes they do? And it's more akin to a shared immune system
| than a single organism.
|
| In this case, it's not like viruses move fast relative to
| the total population of machines, but within the population
| of machines being targeted they do move fast.
| TeMPOraL wrote:
| Still, better to let them spread a bit and deal with the
| localized damage than risk nuking everything. There is
| such a thing as treatment that's very effective, but not
| used because of a low probability risk of terminal
| damage.
| proveitbh wrote:
| Cite one virus thay crashed the supposed 10 or 100
| million machines in 70 minutes.
|
| Just one.
| orf wrote:
| Microsoft puts the count at 8.5 million computers. So,
| percentage wise, the MyDoom virus in 2004 infected a far
| greater % of computers in a month: which in the context
| of internet penetration, availability and speeds (40kb/s
| average, 450kb/s fastest) in 2004 was about as fast as it
| could have. So it might as well have been 70 minutes,
| given downloading a 50mb file on dial up would take way
| longer than 70 mins.
|
| To the smart people below:
|
| It's clear to everyone that 70 minutes is not 1 month.
| The point is that it's not a fair comparison: it would
| simply not have been possible to infect that many
| computers in 70 minutes: the internet infrastructure just
| wasn't there.
|
| It's like saying "the Spanish flu didn't do that much
| damage because there where less people on the planet" -
| it's a meaningless absolute comparison, whereas the
| relative comparison is what matters.
| smartpeoplebelw wrote:
| There's also orders of magnitudes more machines today
| than 20 years ago -- so it should be easier to infect
| more machines now than before, and yet no one can sight a
| virus that was as quickly moving and damaging as what
| crowdstrike did through gross negligence.
|
| Be better.
| orf wrote:
| This entire thread is stupid.
|
| Computer security as a whole has improved, whilst the
| complexity of interconnected systems has exponentially
| increased.
|
| This has made the barrier to entry for malware higher,
| and so means we no longer have the same historic examples
| of large scale worms targeting consumer machines _that we
| used to_.
|
| At the same time the financial rewards for finding and
| exploiting a vulnerability within an organisations
| complex stack have greatly increased. The rewards are
| coupled to the time it takes to execute on the
| vulnerability.
|
| This leads to what we have today: localised, and often
| specialised attacks against valuable targets that are
| executed as fast as possible in order to minimise the
| chance a target has to respond or the vulnerability they
| are exploiting to be burned.
|
| Of course the "smart people belw" must know this, so it's
| unclear why they are pretending to be dumb.
| TeMPOraL wrote:
| > _This leads to what we have today: localised, and often
| specialised attacks against valuable targets that are
| executed as fast as possible in order to minimise the
| chance a target has to respond or the vulnerability they
| are exploiting to be burned._
|
| Yup, exactly that.
|
| So what I'm saying it, it's beyond idiotic to combat this
| with a kernel-level backdoor managed by one entity and
| deployed across half the Internet. If anyone manages to
| breach _that_ , they have a way to make their attack much
| simpler and much less localized (though they're unlikely
| to be prepared to capitalize on that). A fuckup on the
| defense side, on the other hand, can kill everything
| everywhere all at once. Which is what just happened.
|
| It's a "cure" for disease that happens to both boost the
| potency of the disease, _and_ , once in blue moon,
| randomly kills the patient for no reason.
| orf wrote:
| But now you run into the tragedy of the commons.
|
| The fact is that this _does_ help organisations.
| Definitely not all of the orgs that buy Crowdstrike, but
| rapid defence against evolving threats is a valuable
| thing for companies.
|
| So, individually it's good for a company. But as a whole,
| and as currently implemented, it's not good for everyone.
|
| However that doesn't matter. Because individually it's a
| benefit.
| TeMPOraL wrote:
| That's right.
|
| Which is why I'm hoping that this incident will make both
| security professionals and regulators reconsider the idea
| of endpoint security as it's currently done, and that
| there will be some cultural and regulatory pushback.
| Maybe this will incentivize people to come up with other
| ideas on how to secure systems and companies, that don't
| look like a police state on steroids.
| 8organicbits wrote:
| ILOVEYOU is a pretty decent contender, although the
| Internet was smaller back then and it didn't "crash"
| computers, it did different damage. Computer viruses and
| worms can spread extremely quickly.
|
| > infected millions of Windows computers worldwide within
| a few hours of its release
|
| See: https://en.wikipedia.org/wiki/Timeline_of_computer_v
| iruses_a...
| echoangle wrote:
| Can you explain why you find this idea of fast moving
| viruses so improbable? Just from the way the internet
| works, I wouldn't be surprised if every reachable host
| could be infected in a few hours if the virus can infect
| a machine in a short time (a few seconds) and would then
| begin infecting other machines. Why is that so hard to
| imagine?
| SoftTalker wrote:
| Proper firewalling for one. "Every reachable host" should
| be a fairly small set, ideally an empty set, when you're
| on the outside looking in.
|
| And operating systems aren't _that_ bad anymore. You don
| 't have services out of the box opening ports on all the
| interfaces, no firewalls, accepting connections from
| everywhere, and using well-known default (or no)
| credentials.
|
| Even stuff like the recent OpenSSH bug that is remotely
| exploitable and grants root access wasn't anything close
| to this kind of disaster because (a) most computers are
| not running SSH servers on the public internet (b) the
| exploit is rather difficult to actually execute.
| Eventually it might not be, but that gives people a bit
| of breathing space to react.
|
| Most cyberattacks use old, unpatched vulnerabilites
| against unprotected systems combined with social
| engineering to get the payload past the network boundary.
| If you are within a pretty broad window of "up to date"
| on your OS and antivirus updates, you are pretty safe.
| echoangle wrote:
| The focus seems to have been the time limit though. All
| the reasons you mention are just that there aren't even
| that many targets.
| hello_moto wrote:
| The malware doesn't need to infect 100 million machines.
|
| It just needs to infect 200k devices to get to the pot:
| hundred million dollars of ransomware.
| TeMPOraL wrote:
| It's a trivial cost to pay if the alternative is
| CrowdStrike inflicting billions of dollars of damage and
| loss of life across several countries.
|
| (I expect this to tally up to double-digit billions and
| thousands of lives lost directly to the outages when the
| dust settles.)
| hello_moto wrote:
| Trivial cost to pay from which side?
|
| The organization like MGM and London Drugs?
| nullindividual wrote:
| https://www.caida.org/catalog/papers/2003_sapphire/
|
| [SQL] Slammer spread incredibly quickly, even though the
| vulnerability was patched in the prior year.
|
| > As it began spreading throughout the Internet, it
| doubled in size every 8.5 seconds. It infected more than
| 90 percent of vulnerable hosts within 10 minutes.
|
| Worms are not technically viruses, but they can have
| similar impacts/perform similar tasks on an infected
| host.
| smartpeoplebelw wrote:
| You are off by several orders of magnitude
|
| Also keep in mind 8.5 million is likely the count of
| machines fully impacted and are not counting the machines
| impacted but were able to be automatically recovered.
| nullindividual wrote:
| > You are off by several orders of magnitude
|
| Can you cite something? This is HN, not reddit.
|
| > Also keep in mind 8.5 million is likely the count of
| machines fully impacted and are not counting the machines
| impacted but were able to be automatically recovered.
|
| Do you have evidence of this? Please bring sources with
| you.
| aldousd666 wrote:
| No they're in the wrong. They didn't test adequately,
| regardless of their motive for not doing so. Obviously
| reality is not backing up your theory there
| orf wrote:
| FYI, both the following statements can be true:
|
| 1. Crowdstrike didn't test adequately
|
| 2. Viruses can move pretty fast once a foothold is gained
| UncleMeat wrote:
| https://en.wikipedia.org/wiki/SQL_Slammer
|
| There is no real "speed limit" on malware spread.
| TeMPOraL wrote:
| No, but there are impenetrable barriers. 0days in
| paricular are usually very specific and affect few
| systems directly but even the broader ones aren't usually
| followed by a blanket attack that pwns everything and
| steals all the data or monies. Just about the only way to
| achieve this kind of blast radius is to have a kernel-
| level backdoor installed in every other computer on the
| planet - which is _exactly_ what those endpoint
| "security" systems are.
| pyeri wrote:
| But why does a signature database update have to mess with
| the kernel in any kind of way? Shouldn't such a database stay
| in the user land?
| vbezhenar wrote:
| Because kernel needs to parse the data in some way and that
| parser apparently was broken enough. Whether it could be
| done in a more resilient manner, I don't know, you need to
| remember that antivirus works in hostile environment and
| can't necessarily trust userspace, so probably they need to
| verify signatures and parse payload in the kernel space.
| theshrike79 wrote:
| The scanner is a Ring 0[0] program. Windows only has 2
| options 0 and 3. 3 won't work for any kind of security
| scanners, so they're forced to use 0.
|
| The proper place would be Ring 1, which doesn't exist on
| Windows.
|
| And being a kernel-level operation, it has the capability
| to crash the whole system before the actual OS has any
| chance to intervene.
|
| [0] https://en.wikipedia.org/wiki/Protection_ring
| leosarev wrote:
| Why is so?
| hello_moto wrote:
| That's a question for Microsoft OS architects
| benchloftbrunch wrote:
| Historical reasons. Windows NT was designed to support
| architectures with only two privilege rings.
| layer8 wrote:
| All modern OSes only use ring 0 and 3. Intel is
| considering removing rings 1 and 2 in a future revision
| for that reason: https://www.intel.com/content/www/us/en/
| developer/articles/t...
| siscia wrote:
| Thanks for the clarification, this makes more sense.
| LeonB wrote:
| It's quite impressive really -- crowdstrike were deploying a
| content update to all of their servers to warn them of the
| "nothing but nulls, anti-crowdstrike virus"
|
| Their precognitive intelligence suggested that a world wide
| attack was only moments away. The same precognitive system
| showed that the virus was so totally incapacitating that the
| only safe response was to incapacitate the server.
|
| Knowing that the virus was capable of taking down _every_
| crowdstrike server, they didn't waste time trying it on a
| subset of servers.
|
| When you know you know.
| Ensorceled wrote:
| Surely there is a happy medium between zero
| (nil,none,nada,zilch) staging and 24 hours of rolling
| updates? A single 30 second or so VM test would have revealed
| this issues.
| layer8 wrote:
| There should have been a test catching the error before
| rollout, however this doesn't require a staged rollout as
| suggested by the GP comment, testing the update at some
| customers (which would still be hosed in that case), it
| only requires executing the test before the rollout.
| jrochkind1 wrote:
| Yup. If they were delaying update to half of their customers
| for 24 hours, and in that 24 hours some of their customers
| got hacked by a zero day, say leading to ransomeware, the
| comment threads would be demanding their head for _that_!
| sateesh wrote:
| Even if it is a staged rollout why would one do it in 24
| hour phases ? It can be a hourly (say) staggered rollout
| too.
| jrochkind1 wrote:
| Sure. And if someone showed up here with a story about
| how they got attacked and ransomwared enterprise-wide in
| the however many several hours that they were waiting for
| their turn to rollout, what do you think HN response
| would be?
|
| Hmm, maybe you could have companies pay more to be in the
| first rollout group? That'd go over well too.
| sateesh wrote:
| It doesn't matter what kind of update it was: signature,
| content,etc. Only thing that matters is does the update has a
| potential to disrupt the user's normal activity (leave alone
| bricking the host), if yes ensure it either works or have a
| staged rollout with a remediation plan.
| aldousd666 wrote:
| You do want to fuzz test it like crazy. Can be automated.
| Takes minutes, saves billions
| robxorb wrote:
| "Blast radius" seems... apt.
|
| It would be rather easier to understand and explain if it were
| intentional. Likely not able to be discussed though.
|
| Anyone able to do that here?
| rplnt wrote:
| It's answered in the post (in the thread) as well. But for
| comparison, when I worked for an AV vendor we pushed maybe 4
| updates a day to a much bigger customer base (if the numbers
| reported by MS are true).
| kchr wrote:
| I'm curious, what did your deployment plan look like?
| Phased/staggered, if so how?
| heraldgeezer wrote:
| How strange to cite ResetEra, a gaming forum with a significant
| certain community, and may not be considered a reliable source.
| switch007 wrote:
| Is there commercial pressure to push out "content" updates asap
| so you can say you're quicker than your competition at responding
| to emerging threats?
| minhoryang wrote:
| Can we find an uptime(availability) graph for the CrowdStrike
| agent? Don't you think this graph should be included in the
| postmortem?
| wasabinator wrote:
| I wonder what privilege level this service runs at. If it's less
| than ring 0, i think some blame needs to go to Windows itself. If
| it's ring 0, did it really need to be that high??
|
| Surely an OS doesn't have to go completely kaput due to one
| service crashing.
| Kwpolska wrote:
| It's not a service, it's a driver. "Anti"malware drivers
| typically run with a lot of permissions to allow spying on all
| processes. Driver failures likely mean the kernel state is
| borked as well, so Windows errs on the side of caution and
| halts.
| FergusArgyll wrote:
| Why wasn't it caught?
|
| https://manifold.markets/ChrisGreene/why-didnt-the-crowdstri...
| dallas wrote:
| Those who have spent time writing NDIS/TDI drivers are those who
| know the minefield!
| MuffinFlavored wrote:
| Did this cause the Azure outage
| https://status.dev.azure.com/_event/524064579 that happened like
| 12 hours before or were they separate?
| cedws wrote:
| Does anybody know if these "channel files" are signed and
| verified by the CS driver? Because if not, that seems like a
| gaping hole for a ring 0 rootkit. Yeah, you need privileges to
| install the channel files, but once you have it you can hide
| yourself much deeper in the system. If the channel files can
| cause a segfault, they can probably do more.
|
| Any input for something that runs at such high privilege should
| be at least integrity checked. That's the basics.
|
| And the fact that you can simply delete these channel files
| suggests there isn't even an anti-tamper mechanism.
| calrain wrote:
| This reminds me of the vulnerability that hit jwt tokens a few
| years ago, when you could set the 'alg' to 'none'.
|
| Surely CrowdStrike encrypts and signs their channel files, and
| I'm wondering if a file full of 0's inadvertently signaled to the
| validating software than a 'null' or 'none' encryption algo was
| being used.
|
| This could imply the file full of zeros is just fine, as the null
| encryption passes, because it's not encrypted.
|
| That could explain why it tried to reference the null memory
| location, because the null encryption file full of zeroes just
| forced it to run to memory location zero.
|
| The risk is, if this is true, then their channel loading
| verification system is critically exposed by being able to load
| malicious channel drivers through disabled encryption on channel
| files.
|
| Just a hunch.
| kachapopopow wrote:
| That was the first thing I thought about when I started
| analyzing this file.
| Gazoche wrote:
| What really blew my mind about this story is learning that a
| single company (CrowdStrike) has the power to push random kernel
| code to a large part of the world's IT infrastructure, at any
| time, at their will.
|
| Correct me if I'm wrong but isn't kernel-level access essentially
| God Mode on every computer their software is installed on?
| Including spying on the entire memory, running any code, deleting
| data, installing ransomware? This feels like an insane amount of
| power concentrated into the hands of a single entity, on the
| level of a nuclear submarine. Wouldn't that make them a prime
| target for all sorts of nation-state actors?
|
| This time the damage was (likely) unintentional and no data was
| lost (save for lost BitLocker keys), but were we really all this
| time one compromised employee away from the largest-ever
| ransomware attack, or even worse?
| andix wrote:
| It's not perfectly clear yet if CrowdStrike is able to push
| executable code via those updates. It looks like they updated
| some definition files and not the kernel driver itself.
|
| But the kernel driver obviously contains some bugs, so it's
| possible that those definition updates can inject code. There
| might be a bug inside the driver that allows code execution (it
| happens all the time that some file parsing code can be tricked
| into executing parts of the data). I'm not sure, but I guess a
| lot of kernel memory is not fully protected by NX bits.
|
| I still have the gut feeling, that this incident was connected
| to some kind of attack. Maybe a distraction from another attack
| while everyone is busy about fixing all the clients. During
| this incident security measures were for sure lowered, lists
| with BitLocker keys printed out for service technicians to fix
| the systems. Even the fix itself was to remove some parts of
| the CroudStrike protection. I would really like to know what
| was inside the C-00000291*.sys file before the update replaced
| it with all zeros. Maybe it was a cleanup job to remove
| something concerning that went wrong. But Hanlon's razor tells
| me not to trust my gut: "Never attribute to malice that which
| is adequately explained by stupidity."
| neom wrote:
| For what it's worth, I 10000% agree with your gut feeling,
| and mine is a gut feeling too so I didn't mention it on HN
| because we typically don't talk about these types of guts
| feelings because of the directions they become speculative in
| (+the razor), but what you wrote is _exactly_ what is in my
| head, fwiw.
| milkshakes wrote:
| falcon absolutely has a remote code execution function as a
| part of Falcon Response
| andix wrote:
| So CrowdStrike has direct access to a lot of critical
| infrastructure? LOL.
| neom wrote:
| Well kernel agents and drivers are not uncommon, however anyone
| doing anything at scale where there is anything touching a
| kernel is typically well understood in the system you're
| implementing it on. That aside, I gather from skimming around
| (so might be wrong here) - seems people were specifically
| implementing this because of a business case not a technical
| case, I read it's mostly used to create compliance (I think via
| shifted liability) - so I think it was probably too easy to
| happen and so it happened - in that - someone in the bizniz
| dept said "if we run this software we are compliant with
| whatever, enabling XYZ multiple of new revenue, clear business
| case!!!" and the tech people probably went "bizniz people want
| this, bizniz case is clear, this seems like a relatively
| advanced business who know what they're doing, it doesn't
| really do much on my system and I'm mostly deploying it to
| innocuous edge user systems, so seems fine _shrug_ " - and then
| a bad push happened and lots and lots of IT departments had had
| the same convo aforementioned.
|
| Could be wrong here so if anyone knows better and can correct
| me...plz do!
| hello_moto wrote:
| A lot of people, especially the non cybersecurity ones, are
| way off the mark so you're not the only one.
| lyu07282 wrote:
| > implementing this because of a business case not a
| technical case
|
| there are some certification requirements to do pentests/red
| teaming and then those security folk will all tell them to
| install an EDR so they picked crowdstrike, but the security
| people have a very valid technical case for that
| recommendation.
|
| it doesn't shift liability to crowdstrike, thats not how this
| works. In this specific case they are very likely liable due
| to gross negligence, but that is different
| nightowl_games wrote:
| > no data was lost
|
| Data was lost in the knock on effects of this, I assure you.
|
| > largest-ever ransomware attack
|
| A ransomware attack would be a terrible use of this power. A
| terrorist attack or cover while a country invades another
| country is a more appropriate scale of potential damage here.
| Perhaps even worse.
| wellknownfakts wrote:
| It is a well known fact that these companies who hold huge sway
| on the world's IT landscape are commonly infiltrated at the top
| levels by Intel agents.
| ChoGGi wrote:
| "What really blew my mind about this story is learning that a
| single company (CrowdStrike) has the power to push random
| kernel code to a large part of the world's IT infrastructure,
| at any time, at their will."
|
| Isn't that every antivirus software and game anticheat?
| SoftTalker wrote:
| The OS vendors themselves (Microsoft, Apple, all the linux
| distros) have this power as well via their automatic update
| channels. As do many others who have automatically-updating
| applications. So it's not a single company, it's many
| companies.
| Gazoche wrote:
| That's true; I suppose it doesn't feel as bad because they're
| much larger companies and more in the public's eye. It's
| still scary to think about the amount of power they yield.
| Shorel wrote:
| What blew my mind is that a single company has such a good
| sales team to sell an unnecessary product to a large part of
| the world's IT.
|
| And if any part of it is necessary, then that's a failure of
| the operating system. It should be a feature of Active
| Directory or Windows.
|
| So, great job sales team, you earned your commissions, now get
| ready to jump ship, 'cause this one is sinking.
| vimbtw wrote:
| This is the mini existential crisis I have randomly. The attack
| area for a modern IT computer is mind bogglingly massive.
| Computers are pulling and executing code from a vast array of
| "trusted" sources without a sandbox. If any one of those
| "trusted" sources are compromised (package managers, cdns, OS
| updates, security software updates, just app updates in
| general, even specific utilities like xz) then you're
| absolutely screwed.
|
| It's hard not to be a little nihilistic about security.
| flappyeagle wrote:
| The only thing I know about crowdstrike is they hired a large
| percentage of the underperforming engineers we fired at multiple
| companies I've worked at
| lizknope wrote:
| https://www.zdnet.com/article/defective-mcafee-update-causes...
|
| April 21, 2010
|
| In 2010 McAffe caused a global IT meltdown due to a faulty
| update. CTO at this time was George Kurtz. Now he is CEO of
| crowdstrike
| nesas wrote:
| Nesa
| taormina wrote:
| Imagine if Microsoft sold you a secure operation system like
| Apple. A staggering portion of the existing cybersecurity
| industry would be irrelevant if this ever happened.
| natdempk wrote:
| Most enterprises these days also run stuff like Crowdstrike (or
| literally Crowdstrike) on their macOS deployments. Similarly
| Windows these days is bundled with OS-level antivirus which is
| sufficient for non-enterprise users.
|
| Not in the security industry, but my take is that basically the
| desktop OS permissions and security model is wrong for a lot of
| these devices, but there is no alternative that is suitable or
| that companies are willing to invest in. Probably many of the
| highest-profile affected machines (airport terminals, signage,
| medical systems, etc.) should just resemble a
| phone/iPad/Chromebook in terms of security/trust, but for
| historical/cost/practical reasons are Windows PCs with
| Crowdstrike.
| kchr wrote:
| CrowdStrike uses eBPF on Linux and System Extensions on
| macOS. Neither if which need kernel level presence. Microsoft
| should move towards offering these kind of solutions to make
| AV and EDR more resistent on Windows devices, without
| jeopardising system integrity and availability.
| cybervegan wrote:
| Boy is crowdstrike's software going to get seriously fuzz tested
| now. All their vulns will be on public display in the next week
| or so.
| ai4ever wrote:
| why is openai/anthropic letting this crisis go to waste ?
|
| where are tweets from sama and amodei on how agi is going to fix
| these issues ?
| ok123456 wrote:
| When your snake oil is poisonous.
| meindnoch wrote:
| Because it wasn't written in Rust!
| qingcharles wrote:
| https://tenor.com/view/oh-boy-here-we-go-again-oh-dear-omg-a...
| kachapopopow wrote:
| These "channel files" sound like they could be used to execute
| arbitrary code... Would be a big embarrassment if it shows up in
| KDU as a provider...
|
| (This is just an early guess from looking at some of the csagent
| in ida decompiler, haven't validated that all the sanity checks
| can be bypassed as these channel files appear to have some kind
| of signature attached to them.)
| jonhohle wrote:
| I don't run CrowdStrike and to the best of my knowledge haven't
| had it installed on one of my systems (something similar ran on
| my machine at the last corporate Jon I had), so correct me if I'm
| wrong.
|
| It seems great pains are made to ensure the CS driver is
| installed first _and_ cannot be uninstalled (presumably the
| remote monitor will notice) or tampered with (signed driver).
|
| Then the driver goes and loads unsigned data files that can be
| arbitrarily deleted by end users? Can these files also be
| arbitrarily added by end users to get the driver to behave in
| ways that it shouldn't? What prevents a malicious actor from
| writing a malicious data file and starting another cascade of
| failing machines or worse, getting kernel privileges?
| mr_mitm wrote:
| These files cannot be deleted or modified by the user, even
| with admin privs. That would make it trivial to disable the
| antivirus. It's only possible by mounting the file system in a
| different OS, which is typically prevented by Bitlocker.
| jonhohle wrote:
| The files are deletable through safe mode, no? I'm assuming
| they are writable by a program outside of the driver, right?
| mr_mitm wrote:
| Yes, but you need the Bitlocker key to get into safe mode
| discostrings wrote:
| Not in the BitLocker configurations I've seen over the
| last few days. The file is deletable as a local
| administrator in safe mode without the BitLocker recovery
| key in at least some configurations.
| fifteen1506 wrote:
| Parsers, verifiers, whatever?
|
| User space downloads file.
|
| User space sets up probation dir.
|
| User space requests kernel to load once the new file.
|
| After that, after a successful boot or 36 hours the file is
| marked as safe and set to autoload.
|
| Or, you know, just load it. It will be cheaper. The ROI on
| loading it immediately is far greater and that's what counts.
| apatheticonion wrote:
| One thing I am surprised no one has been discussing is the role
| Microsoft have played in this and how they set the stage for the
| CrowdStrike outage through a lack of incentive (profit,
| competition) to make Windows resilient to this sort of situation.
|
| While they were not directly responsible for the bug that caused
| the crashes, Microsoft does hold an effective monopoly position
| over workstation computing space (I'd consider this as
| infrastructure at this point) and therefore have a duty of care
| to ensure the security/reliability and capabilities of their
| product.
|
| Without competition, Microsoft have been asleep at the wheel on
| innovations to Windows - some of which could have prevented this
| outage.
|
| For example; Crowdstrike runs in user space on MacOS and Linux -
| does Windows not provide the capabilities needed to run
| Crowdstrike in user space?
|
| What about innovations in application sandboxing which could
| mitigate the need for level of control CrowdStrike requires?
|
| The fact is; Microsoft is largely uncontested in holding the keys
| to the world's computing infrastructure and they have virtually
| no oversight.
|
| Windows has fallen from making over 80% of Microsoft's revenue to
| 10% today - there is nothing wrong with being a private company
| chasing money - but when your product is critical to the
| operation of hospitals, airlines, critical infrastructure, you
| can't be out there tickling your undercarriage on AI assistants
| and advertisements to increase the product's profitability.
|
| IMO Microsoft have dropped the ball on their duty of care to
| consumers and CrowdStrike is a symptom of that. Governments need
| to seriously consider encouraging competition in the desktop
| workspace market. That, or regulate Microsoft's Windows product
___________________________________________________________________
(page generated 2024-07-21 23:03 UTC)