[HN Gopher] Microsoft technical breakdown of CrowdStrike incident
___________________________________________________________________
Microsoft technical breakdown of CrowdStrike incident
Author : nar001
Score : 163 points
Date : 2024-07-28 19:55 UTC (3 hours ago)
(HTM) web link (www.microsoft.com)
(TXT) w3m dump (www.microsoft.com)
| ldjkfkdsjnv wrote:
| The true story is that I bet some major divisions of Crowdstrike
| are ran by non technical people that got there through non
| meritocratic means. Theres generally been no repercussions for
| their underperformance, much like boeing. Crowdstrike business is
| built on relationships, not technical supremacy. And bada bing
| bada boom, we have a complete failure of basic technical
| competency (no rigourous role out process).
| Paianni wrote:
| All business are built on relationships, technical competency
| can but doesn't have to be a means to that end.
| Wytwwww wrote:
| > technical competency
|
| In a more fair world (that also valued economic
| productivity/growth more) companies which completely ignore
| that wouldn't survive, though.
| wiseowise wrote:
| > The true story is that I bet some major divisions of
| Crowdstrike are ran by non technical people that got there
| through non meritocratic means.
|
| Lmao.
|
| > Theres generally been no repercussions for their
| underperformance, much like boeing. Crowdstrike business is
| built on relationships, not technical supremacy. And bada bing
| bada boom, we have a complete failure of basic technical
| competency (no rigourous role out process).
|
| Hope you don't say anything like that in real life.
| ldjkfkdsjnv wrote:
| I try not to, catch me on the wrong day and it slips out
| jacobgorm wrote:
| I used to work on Control Flow Integrity (CFI/XFI) research at
| places like MSR Silicon Valley and VMware, as far back as 2006.
| Back then, sandboxing a kernel module like ramdisk.sys was doable
| with a lot of binary rewriting magic, and later with custom LLVM
| passes, but nowadays it should be a simple matter of compiling
| the code with clang and the appropriate flags, to completely rule
| out this type of memory safety error, turning a BSOD into a
| polite log message and disabling the faulty driver.
| pcwalton wrote:
| I mean, this is basically what eBPF accomplishes in Linux.
| gclawes wrote:
| There is eBPF for Windows: https://github.com/microsoft/ebpf-
| for-windows
|
| I'd hope security products in the future leverage this more
| than custom kernel-mode sensors.
| capitainenemo wrote:
| Was discussed on HN last week. Top comment notes the
| Windows support is still very limited.
| https://news.ycombinator.com/item?id=41033579
| torginus wrote:
| from what I understand, CrowdStrike has essentially put a
| Turing-complete interpreter for their scripting language into
| the kernel. I doubt you can do much when something is that
| general purpose.
| capitainenemo wrote:
| Do you have more information on that? Hadn't read anything
| about the CS kernel module running arbitrary code. Was it a
| factor in the crash?
|
| 'course, Microsoft also put turing complete scripting in ring
| 0 years ago for performance reasons (TTFs - XML/HTML parsing
| and GUI rendering too - to beat other OSes apparently) and
| that certainly did lead to exploited vulnerabilities...
|
| https://googleprojectzero.blogspot.com/2016/07/a-year-of-
| win... https://gist.github.com/Nevor/ed3719dad0cf66893e42a9ba
| 024c91... https://learn.microsoft.com/en-us/security-
| updates/securityb... https://www.fortinet.com/blog/threat-
| research/one-bit-to-rul... https://learn.microsoft.com/en-
| us/security-updates/SecurityA...
| https://news.ycombinator.com/item?id=9769099 (this comment in
| particular https://news.ycombinator.com/item?id=9783863)
| jacobgorm wrote:
| It doesn't matter if you are doing full Fault Isolation with
| XFI. I recommend reading the paper here https://www.usenix.or
| g/legacy/event/osdi06/tech/full_papers/...
| magicalhippo wrote:
| Lua has been used in Linux kernel modules[1][2]. At least for
| the ZFS case I know they were satisfied with the ability to
| limit what the Lua scripts could do to avoid issues.
|
| [1]: https://lwn.net/Articles/830154/
|
| [2]: https://openzfs.github.io/openzfs-docs/man/master/8/zfs-
| prog...
| dmattia wrote:
| I suppose I was expecting something more authoritative here. They
| confirm that there was an attempted read-out-of-bounds, as
| CrowdStrike said, but that's not really new information at this
| point. I suppose we'll need to wait for more detailed analysis
| from CrowdStrike at some point.
|
| This post explains why security software has historically run in
| kernel-mode, and really seems to be pushing new technology that
| Microsoft has that would push security vendors into user-mode
| (with APIs that attempt to assist with many of the reasons why
| they have historically used kernel-mode).
|
| Crowdstrike already runs in user-mode on both Mac and Linux (from
| what I can tell), and it seems like running in user-mode on
| Windows would significantly lessen the risk of catastrophic
| failures like a blue-screen-of-death. I know the bulk of the
| failures here belong to CrowdStrike, but I can't help but think
| about the fact that Apple kicked security vendors out of kernel-
| mode a ways back, and that if Windows had done similarly, an
| issue like this probably wouldn't have been possible. By even
| offering kernel-mode options to external vendors, I believe
| Microsoft is creating risk for themselves.
| Rinzler89 wrote:
| _> I can 't help but think about the fact that Apple kicked
| security vendors out of kernel-mode a ways back, and that if
| Windows had done similarly, an issue like this probably
| wouldn't have been possible_
|
| Like others already said, Microsoft already tried to do that
| with PatchGuard in 2006 with the launch of Windows Vista and
| the likes of Symantec and McAfee complained to the EU about
| this would harm the sales of their products, so the EU told
| Microsoft to not do it in 2009[1].
|
| Apple has the luxury of a small market share on the desktop PC
| space to not attract the attention of the regulators, plus a
| user base that's used to Apple constantly rewriting the OS,
| deprecating APIs, switching CPU architectures, etc. without
| giving a fuck about breaking backwards compatibility or cutting
| off developers access to OS features their products use and
| getting away with it, luxuries that Microsoft doesn't have.
|
| IMHO, sticking with Window's default security and not using
| third party anit-malware has made Windows vastly more secure
| and rulabile than it was in the days when you'd be looking on
| installing the likes of Symantec or McAfee for your
| "protection" which ended up acting like malware after a while
| throwing dark patterns at you to milk more subsection fees, so
| as much as it hurts their sales, it's important for the
| regulators to understand that security is far more important
| than the regulations they put on Windows for Internet Explorer
| and Media Player and just like Apple's apps-store, it's
| sometimes better to let the original product maker handle
| security and not leave the product open at all points just so
| some of these bandits can make a living selling security for
| it. It's like foxes complaining to regulators how chicken wire
| is a threat to their existence.
|
| [1] https://stratechery.com/2024/crashes-and-competition/
| rrix2 wrote:
| No, they engaged in malicious compliance (which many here
| like yourself have bought in to) by rather than
| rearchitecting their own security software to not rely on
| trusted kernel level access, forcing every PC user in to a
| less secure ecosystem where these things must run in the
| kernel.
| Rinzler89 wrote:
| That's an interesting theory. Do you have any sources for
| this? Because so far there has been no technical arguments
| to support your PoV.
| spott wrote:
| Wasn't the whole regulatory argument that Microsoft was
| using kernel mode in their security software, while
| trying to relegate third party security software to user
| land? In that case, regulators stepped in and made
| Microsoft open up kernel mode to level the playing field.
| foota wrote:
| I don't see the malicious part of the compliance here.
| Maybe lazy compliance?
| feyman_r wrote:
| Lots of allegations here. Can you share examples with
| sources of other operating systems following practices
| which you mention here? I presume Mac allows the same level
| of access for CRWD through user mode access only and that's
| the only way they do it too. Same goes for Linux.
|
| I genuinely want to understand this - how everyone else got
| it right and this entity got it wrong.
| whimsicalism wrote:
| The EU requires MS to provide kernel-level access to security
| vendors due to their crazy anti-compete provisions
| dmattia wrote:
| This seems to be only partially true when I read into it. The
| EU said that Microsoft would need to move their security
| tools into user-space (or at least to use the same APIs as
| are available in user-space). If they did that (like Apple
| has done), they could kick everyone out of kernel-space if
| they wanted.
| TillE wrote:
| > pushing new technology that Microsoft has that would push
| security vendors into user-mode
|
| This doesn't exist. It's briefly hinted at in their conclusion,
| but right now it's simply not there.
|
| There is no userspace equivalent of filesystem minifilters,
| ObRegisterCallbacks, etc.
| dmattia wrote:
| This is fascinating, thank you for the info! If I am
| understanding, it would have then been difficult/impossible
| for CrowdStrike to create a user-mode only sensor without
| these equivalent APIs.
|
| So I guess I'm not sure I see validity in the claims of those
| blaming the EU here. It seems as though the EU would have
| allowed Microsoft to kick users out of kernel-space if they
| had APIs that allowed making security products in user-space.
| Like Linux/Mac already appear to have.
| extraduder_ire wrote:
| I don't think they would have had to provide those APIs in
| the EU, so long as their own security products were "kicked
| out" as well. That's kind of complicated to achieve in a
| permanent and provable way. Though, windows has had support
| for eBPF for about two years now.
| TillE wrote:
| Windows eBPF support is experimental and currently
| provides hooks for packet filtering stuff and nothing
| else.
|
| I would be delighted if their long-term solution is eBPF
| which provides full anti-malware hooks, but again it's
| unfortunately not there yet.
| __MatrixMan__ wrote:
| I agree. Microsoft's core competency has traditionally been
| backwards compatibility, but if each security vendor can tamper
| with windows at the deepest level and is allowed to continue
| explore all of the ways that they can leverage that... What you
| end up with is a fleet of different windowses, each diverging
| further with time. It dilutes the benefits brought by
| investment into the stability of the system because whatever
| fights are won in one fragment must be refought in others
| before you can have confidence in the stability of all
| fragments.
|
| It seems like madness to me.
| michaelt wrote:
| _> Crowdstrike already runs in user-mode on both Mac and Linux
| (from what I can tell),_
|
| Crowdstrike provides a Linux kernel module, and expects users
| to manually install an extra Secure Boot key for it, as part of
| their corporate laptop setup procedure.
|
| This has always seemed inadvisable to me, but checkbox checkers
| gotta check checkboxes I guess.
| GordonS wrote:
| For one thing, being difficult to kill is huge selling point
| for EDR - move it to user space and it's a lot easier to kill.
| pas wrote:
| A kernel-space watchdog (that checks integrity of the image)
| would be much easier than a filter that updates from the
| internet.
|
| Sure, the whole thing is definitely a hard problem, but CS
| fucking up even the most basic QA **and** error handling ...
| it just shows how ridiculous their whole claim to having
| super fancy _technology_ is.
| akira2501 wrote:
| > where security and availability are non-negotiable.
|
| Yep. You just have to pretend that everyone who deployed Windows
| had an actual competitive choice available to them.
|
| > A second benefit of loading into kernel mode is tamper
| resistance.
|
| I guess availability is negotiable after all.
| qsdf38100 wrote:
| > Yep. You just have to pretend that everyone who deployed
| Windows had an actual competitive choice available to them.
|
| Could you elaborate? How is that related to security and
| availability being non negotiable?
| akira2501 wrote:
| Microsoft's statement implies that people choose Windows
| because of it's security and availability. Whereas most
| people end up with Windows because the software they want to
| run only operates on that single platform.
|
| The security and availability, to the extent they even exist,
| are clearly not part of the market's decision making process.
| janice1999 wrote:
| At least they're not blaming the European Union in this breakdown
| (as they did earlier).
| zh3 wrote:
| Even this is written after multiple reviews by corporate
| lawyers.
| whimsicalism wrote:
| they're right though...
| DarkNova6 wrote:
| Yes. Only Microsoft should be allowed to crash their
| operating system. Like back in the good old days when only MS
| could use their secret high-performance APIs.
| graeme wrote:
| Why exactly _should_ security vendors have the ability to
| crash the operating system?
| dmattia wrote:
| They shouldn't. Microsoft should have APIs that enable
| security vendors to work in userspace.
|
| The EU didn't say that Microsoft couldn't kick vendors
| out of the kernel, just that they couldn't do so without
| having the APIs available that would let security vendors
| operate outside the kernel.
|
| Mac and Linux have such APIs, so CrowdStrike operates in
| user-mode on those platforms, so those platforms do not
| give security vendors the ability to crash the operating
| system.
| strombofulous wrote:
| Would this still have happened if the EU had not ruled against
| Microsoft?
| PlutoIsAPlanet wrote:
| Microsoft can kick security vendors out the kernel, but they
| can't sell a product that uses APIs not accessible to other
| vendors.
| strombofulous wrote:
| Sure, but my question still stands - would this have
| happened if the EU had not made that ruling?
| mort96 wrote:
| Probably
| Tuna-Fish wrote:
| Yes. There were kernel mode drivers before that ruling,
| it is essentially entirely irrelevant to this outage.
| holsta wrote:
| It's not about kernel access, it's about equal access to
| avoid yet another monopoly.
|
| Microsoft could have come up with a kernel API that their own
| malware (and everyone elses) product could make use of. They
| did not.
| extraduder_ire wrote:
| Probably not, but in more of a butterfly-effect or this
| product not existing way.
| ziml77 wrote:
| But the blame wasn't misplaced before. People keep saying that
| macOS does things better by forcing third parties out of the
| kernel and instead offering APIs to do the same work in
| userspace. Microsoft tried to do exactly this for security
| software in Windows, but the EU didn't like that this change
| meant that any Microsoft-developed solutions would have an
| advantage over third party ones.
| ronsor wrote:
| I really, _really_ wish Microsoft would force third parties
| out of the kernel.
| Khaine wrote:
| No, the EU didn't like MS having their malware protection in
| kernel while kicking out third parties.
|
| If Defender was also kicked out, it would have been fine, but
| it wasn't.
| tacticus wrote:
| > Microsoft tried to do exactly this for security software in
| Windows
|
| Using a monopoly in one industry to capture the market in
| another industry is what anti monopoly laws are meant to
| prevent.
|
| Microsoft was prevented because they wanted to retain a
| commercial business in their security products having special
| access while locking out everyone else.
| rdtsc wrote:
| > We plan to work with the anti-malware ecosystem to take
| advantage of these integrated features to modernize their
| approach, helping to support and even increase security along
| with reliability.
|
| > Providing safe rollout guidance, best practices, and
| technologies to make it safer to perform updates to security
| products.
|
| > Reducing the need for kernel drivers to access important
| security data.
|
| They are being as diplomatic as they can, but it's definitely a
| slap to CS. Read as "they don't know how to roll things out, they
| need guidance on basic QA practices, we'll happily teach
| them...". Then, they list a set of facilities running in user-
| mode to avoid needing to run as many things in kernel mode.
|
| I would be interested what the water cooler discussion about CS
| was like inside Microsoft. Especially in teams needed to respond
| to customers about "Your windows OS is broken, our hospital
| patients are suffering...".
| notepad0x90 wrote:
| I must disagree with that take, your last quoted sentence is in
| response to all the supposed self-proclaimed experts asking
| "why does it need kernel access", the ones before that is to
| limit their own liability.
|
| What I've heard from people in the industry is not this silly
| "oh no, crowdstrike is so incompetent" b.s. that is being
| spread on sites like HN and reddit but more of an empathic "it
| could have been us" sentiment. In this write up as well,
| Microsoft knows they have caused their share of outages, it is
| a technical write-up but in part, it is to cover their bases
| for government investigations and lawsuits that will arise from
| this incident.
|
| And in part, they are also responsible for recovering from
| third-party driver errors and repeated boot failures caused by
| faulty drivers.
| retrochameleon wrote:
| CrowdStrike blamed their test software, but in the same
| breath revealed that they haven't been using any canary
| deployments. The bug that caused all this was present in
| their kernel driver for a long time.
|
| For being such a large cybersecurity player and deploying
| updates to 8.5 million devices, their quality control
| practices are embarrasingly lacking.
| rvnx wrote:
| Clearly incompetence to deploy from 0 to 8 million devices
| without any gradual rollout.
|
| That goes even further, because apparently they were fully
| blind and didn't have crash metrics.
|
| "Ok we push the update, and pray".
| galangalalgol wrote:
| I think it is past incompetence, and on into negligence.
| Given the stories we have heard here about emergency
| service failures it is likely that people died. When
| people die due to negligence isn't that usually criminal?
| rvnx wrote:
| Can't agree more, you found the right words.
| binkHN wrote:
| And this is how the lawsuits will start.
| SoftTalker wrote:
| Who is negligent though? Crowdstrike, or the emergency
| services that are using an OS that requires third party
| endpoint security right out of the box in order to be
| safely used, or the company that makes and sells that OS?
| crazygringo wrote:
| Why not both?
|
| Crowdstrike, for negligently not rolling out updates
| gradually.
|
| And emergency services, if they don't have robust
| fallback procedures/systems for when their IT system goes
| down. I mean it's totally fine if regular doctor's visits
| get postponed, but 911 should never go down just because
| their computers down. Just like aircraft have redundant
| systems, so too should 911.
|
| (The company that makes and sells the OS -- I don't see
| any negligence there, in this case. If security software
| fundamentally requires running at the kernel level and
| Microsoft allows that, I don't see how Microsoft can be
| at fault.)
| jmb99 wrote:
| Yeah, I don't see how one can blame Microsoft in this
| scenario. If you choose to run buggy kernel-level code,
| that's on you, not the publisher of the kernel/OS.
| Especially when the code you're running is a replacement
| for functionality already provided by the OS. It's hard
| to argue that MS could be negligent for "not having a
| good enough AV/endpoint protection solution" or "allowing
| customers to run kernel-level code."
| mort96 wrote:
| Every company I've ever been at rolls out updates slowly.
| Rolling out a change to 8.5 million computers at the same
| time seems ridiculous. Even the most cash strapped start-
| ups with every incentive to cut corners tends to get staged
| roll-outs more or less right. It's crazy.
| binkHN wrote:
| Beyond crazy. I even have a small app that never makes it
| to production before being rolled out to internal and
| open testing first. And, even then, it's slowly rolled
| out to a percentage at each stage before being fully
| deployed. One would think a major company with kernel
| level access would do this at minimum.
| geon wrote:
| I had a fleet of only maybe 200 computers I updated
| remotely. I did canary staged roll outs.
| doubled112 wrote:
| When I managed ~ 15 developer's Arch Linux workstations,
| I found it very beneficial to be the canary, and then
| rollout to a couple of the more capable of
| troubleshooting devs, and then the rest. I can always fix
| my own box.
|
| 8.5M all at once feels insane.
| duskwuff wrote:
| > CrowdStrike blamed their test software, but in the same
| breath revealed that they haven't been using any canary
| deployments.
|
| Their post-incident report [1] also stated that they intend
| to improve testing by "using testing types such as: local
| developer testing". One has to wonder what, if any, testing
| they were doing beforehand.
|
| [1]: https://www.crowdstrike.com/blog/falcon-content-
| update-preli...
| gjsman-1000 wrote:
| Microsoft should be sued, for literally having blood on their
| hands. There was an easily mitigated design flaw in Windows
| that would have greatly blunted the impact.
|
| https://news.ycombinator.com/item?id=41095788
| freehorse wrote:
| If "it could have been them", then I would like to read such
| professionals write exactly about how to avoid having a
| global outage like this again, rather than "showing empathy"
| with a corporation. Or do we just leave it up to luck, and if
| "it happens to them too" in a month or year, oopsies? What
| about which practices could be improved?
| michaelt wrote:
| Anyone in the industry could have a bug get through testing.
|
| Some companies could have a severe and readily reproducible
| bug get through testing.
|
| A few of those companies have a hand-rolled update mechanism,
| and can accidentally break their ability to roll back a bad
| release.
|
| A few of _those_ companies are in a position to push a
| release that breaks not only their own software, but the
| entire OS.
|
| Very few companies in that position would roll out to 100% of
| client machines in a single worldwide deployment.
| gnfargbl wrote:
| It didn't read as _particularly_ diplomatic to me. In
| particular, this paragraph..
|
| _> It is possible today for security tools to balance security
| and reliability. For example, security vendors can use minimal
| sensors that run in kernel mode for data collection and
| enforcement limiting exposure to availability issues. The
| remainder of the key product functionality includes managing
| updates, parsing content, and other operations can occur
| isolated within user mode where recoverability is possible._
|
| ...was about as close to tetchy as a post like this would ever
| get. Basically they are saying "there was no good reason at
| all why CrowdStrike had to put so much code inside the actual
| kernel." And with the benefit of hindsight, it's a strong
| point.
| ffhhj wrote:
| > there was no good reason at all why CrowdStrike
|
| Their business is corporate spyware to surveil employees,
| ofcourse they'll use any tactic to make it work, that's the
| why. And their EULA states there is no liability for the
| company:
|
| https://www.crowdstrike.com/terms-conditions/
|
| Dirty policies on top of dirty practices.
| Rinzler89 wrote:
| _> Their business is corporate spyware to surveil
| employees_
|
| What?! Anything you do on your corporate provided laptop is
| always gonna be logged by IT for security in every large
| company everywhere, that's news to nobody, but your company
| doesn't care that you use your corpo laptop to book your
| vacation, IT has better things to do than narc on you for
| that.
|
| If your boss wants to actually spy on you they don't need
| Crowdstrike, there's other SW dedicated for that depending
| on the laws in your jurisdiction but that' not what
| Crowdstrike is for.
|
| If you want complete privacy from your employer, just use
| your personal machine for your private activities instead
| of your work laptop, why is this so hard?
| userbinator wrote:
| Speak for yourself. There are still companies who don't
| treat their employees like idiots and actually trust
| them. Let's not normalise pervasive surveillance.
| Rinzler89 wrote:
| _> There are still companies who don't treat their
| employees like idiots and actually trust them._
|
| Yeah sure, but wow many of those are large non-tech
| companies?
|
| You massively overestimate the tech competency of the
| average PC user if you think it's normal in most
| companies to not have security monitoring solutions in
| place or over the internat activity. In our latest
| phishing test IT did, several users fell for the trap,
| despite it being a tech company. There's always gonna be
| someone careless one day and companies want insurance
| policies against that.
|
| Having such solutions in place doesn't mean the company
| doesn't trust you, it's more like that old Russian
| proverb, "trust but verify", and for ticking security
| compliance boxing as an insurance policy.
|
| Everyone makes mistakes, it's only human. So more like,
| speak for yourself, if you think your internet activity
| at work isn't logged anywhere.
| holsta wrote:
| > they need guidance on basic QA practices
|
| Microsoft has a loooong history of botched (security) updates,
| so I'm not hopeful they can teach Crowdstrike much.
| SoftTalker wrote:
| Yes, quite the epitome of throwing stones from a glass house.
| Rinzler89 wrote:
| Do you happen to have a list of that "loooong history" of
| botched (security) updates?
|
| I can only find a couple of examples after googling, which a
| bit smaller than a "loooong history" you're talking about, so
| unless Microsoft is paying Google to delete results, maybe
| you're mistaken.
| SoftTalker wrote:
| This is a company whose OS could not even be installed on a
| live network without getting rooted within a few minutes.
| Anybody who was paying attention knew that you didn't use
| any new Windows release until at least the first service
| pack had come out.
|
| Granted that was a while back but painful memories die
| hard.
| Rinzler89 wrote:
| _> This is a company whose OS could not even be installed
| on a live network without getting rooted within a few
| minutes. _
|
| That was WIndows XP 20 years ago. Please bring arguments
| about modern Window 11 security which is the current up
| to date product they're selling and supporting not
| scenarios that haven't happened in 20 years.
| clwg wrote:
| First thing that comes to mind is that Recall stuff from
| a month ago, they also release updates[0] that crash
| machines.
|
| [0] https://www.tomsguide.com/news/windows-11-update-
| causing-blu...
| TeMPOraL wrote:
| Recall actually is a brilliant idea, and I dreamed of
| something like it for a long time, and so did plenty
| people here. It's just not something you can trust a
| third-party business with, whether it's a fly-by-night
| startup or an international megacorporation known to be
| openly promiscuous with advertisers.
|
| This is basically "take a screenshot every 30 seconds and
| compile it into a timelapse", but on steroids, and the
| same appeal, and arguments wrt. who gets to run it on
| whose machines, all apply.
| clwg wrote:
| The functionality does seem intriguing, that doesn't
| change it's security profile which was poorly thought out
| and implemented.
| feyman_r wrote:
| Ignoring Windows Insider reports is bad. However, how
| many endpoints having issues (out of a billion+) is
| 'acceptable' after an update? We live in a news hype
| cycle so clearly even the one wrong failure will make it
| up somewhere.
|
| However, without metrics that show BSoDs from patches
| (which MS will likely never share), it's hard to see if
| things have improved or regressed. If they regressed,
| someone up in their leadership chain is hopefully
| following the constructive discussion here.
| Eduard wrote:
| for a loooong history, you have to look in the past
| Rinzler89 wrote:
| Ah, well, if only things of the past were useful today,
| I'd still have hair, and probably millions made form
| right investments, but unfortunately, it's what's
| happening today that actually matters.
| echoangle wrote:
| So you asked for proof of a long history and are now
| surprised that the examples are all from the past?
| squigz wrote:
| GP is absolutely correct. You can't ask for examples of a
| long history of something, then dismiss examples from,
| you know, history.
| tacticus wrote:
| The company that let every db server have global admin
| creds and 0 logging on their cloud platform?
|
| That didn't run their own enhanced visibility on their
| own cloud platform.
| lightedman wrote:
| Vulnerabilities present in 2000 are showing up still in
| modern Windows versions.
|
| https://www.csoonline.com/article/564499/3-leaked-nsa-
| exploi...
|
| You have no idea the cruft and technical debt Windows has
| in order to maintain its backwards compatibility.
| TeMPOraL wrote:
| That's a bit disingenuous, though. That was, as
| 'Rinzler89 points out, some 20 years ago. Back then, any
| Linux distro would've definitely been much safer option,
| because after installing _you couldn 't even connect it
| to the network_, because it had no support for your cable
| modem or wireless card, and that's assuming you didn't
| fuck up your MBR with LiLo for the 20th time. Ask me how
| I know.
|
| Both OS families have changed much since that time.
| rvnx wrote:
| Oh sweet, this laptop has a PCMCIA Wi-Fi card!
|
| That'd be cool if one day I can get the laptop running on
| battery and not just on sector.
|
| Let me just setup it.
|
| Wait a second, how do I wake up the screen again and get
| out of this hibernation stage ?
|
| Why are all the fans stuck in 100% now ?
|
| Errr, first let's see if I can get the trackpad working.
| feyman_r wrote:
| Agree.I also remember those days when it was so hard to
| get Linux to just boot up and get your display working
| correctly- it was almost like a rite of passage. It was
| just proving grounds for how much of an expert you were
| and the number of hours you spent in front of the PC,
| just to get things working.
|
| My point is, good and bad memories will always stand out.
| system2 wrote:
| Anyone who worked in IT knows this, it is not something
| rare. Literally every month, for example one from last
| month:
|
| https://www.techradar.com/computing/windows/windows-11-upda
| t...
|
| This is the main reason every IT professional I know
| disables auto updates of windows and manually trigger
| updates after testing (hopefully) on multiple dummy
| machines on the network.
|
| I personally remember booting to safe mode to remove
| Windows updates to rescue the computers more than I can
| count.
| Rinzler89 wrote:
| Examples like that one I also found, but that's not
| really a "looooong list". If people can only show one
| single example as an argument it's kind of a moot point.
| system2 wrote:
| You'd experience at least 3-5 per year if you work in IT.
| There really is a long list but since it is not my
| argument, I won't list them after searching for an hour.
| The list starts early 2000s, not recent.
|
| EDIT: Whatever, I will do the search for you since you
| cannot use google:
|
| https://www.pcgamer.com/an-odd-bug-in-this-months-
| windows-10...
|
| https://www.windowslatest.com/2023/10/22/windows-11-octob
| er-...
|
| https://www.bleepingcomputer.com/news/microsoft/windows-1
| 0-e...
|
| https://www.windowslatest.com/2023/02/09/microsoft-
| confirms-...
|
| https://www.windowslatest.com/2023/07/16/windows-11-kb502
| 818...
|
| These are just the last quarter of 2023. There is over
| 2000 news but I won't link them Use keywords: Windows
| Update, Crash, and use the date option on google go
| before 2023.
| GordonS wrote:
| There's only been a few _really_ bad ones, but Microsoft
| botch Windows updates quite regularly.
| Rinzler89 wrote:
| _> but Microsoft botch Windows updates quite regularly_
|
| OK, please show us the proof then. If it's as regularly
| indeed like you claim then it must be documented
| somewhere as a greppable list.
|
| Tech blogs would have a field day getting traffic on
| their site by keeping track and documenting on such
| regular mistakes if they exist.
| Brybry wrote:
| It's frequent enough that people pay money for
| AskWoody[1] to tell them when it's safe to patch or what
| patches to skip.
|
| [1] https://www.askwoody.com/ms-defcon-system/
| Rinzler89 wrote:
| Quote, from the website:
|
| _" In general, I apply Windows Defender updates as soon
| as they're available. Why? Microsoft hasn't screwed up
| any of them too badly. You're better off applying those
| updates than letting them slide for a week or two."_
| Brybry wrote:
| Yep, Microsoft does a good job with Windows Defender
| (antivirus) updates.
|
| It's the other Windows Updates that they botch frequently
| enough to make people wary of patching immediately.
| oxygen_crisis wrote:
| Here's >100 of them in the past ~8 months:
|
| https://www.manageengine.com/patch-
| management/resources/micr...
| feyman_r wrote:
| Where can I find a list for all OSes? I'd assume such a
| list would have known issues with X11 etc. I want to
| ensure it's not a case of surviviorship bias.
| mrj wrote:
| Well, from the news this morning:
|
| https://www.forbes.com/sites/daveywinder/2024/07/27/microso
| f...
| drdec wrote:
| >> they need guidance on basic QA practices
|
| > Microsoft has a loooong history of botched (security)
| updates, so I'm not hopeful they can teach Crowdstrike much.
|
| Experience is the best teacher
| cogman10 wrote:
| And they've learned a lot from it. For example, MS no longer
| universally deploys updates across the world, they have a
| slower rollout to avoid just such an incident.
| f001 wrote:
| I can tell you they're quite unhappy about it. Have a friend
| working there who frustratedly says it wasn't their fault
| every-time it comes up. Which is quite often and at every
| social occasion since.
| fishywang wrote:
| but it's kind of their fault? they designed the api that way,
| they decided what can be done in userland and what must be
| done via kernel. they at least _allowed_ it to happen every
| time.
| lozenge wrote:
| You can't just let people do anything from userland, the
| performance would tank. As for restricting kernelland, EU
| competition regulators would not be happy if MS was the
| only one able to write anti virus software that runs in
| kernelland.
| justinclift wrote:
| Or perhaps MS could actually try to think of a working
| solution, rather than blame legislation they don't like?
|
| "Don't blame us! Blame the EU for stopping our monopoly!"
|
| Yeah, good luck with that. ;)
| gjsman-1000 wrote:
| Reminder that Microsoft _could_ have programmed Windows to notice
| if a driver has caused a blue screen three times in a row, and
| prompt if you want to disable the driver on boot. After all,
| Windows _already_ collects how many times a driver causes a
| crash. This would have made recovery one click instead of heading
| into Safe Mode and needing BitLocker keys.
|
| But they didn't.
|
| And Microsoft, I argue, _also_ has blood on their hands for every
| hospital this hit. Giving users a prompt to disable the driver,
| after three successive failed boots, would have saved lives.
| t-writescode wrote:
| How would that have helped the server farms that were
| experiencing the issue?
| gjsman-1000 wrote:
| Oh I don't know, the servers down, you go and look as a
| technician, and you simply see a screen saying:
|
| "CSAgent.sys has caused a failure to boot three times in a
| row. Do you want to disable this driver? <Yes> <No>."
|
| You click "Yes." Server reboots with CloudStrike driver
| disabled. The day is saved in 5 minutes instead of building a
| custom ISO image or going on a BitLocker key recovery spree.
| politelemon wrote:
| It would still have required on site presence and
| interaction during which there is still downtime, so this
| accomplishes marginally small gains.
| gjsman-1000 wrote:
| At the same time though, imagine you woke up and
| CloudStrike hit your organization.
|
| For most users, they'll try clicking "Yes." And then it's
| back to work. After all, "No" just causes a blue screen
| again, might as well try the other path.
|
| This would have been the difference between the IT
| department handling 10,000+ calls or a few hundred (plus
| sending out a bulletin) in many, many organizations. It
| also could have saved billions at this point.
|
| Heck, it would have saved _lives_ in hospitals.
| jonathantf2 wrote:
| But then you have millions of endpoints booting without
| malware protection
| echoangle wrote:
| Can you cite some reports of deaths caused by the outage?
| morkalork wrote:
| Instead of prompting on the screen, disable the driver and
| boot directly into a recovery state that has networking
| enabled so sysadmins can push scripts and fixes? As long as
| it's not a network driver you'd be okay.
| t-writescode wrote:
| Disable the driver that is explicitly there to protect from
| malware and attacks?
|
| Wouldn't malware just use that as an attack vector?
| danlitt wrote:
| Nooo you don't understaaaand kernel code is special :'(
| actually BSOD was a desired feature because CrowdStrike is a
| Security (TM) application.
|
| (sorry, just simulating the replies I get when I post this
| sentiment anywhere else)
| gjsman-1000 wrote:
| That's very easily mitigated - write the security software so
| it can't crash. Like, you know, drivers should be written.
|
| Malware can't crash a well-written or memory-safe driver, so
| it will never be unloaded. Problem solved.
| echoangle wrote:
| Writing the driver so it can't crash is the hard part, I
| think the developers knew that this was the goal.
| Uvix wrote:
| Those hospitals chose to deploy software that didn't support
| testing. The blood is on their own hands.
| galangalalgol wrote:
| I think sueing MS for the behavior that ensued when people
| installed a rootkit directly into the kernel and opened all the
| ports on their network to let that rootkit get used, is...
| excessive. Both MS and CS should have had a fail to previous
| good kernel ability, but the negligence here is clearly with CS
| for not even trying a blank data file in the automated tests
| for a piece of safety critical software, and then not using
| canary deployments before pushing to millions of devices.
| crazygringo wrote:
| Do I like your idea for that?
|
| Yes, absolutely. It's a clever idea.
|
| But do I think Microsoft was _negligent_ in not building that?
|
| No, I think that's going too far. Windows already has Safe Mode
| -- as you note -- to allow for manual recovery, which is what
| people are using.
|
| I don't think it makes sense for it to be Microsoft's legal
| responsibility to protect its users from software with a
| critical bug that wasn't written by Microsoft. Otherwise, where
| would it end? If a third-party program tries to delete all your
| user data, is it Microsoft's legal responsibility to check
| whenever a process is deleting a lot of data, and intervene
| with a confirmation dialog? Is it Microsoft's responsibility to
| protect you from all malware and ransomware, no matter how
| cleverly written? Is it Microsoft's responsibility to
| constantly cache program state on disk so that when a third-
| party program crashes, you don't lose your data since your last
| save?
|
| I think that's going too far, in terms of legal obligation.
| grumpyprole wrote:
| Microsoft may be negligent in selling a product unsuitable
| for these applications. Windows is unsuitable precisely
| because it can be brought down by third party updates, such
| that it cannot recover without manual intervention by
| technical experts. Third party vendors are forced into
| writing unsafe kernel drivers because Microsoft does not
| provide sufficient user mode APIs.
|
| Windows has a dated design and a security model no longer fit
| for purpose. As for your other example, it _could_ be
| protecting users from malicious programs that may delete
| data, simply by having a better security model, like Android
| and iOS.
| crazygringo wrote:
| I don't think Microsoft can be negligent here, because
| Windows isn't being brought down by _Microsoft_ updates.
|
| Somebody bought Windows, and bought CrowdStrike.
| CrowdStrike is negligent, and possibly also the person/org
| who chose to rely on Windows+CrowdStrike without a backup
| plan if that resulted in further damages to others.
|
| Third party vendors are absolutely not "forced into writing
| unsafe kernel drivers". They can properly test things to
| write safer code (which CrowdStrike infamously didn't). And
| kernel mode is fundamentally required for security software
| like this, as far as I understand.
|
| And using app-based mobile OS's is not necessarily a useful
| comparison point. They are limited in all sorts of ways
| that desktop OS's are not -- and don't you hear people here
| on HN constantly _complaining_ about that? A better
| comparison point is macOS and Linux. CrowdStrike also
| crashed Linux, and macOS still lets you bypass SIP if you
| want to.
| Khaine wrote:
| AFAIK Windows does do that, except for drivers that are marked
| as required for boot. CrowdStrike's drivers are marked as
| required for boot.
| ziml77 wrote:
| Imagine I've installed CrowdStrike under the assumption that it
| makes my system more secure. Why would I want the OS to allow
| the system to boot up in a less secure state by providing a
| prompt for that? Most users will just click whichever option
| gets them back up and running and IT will have no control over
| that.
| nerdjon wrote:
| This is very much a "easier said than done" situation that I
| would think Hacker News of all places would be better about
| when it comes to "just" doing something in code.
|
| First Windows already does something similar. After 3 it is
| supposed to boot into WindowsRE which gives you options to
| revert to a previous version, uninstall updates, and I believe
| also reverts configurations like recent driver installations.
|
| The problem here though, CrowdStrike itself didn't update. It
| updated a definition file (last I saw at least) and that likely
| would not have been caught by Windows as a new version.
|
| Also frankly, not super thrilled at the idea of Windows just
| deciding to disable/uninstall something except for rolling back
| (so a previously working config) due to how things could
| interact. This situation could have been far worse and harder
| to recover from.
|
| In this case maybe Windows could have noticed that the
| configuration update is what was causing it and rolled that
| back, but it's possible it would have just re-downloaded the
| file when it started back up anyways.
|
| Regarding saved lives, do we actually know that anyone's lives
| were lost due to this? My local hospitals were still performing
| emergency surgery.
| zh3 wrote:
| I do have to wonder how many agonising layers of review this went
| through with the marketing and legal departments as part of
| shifting the blame.
|
| If you want to decide which OS/distros to avoid for critical
| stuff, look to see who's learning from the incident (even if not
| bitten by it) compared to those saying "it wasn't our fault" (and
| that's not just MS).
| EasyMark wrote:
| Oh I like this breakdown a lot. Fairly technical, links to
| resources used, flow of debug process, didn't get lost in a the
| weeds of details and how clever they were. I wish more debug
| retrospectives were like this. It seems like you end up with 100
| pages of analysis or a couple of vague paragraphs.
| userbinator wrote:
| I'm going to be the controversial one here and say that, as bad
| as CrowdStrike was, the alternative of having only Microsoft be
| able to decide what people can do is far worse. I've already seen
| many others trying to use this incident to advocate for digital
| totalitarianism.
| superposeur wrote:
| I'm surprised no one has yet noted that Microsoft itself is a
| chief CrowdStrike competitor.
| tonymet wrote:
| i thought crowdstrike provided features that go beyond windows
| defender. is there another MS product that competes?
| superposeur wrote:
| FWIW, here is CrowdStrike's own comparison of features:
|
| https://www.crowdstrike.com/compare/crowdstrike-vs-
| microsoft...
| tonymet wrote:
| Did either release from MS or Crowdstrike explain how this crash
| bypassed QC? I'm still baffled that a 100% repro crash even made
| it anywhere near the later stages of QC. This is something easily
| caught by the earliest CI phases , at the developer and at least
| first build automation phase, let alone human QC.
| magicalhippo wrote:
| From what I read in the previous thread, their test environment
| didn't actually test what was deployed.
|
| That is, there was a post-test pre-distribution packaging
| stage, and that's where the distributed file(s) got f'ed up.
|
| If true that would explain how it got past their testing, but
| would also be an incredible lack of competence IMHO.
|
| But yeah, curious if there's been some more concrete details
| there.
| tonymet wrote:
| I heard something similar. that they deploy content
| separately from code, but they don't test all of the
| combinations of code + content. This crash was from "stable"
| code in the driver mixed with a corrupt or incomplete content
| file (config, etc) , triggering the null-ptr exception .
|
| Sounds like one of those companies where you get hired and
| are shocked by the sausage factory you just stepped into
| rvnx wrote:
| In February they added new code that allows to spy/block
| named pipes.
|
| Named pipes are pipes of communication that processes can
| use to talk to each other, as an alternative to sockets.
|
| For example Chrome uses them between the user interface and
| the actual page renderer.
|
| In March they tested it in staging, said it was fine,
| pushed to prod with few rules in April, still looked fine.
|
| In July they added a new rule, which was deployed to 100%
| immediately, as from their perspective, a new entry in a
| database definition doesn't need testing nor canary deploy
|
| (which is still irresponsible, because bad rules could
| cause damage as well like any security/antivirus software,
| even if the parser didn't crash, but it could have blocked
| legitimate actions or files)
| pas wrote:
| lack of fuzzing for their "parser + updater"
| DeathMetal3000 wrote:
| "Windows has announced a commitment around the Rust programming
| language as part of Microsoft's Secure Future Initiative (SFI)
| and has recently expanded the Windows kernel to support Rust."
___________________________________________________________________
(page generated 2024-07-28 23:02 UTC)