[HN Gopher] No More Blue Fridays
___________________________________________________________________
No More Blue Fridays
Author : moreati
Score : 369 points
Date : 2024-07-22 12:21 UTC (10 hours ago)
(HTM) web link (www.brendangregg.com)
(TXT) w3m dump (www.brendangregg.com)
| xg15 wrote:
| > _In the future, computers will not crash due to bad software
| updates, even those updates that involve kernel code. In the
| future, these updates will push eBPF code._
|
| Assuming every security critical system will be on a recent
| enough kernel to support this...
| efee22 wrote:
| I think with a LTS distribution you should get very far these
| days when it comes to implementing such sensors.
| chasil wrote:
| On rhel8 variants, you can use the Oracle UEK to get eBPF.
|
| https://blogs.oracle.com/linux/post/oracle-linux-and-bpf
| $ cat /etc/redhat-release /etc/oracle-release /proc/version
| Red Hat Enterprise Linux release 8.10 (Ootpa) Oracle
| Linux Server release 8.10 Linux version
| 5.15.0-203.146.5.1.el8uek.x86_64
| (mockbuild@host-100-100-224-48) (gcc (GCC) 11.2.1 20220127
| (Red Hat 11.2.1-9.2.0.1), GNU ld version 2.36.1-4.0.1.el8_6)
| #2 SMP Thu Feb 8 17:14:39 PST 2024
| dijit wrote:
| And assuming there's no bugs in the BPF code...
|
| Oh wait: https://news.ycombinator.com/item?id=41031699
| efee22 wrote:
| RHEL kernel.. right. Imho, I'd trust an upstream stable
| kernel far more than a RHEL one for production which has
| dozen of feature backports and an internal kABI to maintain..
| granted RH has a QA team, but it is still impossible to test
| everything beforehand.
| worthless-trash wrote:
| On the upside, non root users can't insert ebpf code, so
| its a priv'ed operation, not like other distros.
| nequo wrote:
| Isn't it tied to CAP_BPF on every distro since the 5.8
| kernel?
|
| https://mdaverde.com/posts/cap-bpf/
| dredmorbius wrote:
| Considering the number of systems running very obsolete OSes
| these days: WinNT (4x or 3x), Windows, DOS, or various
| proprietary Unixen, stale Linux flavours, etc., etc., ... yes,
| quite.
| usrme wrote:
| Does anyone know how far along the eBPF implementation for
| Windows actually is? In the sense that it could start feasibly
| replacing existing kernel drivers.
| CoastalCoder wrote:
| > If your company is paying for commercial software that includes
| kernel drivers or kernel modules, you can make eBPF a
| requirement.
|
| Are they saying that device drivers should be written in eBPF?
|
| Or maybe their drivers should expose an eBPF API?
|
| I assume _some_ driver code still needs to reside in the actual
| kernel.
| prmoustache wrote:
| These tool wouldn't need kernel drivers, only to target the
| eBPF userspace API:
| https://www.kernel.org/doc/html/latest/userspace-api/ebpf/in...
| asynchronous wrote:
| Is there a reason for the lack of naming+shaming Crowdstrike in
| this blogpost? Was it to not give them any more publicity, good
| or bad?
| StevenWaterman wrote:
| If you consider kernel programming to be inherently unsafe,
| then you would consider this to be inevitable, meaning it's not
| really the specific company's fault. They were just the unlucky
| ones.
| efee22 wrote:
| Agree, Crowdstrike was an unlucky one, but it is more about
| the issue in general. If I remember correctly, also others
| like sysdig user their own kernel modules for collection.
| asynchronous wrote:
| I still hold true that testing even improperly would have
| caught this before it hit worldwide. But I suppose you are
| right, that doesn't help the argument being made here.
| ForOldHack wrote:
| Wasnt that the job of AI/co-pilot/clippy /D.E.P? "Would you
| like me to try and execute a random blank file?"
|
| And of course QA.
|
| I was unaffected, but was fielding calls from customers.
|
| My update Tuesday is the week after, so in-between MS and
| my updates, I am very suspicious of everything.
|
| I was also unaffected by 22H2, and spent time fielding
| calls.
| lordnacho wrote:
| They could have helped their luck by doing some of the common
| sense things suggested in the article.
|
| For instance, why not find a subset of your customers that
| are low risk, push it out to them, and see what happens? Or
| perhaps have your own fleet of example installations to run
| things on first. None of which depends on any specific
| technology.
| hello_moto wrote:
| "find a subset of low risk customers" and use them as test
| subject?
|
| Repeat that a few times to understand the repercussions.
|
| If I were the customers and I found out that I was used as
| test subject, how would I feel?
| whynotminot wrote:
| Canary deployments are already an industry accepted
| practice and it's shocking Crowdstrike apparently doesn't
| do them.
| hello_moto wrote:
| Which industry? Cybersecurity or Cloud software?
| whynotminot wrote:
| Any industry that wants to reliably deliver software that
| doesn't brick systems at scale? I'm confused by your
| question.
|
| Are you telling me the cybersecurity scene is special and
| shouldn't follow best practices for software deployment?
| hello_moto wrote:
| Canary deployment for subset of Salesforce customers
| won't see much of revolt from customers compare to AV
| definition rollout (not software, but AV definition) in
| Cybersecurity where gaps between 0day and rollout means
| you're exposed.
|
| If customers found out that some are getting roll out
| faster than the others, essentially splitting the group
| into 2, there will be a need for customer opt-in/opt-out.
|
| If everyone is opting-out because of Friday, your Canary
| deployment becomes meaningless.
|
| Any proof that other Cybersecurity vendors do Canary
| deployment for their AV definition? :)
|
| PS: not to say that the company should test more
| internally...
| whynotminot wrote:
| Canary deployment doesn't necessarily mean massive gaps
| between deployment waves. You can fast-follow. Sure,
| there may be scenarios with especially severe
| vulnerabilities where time is of the essence. I'm out of
| the loop if this crowdstrike update was such a scenario
| where best practices for software deployment were worth
| bypassing.
|
| If this is just how they roll with regular definition
| updates, then their deployment practices are garbage and
| this kind of large scale disaster was inevitable.
| hello_moto wrote:
| Let's walk this through: Canary deployment to Windows
| machines. If those Windows machines got hit with BSOD,
| they will go offline. How do you determine if they go
| offline because of Canary or because of regular
| maintenance by the customer's IT cycle?
|
| You can guess, but you cannot be 100% sure.
|
| What if the targeted canary deployments are Employees
| desktops that are OFFLINE during the time of rollout?
|
| >I'm out of the loop if this crowdstrike update was such
| a scenario where best practices for software deployment
| were worth bypassing.
|
| I did post a question: what about other Cybersecurity
| vendors? Do you think they do canary deployment on their
| AV definitions?
|
| Here's more context to understand Cybersecurity:
| https://radixweb.com/blog/what-is-mean-time-to-detect
|
| Cybersecurity companies participate in Sec evaluation
| annually that evaluates (measure) and grade their
| performance. That grade is an input for Organizations to
| select vendors outside their own metrics/measurements.
|
| I don't know if MTTD is included in the contract/SLA. If
| it does, you got some answer as to why certain decision
| is made.
|
| It's definitely interesting to see Software developers of
| HN giving out their 2c for a niche Cybersecurity
| industry.
| whynotminot wrote:
| > You can guess, but you cannot be 100% sure.
|
| I worked in the cyber security space for a decent chunk
| of my career, and the most frustrating part was cyber
| security engineers thinking their problems were unique
| and being completely unaware of the lessons software
| engineering teams have already learned.
|
| Yes, you need to tune your canary deployment groups to be
| large and diverse enough to give a reliable indicator of
| deployment failure, while still keeping them small enough
| that they achieve their purpose of limiting blast radius.
|
| Again, if you follow industry best practices for software
| deployment, this is already something that should be
| considered. This is a relatively solved problem -- this
| is not new.
|
| > I did post a question: what about other Cybersecurity
| vendors? Do you think they do canary deployment on their
| AV definitions?
|
| I think that question is being asked right now by every
| company using Crowdstrike -- what vendors are actually
| doing proper release engineering and how fast can we
| switch to them so that this never happens to us again?
| hello_moto wrote:
| >if you follow industry best practices for software
| deployment, this is already something that should be
| considered. This is a relatively solved problem -- this
| is not new.
|
| You have to ask the customer if they're okay with that
| citing "our software might failed and brick your
| machine".
|
| I'd like to see any Sales and Marketing folks say that ;)
|
| > I think that question is being asked right now by every
| company using Crowdstrike -- what vendors are actually
| doing proper release engineering and how fast can we
| switch to them so that this never happens to us again?
|
| Uber valid question and this BSOD incident might be a
| turning point for customers to pay up more for their IT
| infrastructure.
|
| It's like: previously Cybersecurity vendors are shy to
| ask customers to setup Canary systems because that's just
| "one-more-thing-to-do". After BSOD: customers will
| smarten up and do it without being asked and to the point
| where they would ask Vendors to _support_ that type of
| deployment (unless they continue to be cheap and lazy).
| whynotminot wrote:
| > You have to ask the customer if they're okay with that
| citing "our software might failed and brick your
| machine".
|
| I think you're still missing the point of Canary
| deployments. The question your sales team should ask is
| "would you like a 5% chance of a bug harming your system,
| or a 100% chance?"
|
| > It's like: previously Cybersecurity vendors are shy to
| ask customers to setup Canary systems because that's just
| "one-more-thing-to-do"
|
| You should by shy because it is not your customer's job
| to set up canary deployments. Crowdstrike owns the
| software and the deployment process. They should be
| deploying to a subset of machines, measuring the results,
| and deciding whether to roll forward or roll back. It is
| not the customers job to implement good release
| engineering controls for Crowdstrike (although after this
| debacle you may well see customers try).
| hello_moto wrote:
| If you refer Canary deployment as the vendor's internal
| deployment? I definitely agree.
|
| What I find it hard is those in Software that suggested
| to roll it to a few customers first because this isn't
| cloud deployment doing A/B test when it comes to Virus
| Definition.
|
| Customers must know what's going on when it comes to
| virus definition and the implication of them whether
| they're being part of the rollout group or not.
| whynotminot wrote:
| > If you refer Canary deployment as the vendor's internal
| deployment? I definitely agree.
|
| No, I'm talking about external deployment to customers.
| They clearly also had a massive failure in their internal
| processes too, since a bug this egregious should never
| make it to the release stage. But that is not what I am
| talking about right now.
|
| > What I find it hard is those in Software that suggested
| to roll it to a few customers first because this isn't
| cloud deployment doing A/B test when it comes to Virus
| Definition.
|
| I don't care what you're releasing to customers--
| application binary, configuration change, virus
| definition, etc, if it has the chance of doing this much
| damage it must be deployed in a controlled, phased way.
| You cannot 100% one-shot deploy any change that has the
| potential to boot-loop a massive amount of systems like
| this. This current process is unacceptable.
|
| > Customers must know what's going on when it comes to
| virus definition and the implication of them whether
| they're being part of the rollout group or not.
|
| Who says they don't have to know? Telling your customers
| that an update is planned and giving them a time window
| for their update seems reasonable to me.
| hello_moto wrote:
| If it's virus defn, what's the process here?
|
| * 0day is happening
|
| * Cybersecurity vendors preparing virus definition
|
| * Vendors send update => new virus definition is about to
| go down in 1 hour, get ready.
|
| Folks are asleep, nobody reads it?
|
| Let's say now let's do Canary: let's deploy to a few
| customers (this is unclear how this started: should this
| be opt-in? opt-out?)
|
| Some customers got it, others... who knows, unclear what
| the processes are here.
|
| Between here and there, 0day exploited customers because
| AV defn is not there. What now?
|
| I'm not sure how this plays out tbh.
| lordnacho wrote:
| > If I were the customers and I found out that I was used
| as test subject, how would I feel?
|
| In reality, every business has relationships that it
| values more than others. If I wasn't paying a lot for it,
| and if I was running something that wasn't critical (like
| my side project) then why not? You can price according to
| what level of service you want to provide.
| hello_moto wrote:
| Customers will ask to opt-out.
| ahtihn wrote:
| Customers will _pay_ to opt out.
| gtsop wrote:
| Why even do that? We have virtualization, they could
| emulate real clients and networks of clients. This
| particular bug would have been prevented for sure
| lordnacho wrote:
| Yeah I thought maybe the VM thing might not catch the bug
| for some reason, but it seems like the natural thing to
| do. Spin up VM, see if there's a crash. I heard the
| technical reason had something to do with a file being
| full of nulls, but that sort of thing you should catch.
|
| Honestly, the most generous excuse I can think of is that
| CS were informed of some sort of vulnerability that would
| have profound consequences immediately, and that
| necessitated a YOLO push. But even that doesn't seem too
| likely.
| brendangregg wrote:
| Right, and we wanted to talk about all security solutions and
| not make this about one company. We also wanted to avoid
| shaming since they have been seriously working on eBPF
| adoption, so in that regard they are at the forefront of
| doing the right thing.
| hiddencost wrote:
| I think the article isn't about crowd strike. It's about ebpf.
| pimlottc wrote:
| The second paragraph is 100% about Crowdstrike. It even links
| to the Wikipedia article:
|
| https://en.m.wikipedia.org/wiki/2024_CrowdStrike_incident
| hiddencost wrote:
| CrowdStrike is mentioned, but the goal of the article is to
| promote eBPF. CrowdStrike is tangentially related because
| it draws attention to a platform that Gregg has put a lot
| into.
| kayo_20211030 wrote:
| This isn't right. If I need a system to run _with_ a piece of
| code, then it shouldn 't run at all if that piece of code is
| broken. Ignoring the failure is perverse. Let's say that the
| driver code ensures that some medical machine has safety locks
| (safeguards) in place to make sure that piece of equipment won't
| fry you to a crisp; I'd prefer that the whole thing not run at
| all rather than blithely operate with the safeguards disabled.
| It's turtles all the way down.
| Smaug123 wrote:
| I think the premise is false? It's up to the eBPF implementor
| what to do in the case of invalid input; the kernel could
| choose to perform a controlled shutdown in that case. (I have
| no idea what e.g. Linux actually does here, but one could
| imagine worlds where the action it takes on invalid input is
| configurable.)
|
| Also your statement is _sometimes_ not true, although I
| certainly sympathise in the mainline case. In some contexts you
| really do need to keep on trucking. The first example to spring
| to mind is "the guidance computers on an automated Mars
| lander"; the round-trip to Earth is simply too long to defer
| responsibility in that case. If you shut down then you _will_
| crash, but if you do your best from a corrupted state then you
| merely _probably_ crash, which is presumably better.
| umanwizard wrote:
| > I have no idea what e.g. Linux actually does here
|
| If you attempt to load an eBPF program that the verifier
| rejects, the syscall to load it fails with EINVAL or E2BIG.
| What your user-space program then does is up to you, of
| course.
| phartenfeller wrote:
| The medical machine software should just refuse to run with an
| error message if a critical driver was not loaded. The OS
| bricking is causing way more trouble where an IT technician now
| needs to fix something where it otherwise would just be
| updating the faulty driver... Also does your car not start if
| you are missing water for the wiper?
| jve wrote:
| Water for the wiper is userland feature.
|
| 3rd party hooking into kernel is 3rd party responsibility. It
| is like equipping your car with LPG - THAT hooks into engine
| (kernel). And When I had a faulty gas pressure sensor then my
| car actually halted (BSOD if you will) instead of
| automatically failing over to gasoline as it is by design.
|
| You can argue that car had no means to continue execution but
| kernel has, however invalid kernel state can cause more
| corruption down the road. Or as parent even points out -
| carry out lethal doses of something.
| pinebox wrote:
| Initially I was inclined to disagree ("these things should
| always fail safe") however with more and more stuff being
| pushed into the kernel it's hard to say that you're wrong
| or exactly where a line needs to be drawn between
| "minimally functional system" and "dangerously out of
| control system".
|
| I think until we discover a technology that forces
| commercial software vendors to employ functioning QA
| departments none of this will really solve anything.
| ChrisMarshallNY wrote:
| _> Ignoring the failure is perverse._
|
| If the failed system is a security module, I think that's
| absolutely correct. If the system runs, without the security
| module, well, that's like forgetting to pack condoms on Shore
| Leave. You'll likely be bringing something back to the ship
| with you.
|
| _Someone_ needs to be testing the module, and the enclosing
| system, to make sure it doesn 't cause problems.
|
| I suspect that it got a great deal of automated unit testing,
| but maybe not so much fuzz and monkey (especially "Chaos
| Monkey"-style) testing.
|
| It's a fuzzy, monkey-filled world out there...
| kayo_20211030 wrote:
| Interesting analogy, but yes. If the module *is* necessary,
| well, it's necessary and nothing should work without it.
| Testing must have been a mess here.
| __MatrixMan__ wrote:
| I like how Unison works for this reason. You call functions by
| cryptographic hash, so you have some assurance that you're
| calling the same function you called yesterday.
|
| Updates would require the caller to call different functions
| which means putting the responsibility in the hands of the
| caller, where it should be, instead of on whoever has a side
| channel to tamper with the kernel.
|
| You end up with the work-perfectly-or-not-at-all behavior that
| you're after because if the function that goes with the
| indicated hash is not present, you can't call it, and if it is
| present you can't call it in any way besides how it was
| intended
| enragedcacti wrote:
| I agree that some system components should be treated as
| critical no matter what, but the software at issue in this case
| (Falcon Sensor or Antivirus more generally) is precautionary
| and only best effort anyways. I would wager the vast majority
| of the orgs affected on Friday would have preferred the
| marginally increased risk of a malware attack or unauthorized
| use over a 24 hour period instead of the total IT collapse they
| experienced. Further, there's no reason the bug HAD to cause a
| BSOD, it's possible the systems could have kept on trucking but
| with an undefined state and limitless consequences. At least
| with eBPF you get to detect a subset of possible errors and
| make a risk management decision based on the result.
| kayo_20211030 wrote:
| I'm with you. What's critical, and what's not? Is it a big
| thing, or not a big thing? Is this particular machine more
| critical than the one over there? Security systems need to be
| at the lowest level, or else some shifty bastard will find a
| path around them. If it's at the lowest level, the downside
| of a failure is catastrophic, as we experienced last Friday.
| The carnage here is ultimately on CrowdStrike. The testing
| must have been slapdash at best, and missing at worst. eBPF
| changes nothing. The question is: should we fail, or carry
| on? eBPF doesn't help with that decision, it only determines
| the outcome from a system perspective. Any decision is a
| value judgement; it might be right or wrong, and its outcome
| either benign or deadly. Choices!
| emn13 wrote:
| The system clearly already behaves that way (i.e. ignores
| failure) - after all, the fix was to simply delete the
| offending file. If that's an option, then loader can do that
| too. It can and perhaps even is smarter, such as "fallback onto
| previous version".
|
| Furthermore, the reaction to a malformed state need not be
| "ignore". It could disable restricted user login; or turn off
| the screen.
|
| If the worry is that this is viable to abuse by malware, well,
| if the malware can already rewrite the on-disk files for the
| AV, I wonder whether it's really a good idea to trust the
| system itself to be able to deal with that. It'd probably be
| safer to just report that up the security foodchain, and
| potentially let some external system take measures such as
| disable or restrict network access. Better yet, such measures
| don't even require the same capabilities to intervene in the
| system, merely to observe - which makes the AV system less
| likely to serve as a malware vector itself or to cause bugs
| like this.
| shrx wrote:
| From the article:
|
| > If the verifier finds any unsafe code, the program is rejected
| and not executed. The verifier is rigorous -- the Linux
| implementation has over 20,000 lines of code [0] -- with
| contributions from industry (e.g., Meta, Isovalent, Google) and
| academia (e.g., Rutgers University, University of Washington).
|
| [0] links to
| https://github.com/torvalds/linux/blob/master/kernel/bpf/ver...
| which has this interesting comment at the top:
| /* bpf_check() is a static code analyzer that walks eBPF program
| * instruction by instruction and updates register/stack state.
| * All paths of conditional branches are analyzed until 'bpf_exit'
| insn. * * The first pass is depth-first-search
| to check that the program is a DAG. * It rejects the
| following programs: * - larger than BPF_MAXINSNS insns
| * - if loop is present (detected via back-edge) ...
|
| I haven't inspected the code, but I thought that checking for
| infinite loops would imply solving the halting problem. Where's
| the catch?
| dtx1 wrote:
| I have no insight into this particular project but you could
| work around the halting problem by only allowing loops you can
| proof will not go infinite. That would of course imply
| rejecting loops that won't go infinite but can't be proven not
| to.
| hiddencost wrote:
| Unterminated loops might be a better phrasing.
| efee22 wrote:
| Infinite loops are not possible and would get rejected by the
| verifier since it cannot solve the halting problem. Here is a
| good overview on the options available: https://ebpf-
| docs.dylanreimerink.nl/linux/concepts/loops/
| skywhopper wrote:
| If the verifier can't determine that the loop will halt, the
| program is disallowed. Also, if the program gets passed and
| then runs too long anyway, it's force-halted. So... I guess
| that solves the halting problem.
| neaanopri wrote:
| It's more accurate to say that in principle, there could be
| programs that would halt, but that the verifier will deny.
| lucianbr wrote:
| So this "solves" the halting problem by creating a new class
| "might-not-halt-but-not-sure" and lumping it with "does-not-
| halt". I find it hard to believe the new class is small
| enough for this to be useful, in the sense that it will avoid
| all kernel crashes.
|
| I rather expect useful or needed code would be rejected due
| to "not-sure-it-halts", and then people will use some kind of
| exception or not use the verifier at all, and then we are
| back to square one.
| umanwizard wrote:
| Well it is useful in practice, there are some pretty useful
| products based on eBPF on Linux, most notably Cilium (and,
| shameless plug for the one I'm working on: Parca, an eBPF-
| based CPU profiler).
| lucianbr wrote:
| Bad wording on my part, and I still don't know how to
| word it better. I'm sure this thing is useful, I don't
| think everyone who contributed code was just clueless.
|
| However, the claim "in the future, computers will not
| crash due to bad software updates, even those updates
| that involve kernel code" must be false. There is no way
| it is true. Whatever Cilium is, I cannot believe it
| generally prevents kernel crashes.
| umanwizard wrote:
| Correct, you will never be able to write any possible
| arbitrary code and have it run in eBPF. It necessarily
| constrains the class of programs you can write. But the
| constrained set is still quite useful and probably
| includes the crowdstrike agent.
|
| Also, although this isn't the case now, it's possible to
| imagine that the verifier could be relaxed to allow a
| Turing-complete subset of C that supports infinite loops
| while still rejecting sources of UB/crashes like
| dereferencing an invalid pointer. I suspect from reading
| this post that that is the future Mr. Gregg has in mind.
|
| > Whatever Cilium is, I cannot believe it generally
| prevents kernel crashes.
|
| It doesn't magically prevent all kernel crashes from
| unrelated code. But what we can say is that Cilium itself
| can't crash the kernel unless there are bugs in the eBPF
| verifier.
| lucianbr wrote:
| If the verifier allowed a Turing-complete language, it
| would solve the halting probem, which is impossible.
| umanwizard wrote:
| My point is that the verifier could be relaxed to accept
| programs that never halt, thus not needing to solve the
| halting problem. You could then have the kernel just kill
| it after running over a certain maximum amount of time.
| lucianbr wrote:
| Why do you think the kernel crashes when crowdstrike
| attempts to reference some unavailable address (or
| whatever it does) instead of just denying that operation
| and continuing on? That would be the solution using this
| philosophy "just kill long running program". And no need
| for eBPF or anything complicated. But it doesn't work
| that way in practice.
|
| This is just such a naive view. "We can prevent programs
| from crashing by just taking care to stop them when they
| do bad things". Well, sure, that's why you have a kernel
| and userland. But it turns out, some things need to run
| in the kernel. Or "just deny permission". Then it turns
| out some programs need to run as admin. And so on.
|
| There is a generality in the halting problem, and saying
| "we'll just kill long runing programs" just misses the
| point entirely.
|
| Likely what will happen is that you will kill useful
| long-running programs, then an exception mechanism will
| be invented so some programs will not be killed, because
| they need to run longer, then one of those programs will
| go into an infinite loop despite all your mechanisms
| preventing it. Just like the crowdstrike driver managed
| to bring down the OS despite all the work that is
| supposed to prevent the entire computer crashing if a
| single program tries something stupid.
| umanwizard wrote:
| > Why do you think the kernel crashes when crowdstrike
| attempts to reference some unavailable address (or
| whatever it does) instead of just denying that operation
| and continuing on?
|
| Linux and windows are completely monolithic kernels; the
| crowdstrike agent isn't running in a sandbox and has
| complete unfettered access to the entire kernel address
| space. There is no separate "the kernel" to detect when
| the agent does something wrong; once a kernel module is
| loaded, IT IS the kernel.
|
| Lots of people have indeed realized this is undesirable
| and that there should be a sandboxed way to run kernel
| code such that bugs in it can't cause arbitrarily bad
| undefined behavior. Thus they invented eBPF. That's
| precisely what eBPF is.
|
| I don't know whether it's literally true that someday you
| will be able to write all possibly useful kernel-mode
| code in eBPF. But the spirit of the claim is true:
| there's a huge amount of useful software that could be
| written in eBPF today on Linux instead of as kernel
| modules, and this includes crowdstrike. Thus Windows
| supporting eBPF, and crowdstrike choosing to use it,
| would have solved this problem. That set of software will
| increase as the eBPF verifier is enhanced to accept a
| wider variety of programs.
|
| Just like you can write pretty much any useful program in
| JavaScript today -- a sandboxed language.
|
| You're also correct that due to the halting problem,
| we'll either have to accept that eBPF will never be
| Turing complete, OR accept that some eBPF programs will
| never halt and deal with the issues in other ways. Just
| like Chrome's JavaScript engine has to do. I don't really
| view this as a fundamentally unsolvable issue with the
| nature of eBPF.
| tptacek wrote:
| The claim isn't that eBPF generally prevents kernel
| crashes. It's that it prevents crashes in the subset of
| programs it's designed for, in particular for
| instrumentation, which Crowdstrike is (in this author's
| conception) an instance of.
| lucianbr wrote:
| I have quoted the claim verbatim from the article. It is
| obviously the claim of the article.
| tptacek wrote:
| It's referring to _Windows security software_. If you
| have a lot of context with eBPF, which Gregg obviously
| does, the notion that eBPF will subsume the entire kernel
| doesn 't even need to be said: you can't express
| arbitrary programs in eBPF. eBPF is safe because the
| verifier rejects the vast majority of valid programs.
| tptacek wrote:
| Lots of useful code is rejected due to "not-sure-it-halts".
| That's the premise.
| pkhuong wrote:
| The basic logic flags _any_ loop ( "back-edge").
| rezonant wrote:
| This, others have said it less concisely, but a program
| without loops and arbitrary jumps is guaranteed to halt if we
| assume the external functions it calls into will halt.
| atrus wrote:
| The halting problem is exhaustive, there isn't an algorithm
| that is valid for all programs. You can still check for some
| kinds of infinite loops though!
| roywiggins wrote:
| More specifically, you can accept a set of programs that you
| are certain do halt, and reject all others, at the expense of
| rejecting some that will halt. As long as that set is large
| enough to be practical, the result can be useful. If you eg
| forbid code paths that jump "backwards", you can't really
| loop at all. Or require loops to be bounded by constants.
| aksdlf wrote:
| I'm glad to hear that Meta and Google code is "rigorous". I'd
| prefer INRIA, universities that fund theorem provers,
| industries where correctness matters like aerospace or
| semiconductors.
| chc4 wrote:
| Windows doesn't use the Linux eBPF verifier, they have their
| own implementation named PREVAIL[0] that is based on an
| abstract interpretation model that has formal small step
| semantics. The actual implementation isn't formally proven,
| however.
|
| 0: https://github.com/vbpf/ebpf-verifier
| auspiv wrote:
| Correctness as defined by Boeing? Or another definition?
|
| "The Maneuvering Characteristics Augmentation System (MCAS)
| is a flight stabilizing [software] feature developed by
| Boeing that became notorious for its role in two fatal
| accidents of the 737 MAX in 2018 and 2019, which killed all
| 346 passengers and crew among both flights."
|
| https://en.wikipedia.org/wiki/Maneuvering_Characteristics_Au.
| ..
|
| "The Boeing Orbital Flight Test (OFT) was an uncrewed orbital
| flight test launched on December 20, 2019, but after
| deployment, an [incorrect] 11-hour offset in the mission
| clock of Starliner caused the spacecraft to compute that "it
| was in an orbital insertion burn", when it was not. This
| caused the attitude control thrusters to consume more fuel
| than planned, precluding a docking with the International
| Space Station.[79][80]"
|
| [79] https://spacenews.com/starliner-suffers-off-nominal-
| orbital-... "Starliner suffers "off-nominal" orbital
| insertion after launch". SpaceNews. December 20, 2019.
| Archived from the original on June 6, 2024. Retrieved
| December 20, 2019.
|
| [80] https://www.cnbc.com/2019/12/20/boeings-starliner-flies-
| into... Sheetz, Michael (December 20, 2019). "Boeing
| Starliner fails mission, can't reach space station after
| flying into wrong orbit". CNBC. Archived from the original on
| February 8, 2021. Retrieved December 20, 2019.
| SoftTalker wrote:
| Also that lines of code is a proxy for rigor, something new I
| learned today. /s
| sunnyps wrote:
| I think they mean that the code base is small enough to be
| audited thoroughly. Maybe they should reword it to be
| clearer.
| umanwizard wrote:
| eBPF is not Turing complete. Writing it is very annoying
| compared to writing normal C code for exactly this reason.
| Retr0id wrote:
| The halting problem cannot be solved in the general case, but
| in many cases you _can_ prove that a program halts. eBPF only
| allows verifiably-halting programs to run.
| lolinder wrote:
| I'm not able to comment on what this code is doing, but as for
| the theory:
|
| The halting problem is only unsolvable in the general case. You
| cannot prove that any arbitrary piece of code will stop, but
| you can prove that specific types of code will stop and reject
| anything that you're unable to prove. The trivial case is "no
| jumps"--if your code executes strictly linearly and is itself
| finite then you know it will terminate. More advanced cases can
| also be proven, like a loop over a very specific bound, as long
| as you can place constraints on how the code can be structured.
|
| As an example, take a look at Dafny, which places a lot of
| restrictions on loops [0], only allowing the subset that it can
| effectively analyze.
|
| [0] https://ece.uwaterloo.ca/~agurfink/stqam/rise4fun-
| Dafny/#h25
| jkrejcha wrote:
| Adding on (and it's not terribly relevant to eBPF), it's also
| worth noting that there are trivial programs you can prove
| DON'T halt.
|
| A trivial example[1]: int main() {
| while (true) {} int x = foo(); return
| x; }
|
| This program trivially runs forever[2], and indeed many
| static code analyzers will point out that everything after
| the `while (true) {}` line is unreachable.
|
| I feel like the halting problem is incredibly widely
| misunderstood to be similar to be about "ANY program" when it
| really talks about "ALL programs".
|
| [1]: In _C++_ , this is undefined behavior technically, but C
| and most other programming languages define the behavior of
| this (or equivalent) function.
|
| [2]: Fun relevant xkcd: https://xkcd.com/1266/
| fwip wrote:
| EDIT: I am incorrect, please ignore. (Original text below,
| for posterity).
|
| Nit: In many languages, doesn't this depend on what foo()
| does? e.g: foo() { exit(0); }
| loeg wrote:
| No? The foo() invocation is never reached because the
| while loop never terminates.
| fwip wrote:
| Apologies; I misread the function call as being inside
| the loop.
| dathinab wrote:
| the halting problem is only true for _arbitrary_ programs
|
| but there are always sets of programs for which it is clearly
| possible to guarantee their termination
|
| e.g. the program `return 1+1;` is guaranteed to halt
|
| e.g. given program like `while condition(&mut state) { ... }`
| with where `condition()` is guaranteed to halt but otherwise
| unknown is not guaranteed to halt, but if you turn it into `for
| _ in 0..1000 { if !condition(&mut state) { break; } ... }` then
| it is guaranteed to halt after at most 1000 iterations
|
| or in other words eBPF only accepts programs which it can proof
| will halt in at most maxins "instruction" (through it's more
| strict then my example, i.e. you would need to unroll the for-
| loop to make it pass validation)
|
| the thing with programs which are provable halting is that they
| tend to also not be very convenient to write and/or quite
| limited in what you can do with them, i.e. they are not
| suitable as general purpose programming languages at all
| red_admiral wrote:
| eBPF is not Turing-complete, I suppose.
| lizxrice wrote:
| In this talk we demo Conway's Game of Life implemented in
| eBPF: https://www.youtube.com/watch?v=tClsqnZMN6I
| lizxrice wrote:
| I should clarify that individual eBPF programs have to
| terminate, but more complex problems can be solved with
| multiple eBPF programs, and can be "scheduled" indefinitely
| using BPF timers
| javierhonduco wrote:
| It is not, programs that are accepted are proved to
| terminate. Large and more complex programs are accepted by
| BPF as of now, which might give the impression that it's now
| Turing complete, when it is definitely not the case.
| skywhopper wrote:
| The implicit assumption of the article is that eBPF code can't
| crash a kernel, but the article itself eventually admits that it
| can and has done, including last month. eBPF is a safer way of
| providing kernel-extension functionality, for sure, but
| presenting it as the perfect solution is just asking to have your
| argument dismissed. eBPF is not perfect. And there's plenty of
| things it can't do. The very sandbox rules that limit how long
| its programs may run and what they can do also make it entirely
| inappropriate for certain tasks. Let's please stop pretending
| there's a silver bullet.
| efee22 wrote:
| It's not a silver bullet, however, it is still better to
| pushing all the panicable bugs into one community-maintained
| section (e.g. eBPF verifier). All vendors have an incentive to
| help get right and this is much better than every vendor
| shipping their own panicable bugs in their own out of tree
| kernel modules. Additionally, it's not just the industry
| looking at eBPF, but also academia in terms of formally
| verifying these critical sections.
| lucianbr wrote:
| "Improves kernel stability" is great. "Prevents kernel
| crashes" is a plain lie.
|
| > In the future, computers will not crash due to bad software
| updates, even those updates that involve kernel code.
|
| Come on. Computers will continue to crash in the future, even
| when using eBPF. I am quite certain.
| lucianbr wrote:
| It's casually claiming to have solved the halting problem, at
| least within some limited but useful context. That should be
| impossible, and it turns out, it is.
|
| I expect it can be solved within some limited contexts, but
| those contexts are not useful, at least not at the level of
| "generic kernel code".
| red_admiral wrote:
| It solves the halting problem by not being Turing complete. I
| presume each eBPF runs in a context with bounded memory,
| requested up front, for one thing; it also disallows jumps
| unless you can prove the code still halts.
| michaelt wrote:
| eBPF started out as Berkeley Packet Filters. People wanted to
| be able to set up complex packet filters. Things like 'udp
| and src host 192.168.0.3 and udp[4:2]=0x0034 and
| udp[8:2]=0x0000 and udp[12]=0x01 and udp[18:2]=0x0001 and not
| src port 3956'
|
| So BPF introduced a very limited bytecode, which is complex
| enough that it can express long filters with lots of
| and/or/brackets - but which is limited enough it's easy to
| check the program terminates and is crash-free. It's still
| quite limited - prior to ~2019, all loops had to be fully
| unrolled at compile time as the checker didn't support loops.
|
| It turned out that, although limited, this worked pretty well
| for filtering packets - so later, when people wanted a way to
| filter all system calls they realised they could extend the
| battle-tested BPF system.
|
| Nobody is claiming to have solved the halting problem.
| lucianbr wrote:
| Did you read the article? It says computers will not crash
| in the future due to updates. It literally says that in the
| very first line of the article.
|
| > In the future, computers will not crash due to bad
| software updates, even those updates that involve kernel
| code. In the future, these updates will push eBPF code.
|
| What you are claiming is completely different. A kind of
| "firewall" for syscalls. But updates to drivers and
| software must contain code and data. The author is not
| talking about updates to the firewall between drivers and
| the kernel, they talk about updating drivers themselves. It
| literally says "updates that involve kernel code". Will the
| kernel only consist of eBPF filtering bytecode? How could
| that possibly work?
| vfclists wrote:
| Yep, another fix to all our problems, a new bandwagon to be
| jumped on by wall EDR vendors, until ...
|
| Here I am using the term "EDR". Until this CrowdStrike debacle
| I'd never heard it.
|
| Only tells how seriously you should take my opinions.
| blinkingled wrote:
| Ok. But the good old push code to staging / canary it before
| mainstream updates was a simpler way of solving the same problem.
|
| Crowdstrike knows the computers they're running on, it is trivial
| to implement a system where only few designated computers
| download and install the update and report metrics before the
| update controller decides to push it to next set.
| Archelaos wrote:
| It would mitigate the problem, but not solve it. You can still
| imagine a condition that only occurs after the update has been
| rolled out everywhere. Furthermore, such a bug would still be
| extremely problematic for the concerned customers, even if not
| all of them were affected. In addition, it would be necessary
| to react very quickly in the case of zero-day vulnerabilities.
| tantalor wrote:
| (semantic argument warning)
|
| "Mitigation" is dealing with an outage/breakage after it
| occurs, to reduce the impact or get system healthy again.
|
| You're talking about "prevention" which keeps it from
| happening at all.
|
| Canarying is generic approach to prevention, and should not
| be skipped.
|
| Avoiding the risk entirely (eBPF) would also help prevent
| outage, but I think we're deluding ourselves to say it
| "solves" the problem once and for all; systems will still go
| down due to bad deploys.
| blinkingled wrote:
| Yes, I am not arguing against having the ability to deal with
| it quickly - I am saying canary/ staging helps you do exactly
| that. Because as we see in the case of Intel CPUs and
| Crowdstrike some problems or scale of some problems is best
| prevented.
| phartenfeller wrote:
| Why trust somebody else not messing up? With that in place for
| windows and crowdstrike billions of dollars would be saved and
| many lives not negatively impacted ...
| mrpippy wrote:
| > Once Microsoft's eBPF support for Windows becomes production-
| ready, Windows security software can be ported to eBPF as well.
|
| This doesn't seem grounded in reality. If you follow the link to
| the "hooks" that Windows eBPF makes available [1], it's just for
| incoming packets and socket operations. IOW, MS is expecting you
| to use the Berkeley Packet Filter for packet filtering. Not for
| filtering I/O, or object creation/use, or any of the other
| million places a driver like Crowdstrike's hooks into the NT
| kernel.
|
| In addition, they need to be in the kernel in order to monitor
| all the other 3rd party garbage running in kernel-space. ELAM
| (early-launch anti-malware) loads anti-malware drivers first so
| they can monitor everything that other drivers do. I highly doubt
| this is available to eBPF.
|
| If Microsoft intends eBPF to be used to replace kernel-space
| anti-malware drivers, they have a long, long way to go.
|
| [1]: https://microsoft.github.io/ebpf-for-
| windows/ebpf__structs_8...
| shahahqq wrote:
| I hope though that Microsoft will double down on their eBPF
| support for Windows after this incident.
| stackskipton wrote:
| Doubt it. Microsoft is clearly over Windows. They continue to
| produce it but every release feels like "Ugh, fine, since you
| are paying me a ton of money."
|
| Internally, Microsoft is running more and more workloads on
| Linux and externally, I've had .Net team tell me more than
| once that Linux is preferred environment for .Net. SQL Server
| team continues to push hard for Linux compatibility with
| every release.
|
| EDIT: Windows Desktop gets more love because they clearly see
| that as important market. I'm talking more Windows Server.
| kevincox wrote:
| They aren't over windows. They continue to be incredibly
| interested in and actively developing how much money they
| can suck from their users. Especially via various forms of
| ads.
|
| But yeah, kernel features are few and far between.
| rob74 wrote:
| See also: https://en.wikipedia.org/wiki/Cash_cow
| queuebert wrote:
| I believe the term you are looking for is "rent seeking".
| Other than visual changes, what new functionality does
| Windows 11 actually have that Windows XP didn't have?
| (I'm being generous with XP, because actually 95 was
| already mostly internet ready.) Yet how many times have
| many of us paid for a Windows license on a new computer
| or because the old version stopped getting updates?
| pcwalton wrote:
| > Other than visual changes, what new functionality does
| Windows 11 actually have that Windows XP didn't have?
|
| Off the top of my head, limiting myself to just NT kernel
| stuff: WSL and Hyper-V, pseudo-terminals, condvars, WDDM,
| DWM, elevated privilege programs on the same desktop,
| font driver isolation, and limiting access to win32k for
| sandboxing.
| recursive wrote:
| > what new functionality does Windows 11 actually have
| that Windows XP didn't have? (
|
| Off the top of my head, built-in bluetooth support, an
| OS-level volume mixer, and more support for a wider
| variety of class-compliant devices. I'm sure there are a
| lot more, and if you actually care about the answer, I
| don't think it would be hard to find.
| queuebert wrote:
| All of this could've been added to XP, right?
| recursive wrote:
| I don't know.
|
| If it could, Then XP would just be Windows 11. What's the
| objection here.
| vitus wrote:
| > Other than visual changes, what new functionality does
| Windows 11 actually have that Windows XP didn't have?
|
| Modern crypto ciphersuites that aren't utterly broken?
| Your best options for symmetric crypto with XP are 3DES
| (officially retired by NIST as of this year) and RC4
| (prohibited in TLS as of RFC 7465).
|
| (And if you think 3DES isn't totally broken by itself,
| you're right... except for the part where the ciphersuite
| in question is in CBC mode and is vulnerable to BEAST.
| Thanks, mandated ciphersuites.)
| wolrah wrote:
| > Other than visual changes, what new functionality does
| Windows 11 actually have that Windows XP didn't have?
|
| XP->Vista alone brought a bunch of huge changes that
| massively improved security (UAC), capability (64 bit
| desktops), and future-proofing (UEFI) among many many
| other things.
|
| Some helpful Wikipedia editors have answered this
| question in excessive detail, so I'm just going to link
| those for more info. Also I'm going to start with what XP
| changed from 2003 both because it makes a good comparison
| and I'd argue 2000/NT 5.0 is the root of the modern
| Windows era. Your next sentence after the quote implies
| you probably won't have a problem with that.
|
| * XP/2003:
| https://en.wikipedia.org/wiki/Features_new_to_Windows_XP
|
| * 2003R2: https://en.wikipedia.org/wiki/Windows_Server_20
| 03#Windows_Se...
|
| * Vista: https://en.wikipedia.org/wiki/Features_new_to_Wi
| ndows_Vista
|
| * 2008: https://en.wikipedia.org/wiki/Windows_Server_2008
| #Features
|
| * 7:
| https://en.wikipedia.org/wiki/Features_new_to_Windows_7
|
| * 2008R2: https://en.wikipedia.org/wiki/Windows_Server_20
| 08_R2#New_fea...
|
| * 8:
| https://en.wikipedia.org/wiki/Features_new_to_Windows_8
|
| * 2012: https://en.wikipedia.org/wiki/Windows_Server_2012
| #Features
|
| * 8.1: https://en.wikipedia.org/wiki/Windows_8.1#New_and_
| changed_fe...
|
| * 2012R2: https://en.wikipedia.org/wiki/Windows_Server_20
| 12_R2#Feature...
|
| * 10:
| https://en.wikipedia.org/wiki/Features_new_to_Windows_10
|
| * 2016: https://en.wikipedia.org/wiki/Windows_Server_2016
| #Features
|
| * 2019: https://en.wikipedia.org/wiki/Windows_Server_2019
| #Features
|
| * 2022: https://en.wikipedia.org/wiki/Windows_Server_2022
| #Features
|
| * 11:
| https://en.wikipedia.org/wiki/Features_new_to_Windows_11
|
| * 2025: https://learn.microsoft.com/en-us/windows-
| server/get-started...
|
| Obviously some of this will be "fluff" and that's up to
| your own personal definitions, but to act like there
| haven't been significant changes in every major revision
| is just nonsense.
| throwaway2037 wrote:
| This claim about SQL Server: Is it due to disk access being
| slower from NT kernel compared to Linux kernel?
| stackskipton wrote:
| It's just easier for everyone involved (outside Windows
| GUI clicker admins) if it runs on Linux. Containerization
| is easier, configuration is easier and operating system
| is much more robust.
| marcosdumay wrote:
| There's something very wrong with Windows disk access,
| you can see it easily by trying to run a Windows desktop
| with rotating disks.
|
| But SQL Server is in the unique position of being able to
| optimize Windows for their own needs. So they shouldn't
| have this kind of problem.
| riskable wrote:
| I had read previously from an unverified SQL Server
| engineer that the thing they wanted most (with Linux
| support) was proper containerization (from a developer
| perspective). Apparently containers on Windows just don't
| cut it (which is why nobody uses them in production).
| Take it with a grain of salt though.
|
| I don't think they'd ever admit that filesystem
| performance was an issue (though we all know it is; NTFS
| is over 30 years old!).
| shawnz wrote:
| > though we all know it is; NTFS is over 30 years old!
|
| ext2, which is forwards compatible with ext3 and ext4, is
| slightly older than NTFS
| mosburger wrote:
| > SQL Server team continues to push hard for Linux
| compatibility with every release.
|
| It's kinda funny that the DB that was once a fork of Sybase
| that was ported to Windows is trying to make its way back
| to Unix.
| benfortuna wrote:
| Keep in mind they don't just allow any old code to execute in
| the kernel.
|
| They do have rigorous tests (WHQL), it's just Crowdstrike
| decided that was too burdensome for their frequent updates,
| and decided to inject code from config files (thus bypassing
| the control).
|
| The fault here is entirely with Crowdstrike.
| capitainenemo wrote:
| Is there any evidence that the config files had arbitrary
| code in them? The only analysis I'd seen so far indicated a
| parsing error loading a viral signature database that was
| routinely updated, but in this case was full of garbage
| data.
| benfortuna wrote:
| Perhaps not verified, but some smart people do have
| convincing arguments:
|
| https://youtu.be/wAzEJxOo1ts?si=UNNxAN27VV1E6mcP&t=505
| capitainenemo wrote:
| Any article/blog/text-that-can-be-read?
| alecco wrote:
| Don't bother. He just repeats a tweet saying a
| null+offset dereference and also the speculation of that
| null picked from the sys file.
| remram wrote:
| How rigorous are the tests if faulty data can brick the
| machine?
| dwattttt wrote:
| Not rigorous enough to have detected this flaw in the
| kernel sensor, although effectively any bug in this
| situation (an AV driver) can brick a machine. I imagine
| WHQL isn't able to find every possible bug in a driver
| you submit to them, they're not your QA team.
| brendangregg wrote:
| Yes, we know eBPF must attach to equivalent events to Linux,
| but given there are already many event sources and consumers in
| Windows, the work is to make eBPF another consumer -- not to
| invent instrumentation frameworks from scratch.
|
| Just to use an analogy: Imagine people do their banking on
| JavaScript websites with Google Chrome, but if they use
| Microsoft Edge it says "JavaScript isn't supported, please
| download and run this .EXE". I'm not sure we'd be asking "if"
| Microsoft would support JavaScript (or eBPF), but "when."
| surajrmal wrote:
| This assumes eBPF becomes the standard. It's not clear
| Microsoft wants that. They could create something else which
| integrates with dot net and push for that instead.
|
| Also this problem of too much software running in the kernel
| in an unbounded manner has long existed. Why should Microsoft
| suddenly invest in solving it on Windows?
| brendangregg wrote:
| Microsoft have been driving the work to make eBPF an IETF
| industry standard.
| riskable wrote:
| ...just like they did with Kerberos! And just like with
| Kerberos they'll define a standard _then refuse to follow
| it_. Instead, they will implement subtle changes to the
| Windows implementation that make solutions that use
| Windows eBPF incompatible with anything else, making it
| much more difficult to write software that works with all
| platforms eBPF (or even just its output).
|
| Everything's gotta be different in Windows land.
| Otherwise, migrating _off_ of Windows land would be too
| easy!
|
| In case you were wondering what Microsoft refused to
| implement with its Kerberos implementation it's the DNS
| records. Instead of following the standard (they wrote!)
| they decided that all Windows clients will use AD's
| Global Catalog to figure out which KDC to talk to (e.g.
| which one is "local" or closest to the client). Since
| nothing but Windows uses the Global Catalog they
| effectively locked out other platforms from being able to
| integrate with Windows Kerberos implementation _as
| effectively_ (it 'll still work, just extremely
| inefficiently as the clients won't know which KDC is
| local so you either have to hard-code them into the
| krb5.conf on every single device/server/endpoint and hope
| for the best or DNS-and-pray you don't get a Domain
| Controller/KDC that's on an ISDN line in some other
| country).
| MawKKe wrote:
| Embrace, extend, ...
| jrockway wrote:
| This doesn't really seem like their strategy anymore.
| It's not like Edge directly interprets Typescript, for
| example. While they embraced and extended Javascript, any
| extinguishing seems to be on the technical merits rather
| than corporate will.
|
| In the case of security scanners that run in the kernel,
| we learned this weekend that a market need exists. The
| mainstream media blamed Crowdstrike's bugs on "Windows".
| Microsoft would likely like to wash its hands of future
| events of this class. Linux-like eBPF is a path forward
| for them that allows people to run the software they want
| (work-slowers like Crowdstrike) while isolating their
| reputation from this software.
| philistine wrote:
| Apple took the lead on this front. It has closed easy
| access to the kernel by apps, and made a list of APIs to
| try and replace the lost functionality. Anyone maintaining
| a kernel module on macOS is stuck in the past.
|
| Of course, the target area of macOS is much smaller than
| Windows, but it is absolutely possible to kick all code,
| malware and parasitic security services alike, from
| accessing the kernel.
|
| The safest kernel is the one that cannot be touched at
| runtime.
| Xunjin wrote:
| > The safest kernel is the one that cannot be touched at
| runtime.
|
| Can you expand what you mean here? Because depending on
| the application you are running, you will need at least
| talk with some APIs to get privileged access?
| odo1242 wrote:
| Yeah, Apple doesn't allow any user code to run in kernel
| mode without significant hoops (the kernel is code
| signed) and tries to provide a user space API (e.g.
| DriverKit) as an alternative for the missing
| functionality.
|
| Some things (FUSE) are still annoying though.
| Agingcoder wrote:
| Being allowed to talk to the kernel to get info and
| running with the same privileges ( basically being able
| to read / write any memory ) is different.
| nullindividual wrote:
| I don't think Microsoft has a choice with regards to
| kernel access. Hell, individuals currently use
| undocumented NT APIs. I can't imagine what happens to
| backwards compat if kernel access is closed.
|
| Apple's closed ecosystem is entirely different. They'll
| change architectures on a whim and users will go with the
| flow (myself included).
| becurious wrote:
| But Apple doesn't have the industrial and commercial uses
| that Linux and Windows have. Where you can't suddenly
| switch out to a new architecture without massive amounts
| of validation costs.
|
| At my previous job they used to use Macs to control
| scientific instrumentation that needed a data acquisition
| card. Eventually most of the newer product lines moved
| over to Windows but one that was used in a validated FDA
| regulated environment stayed on the Mac. Over time
| supporting that got harder and harder: they managed
| through the PowerPC to Intel transition but eventually
| the Macs with PCIe slots went away. I think they looked
| at putting the PCIe card in a Thunderbolt enclosure. But
| the bigger problem is guaranteeing supply of a specific
| computer for a reasonable amount of time. Very difficult
| to do these days with Macs.
| nullindividual wrote:
| > validated FDA regulated environment stayed on the Mac
|
| Given how long it takes to validate in a GxP environment,
| and the cost, this makes sense.
| adolph wrote:
| Sounds like they need a nice Hackintosh for that
| validated FDA regulation app-OS-HW combo.
| becurious wrote:
| Good luck getting that through a regulated company's
| Quality Management System or their legal department. Way
| too much business risk and the last thing you want is a
| yellow or red flag to an inspector who can stop ship on
| your product until all the recall and remediation is
| done.
| numbsafari wrote:
| > Why should Microsoft suddenly invest in solving it on
| Windows?
|
| If they can continue to avoid commercial repercussions for
| failing to provide a stable and secure system, then society
| should begin to hold them to account and force them to.
|
| I'm not necessarily advocating for eBPF here, either. If
| they want to get there through some "proprietary" means, so
| be it. Apple is doing much the same on their end by locking
| down kexts and providing APIs for user mode system
| extensions instead. If MS wants to do this with some kind
| of .net-based solution (or some other fever dream out of
| MSR) then cool. The only caveat would seem to be that they
| are under a number of "consent decree" type agreements that
| would require that their own extensions be implemented on a
| level playing field.
|
| So what. Windows Defender shouldn't be in the kernel any
| more than CrowdStrike. Add an API. If that means being able
| to send eBPF type "programs" into kernel space, cool. If
| that means some user mode APIs, cool.
|
| But lock it down already.
| wongarsu wrote:
| Microsoft has invested in solving this for at least two
| decades, probably longer. They are just using a different
| (arguably worse) approach to this than the Unix world.
|
| In Windows 9x anti-malware would just run arbitrary code in
| the kernel that hooked whatever it wanted. In Windows XP a
| lot of these things got proper interfaces (like the file
| system filter drivers to facilitate scanning files before
| they are accessed, later replaced by minifilters), and the
| 64 bit edition of XP introduced PatchGuard [1] to prevent
| drivers from modifying Microsoft's kernel code.
| Additionally Microsoft is requiring ever more static and
| dynamic analysis to allow drivers to be signed (and thus
| easily deployed).
|
| This is a very leaky security barrier. Instead of a
| hardware-enforced barrier like the kernel-userspace barrier
| it's an effort to get software running at the same
| protection level to behave. PatchGuard is a cat-and-mouse
| game Microsoft is always loosing, and the analysis mostly
| helps against memory bugs but can't catch everything. But
| MS has invested a lot of work over the years in attempts to
| make this path work. So expecting future actions isn't
| unreasonable.
|
| [1] https://en.wikipedia.org/wiki/Kernel_Patch_Protection
| Analemma_ wrote:
| This is a weird reading of history. Microsoft has spent
| tons of effort getting as much code out of the kernel as
| possible: Windows drivers used to be almost all kernel-
| mode, now they're nearly all in userspace and you almost
| never need to write a kernel-mode Windows driver unless
| you're doing something with deep OS hooks (like CS was,
| although apparently even that wasn't actually necessary).
| The safeguards on kernel code are for the tiny sliver of
| use cases left that need it, it is not Microsoft patching
| individual holes on the leaky ship.
|
| They haven't yet gone as far as Apple in banning third-
| party kernel-mode code entirely, but I wouldn't be
| surprised if it's coming.
| tptacek wrote:
| A thing I think a lot of people don't include in their
| premises about Crowdstrike is that they're probably the
| most significant aftermarket endpoint security product in
| the world (they are what Norton and McAfee were in 2000),
| which means they're more than large enough for malware to
| target their code directly, which creates interesting
| constraints for where their code can run.
|
| I'm not saying I'd run it (I would not), just that I can
| see why they have a lot of kernel-resident code.
| nullindividual wrote:
| Microsoft already has an extensible file system filter
| capability in place, which is what current AV uses. Does it
| make sense to add eBPF on top of that and if so, are there any
| performance downsides, like we see with file system filters?
| mauvehaus wrote:
| They've done a technology transition once already from legacy
| file system filter drivers to the minifilter model. If they
| see enough benefit to another change, it wouldn't be
| unprecedented.
|
| Mind you, it looks like after 20-ish years Windows still
| supports loading legacy filter drivers. Given the
| considerable work that goes into getting even a simple
| filesystem minifilter driver working reliably, it's safe to
| assume that we'd be looking at a similarly protracted
| transition period.
|
| As to the performance, I don't think the raw infrastructure
| to support minifilters is the major performance hit. The work
| the drivers themselves end up doing tends to be the bigger
| hit in my experience.
|
| Some background for the curious:
|
| https://www.osr.com/nt-insider/2019-issue1/the-state-of-
| wind...
| Scene_Cast2 wrote:
| How much extra security does this provide on top of HLK?
| xyzzy123 wrote:
| So many problems though! including commercial monocultures, lack
| of update consent, blast radius issues, etc etc. There's a
| commons in our pockets but that is very difficult to regulate
| for. The will keep putting the gun to your head until you keep
| choosing the monoculture.
| shahahqq wrote:
| worrisome indeed that now the world knows how many users are
| affected by crowdstrike so the bad guys just need to poke
| deeper there
| kevin_nisbet wrote:
| I hate to dispute with someone like Brendan Gregg, but I'm hoping
| vendors in this space take a more holistic approach to
| investigating the complete failure chain. I personally tend to
| get cautious when there is a proposal that x will solve the
| problem that occurred on y date, especially 3 days after the
| failure. It may be true, but if we don't do the analysis we could
| leave ourselves open to blindspots. There may also be plenty of
| alternative approaches that should be considered and
| appropriately discarded.
|
| I think the part I specifically dispute is the only negative
| outcome is wasted CPU cycles. That's likely the case for the
| class of bug, but there are plenty of failure modes where a bad
| ruleset could badly brick a system and make it hard to recover.
|
| That's not to say eBPF based security modules isn't the right
| choice for many vendors, just that let's understand what risks
| they do and do not avoid, and what part of the failure chain they
| particularly address.
| mirashii wrote:
| Just because you have not been aware of the discussions on this
| topic that have been happening for years, doesn't mean that
| they haven't been happening. This isn't some new analysis
| formed 3 days after an incident, this is the generally accepted
| consensus among many experts who have been working in the
| space, introducing these new APIs specifically to improve
| stability, security, etc. of systems.
| ohmyiv wrote:
| > I personally tend to get cautious when there is a proposal
| that x will solve the problem that occurred on y date,
| especially 3 days after the failure.
|
| Microsoft has been working on eBPF for a few years at least.
|
| https://opensource.microsoft.com/blog/2021/05/10/making-ebpf...
|
| https://lwn.net/Articles/857215/
|
| If you're really concerned, they have discussions and
| communication channels where you're invited to air your
| concerns. They're listed on their github:
|
| https://github.com/microsoft/ebpf-for-windows
|
| Who knows, maybe they already have answers to your concerns. If
| not, they can address them there.
| the8472 wrote:
| If the filters are loaded at boot and hook into everything then a
| bug can still lock down the system to a point where it can't be
| operated or patched anymore (e.g. because you loaded an empty
| whitelist). So it could end up replacing a boot loop with another
| form of DoS.
|
| If microsoft includes a hardcoded whitelist that covers some
| essentials needed for recovery that could make a bug in such a
| tool easier to fix, but could still cause effective downtimes
| (system running but unusuable) until such a fix is delivered.
| twen_ty wrote:
| Can someone tell me what's the advantage of eBPF over a user mode
| driver? The article makes it look it eBPF is have your cake and
| eat it too solution which is too good to be true? Can you run
| graphics drivers in eBPF for example?
| tptacek wrote:
| No, you can't run arbitrary general-purpose programs in eBPF,
| and you cannot run graphics drivers in it. You generally can't
| run programs with unprovably bounded loops in eBPF, and your
| program can interact with the kernel only through a small
| series of explicitly enumerated "helpers" (for any given type
| of eBPF program, you probably have about 20 of these in total).
| chasil wrote:
| This is the wiki. I haven't kept up, but this isn't a kernel
| module.
|
| "eBPF is a technology that can run programs in a privileged
| context such as the operating system kernel. It is the
| successor to the Berkeley Packet Filter (BPF, with the "e"
| originally meaning "extended") filtering mechanism in Linux
| _and is also used in non-networking parts of the Linux kernel
| as well._ "
|
| https://en.wikipedia.org/wiki/EBPF
| bewo001 wrote:
| AFAIK, an ebpf function can only access memory it got handed as
| an argument or as result from a very limited number of kernel
| functions. Your function will not load if you don't have
| boundary checks. Fighting the ebpf validator is a bit like
| fighting Rust's borrow checker; annoying, at times it's too
| conservative and rejects perfectly correct code, but it will
| protect you from panics. Loops will only be accepted if the
| validator can prove they'll end in time; this means it can be a
| pain to make the validator to accept a loop. Also, ebpf is a
| processor-independent byte code, so vectorizing code is not
| possible (unless the byte code interpreter itself does it).
|
| Given all its restrictions, I doubt something complex like a
| graphics driver would be possible. But then, I know nothing
| about graphics driver programming.
| umanwizard wrote:
| > Fighting the ebpf validator is a bit like fighting Rust's
| borrow checker
|
| I think this undersells how annoying it is. There's a bit of
| an impedance mismatch. Typically you write code in C and
| compile it with clang to eBPF bytecode, which is then checked
| by the kernel's eBPF verifier. But in some cases clang is
| smart enough to optimize away bounds checks, but the eBPF
| verifier isn't smart enough to realize the bound checks
| aren't needed. This requires manual hacking to trick clang
| into not optimizing things in a way that will confuse the
| verifier, and sometimes you just can't get the C code to work
| and need to write things in eBPF bytecode by hand using
| inline assembly. All of these problems are massively
| compounded if you need to support several different kernel
| versions. At least with the Rust borrow checker there is a
| clearly defined set of rules you can follow.
| WaitWaitWha wrote:
| eBPF == extended Berkeley Packet Filter
|
| https://en.wikipedia.org/wiki/Berkeley_Packet_Filter
| kayge wrote:
| Thanks! This was not a familiar acronym to me... and after some
| digging[0] apparently it's no longer an acronym:
|
| "BPF originally stood for Berkeley Packet Filter, but now that
| eBPF (extended BPF) can do so much more than packet filtering,
| the acronym no longer makes sense. eBPF is now considered a
| standalone term that doesn't stand for anything."
|
| [0] https://ebpf.io/what-is-ebpf/
| CodeWriter23 wrote:
| > an unprecedented example of the inherent dangers of kernel
| programming
|
| I take issue with that. Kernel programming was not to blame;
| looking up addresses from a file and accessing those memory
| locations without any validation is. The same technique would
| yield the same result at any Ring.
| lucianbr wrote:
| Obviously in userspace it would only crash the running program
| and not the entire operating system? It's a significant
| difference.
|
| All of the service interruptions would have been just "computer
| temporarily not protected by crowdstrike agent". Not the same
| thing at all.
| CodeWriter23 wrote:
| > It's a significant difference.
|
| When various apps running the world are crashing, unable to
| execute because malware protection is failing, there is no
| difference.
| macobrien wrote:
| _No_ difference oversells it, IMO -- the fact that the
| entire OS crashed is what made fixing the bug so arduous,
| since it required in-person intervention. To be sure,
| running the code in userspace would still cause
| unacceptable service interruptions, but the fix could be
| applied remotely.
| nine_k wrote:
| At Ring 3 it would crash an app, not the entire OS.
|
| Yes, the kernel is fine and is not to blame. But running
| basically a rootkit controlled by a third party indeed _is_ to
| blame.
| CodeWriter23 wrote:
| > At Ring 3 it would crash an app, not the entire OS.
|
| That's still an outage for those key systems.
| nequo wrote:
| It is an outage for the monitoring system, not the system
| that it monitors.
| dwattttt wrote:
| FWIW their configuration files can't be holding addresses;
| those have been randomised in the kernel for at least a decade
| nkozyra wrote:
| I don't do any kernel stuff so I'm out of my element, but doesn't
| the fact that Crowdstrike & Linux kernel eBPF already caused
| kernel crashes[1] sort of downplay the rosiness of the state of
| things?
|
| [1]: https://access.redhat.com/solutions/7068083
| guipsp wrote:
| This is specifically addressed in the post you are replying to
| nkozyra wrote:
| Can you elaborate? What I see about Linux is that Crowdstrike
| was in the process of adopting eBPF which is ostensibly
| immune to kernel panics, but that issue shows their eBPF
| implementation specifically causing a kernel panic.
| mschuster91 wrote:
| > If your company is paying for commercial software that includes
| kernel drivers or kernel modules, you can make eBPF a
| requirement. It's possible for Linux today, and Windows soon.
| While some vendors have already proactively adopted eBPF (thank
| you), others might need a little encouragement from their paying
| customers.
|
| How about Microsoft's large government and commercial customers
| make it a requirement that MS does not develop a single new
| feature for the next two fucking years or however long it takes
| to go through the entirety of the Windows+Office+Exchange code
| base and to make sure there are no security issues in there?
|
| We don't need ads in the start menu, we don't need telemetry, we
| don't need desktop Outlook becoming a rotten slow and useless web
| app, we don't need AI, we certainly don't need Recall. We need an
| OS environment that doesn't need a Patch Tuesday where we have to
| check if the update doesn't break half the canary machines.
|
| And while MS is _at that_ they can also take the goddamn time and
| rework the entire configuration stack. I swear to god, it drives
| me nuts. There 's stuff that's only accessible via the registry
| (and there is no comprehensive documentation showing exactly what
| _any_ key in the registry can do - large parts of that are MS-
| internal!), there 's stuff only accessible via GPO, there's stuff
| hidden in CPLs dating back to Windows 3.11, and there's stuff in
| Windows' newest UI/settings framework.
| throwaway2037 wrote:
| The blog post says: > eBPF, which is immune to
| such crashes.
|
| I tried to Google about this, but I cannot find anything
| definitive. It looks like you can still break things. Can an
| expert on eBPF please comment on this claim? This is the best
| that I could find:
| https://stackoverflow.com/questions/70403212/why-is-ebpf-sai...
| umanwizard wrote:
| eBPF programs cannot crash the kernel, assuming there are no
| bugs in the eBPF verifier. There have been such bugs in the
| past but they seem to be getting more and more rare.
| javierhonduco wrote:
| Or in other parts of the kernel. It's been the case in
| multiple occasions that buggy locking (or more generalised,
| missing 'resource' release) has caused problems for perfectly
| safe BPF programs. For example, see
| https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1033398 and
| the fix https://git.kernel.org/pub/scm/linux/kernel/git/torva
| lds/lin...
| umanwizard wrote:
| This is actually exactly the bug I was thinking of, so fair
| point! (I work at PS now and am aware you worked on
| debugging it a while back).
| rwmj wrote:
| This isn't really true. eBPF programs in Linux have access to
| a large set of helper functions written in plain C.
| https://lwn.net/Articles/856005/
| umanwizard wrote:
| I don't see how this contradicts what I said. Indeed, there
| are helpers, but the verifier is supposed to check that the
| eBPF program isn't calling them with invalid arguments.
| queuebert wrote:
| I would be very hesitant to say "cannot" in a million-line C
| code base.
| umanwizard wrote:
| Yes, bugs in Linux are possible, so there might be some
| eBPF code that crashes the kernel. Just like bugs in Chrome
| are possible, so there might be some JavaScript that
| crashes the browser. Still, JavaScript is much safer than
| native code, because fixing the bugs in one implementation
| is a tractable problem, whereas fixing the bugs in all user
| code is not.
| __MatrixMan__ wrote:
| Maybe we should start taking Fridays off to commemorate the
| event, which probably would have been less bad if more people
| spent less time with their nose to the grindstone and had more
| time to stop and think about how it all was shaping up and how
| they could influence that shape.
| ReleaseCandidat wrote:
| Sorry, but neither eBPF nor Rust nor formal verification nor ...
| is going to solve that problem. Repeat after me: there are no
| technical solutions to social problems. As long as the result of
| such an outage is basically a "oh, a software problem! _shrug_ ",
| _nothing_ will change.
| Yawrehto wrote:
| 1. How does eBPF solve this? It makes it more difficult, sure,
| but it'll almost always be _possible_ to cause a crash, if you
| try hard enough. 2. More importantly, the problem is rarely
| fixable by changing technology, because typically, problems are
| caused by people and their connections: social /corporate
| pressures, profit-seeking, mental health being treated as
| unimportant, et cetera. eBPF can't fix those, and as long as
| corporations have social structures that penalize thoroughness
| and caution, and incentivize getting 'the most stuff' done, this
| will persist as a problem.
| umanwizard wrote:
| > it'll almost always be possible to cause a crash, if you try
| hard enough.
|
| If you think you know a way to crash the Linux kernel by
| loading and running an eBPF program, you should report a bug.
| uticus wrote:
| > eBPF programs cannot crash the entire system because they are
| safety-checked by a software verifier and are effectively run in
| a sandbox.
|
| Isn't one of the purposes of an OS to police software? I get that
| this has to do with the OS itself, but what does watching the
| watchers accomplish other than adding a layer which must then be
| watched?
|
| Why not reduce complexity instead of naively trusting that the
| new complexity will be better long term?
| MetaWhirledPeas wrote:
| Right? I might spend a few minutes seeing if an AI chatbot can
| explain all the justifications that lead to using something
| like CrowdStrike in the first place.
| riskable wrote:
| eBPF isn't "watching the watchers" it's just a tool that lets
| _other_ tools access low-level things in the kernel via a very
| picky sandbox. Think of it like this:
|
| Old way: Load kernel driver, hook into bazillions of system
| calls (doing whatever it is you want to do), pray you don't
| screw anything up (otherwise you _can_ get a panic though not
| necessarily--Linux is quite robust).
|
| eBPF way: Just ask eBPF to tell you what you want by giving it
| some eBPF-specific instructions.
|
| There's a rundown on how it works here: https://ebpf.io/what-
| is-ebpf/
| risenshinetech wrote:
| Thank God some superheros have finally come along to make sure
| code never crashes any computers ever again! /s
| klooney wrote:
| First io_uring, now eBPF. Kind of wild.
| tracker1 wrote:
| I don't buy it... didn't a bug from RedHat + Crowdstrike have a
| similar panic issue? I understand in that case it was because of
| RedHat, but still. I don't think this, by itself will change
| much.
| kaliszad wrote:
| "These security agents will then be safe and unable to cause a
| Windows kernel crash."
|
| Unless of course there is a bug in eBPF
| (https://access.redhat.com/solutions/7068083) @brendangregg and
| the kernel panics/ BSoDs anyway which you mention later in the
| article of course.
| ec109685 wrote:
| Benefit of fixing that bug is that all ebpf programs benefit
| versus every security vendor needing to ensure they write
| perfect c code.
| throw0101d wrote:
| Meta:
|
| > _eBPF (no longer an acronym)_ [...]
|
| Any reason why the official acronym was done away with?
| riskable wrote:
| Because it used to stand for extended Berkeley Packet Filter
| and it has since moved far, far beyond just packets. It now
| hooks into the _entire_ network stack, security, and does
| observability /tracing for nearly anything and everything in
| the kernel ("nearly" because some stuff runs when the kernel
| boots up--before eBPF is loaded--and never again after that).
| sandywaffles wrote:
| Because eBPF is no longer _just_ packet filtering? It 's now
| used in loads of hook pionts unrelated to packets or filtering
| at all.
| bfrog wrote:
| I wonder if microkernels ever had this kind of bullshit. Had it
| been a microkernel, would we all be sitting twiddling our thumbs
| on friday? Hot take: No.
| dveeden2 wrote:
| So eBPF is giving us eBFP (enhanced Blue Friday Protection)?
| muth02446 wrote:
| ```The verifier is rigorous -- the Linux implementation has over
| 20,000 lines of code -- with contributions from industry (e.g.,
| Meta, Isovalent, Google) and academia (e.g., Rutgers University,
| University of Washington). The safety this provides is a key
| benefit of eBPF, along with heightened security and lower
| resource usage. ``` Wow, 20k is not exactly encouraging. Besides
| the extra attack surface, who can vouch for such a large code
| base?
| haberman wrote:
| I had exactly the same thought. I don't know if that 20k number
| was supposed to inspire confidence, but for me it did the
| opposite. It would have inspired confidence if it was 300 lines
| of code.
|
| My impression is that the WebAssembly verifier is much simpler.
| brundolf wrote:
| This sounds like a cool technology, but this was the really
| egregious problem:
|
| > There are other ways to reduce risks during software deployment
| that can be employed as well: canary testing, staged rollouts,
| and "resilience engineering" in general
|
| You don't need a new technology to implement basic industry-
| standard quality control
| odyssey7 wrote:
| "The verifier is rigorous"
|
| But the appeal-to-authority evidence that the article presents is
| not.
|
| "-- the Linux implementation has over 20,000 lines of code --
| with contributions from industry (e.g., Meta, Isovalent, Google)
| and academia (e.g., Rutgers University, University of
| Washington). The safety this provides is a key benefit of eBPF,
| along with heightened security and lower resource usage."
| lazycog512 wrote:
| "The major difference between a thing that might go wrong and a
| thing that cannot possibly go wrong is that when a thing that
| cannot possibly go wrong goes wrong it usually turns out to be
| impossible to get at and repair."
|
| - Douglas Adams
| rezonant wrote:
| > the company behind this outage was already in the process of
| adopting eBPF, which is immune to such crashes
|
| Oh I'm sure they'll find a way.
| egorfine wrote:
| One option to prevent this is to not run corporate spyware. But I
| guess for some industries this isn't an option.
| 0xbadcafebee wrote:
| > In the future, computers will not crash due to bad software
| updates
|
| I'm still waiting on my flying car...
| tgtweak wrote:
| Even if Microsoft rolls out eBPF and mainstreams it - it will be
| years before everything is ported over and it still won't address
| legacy windows versions (which appear to be a good chunk of what
| was impacted).
|
| It's a move in the right direction but it probably won't fully
| mitigate issues like this for another 5+ years.
| ksec wrote:
| The article mentions Windows and Linux. Does anyone know if there
| will be eBPF for FreeBSD?
| titzer wrote:
| WebAssembly is a better choice for sandboxing kernel code. It has
| a full formal specification with a mechanized proof of type
| safety, many high-performance implementations, broad toolchain
| support, is targetable from many languages, and a capability
| security model.
| datadeft wrote:
| It is great that we need a linux kernel feature to be ported to
| Windows so we don't have blue Fridays
| 7e wrote:
| eBPF will be an improvement, I'm sure, but does not mean the end
| of bugs/DoS in software.
| wiresurfer wrote:
| Hey Brendan,
|
| > If your company is paying for commercial software that includes
| kernel drivers or kernel modules, you can make eBPF a
| requirement.
|
| Windows soon, may still be atleast a year ahead. Would that be a
| fair statement? atleast being the operating keyword here.
|
| Specifically in the context of network security software, for
| eBPF programs to be portable across windows/linux, we would need
| MSFT to add a lot more hooks and expose internal kernel stucts.
| Hopefully via a common libbpf definition. Otherwise, I fear,
| having two versions of the same product, across two OSs would
| mean more secuirty and quality issues.
|
| I guess the point I am trying to make is, we would get there, but
| we are more than a few years away. I would love to see something
| like cilium on vanilla windows for a Software defined Company
| Wide network. We can then start building enterprise network
| secutiry into it. Baby steps!
|
| ---
|
| btw, your talks and blog posts about bpftools is godsent!
| fullspectrumdev wrote:
| This puts an awful lot of stock in the robustness of eBPF.
|
| Which is odd, given there's been a bunch of kernel privesc bugs
| using eBPF...
___________________________________________________________________
(page generated 2024-07-22 23:07 UTC)