[HN Gopher] Mandatory enforcement of indirect branch targets
___________________________________________________________________
Mandatory enforcement of indirect branch targets
Author : peter_hansteen
Score : 218 points
Date : 2023-07-14 12:19 UTC (10 hours ago)
(HTM) web link (undeadly.org)
(TXT) w3m dump (undeadly.org)
| SoftTalker wrote:
| Theo had to get his digs in against Linux in that announcement.
| Why not just focus on what OpenBSD is doing, and maybe contrast
| it to what Linux does without the speculation that they will
| still be doing the same thing in 20 years.
|
| He's unquestionably brilliant, but I've had a few encounters with
| him on the mailing lists and he is _so_ quick to take offense
| where none was meant and drop into name-calling and insults. I
| don 't really get it. He may have some deep insecurities.
| VancouverMan wrote:
| That part doesn't look like a "dig" or an insult to me.
|
| It seems like a reasonable, relevant, and plausible assessment
| of how the long-term outcomes may likely differ between
| OpenBSD's stricter approach versus a looser approach,
| specifically when it comes to the degree of security offered
| (which is one of OpenBSD's main focuses), based on a past
| situation that's similar.
|
| How do you know that you aren't being, to use your words,
| "quick to take offense where none was meant" in this case?
| jacquesm wrote:
| > How do you know that you aren't being, to use your words,
| "quick to take offense where none was meant" in this case?
|
| Past knowledge about Theo?
| Ericson2314 wrote:
| Are Theo and Linux more alike than OpenBSD and Linux?
| PrimeMcFly wrote:
| Now, yes. Linus wasn't always so abrasive though. At some
| point he caught up to Theo.
| LexiMax wrote:
| Linus has been trying to calm down in recent years, in
| large part because he decided he no longer wanted to be
| lumped in with the crowd that endlessly complains about
| political correctness.
|
| https://www.bbc.com/news/technology-45664640
| Ericson2314 wrote:
| Yeah this is good stuff, and why I felt bad about making
| the comparison. Not saying Theo is in that camp, but
| Linus is trying to be less abrasive in general, and Theo
| is not.
| NoZebra120vClip wrote:
| Perhaps we're reading into their personalities more than
| we should, based on public social-media appearances.
|
| Egos tend to become exaggerated when benevolent dictator
| types make public statements. Their candor and bluntness
| on a mailing list or Twitter may be completely different
| than their demeanor and their kindness toward
| collaborators in private.
|
| Now we have the very public drama that happened between
| Theo and that "other BSD" team to create the original
| schism. But have we had any subsequent drama that caused
| breakups or forks? I don't know. OpenBSD manages to plug
| away and push releases out the door on schedule, right?
|
| Linus doesn't seem to have a lot of internal contributor
| drama, judging by the way they also push releases out the
| door and merge pull requests and add features.
|
| Really, if either Theo or Linus were unreasonable men,
| their teams would fall apart and they would cease to be
| leaders of anything. I think their leadership abilities
| speak for themselves: they've both been committed and
| dedicated to the same project since decades ago, and
| they've both built and maintained cohesive teams of
| contributors who seem to mostly stick around long enough
| to make a difference.
|
| They are "thought leaders", if you will; perhaps not
| charismatic ones, but canny businessmen who know how to
| nurture their pet projects.
| NoZebra120vClip wrote:
| > Are Theo and Linux more alike than OpenBSD and Linux?
|
| Is a Canadian kernel developer more like a POSIX operating
| system than a POSIX operating system is like a POSIX
| operating system?
|
| I'm not sure I understand. Perhaps you meant to write "Linus"
| since Linus is also a kernel developer? That seems more like
| apples to apples.
| redundantly wrote:
| I wouldn't have it any other way. I love the OpenBSD mailing
| lists. Always an entertaining read when Theo gets involved.
| teknopurge wrote:
| upvoted and +1. Theo has been an important leader in OSS for
| decades: his brevity and impatience is a net positive. also
| he is usually correct.
| ris wrote:
| This is my main takeaway too. As a one time OpenBSD enthusiast
| (and still admirer), now I'm a bit older I find the continual
| smugness starts to grate.
|
| Truth is, Linux has a lot more constraints on how it can
| implement something because it has _users_. Users that have all
| sorts of different ways they need it to work.
| microtherion wrote:
| I'd just like to interject for a moment. What you're referring
| to as Linux, is in fact, NotOpenBSD/Linux, or as I've recently
| taken to calling it, Linux as opposed to OpenBSD...
| Joker_vD wrote:
| That's the problem with many brilliant people: what they
| perceive as their interlocutors being deliberately obtuse on
| some completely obvious point is actually their interlocutors
| being just as smart as they always are on some point that is
| not obvious at all to them.
| rkangel wrote:
| Perception of relative intelligence or sensible decision
| making is irrelevant. Just because you think you're doing a
| better job doesn't mean you need to shit on the other person.
|
| You could not mention Linux at all, or you could even say "we
| think this is better than Linux's approach because of X" and
| it would be a great improvement.
|
| I have always found it interesting that Rust purposefully
| avoided doing language comparisons - "we're better than
| Python like this and better than C like that". Their message
| purposefully avoided any positioning of it as a competition,
| instead focusing just on articulating Rust's value. It was an
| eye opening approach given our instinct is normally to pit
| things against each other.
| selectodude wrote:
| I think parent agrees with you.
| brynet wrote:
| It's an important comparison of the mechanisms, even in 2023,
| you can still find binaries on modern Linux distributions with
| executable stacks due to the fail-open design, 20 years later.
|
| The fact that Linux hasn't learned the right lessons in 20
| years, and has chosen to "double down" in respect to IBT/BTI,
| does not inspire confidence that they will ever fix it. I'd say
| his 20 year estimate was in fact being pretty generous given
| the evidence available.
|
| https://news.ycombinator.com/item?id=21554975
| jacquesm wrote:
| The funny thing is that this attitude towards breaking
| changes is one of the reasons why Theo is able to make this
| comment at all. If he would allow breaking changes then
| OpenBSD adoption likely would be higher and that in turn
| would cause him to resist the kind of things that Linux would
| not be able to get away with.
|
| It's clearly different philosophies leading to different
| outcomes with neither of them clearly better than the other,
| it just depends on what you need. It would be possible to
| make that statement in a more graceful way.
| sillywalk wrote:
| "I have altered the ABI. Pray I do not alter it further."
| -- Theo de Raadt
|
| https://marc.info/?l=openbsd-tech&m=157489277318829&w=2
| binkHN wrote:
| Theo himself considers OpenBSD a "research" OS, so I don't
| think he'll ever consider OpenBSD going mainstream,
| especially as it allows stuff like this to happen.
| jacquesm wrote:
| Indeed, so it's apples-to-oranges.
| mananaysiempre wrote:
| > It's an important comparison of the mechanisms, even in
| 2023, you can still find binaries on modern Linux
| distributions with executable stacks due to the fail-open
| design, 20 years later.
|
| Unfortunately, for C code using GCC's nested functions
| extension (or for languages that want to be ABI-compatible
| with C and support nested functions, like that paragon of
| advanced features called Pascal /s ), there's no other
| compilation strategy in current ABIs. The patches to switch C
| (and not just Ada) to function descriptors[1] with an ABI
| break have been sitting on the GCC mailing list since
| approximately forever[2], but it doesn't seem like there's
| been any progress.
|
| [1] The strategy is basically to compile (*fp)() not as
| call *%rax
|
| but as (untested) test $1, %rax
| jz 1f mov 8(%rax), %r10 mov (%rax), %rax
| 1: call *%rax
|
| thus essentially inlining the (currently stack-allocated)
| closure calling thunk at all indirect call sites. It is ABI-
| compatible on x86 and x86-64 with all code that does not
| involve nested functions, place functions at odd addresses,
| or tag function pointers itself (and I think with all arm64
| and riscv code, although arm32's usage of the low pointer bit
| for Thumb interworking is bound to make this trickier).
|
| [2] https://gcc.gnu.org/legacy-ml/gcc-
| patches/2019-01/msg00735.h...
| brynet wrote:
| That strategy won't fly with IBT.
|
| Now all software must pay the price and miss out on
| important mitigations, for all eternity, just because of
| some largely unused feature in one compiler?
| mananaysiempre wrote:
| IBT is already further along here. The hypothetical
| solution for executable stacks is to recompile all of
| your nested-function-using or -calling code with
| -ftrampolines (except that won't work without the patch
| above--silently, really GCC?..). The _already real and
| working_ solution for IBT is to recompile all of your
| indirect-branch-using code with -fcf-protection=branch.
| So, ignoring the fact that nested functions are in
| practice much rarer, if you accept the former as valid
| you'll need to accept the latter as well, as far as logic
| as concerned.
|
| I wouldn't characterize this as a "largely unused feature
| in one compiler" screwing things up, but rather as the
| ABI on most Linux and -adjacent platforms (except SysV
| Itanium and FDPIC IIRC) being incapable of supporting
| closures (without executable stacks). That these are
| missing from standard C, and only present in languages
| that are either niche (Pascal, Ada) or don't care about
| following the platform ABI (Rust, Go, C++'s lambdas), is
| a defect of C (and that's at least a somewhat popular
| opinion among ISO C committee members[1]).
|
| Of course, OpenBSD essentially does not _have_ a stable
| ABI, so it's much freer to experiment here.
|
| [1] https://thephd.dev/lambdas-nested-functions-block-
| expression...
| whoopdedo wrote:
| It's the price you pay for never-break-userspace. OpenBSD is
| fine with the very small probability that an executable which
| doesn't do branch tracking will fail to run under the
| enforced rules. The answer to that is to recompile because
| you've still got the source, and if not, well, tough cookies.
| ndesaulniers wrote:
| > the very small probability that an executable which
| doesn't do branch tracking will fail to run under the
| enforced rules
|
| Isn't it any indirect branch in any program that will trip
| BTI/IBT? So most programs? I guess I disagree with the
| `small probability ` part.
| jacquesm wrote:
| Tough cookies translates for many people into: OpenBSD is
| not for me. The 'very small probability' likely approaches
| '1' for sufficiently old enough stuff. And even if you do
| have the source, does it still build without substantial
| work? Backwards compatibility is not something to toss out
| the window without thinking through the consequences.
| loeg wrote:
| > OpenBSD is fine with the very small probability that an
| executable which doesn't do branch tracking will fail to
| run under the enforced rules.
|
| To clarify slightly, OpenBSD is fine with the very _high_
| probability that an executable will fail under new rules.
| Otherwise, yes.
| [deleted]
| WalterBright wrote:
| I'm working on adding ENDBR support to the DMD D compiler
| backend.
| ntfAX wrote:
| A software solution provided by the OS or language can make this
| hardware solution irrelevant.
| wongarsu wrote:
| Windows does this in software, since approximately 8 years.
|
| An advantage of the software solution is that you don't need to
| have the feature compiled into every library for it to work,
| you just lose protection in those parts. That makes for a much
| quicker rollout. Also faster iteration times, in the Windows
| Insider Preview you can get the extended version that also
| checks that the hashed function signature matches.
|
| 1: https://learn.microsoft.com/en-
| us/windows/win32/secbp/contro...
| josephcsible wrote:
| You've got it backwards: this hardware solution makes the
| software solutions irrelevant.
| tialaramex wrote:
| Nope. Here's the actual problem, in these crappy languages
| it's really easy for mistakes to result in a stack smash, so,
| these types of hacks aim to make it harder for the bad guys
| to turn that into arbitrary remote code execution. Not
| impossible, just harder. Specifically in this case the idea
| is that they won't be able to abuse arbitrary bits of
| function without calling the whole function, at a cost of
| some hardware changes and emitting unnecessary code. So maybe
| they can't find a whole function which works for them and
| they give up.
|
| Using better languages makes the entire problem disappear.
| You don't get a stack smash, the resulting opportunities for
| remote code execution disappear.
|
| It suggests that maybe the "C magically shouldn't have
| Undefined Behaviour" people were onto something after all.
| Maybe C programmers really are so wedded to this awful
| language that just being much slower than Python wouldn't
| deter them. There is still the problem that none of them can
| agree how this should work, but if they'll fund it maybe it's
| worth pursuing to find out how much they will put up with to
| keep writing C.
| yakubin wrote:
| I'm always amused by how many of OpenBSD's mitigations are
| patching over something as basic as lack of bounds
| checking, yet they'll never add bounds checking. And, as
| you said, those are all just speed bumps, not fixes.
| dundarious wrote:
| I think one could argue that all the software mitigations
| that aren't based on compile time proofs result in quite a
| bit more "emitting unnecessary code", if "unnecessary" is
| taken to mean "not strictly intrinsic to the task of the
| program". And undefined behavior is bad, but getting rid of
| it wouldn't be a silver bullet for this problem in C, I
| think. All undefined behavior could become "implementation
| defined" tomorrow, where the C compiler becomes more like a
| high-level assembler (again), and you could still jump the
| instruction pointer into arbitrary program text.
| tialaramex wrote:
| > All undefined behavior could become "implementation
| defined" tomorrow, where the C compiler becomes more like
| a high-level assembler (again), and you could still jump
| the instruction pointer into arbitrary program text.
|
| Try to work this through in your head. Imagine how you
| need to specify the working of the abstract machine in
| order to allow this. How do we talk about an "instruction
| pointer" on the abstract machine? What are the
| instructions it's pointing to? Am I defining an entire
| bytecode VM?
|
| Nah, instead you're going to do one of two things. One:
| "Undefined Behaviour" which we explicitly took off the
| table, or Two: "If this happens the program aborts". And
| with that the big problem evaporates. Does it make those
| C programmers happy? I expect not.
| dundarious wrote:
| Implementation defined means the compiler must _specify_
| the behavior, but it has near total freedom, and it can
| define it specific to the target system. There is no
| abstract machine. If I use GCC on Linux x86-64, then
| there very much is an instruction pointer.
| tialaramex wrote:
| In the real world, compilers just specify that the
| behaviour is undefined and tell you to suck it up. But
| we're talking about a hypothetical where we aren't
| allowing Undefined Behaviour. Saying "Oh, but we can if
| we say it's the implementation choosing" is a get out
| which is meaningless for the hypothetical. Just refuse to
| engage with the hypothetical instead if you don't like
| it.
| dundarious wrote:
| I'm using specific, standards defined language, that's
| relatively well known. For example, sizeof(int) is
| implementation defined, meaning it must have a documented
| definition, specific to the implementation (e.g., gcc
| x86_64-linux-gnu, it's 4).
|
| In languages like C that are closer to the machine, not
| everything has to be specified strictly in terms of a
| generic abstract machine.
|
| I'm not trying to be hostile or evasive or derisive, I'm
| just genuinely responding to your original comment, that
| I think missed on some important info. And my point was
| that _if we imagine a different world from the real world
| we 're in right now_, where in this new world, all
| undefined behavior became implementation defined
| behavior, then there would _still_ be a need for
| mitigations like endbr64. So I 'm not painting a rosy
| picture for C. I just think undefined behavior is a red
| herring. Assembly doesn't have undefined behavior, but
| obviously you can have all sorts of issues there.
| tialaramex wrote:
| > Assembly doesn't have undefined behavior, but obviously
| you can have all sorts of issues there.
|
| The machine is in the real world and is thus obliged to
| have some actual behaviour, _but_ it is not always
| practical to discern what that behaviour would be let
| alone make it reliable across a product line and document
| it in an understandable way. As a result actually your
| CPU 's documentation does in effect include "Undefined
| Behaviour".
| dundarious wrote:
| True, when writing my comment I wanted to qualify it to
| the same effect, but thought it would be an unnecessary
| subtlety to the general thrust of my point. That is, we
| can ignore this kind of "undefined behavior in the
| machine itself" for the purposes of this particular
| discussion.
| tialaramex wrote:
| I don't see how to ignore it though. If we're defining
| the behaviour but then our "definition" just doesn't
| specify the actual behaviour because it's specified in
| terms of hardware with no clearly defined behaviour for
| that situation then it's just word play, we're not really
| doing what I set out.
| tremon wrote:
| It's only irrelevant if the hardware solution is available on
| all the supported architectures/systems. As long as it's not,
| the software version must be maintained anyway, and might
| suffer from bitrot if it's no longer exercised on the major
| architectures.
| nullc wrote:
| Is this protection really all that helpful? Surely there are
| functions you can call into the top of to do your diabolical
| deeds for you.
|
| It would be more helpful if callers would store some machine
| specific hash of the function prototype and the function itself
| would check the hash, so that you could only redirect to calling
| a function with the right signature.
|
| But that would also increase the overhead further. Already this
| is bad enough that it makes jump tables unattractive (which is
| too bad, considering the usually jump tables have little to no
| risk of control flow redirection).
| tedunangst wrote:
| The entire field of ROP exploits would basically never have
| been developed if it were as simple as just calling the
| function you want.
| messe wrote:
| For anybody unfamiliar with this, as I was, this appears to refer
| to Intel's Indirect Branch Tracking feature[1] (and the
| equivalent on ARM, BTI). The idea is that an indirect branch can
| only pass control to a location that starts with an "end branch"
| instruction. An indirect branch is one that jumps to a location
| whose value is loaded or computed from either a register or
| memory address: think calling a function pointer in C.
|
| Without IBT, you'd have this equivalence between C and assembly:
| main() { void (*f)(); f = foo;
| f(); } void foo() { } ---
| main: movl $foo, %edx call *%edx
| ret foo: ret
|
| If IBT is enabled, the above code triggers an exception because
| foo doesn't begin with an "end branch" instruction. When IBT is
| enabled by the compiler, the above code gets assembled as:
| main: endbr64 movl $foo, %edx
| call *%edx ret foo: endbr64
| ret
|
| Now the compiler inserts endbr64 at the start of each function
| prologue. The reason for this feature, is to use as a defense in
| depth against JOP, and COP attacks, as it means that the only
| "widgets" available to you are entire functions, which can be far
| harder to exploit and chain.
|
| [1]:
| https://www.intel.com/content/dam/develop/external/us/en/doc...
| asveikau wrote:
| It was an old joke that the opposite of "goto" is "come from",
| or that if goto is considered harmful, nobody said anything
| about a "come from". Marking something as a branch target
| reminds me of this.
|
| https://en.m.wikipedia.org/wiki/COMEFROM
| dejj wrote:
| > GOTO considered harmful
|
| COMEFROM considered harm-mitigating
|
| It ingeniously makes Return Oriented Programming (ROP) a lot
| harder.
| messe wrote:
| > COMEFROM considered harm-mitigating
|
| You know, that'd be a fantastic OpenBSD release name.
|
| Here's hoping a dev sees this comment; there's already been
| a few commenting in this thread.
| wongarsu wrote:
| Interesting. Seems like enforcement on Intel CPUs is supported
| since Tiger Lake (so ~2020). Windows has basically the same
| feature implemented in software since 2015, called Control Flow
| Guard [1]. I wonder what the story there is, and if Windows has
| any plans to (get everyone to) switch to the hardware version
| once those CPUs have sufficient market share.
|
| 1: https://learn.microsoft.com/en-
| us/windows/win32/secbp/contro...
| andersa wrote:
| Windows also recently implemented a far better version of
| this called Extended Flow Guard (XFG) that not only checks
| whether the location is a valid destination, but also whether
| it's a valid destination for that specific source.
|
| For example, for any virtual function call or function
| pointer call, the destination must have a correct tag with
| the hash of the arguments. It's much more secure, and also
| faster, since loading the tag from memory can be merged with
| loading the actual code after it.
|
| I wish this was the one implemented in hardware..
| simcop2387 wrote:
| That does sound like it would be more robust, but
| definitely sounds like it'd require a lot more silicon than
| the IBT that they did implement. Something like it might be
| something that comes in some future revisions.
| rwmj wrote:
| The fun fact being that older CPUs decode ENDBR64 as a slightly
| weird NOP (with no architectural effects), but it'll fault on
| original Pentiums:
| https://stackoverflow.com/questions/56120231/how-do-old-cpus...
| rollcat wrote:
| Various architectures do other interesting things with NOPs,
| IIRC one convention on PowerPC had something vaguely related
| to debugging or tracing (I can't remember the details or find
| any references right now).
| Someone wrote:
| https://www.ibm.com/docs/en/aix/7.3?topic=h-hpmstat-
| command:
|
| "random_samp_ele_crit=name
|
| Specifies the random criteria for selecting the
| instructions for sampling. Valid values for this option are
| as follows:
|
| ALL_INSTR
|
| All instructions are eligible. This value is the default
| setting.
|
| LOAD_STORE
|
| The operation is routed to the Load Store Unit (LSU); for
| example, load, store.
|
| PROB_NOP
|
| Sample only special no-operation instructions, which are
| called Probe NOP events.
|
| [...]"
| aidenn0 wrote:
| Some MIPS cores had a superscalar NOP that would stall
| every ALU by one cycle, which was necessary because they
| lacked synchronization instructions.
| monocasa wrote:
| RISC-V has a whole HINT space that's basically just morphs
| of load immediate into zero register.
|
| AArch64 has a similar space: https://developer.arm.com/docu
| mentation/ddi0596/2020-12/Base...
|
| And yes, PowerPC has a similar space as well holding hints
| like 'give priority to the other hardware threads on this
| core' and the like. https://utcc.utoronto.ca/~cks/space/blo
| g/tech/PowerPCInstruc...
| rollcat wrote:
| I was wondering where did I read about PowerPC, and this
| is exactly the article! So, it was for thread priority.
| Strikes me as an odd design choice, this probably
| should've been something to be managed by the OS more
| explicitly.
| messe wrote:
| Not just architectures, but different OSes and ABIs have
| found ways to repurpose no-ops. One example[1] is Windows
| using the 2-byte "MOV EDI, EDI" as a hot-patch point: it
| gets replaced by a "JMP $-5" instruction which jumps 5
| bytes before the start of a function into a spot reserved
| for patching. That 5 bytes is enough to contain a full jump
| instruction that can then jump wherever you need it to.
|
| ## Why do Windows functions all begin with a pointless MOV
| EDI, EDI instruction?
|
| [1]: https://devblogs.microsoft.com/oldnewthing/20110921-00
| /?p=95...
| pclmulqdq wrote:
| Intel Vtune will do this with 5-byte NOPs directly. I
| think LLVM's x-ray tracing suite did this with a much
| bigger NOP, also, to capture more information.
| gcoakes wrote:
| Good read. Thank you.
|
| This just worsens my fear of changing "unnecessary" code
| when I don't know the original motivation for it.
| jeffbee wrote:
| Interesting, thanks for pointing this out! Just yesterday
| I was gazing at some program containing two consecutive
| xor rax, rax. I thought what's the point? But as you
| point out it might be a NOP sled designed to be that
| specific length.
| jchw wrote:
| I wonder if this is still true. Whenever I go to hook
| Win32 API functions, I use an off-the-shelf length
| disassembler to create a trampoline with the first n
| bytes of instructions and a jmp back, and then just patch
| in a jmp to my hook, but if this hot-patch point exists
| it'd be a lot less painful since you can avoid basically
| all of that.
|
| Though, I guess even if it was, it'd be silly to rely on
| it even on x86 only. Maybe it would still make for a nice
| fast-path? Dunno.
| mattgreenrocks wrote:
| That's really clever use of the opcode space. Thanks for
| passing that along.
| SomeRndName11 wrote:
| NOP on intels is in fact xchg eax, eax
| dataflow wrote:
| There's a good question in the comments there that I still
| don't see the answer to. How does this work if there's an
| interrupt between the branch and the endbranch? Does the OS
| need to save/restore the "branchness" bit?
| drdrey wrote:
| there is no branchness bit, if there's an endbranch you can
| jump to it
| dataflow wrote:
| Ah so when you return from an interrupt, the check is no
| longer done?
| simcop2387 wrote:
| I'd assume so since it wouldn't be a call/jmp coming from
| a computed address in a register. That said I haven't
| read the documentation for any of this. But interrupts
| should be having a stack pointer change and other things
| happening that would be different, which is why they use
| the IRET instruction and not the RET one.
| muricula wrote:
| Yes, on arm the branch type is saved in SPSR_EL1 in the
| BTYPE field. That stands for Saved Program State Register
| for Kernel Mode (Exception Level 1) and Branch Type. https:
| //developer.arm.com/documentation/ddi0595/2021-12/AArc...
| __failbit wrote:
| Thank you for the explanation!
| haberman wrote:
| Interesting. I was able to get Clang to generate this using
| `-fcf-protection=branch`: https://godbolt.org/z/rooP8vPsM
|
| It looks like endbr64 is a 4-byte instruction. That could be a
| significant code size overhead for jump tables with lots of
| targets: https://godbolt.org/z/xTPToaddh
| notaplumber1 wrote:
| OpenBSD disables jump tables in Clang on amd64 due to IBT,
| some architectures also had jump tables disabled as part of
| the switch to --execute-only ("xonly") binaries by default,
| e.g: powerpc64/sparc64/hppa.
|
| https://marc.info/?l=openbsd-cvs&m=168254711511764&w=2
|
| E.g: https://marc.info/?l=openbsd-cvs&m=167337396024167&w=2
| cratermoon wrote:
| In case anyone wants a very simple introduction to JOP/COP
| exploits and mitigations of this type:
| <https://www.theregister.com/2020/06/15/intel_cet_tiger_lake/>
| codedokode wrote:
| Why should every function start with endbr64 command? Aren't
| functions usually called directly?
|
| Also, is it required to insert endbr64 command after function
| calls (for return address)?
| eklitzke wrote:
| As to why they're not always called directly, imagine some
| code like this: int FooWithoutChecks(void
| *p); int Foo(void *p) { if (p ==
| NULL) return -1; return FooWithoutChecks(p);
| }
|
| In general the caller is expected to call Foo if they aren't
| sure if the pointer is nullable, or if they already know that
| pointer is not null (e.g. because they already checked it
| themselves) they can call FooWithoutChecks and avoid a null
| check that they know will never be true.
|
| The naive way to emit assembly for this is to actually emit
| two separate functions, and have Foo call FooWithoutChecks
| the usual way. But notice that the FooWithoutChecks function
| call is a tail call, so the compiler can use tail call
| optimization. To do this it would inline FooWithoutChecks
| into Foo itself, so the compiler just emits code for Foo with
| the logic in FoowithoutChecks inlined into Foo. This is nice
| because now when you call Foo, you avoid a call/ret
| instruction, so you save two instructions on every call to
| Foo. But what if someone calls FooWithoutChecks? Simple, you
| just call at the offset into Foo just past the pointer
| comparison. This actually just works because Foo already has
| a ret instruction, so the call to FooWithoutChecks will just
| reuse the existing ret. This optimization also saves some
| space in the binary which has various benefits in and of
| itself.
|
| The example here with the null pointer check is kind of
| contrived, but this kind of pattern happens a LOT in real
| code when you have a small wrapper function that does a tail
| call to another function, and isn't specific to pointer
| checks.
| aidenn0 wrote:
| A traditional compiler needs to insert them for all external
| functions, because other compilation units may make an
| indirect call.
| messe wrote:
| C allows for any function to be called via a function
| pointer, and functions can be in different translation units,
| so the compiler can't simply assume that a function will
| never be called indirectly and has to pessimistically insert
| endbr64 in order to maintain a reasonable ABI.
|
| And no, as I understand it, this is only for branch/calls not
| returns.
| Joker_vD wrote:
| Well, if the function is marked "static", the compiler can
| actually check whether the function's address is taken in
| the current compilation unit or not and omit/emit ENDBR64
| accordingly (passing pointers to static functions to code
| in another compilation units is legal, and should still
| work).
| messe wrote:
| Good catch. Yeah, as long as the functions address is
| never taken the compiler has a lot of leeway with static
| functions; it can even avoid emitting code for them
| entirely if it can prove they're never called or if it's
| able to compute their results at compile-time.
| josephg wrote:
| Yep. Or inline them at every call site if that makes
| sense to do based on the optimization level and flags.
| MobiusHorizons wrote:
| Is this theoretically something lto could remove?
| tedunangst wrote:
| If you disable dlopen and ld_preload.
| codedokode wrote:
| Dlopen() "sees" only functions marked as exported (with
| macro like DLLEXPORT on Windows), not every function or
| am I wrong? Is C that bad?
| tedunangst wrote:
| On openbsd at least, every global symbol is exported
| unless you use an explicit symbol list. It's unusual for
| executables.
| josephcsible wrote:
| > Why should every function start with endbr64 command?
| Aren't functions usually called directly?
|
| They're _usually_ called directly, but unless the compiler
| can prove that they _always_ are (e.g., if they 're static
| and nothing in the same file takes the address), endbr64 is
| required.
|
| > Also, is it required to insert endbr64 command after
| function calls (for return address)?
|
| No, IBT is only for jmp and call. SS is the equivalent
| mechanism for ret.
| derefr wrote:
| > but unless the compiler can prove that they always are
| (e.g., if they're static and nothing in the same file takes
| the address), endbr64 is required
|
| Then why not just have the compiler break down every non-
| static function into two blocks: a static function that
| contains all the logic, and a non-static function that just
| contains an IBT and a direct jump to the static function?
| (Or, better yet, place the non-static label just before the
| static one, and have the non-static fall through into the
| body of the static.) Then the static direct callsites won't
| have to pay the overhead of executing the IBT NOP.
| Joker_vD wrote:
| That's absolutely doable, just... How much is predicted
| unconditional jump slower/faster than ENDBR64? What's the
| ratio of virtual/static calls in real-world programs? And
| while your last proposal ("foo: endbr64; foo_internal:
| <code>") evades those questions, it raises up questions
| about maintaining function alignment (16 bytes IIRC? Is
| this even necessary today?) and restructuring the
| compiler to distinguish the inner/external symbol
| addresses. Plus, of course, somebody has to actually sit
| down and write the code to implement that, as opposed to
| just adding "if (func->is_escaping) emit_endbr(...);" at
| the beginning of the code that emits the object code for
| a function body.
| 95014_refugee wrote:
| The IBT NOP is "free" in that it will evaporate in the
| pipeline; it still has to be fetched and decoded to some
| extent, but it does not consume execution resources.
|
| From a tooling perspective, what you're describing (two
| entrypoints for a function, the jump you mention is
| pointless) would require changes up and down the
| toolchain; it would affect the compiler, all linkers, all
| debuggers, etc. By contrast, just adding an additional
| instruction to the function prolog is relatively low-
| impact.
|
| It's also worth noting that at the time code for a
| function is emitted, the compiler is not aware of whether
| the symbol will be exported and thus discoverable in some
| other module, or by symbol table lookup, so emitting the
| target instruction is essentially mandatory.
| dzaima wrote:
| Doesn't seem like it'd be that difficult to make the
| change the other direction, i.e. keep endbr64 as-is as
| the default case, but if there's a direct jump/call to
| anywhere that starts with endbr64, offset the immediate
| by 4 bytes; could be done in any single stage of
| toolchain that has that info with no extra help. But
| yeah, quite low impact, might not even affect decode
| throughput & cache usage for at least one of the direct
| or indirect cases.
| tedunangst wrote:
| What is the overhead of executing the IBT NOP?
| 95014_refugee wrote:
| It's not "executed" per se. It consumes space in the
| cache hierarchy, and a slot in the front-end decoder. It
| won't ever be issued, but depending on the
| microarchitecture in question it might result in an issue
| cycle having less occupancy than it might have had in the
| case where the subsequent instruction was available.
|
| With that said, the first few instructions of a called
| function often stall due to stack pointer dependencies,
| etc. so the true execution cost is likely to be even
| smaller than the above might suggest.
| [deleted]
| binkHN wrote:
| I still run OpenBSD where I can, especially where security is
| more important. Yes, it's still missing A LOT of functionally
| compared to other UNIX-like systems, but security bases tend to
| be well covered.
| PrimeMcFly wrote:
| I don't really buy their approach to security honestly. Trying
| to fix all bugs is great, but they provide little to prevent
| unknown bugs bing exploited (pledge is nice for software that
| opts in to use it, but otherwise not so much). I'd love to see
| them implement something like AppArmor with their approach, it
| would probably be amazing.
|
| I actually think NetBSD is a pretty interesting alternative, it
| has some nice security features like veriexec that don't get
| talked about much.
| binkHN wrote:
| I think in the past they tried to fix all the bugs, and
| realized they couldn't, so they started to build all sorts of
| mitigations in the same vein as the one you see posted here
| today. As for pledge, and the related mitigations, yes,
| they're not useful if you don't use them, but I see this as
| them innovating in the space and giving application
| developers more tools to build hardened applications.
|
| I see tools like AppArmor as band-aids to fix problems that
| shouldn't exist in the first place. The problem with these
| approaches are the band-aids tend to break things in
| unexpected ways and when that happens they simply get removed
| and unused.
| PrimeMcFly wrote:
| > I see tools like AppArmor as band-aids to fix problems
| that shouldn't exist in the first place.
|
| I fundamentally disagree on that. I think tools like that
| are amazing at protecting against unknown threats/exploits.
| They let you lock down software and protect against future
| unknown exploits, badly behaving software, malicious
| employees etc. I think something similar should be a part
| of any OS claiming to be security focused. Basic DAC is
| woefully insufficient.
|
| On the other hand, the industry has largely found other
| solutions like sandboxing, but I still think MAC or RBAC or
| whichever has a place, certainly as art of a defense in
| depth strategy.
| anthk wrote:
| OpenBSD has these _on_ while on compiling.
| [deleted]
| carlosrg wrote:
| > they provide little to prevent unknown bugs bing exploited
|
| They provide plenty of mitigations
| (https://www.openbsd.org/innovations.html). In fact OP's
| article is for preventing unknown bugs from being exploited.
| PrimeMcFly wrote:
| They don't provide _any_ mitigations of the sort I was
| clearly referencing. Specifically, for restricting
| malicious code or users that already has access to the
| system, exploiting insecure software that was _not_
| compiled with pledge support.
| MuffinFlavored wrote:
| > Yes, it's still missing A LOT of functionally compared to
| other UNIX-like systems
|
| Could you give some examples/samples of things you have ran
| into off the top of your head?
| binkHN wrote:
| Sure. Poor SMP support (but this has improved heavily over
| the years), ancient file system, no Bluetooth (not important
| if you don't need this), reduced performance (due to a lack
| of optimizations and security mitigations overhead), limited
| Wi-Fi support (this is for numerous reasons, but it's better
| than other BSDs)...
|
| I could go on, but, for my needs, it works very well and some
| of its simplicities are a godsend.
| dark-star wrote:
| I find OpenBSD's hardware support especially lacking. It
| doesn't really work that well on at least 3 devices where I
| tried it on (all Dell laptops from various generations, 3-10
| years old), whereas Linux runs perfectly out-of-the-box on all
| three.
|
| Which is sad, as I kinda like the *BSD approach to things
| carlosrg wrote:
| Not my experience at all, it works very well with a new Acer
| laptop I own: the graphics work (Intel Xe - 12th gen
| processor), audio, touchpad, keyboard (and special keyboard
| keys like brightness), wifi... All I had to do is to download
| the firmware with fw_update, nothing more.
|
| Also I was pleasantly surprised to hear they support Apple
| M1/M2 Macs. Asahi Linux gets a lot of press around here but I
| had no idea OpenBSD supported it.
___________________________________________________________________
(page generated 2023-07-14 23:00 UTC)