[HN Gopher] Constant-time support coming to LLVM: Protecting cry...
___________________________________________________________________
Constant-time support coming to LLVM: Protecting cryptographic code
https://archive.is/tIXt7
Author : ahlCVA
Score : 105 points
Date : 2025-11-25 13:02 UTC (1 days ago)
(HTM) web link (blog.trailofbits.com)
(TXT) w3m dump (blog.trailofbits.com)
| frabert wrote:
| This has been a sore point in a lot of discussions regarding
| compiler optimizations and cryptographic code, how compilers and
| compiler engineers are sabotaging the efforts of cryptographers
| in making sure there are no side-channels in their code. The
| issue has never been the compiler, and has always been the
| language: there was never a way to express the right intention
| from within C (or most other languages, really).
|
| This primitive we're trying to introduce is meant to make up for
| this shortcoming without having to introduce additional rules in
| the standard.
| jfindper wrote:
| > _how compilers and compiler engineers are sabotaging the
| efforts of cryptographers_
|
| I'm not exposed to this space very often, so maybe you or
| someone else could give me some context. "Sabotage" is a
| _deliberate_ effort to ruin /hinder something. Are compiler
| engineers _deliberately_ hindering the efforts of
| cryptographers? If yes... is there a reason why? Some long-
| running feud or something?
|
| Or, through the course of their efforts to make compilers
| faster/etc, are cryptographers just getting the "short end of
| the stick" so to speak? Perhaps forgotten about because the
| number of cryptographers is dwarfed by the number of non-
| cryptographers? (Or any other explanation that I'm unaware of?)
| chowells wrote:
| It's more a viewpoint thing. Any construct cryptographers
| find that runs in constant time is something that could be
| optimized to run faster for non-cryptographic code. Constant-
| time constructs essentially are optimizer bug reports. There
| is always the danger that by popularizing a technique you are
| drawing the attention of a compiler contributor who wants to
| speed up a benchmark of that same construct in non-
| cryptographic code. So maybe it's not intended as sabotage,
| but it can sure feel that way when everything you do is
| explicitly targeted to be changed after you do it.
| stouset wrote:
| It's not intentional. The motivations of CPU designers,
| compiler writers, and optimizers are at odds with those of
| cryptographers. The former want to use every trick possible
| to squeeze out additional performance in the most common
| cases, while the latter absolutely require indistinguishable
| performance across all possibilities.
|
| CPUs love to do branch prediction to have computation already
| performed in the case where it guesses the branch correctly,
| but cryptographic code needs equal performance no matter the
| input.
|
| When a programmer asks for some register or memory location
| to be zeroed, they generally just want to be able to use a
| zero in some later operation and so it doesn't really matter
| that a previous value was really overwritten. When a
| cryptographer does, they generally are trying to make it
| impossible to read the previous value. And they want to be
| able to have some guarantee that it wasn't implicitly copied
| somewhere else in the interim.
| layer8 wrote:
| "Sabotage" can be used in a figurative sense that doesn't
| insinuate intent. An adjacent example is "self-sabotage",
| which doesn't imply intent.
| jfindper wrote:
| Every dictionary I've looked at, wikipedia, etc. all
| immediately and prominently highlight the intent part. It
| really seems like the defining characteristic of "sabotage"
| vs. other similar verbs. But, language is weird, so,
| -\\_(tsu)_/-.
| layer8 wrote:
| Since the sibling comment is dead and thus I can't reply to
| it: Search for "unintentional sabotage", which should
| illustrate the usage. Despite appearances, it isn't an
| oxymoron. See also meaning 3a on https://www.merriam-
| webster.com/dictionary/sabotage.
| colmmacc wrote:
| I don't think it's nefarious but it is sabotage. There's long
| been an implicit assumption that optimization should be more
| important than safety.
|
| Yes, languages do lack good mechanisms to mark variables or
| sections as needing constant-time operation ... but compiler
| maintainers could have taken the view that that means all
| code should be compiled that way. Now instead we're marking
| data and section as "secret" so that they can be left
| unoptimized. But why not the other way around?
|
| I understand how we get here; speed and size are trivial to
| measure and they each result in real-world cost savings. I
| don't think any maintainer could withstand this pressure. But
| it's still deliberate.
| aw1621107 wrote:
| > Now instead we're marking data and section as "secret" so
| that they can be left unoptimized. But why not the other
| way around?
|
| Worse cost-benefit tradeoff, perhaps? I'd imagine the
| amount of code that cares more about size/speed than
| constant-time operation far outnumbers the amount of code
| which prioritizes the opposite, and given the real-world
| benefits you mention and the relative newness of concerns
| about timing attacks I think it makes sense that compiler
| writers have defaulted to performance over constant-time
| performance.
|
| In addition, I think a complicating factor is that
| compilers can't infer intent from code. The exact same
| pattern may be used in both performance- and timing-
| sensitive code, so absent some external signal the compiler
| has to choose whether it prioritizes speed or timing. If
| you think more code will benefit from speed than timing,
| then that is a reasonable default to go with.
| soulbadguy wrote:
| As compiler have become more sophisticated, and hardware
| architecture more complicated, there are been a growing
| sentiment that some of the code transformation done by modern
| compiler make the code hard to reason about and to predict.
|
| A lot of software engineer are seeing this as compiler
| engineer only caring about performance as opposed to other
| aspect such as debuggability, safety, compile time and
| productivity etc... I think that's where the "sabotage" comes
| from. Basically the focus on performance at the detriment of
| other things.
|
| My 2 cents : The core problem is programmers expecting
| invariant and properties not defined in the languange
| standard. The compiler only garanty things as defined in the
| standard, expecting anything else is problematic.
| fooker wrote:
| > making sure there are no side-channels in their code
|
| Any side effect is a side channel. There are always going to be
| side channels in real code running on real hardware.
|
| Sure you can change your code, compiler, or, or even hardware
| to account for this but at it's core that is security by
| obscurity.
| fanf2 wrote:
| What happened to the blog post? It was moved and now it has
| disappeared :-(
| Meneth wrote:
| Archived copy here: https://archive.is/tIXt7
| foresto wrote:
| https://web.archive.org/web/20251125224147/https://blog.trai.
| ..
| frabert wrote:
| Sorry, it was published early and we have to wait for some
| approval checks to clear
| Asooka wrote:
| There really ought to be a subset of C that lets you write
| portable assembly. One where only a defined set of
| optimisations are allowed and required to be performed,
| "inline" means always inline, the "register" and "auto"
| keywords have their original meanings, every stack variable is
| allocated unless otherwise indicated, every expression has
| defined evaluation order, every read/write from/to an address
| is carried out, nothing is ever reordered, and undefined
| behaviour is switched to machine-specific behaviour. Currently
| if you need that level of control, your only option is writing
| it in assembly, which gets painful when you need to support
| multiple architectures, or want fancy features like
| autocomplete or structs and functions.
| saagarjha wrote:
| Why would you like register and auto to have meaning?
| thyristan wrote:
| Because for timing-sensitive code, those are important. If
| a variable is really a register, cache-based timing attacks
| just don't happen, because there is no cache in between.
| delta_p_delta_x wrote:
| > want fancy features like autocomplete or structs and
| functions
|
| I would argue that given a certain ISA, it's probably easier
| to write an autocomplete extension for assembly targeting
| that ISA, rather than autocomplete for C, or goodness forbid,
| C++.
|
| Likewise for structs, functions, jump targets, etc. One could
| probably set up snippets corresponding to different sorts of
| conditional execution--loops, if/else/while, switch, etc.
| GhosT078 wrote:
| SPARK is a very expressive language for implementing
| cryptographic applications. It is available for some LLVM
| targets (e.g. x86-64).
| zzo38computer wrote:
| I think __builtin_ct_select and __builtin_ct_expr would be good
| ideas. (They could also be implemented in GCC in future, as well
| as LLVM.)
|
| In some cases it might be necessary to consider the possibility
| of invalid memory accesses (and avoid the side-channels when
| doing so). (The example given in the article works around this
| issue, but I don't know if there are any situations where this
| will not help.)
| connicpu wrote:
| The side channel from memory access timings are exactly why
| cmov is its own instruction on x86_64. It retrieves the memory
| regardless of the condition value. Anything else would change
| the timings based on condition. If you're going to segfault
| that's going to be visible to an attacker regardless because
| you're going to hang up.
| zzo38computer wrote:
| I mean the possibility that the rest of the program
| guarantees that the address is valid if the condition is true
| but otherwise it might be valid or invalid. This is probably
| not important for most applications, but I don't know if
| there are some unusual ones where it would matter.
| wahern wrote:
| AFAIU, cmov wasn't originally intended to be a guaranteed
| constant-time operation, Intel and AMD won't commit to
| keeping it constant-time in the future, but it just so
| happened that at one point it was implemented in constant-
| time across CPUs, cryptographers picked up on this and began
| using it, and now Intel and AMD tacitly recognize this
| dependency. See, e.g., https://www.intel.com/content/www/us/e
| n/developer/articles/t...
|
| > The CMOVcc instruction runs in time independent of its
| arguments in all current x86 architecture processors. This
| includes variants that load from memory. The load is
| performed before the condition is tested. Future versions of
| the architecture may introduce new addressing modes that do
| not exhibit this property.
| adrian_b wrote:
| At your link there is a link to the list of instructions
| that guarantee constant execution time, independent of the
| operands.
|
| The list includes CMOV.
|
| However, the instructions from the list are guaranteed to
| have constant execution time, even on any future CPUs, only
| if the operating system sets a certain CPU control bit.
|
| So on recent and future Intel/AMD CPUs, one may need to
| verify that the correct choice has been made between secure
| execution mode and fastest execution mode.
| codedokode wrote:
| The names are pretty unattractive (which is not uncommon for C)
| though to the point one would want to avoid them in the code.
| anematode wrote:
| The solution is always another macro :)
|
| #define ct_select __builtin_ct_select
| ethin wrote:
| So this makes me curious: is there a reason we don't do something
| like a __builtin_ct_begin()/__builtin_ct_end() set of intrinsics?
| Where the begin intrinsic begins a constant-time code region, and
| all code within that region must be constant-time, and that
| region must be ended with an end() call? I'm not too familiar
| with compiler intrinsics or how these things work so thought I'd
| ask. The intrinsic could be scoped such that the compiler can use
| it's implementation-defined behavior freedom to enforce the
| begin/end pairs. But Idk, maybe this isn't feasible?
| zzo38computer wrote:
| Maybe it might be better to be implemented as a function
| attribute instead
| ethin wrote:
| Or a pragma? Like how OpenMP did it?
| grogers wrote:
| It'd be very hard for the compiler to enforce constant-time
| execution for generic code. As an example, if you wrote the
| naive password checking where the first byte that doesn't match
| returns false, is that a compiler error if it can't transform
| it into a constant time version?
| amluto wrote:
| Too bad that Intel chips more or less reserve the right to take
| LLVM's nice output and make it non-constant-time anyway. See:
|
| https://www.intel.com/content/www/us/en/developer/articles/t...
|
| Sure, you could run on some hypothetical OS that supports DOITM
| and insert syscalls around every manipulation of secret data.
| Yeah, right.
| JoshTriplett wrote:
| Last I saw, it seemed like the plan was to unconditionally
| _enable_ it, and on the off chance there 's ever a piece of
| hardware where it's a substantial performance win, offer a way
| to opt _out_ of it.
| amluto wrote:
| I advocated for that, and then I completely lost track of the
| status.
|
| The whole design is ridiculous.
| matu3ba wrote:
| What would be more sane alternatives, when it becomes
| obvious that any side-effect of timing is a potential
| attack vector? See https://www.hertzbleed.com/ for
| frequency side channels. I do only see dedicated security
| cores as options with fast data lanes to the CPU similar to
| what Apple is doing with Secure Enclave or do you have
| better suggestions that still allow performance and power
| savings?
| amluto wrote:
| A design such that it would actually make sense for a
| compiler to mark code that should permit data-dependent
| CPU optimizations differently from code that should not.
|
| This could be done using an opcode prefix, which would
| bloat code but would work perfectly. Or it could use an
| RFLAGS bit or a bit in MXCSR or a new register, etc.
|
| Almost anything would be better than an MSR that is only
| accessible to privileged code.
| cesarb wrote:
| > Or it could use an RFLAGS bit or a bit in MXCSR or a
| new register, etc.
|
| > Almost anything would be better than an MSR that is
| only accessible to privileged code.
|
| ARM does that: their flag (DIT) is accessible by non-
| privileged code. If you know the architecture has that
| flag, either because your -march= is recent enough or
| because the operating system told you so through the
| hwcaps or the emulated id registers, you can use it
| freely without needing to switch to privileged mode
| through a syscall.
| JoshTriplett wrote:
| The one reason I can _imagine_ to make it privileged-only
| is that it could be high-overhead to switch: _if_ a CPU
| conditioned various kinds of predictors on it, it might
| have to flush those predictors when toggling it.
| amluto wrote:
| IMO the best design would be to keep the flag with the
| data. Give each register an extra bit indicating whether
| it's sensitive. Any data-dependent-timing operation can't
| possibly leak the data until the data is available to it,
| and that's exactly when the ALU would find out that the
| data is sensitive anyway. No pipeline stalls.
| stingraycharles wrote:
| Sorry, I may be missing the point here, but reading that page
| doesn't immediately make it obvious to me what that feature is.
| Is it some constant time execution mechanism that you can
| enable / disable on a per-thread basis to do... what exactly?
| wat10000 wrote:
| It turns off CPU features that could cause execution time to
| vary in a way that depends on the data being operated on.
| seanhunter wrote:
| As a concrete example, say I have a (very naive and bad)
| password-checker that works like this pseudocode:
| > for i = 1 to len(real_password) { > if
| entered_password[i] != real_password[i] { > return
| FAILURE > } > } > > return SUCCESS
|
| OK now an alert attacker with the ability to very accurately
| record the time it takes to check the password can determine
| the length at least of the real password, because the time
| complexity of this check is O(length of the real password),
| and they could also gradually determine the password itself
| because the check would take longer as the attacker got each
| successive character correct.
|
| Taking this general idea and expanding it, there are lots of
| places where the timing of branches of code can leak
| information about some secret, so in cryptographic code in
| particular, it's often beneficial to be able to ensure that
| two branches (the success and failure branches in the above)
| take exactly the same amount of time so the timing doesn't
| leak information. So to fix the above you would probably want
| to do two things. Firstly set a boolean to failure and still
| continue the checking to ensure the "return failure quickly"
| problem doesn't leak information and also change your
| password check to check against a fixed-width hash or
| something so the length of the password itself wasn't a
| factor.
|
| The problem is lots of performance optimizations (pipelining,
| branch prediction etc) work specifically against this goal-
| they aim to take branches quickly in the happy path of the
| code because normally that's what you want to ensure optimal
| performance.
|
| So say instead of the above I do > bool
| status = SUCCESS > for i = 1 to hash_length { >
| if hash_of_entered_password[i] != hash_of_real_password[i] {
| > status = FAILURE > } > } > >
| return status
|
| ...I don't want the optimizer to realize that when status
| becomes FAILURE it can never become SUCCESS again and the
| loop doesn't do anything else so just return early. I want it
| to actually run the pointless comparison of the rest of the
| hash so the timing is exactly the same each time.
|
| But now my check is constant time but I've shifted the burden
| onto the person who writes the hash function. That has to run
| in constant time or my check will once again leak. So in
| general people want the ability to tell the compiler that
| they want a particular piece of code to run in constant time.
| At the moment, in the general case I think you have to break
| into inline assembly to achieve this.
| rkagerer wrote:
| Why not just always spin until a fixed number of ticks (or
| microseconds for slewing clocks) have passed (starting from
| function entry), prior to returning?
|
| Obviously this doesn't mitigate power usage side channel
| attacks, but that's not the point here.
|
| It's time-bound, so let's check _time_.
| littlestymaar wrote:
| Wouldn't that be much less efficient?
| seanhunter wrote:
| You could totally do that, but in the exact same way as
| the above, you'd want the compiler not to optimize your
| spinlock away thinking it wasn't needed. My understanding
| is in lots of real applications, the asm code that I
| mentioned is in part making sure it waits specific
| numbers of clocks in each branch to ensure they all
| exactly balance.
| codedokode wrote:
| By the way, for small arrays removing branches also
| improves performance (according to my tests). So if you do
| a constant-time comparison like this:
| match = True for a, b in pad(entered_pass,
| real_pass): match = match and a == b
| return match
|
| Then it will be faster as well. I was surprised to find
| this.
| charcircuit wrote:
| These are meaningless without guarantees that the processor will
| run the instructions in constant time and not run the code as
| fast as possible. Claims like cmov on x86 always being constant
| time are dangerous because a microcode update could change that
| to not be the case anymore. Programmers want an actual guarantee
| that the code will take the same amount of time.
|
| We should be asking our CPU vendors to support enabling a
| constant time mode of some sort for sensitive operations.
| Gibbon1 wrote:
| That's been one of my counters to the bitch that C isn't safe.
| The underlying architecture isn't safe.
|
| That said WG21 and WG14 don't seem to be able to get the memo
| that safety is more important than single core speed. Or as I
| suspect a bunch members are actually malicious.
| derangedHorse wrote:
| I agree. For use cases where side channel attacks are likely to
| be attempted, the security of the system ultimately depends on
| both the software and hardware used.
| adrian_b wrote:
| Nowadays both Intel/AMD CPUs and Arm-based CPUs guarantee that
| a certain subset of the instructions are executed in constant
| time.
|
| For an example of a list of such instructions see:
|
| https://www.intel.com/content/www/us/en/developer/articles/t...
|
| However, cooperation from the operating system is necessary, as
| the constant-time execution mode may need to be enabled by
| setting certain CPU-control bits in protected registers (e.g.
| IA32_UARCH_MISC_CTL[DOITM]).
|
| See for instance:
|
| https://www.intel.com/content/www/us/en/developer/articles/t...
|
| CMOV is on the list of instructions with constant-time
| execution, but the list is valid only with the corresponding
| control bit set correctly.
| cesarb wrote:
| > However, cooperation from the operating system is
| necessary, as the constant-time execution mode may need to be
| enabled by setting certain CPU-control bits in protected
| registers (e.g. IA32_UARCH_MISC_CTL[DOITM]).
|
| The way ARM does this is way better, since it doesn't need
| help from the operating system: user-space can directly set
| and clear the DIT bit. Operating system cooperation is
| necessary only to know whether that bit exists (because the
| ID registers are not directly readable by user mode).
| rurban wrote:
| Or just disable the fucking broken optimizer for such a function.
| #pragma GCC optimize ("O0")
| adrian_b wrote:
| Disabling optimizations does not necessarily result in more
| deterministic execution.
|
| With "-O0", the generated code normally retains a huge number
| of useless register loads and stores, which lead to non-
| deterministic timing due to contention in the use of caches and
| of the main memory interface. Optimized code may run only
| inside registers, being thus executed in constant time
| regardless of what other CPU cores do.
|
| The only good part is that this non-deterministic timing will
| not normally depend on the data values. The main danger of the
| non-constant execution time is when this time depends on the
| values of the processed data, which provides information about
| those values.
|
| There are cases when disabling optimization may cause data-
| dependent timing, e.g. if with optimization the compiler would
| have chosen a conditional move and without optimization it
| chooses a data-dependent branch.
|
| The only certain way of achieving data-independent timing is to
| use either assembly language or appropriate compiler
| intrinsics.
| anematode wrote:
| I'd love if this could make it into Rust... but I'm wondering if
| it'd be a bit of a burden on the creators of alternative backends
| (e.g. Cranelift), since if they implemented it naively, they
| would be unsuitable for compiling cryptographic code using it.
| aw1621107 wrote:
| > but I'm wondering if it'd be a bit of a burden on the
| creators of alternative backends (e.g. Cranelift)
|
| Technically _any_ new feature that requires backend support is
| an additional burden on backend devs. There 's nothing special
| about constant-time builtins in this respect.
|
| > since if they implemented it naively
|
| Strictly speaking, whether an implementation is naive is
| independent of whether it is correct. An implementation that
| purports to be constant time while not actually being constant
| time is wrong, no matter how naive or sophisticated the
| implementation may be.
___________________________________________________________________
(page generated 2025-11-26 23:01 UTC)