[HN Gopher] Constant-time support coming to LLVM: Protecting cry...
       ___________________________________________________________________
        
       Constant-time support coming to LLVM: Protecting cryptographic code
        
       https://archive.is/tIXt7
        
       Author : ahlCVA
       Score  : 105 points
       Date   : 2025-11-25 13:02 UTC (1 days ago)
        
 (HTM) web link (blog.trailofbits.com)
 (TXT) w3m dump (blog.trailofbits.com)
        
       | frabert wrote:
       | This has been a sore point in a lot of discussions regarding
       | compiler optimizations and cryptographic code, how compilers and
       | compiler engineers are sabotaging the efforts of cryptographers
       | in making sure there are no side-channels in their code. The
       | issue has never been the compiler, and has always been the
       | language: there was never a way to express the right intention
       | from within C (or most other languages, really).
       | 
       | This primitive we're trying to introduce is meant to make up for
       | this shortcoming without having to introduce additional rules in
       | the standard.
        
         | jfindper wrote:
         | > _how compilers and compiler engineers are sabotaging the
         | efforts of cryptographers_
         | 
         | I'm not exposed to this space very often, so maybe you or
         | someone else could give me some context. "Sabotage" is a
         | _deliberate_ effort to ruin /hinder something. Are compiler
         | engineers _deliberately_ hindering the efforts of
         | cryptographers? If yes... is there a reason why? Some long-
         | running feud or something?
         | 
         | Or, through the course of their efforts to make compilers
         | faster/etc, are cryptographers just getting the "short end of
         | the stick" so to speak? Perhaps forgotten about because the
         | number of cryptographers is dwarfed by the number of non-
         | cryptographers? (Or any other explanation that I'm unaware of?)
        
           | chowells wrote:
           | It's more a viewpoint thing. Any construct cryptographers
           | find that runs in constant time is something that could be
           | optimized to run faster for non-cryptographic code. Constant-
           | time constructs essentially are optimizer bug reports. There
           | is always the danger that by popularizing a technique you are
           | drawing the attention of a compiler contributor who wants to
           | speed up a benchmark of that same construct in non-
           | cryptographic code. So maybe it's not intended as sabotage,
           | but it can sure feel that way when everything you do is
           | explicitly targeted to be changed after you do it.
        
           | stouset wrote:
           | It's not intentional. The motivations of CPU designers,
           | compiler writers, and optimizers are at odds with those of
           | cryptographers. The former want to use every trick possible
           | to squeeze out additional performance in the most common
           | cases, while the latter absolutely require indistinguishable
           | performance across all possibilities.
           | 
           | CPUs love to do branch prediction to have computation already
           | performed in the case where it guesses the branch correctly,
           | but cryptographic code needs equal performance no matter the
           | input.
           | 
           | When a programmer asks for some register or memory location
           | to be zeroed, they generally just want to be able to use a
           | zero in some later operation and so it doesn't really matter
           | that a previous value was really overwritten. When a
           | cryptographer does, they generally are trying to make it
           | impossible to read the previous value. And they want to be
           | able to have some guarantee that it wasn't implicitly copied
           | somewhere else in the interim.
        
           | layer8 wrote:
           | "Sabotage" can be used in a figurative sense that doesn't
           | insinuate intent. An adjacent example is "self-sabotage",
           | which doesn't imply intent.
        
             | jfindper wrote:
             | Every dictionary I've looked at, wikipedia, etc. all
             | immediately and prominently highlight the intent part. It
             | really seems like the defining characteristic of "sabotage"
             | vs. other similar verbs. But, language is weird, so,
             | -\\_(tsu)_/-.
        
             | layer8 wrote:
             | Since the sibling comment is dead and thus I can't reply to
             | it: Search for "unintentional sabotage", which should
             | illustrate the usage. Despite appearances, it isn't an
             | oxymoron. See also meaning 3a on https://www.merriam-
             | webster.com/dictionary/sabotage.
        
           | colmmacc wrote:
           | I don't think it's nefarious but it is sabotage. There's long
           | been an implicit assumption that optimization should be more
           | important than safety.
           | 
           | Yes, languages do lack good mechanisms to mark variables or
           | sections as needing constant-time operation ... but compiler
           | maintainers could have taken the view that that means all
           | code should be compiled that way. Now instead we're marking
           | data and section as "secret" so that they can be left
           | unoptimized. But why not the other way around?
           | 
           | I understand how we get here; speed and size are trivial to
           | measure and they each result in real-world cost savings. I
           | don't think any maintainer could withstand this pressure. But
           | it's still deliberate.
        
             | aw1621107 wrote:
             | > Now instead we're marking data and section as "secret" so
             | that they can be left unoptimized. But why not the other
             | way around?
             | 
             | Worse cost-benefit tradeoff, perhaps? I'd imagine the
             | amount of code that cares more about size/speed than
             | constant-time operation far outnumbers the amount of code
             | which prioritizes the opposite, and given the real-world
             | benefits you mention and the relative newness of concerns
             | about timing attacks I think it makes sense that compiler
             | writers have defaulted to performance over constant-time
             | performance.
             | 
             | In addition, I think a complicating factor is that
             | compilers can't infer intent from code. The exact same
             | pattern may be used in both performance- and timing-
             | sensitive code, so absent some external signal the compiler
             | has to choose whether it prioritizes speed or timing. If
             | you think more code will benefit from speed than timing,
             | then that is a reasonable default to go with.
        
           | soulbadguy wrote:
           | As compiler have become more sophisticated, and hardware
           | architecture more complicated, there are been a growing
           | sentiment that some of the code transformation done by modern
           | compiler make the code hard to reason about and to predict.
           | 
           | A lot of software engineer are seeing this as compiler
           | engineer only caring about performance as opposed to other
           | aspect such as debuggability, safety, compile time and
           | productivity etc... I think that's where the "sabotage" comes
           | from. Basically the focus on performance at the detriment of
           | other things.
           | 
           | My 2 cents : The core problem is programmers expecting
           | invariant and properties not defined in the languange
           | standard. The compiler only garanty things as defined in the
           | standard, expecting anything else is problematic.
        
         | fooker wrote:
         | > making sure there are no side-channels in their code
         | 
         | Any side effect is a side channel. There are always going to be
         | side channels in real code running on real hardware.
         | 
         | Sure you can change your code, compiler, or, or even hardware
         | to account for this but at it's core that is security by
         | obscurity.
        
         | fanf2 wrote:
         | What happened to the blog post? It was moved and now it has
         | disappeared :-(
        
           | Meneth wrote:
           | Archived copy here: https://archive.is/tIXt7
        
           | foresto wrote:
           | https://web.archive.org/web/20251125224147/https://blog.trai.
           | ..
        
           | frabert wrote:
           | Sorry, it was published early and we have to wait for some
           | approval checks to clear
        
         | Asooka wrote:
         | There really ought to be a subset of C that lets you write
         | portable assembly. One where only a defined set of
         | optimisations are allowed and required to be performed,
         | "inline" means always inline, the "register" and "auto"
         | keywords have their original meanings, every stack variable is
         | allocated unless otherwise indicated, every expression has
         | defined evaluation order, every read/write from/to an address
         | is carried out, nothing is ever reordered, and undefined
         | behaviour is switched to machine-specific behaviour. Currently
         | if you need that level of control, your only option is writing
         | it in assembly, which gets painful when you need to support
         | multiple architectures, or want fancy features like
         | autocomplete or structs and functions.
        
           | saagarjha wrote:
           | Why would you like register and auto to have meaning?
        
             | thyristan wrote:
             | Because for timing-sensitive code, those are important. If
             | a variable is really a register, cache-based timing attacks
             | just don't happen, because there is no cache in between.
        
           | delta_p_delta_x wrote:
           | > want fancy features like autocomplete or structs and
           | functions
           | 
           | I would argue that given a certain ISA, it's probably easier
           | to write an autocomplete extension for assembly targeting
           | that ISA, rather than autocomplete for C, or goodness forbid,
           | C++.
           | 
           | Likewise for structs, functions, jump targets, etc. One could
           | probably set up snippets corresponding to different sorts of
           | conditional execution--loops, if/else/while, switch, etc.
        
         | GhosT078 wrote:
         | SPARK is a very expressive language for implementing
         | cryptographic applications. It is available for some LLVM
         | targets (e.g. x86-64).
        
       | zzo38computer wrote:
       | I think __builtin_ct_select and __builtin_ct_expr would be good
       | ideas. (They could also be implemented in GCC in future, as well
       | as LLVM.)
       | 
       | In some cases it might be necessary to consider the possibility
       | of invalid memory accesses (and avoid the side-channels when
       | doing so). (The example given in the article works around this
       | issue, but I don't know if there are any situations where this
       | will not help.)
        
         | connicpu wrote:
         | The side channel from memory access timings are exactly why
         | cmov is its own instruction on x86_64. It retrieves the memory
         | regardless of the condition value. Anything else would change
         | the timings based on condition. If you're going to segfault
         | that's going to be visible to an attacker regardless because
         | you're going to hang up.
        
           | zzo38computer wrote:
           | I mean the possibility that the rest of the program
           | guarantees that the address is valid if the condition is true
           | but otherwise it might be valid or invalid. This is probably
           | not important for most applications, but I don't know if
           | there are some unusual ones where it would matter.
        
           | wahern wrote:
           | AFAIU, cmov wasn't originally intended to be a guaranteed
           | constant-time operation, Intel and AMD won't commit to
           | keeping it constant-time in the future, but it just so
           | happened that at one point it was implemented in constant-
           | time across CPUs, cryptographers picked up on this and began
           | using it, and now Intel and AMD tacitly recognize this
           | dependency. See, e.g., https://www.intel.com/content/www/us/e
           | n/developer/articles/t...
           | 
           | > The CMOVcc instruction runs in time independent of its
           | arguments in all current x86 architecture processors. This
           | includes variants that load from memory. The load is
           | performed before the condition is tested. Future versions of
           | the architecture may introduce new addressing modes that do
           | not exhibit this property.
        
             | adrian_b wrote:
             | At your link there is a link to the list of instructions
             | that guarantee constant execution time, independent of the
             | operands.
             | 
             | The list includes CMOV.
             | 
             | However, the instructions from the list are guaranteed to
             | have constant execution time, even on any future CPUs, only
             | if the operating system sets a certain CPU control bit.
             | 
             | So on recent and future Intel/AMD CPUs, one may need to
             | verify that the correct choice has been made between secure
             | execution mode and fastest execution mode.
        
         | codedokode wrote:
         | The names are pretty unattractive (which is not uncommon for C)
         | though to the point one would want to avoid them in the code.
        
           | anematode wrote:
           | The solution is always another macro :)
           | 
           | #define ct_select __builtin_ct_select
        
       | ethin wrote:
       | So this makes me curious: is there a reason we don't do something
       | like a __builtin_ct_begin()/__builtin_ct_end() set of intrinsics?
       | Where the begin intrinsic begins a constant-time code region, and
       | all code within that region must be constant-time, and that
       | region must be ended with an end() call? I'm not too familiar
       | with compiler intrinsics or how these things work so thought I'd
       | ask. The intrinsic could be scoped such that the compiler can use
       | it's implementation-defined behavior freedom to enforce the
       | begin/end pairs. But Idk, maybe this isn't feasible?
        
         | zzo38computer wrote:
         | Maybe it might be better to be implemented as a function
         | attribute instead
        
           | ethin wrote:
           | Or a pragma? Like how OpenMP did it?
        
         | grogers wrote:
         | It'd be very hard for the compiler to enforce constant-time
         | execution for generic code. As an example, if you wrote the
         | naive password checking where the first byte that doesn't match
         | returns false, is that a compiler error if it can't transform
         | it into a constant time version?
        
       | amluto wrote:
       | Too bad that Intel chips more or less reserve the right to take
       | LLVM's nice output and make it non-constant-time anyway. See:
       | 
       | https://www.intel.com/content/www/us/en/developer/articles/t...
       | 
       | Sure, you could run on some hypothetical OS that supports DOITM
       | and insert syscalls around every manipulation of secret data.
       | Yeah, right.
        
         | JoshTriplett wrote:
         | Last I saw, it seemed like the plan was to unconditionally
         | _enable_ it, and on the off chance there 's ever a piece of
         | hardware where it's a substantial performance win, offer a way
         | to opt _out_ of it.
        
           | amluto wrote:
           | I advocated for that, and then I completely lost track of the
           | status.
           | 
           | The whole design is ridiculous.
        
             | matu3ba wrote:
             | What would be more sane alternatives, when it becomes
             | obvious that any side-effect of timing is a potential
             | attack vector? See https://www.hertzbleed.com/ for
             | frequency side channels. I do only see dedicated security
             | cores as options with fast data lanes to the CPU similar to
             | what Apple is doing with Secure Enclave or do you have
             | better suggestions that still allow performance and power
             | savings?
        
               | amluto wrote:
               | A design such that it would actually make sense for a
               | compiler to mark code that should permit data-dependent
               | CPU optimizations differently from code that should not.
               | 
               | This could be done using an opcode prefix, which would
               | bloat code but would work perfectly. Or it could use an
               | RFLAGS bit or a bit in MXCSR or a new register, etc.
               | 
               | Almost anything would be better than an MSR that is only
               | accessible to privileged code.
        
               | cesarb wrote:
               | > Or it could use an RFLAGS bit or a bit in MXCSR or a
               | new register, etc.
               | 
               | > Almost anything would be better than an MSR that is
               | only accessible to privileged code.
               | 
               | ARM does that: their flag (DIT) is accessible by non-
               | privileged code. If you know the architecture has that
               | flag, either because your -march= is recent enough or
               | because the operating system told you so through the
               | hwcaps or the emulated id registers, you can use it
               | freely without needing to switch to privileged mode
               | through a syscall.
        
               | JoshTriplett wrote:
               | The one reason I can _imagine_ to make it privileged-only
               | is that it could be high-overhead to switch: _if_ a CPU
               | conditioned various kinds of predictors on it, it might
               | have to flush those predictors when toggling it.
        
               | amluto wrote:
               | IMO the best design would be to keep the flag with the
               | data. Give each register an extra bit indicating whether
               | it's sensitive. Any data-dependent-timing operation can't
               | possibly leak the data until the data is available to it,
               | and that's exactly when the ALU would find out that the
               | data is sensitive anyway. No pipeline stalls.
        
         | stingraycharles wrote:
         | Sorry, I may be missing the point here, but reading that page
         | doesn't immediately make it obvious to me what that feature is.
         | Is it some constant time execution mechanism that you can
         | enable / disable on a per-thread basis to do... what exactly?
        
           | wat10000 wrote:
           | It turns off CPU features that could cause execution time to
           | vary in a way that depends on the data being operated on.
        
           | seanhunter wrote:
           | As a concrete example, say I have a (very naive and bad)
           | password-checker that works like this pseudocode:
           | > for i = 1 to len(real_password) {       >   if
           | entered_password[i] != real_password[i] {       >      return
           | FAILURE       >   }       > }       >       > return SUCCESS
           | 
           | OK now an alert attacker with the ability to very accurately
           | record the time it takes to check the password can determine
           | the length at least of the real password, because the time
           | complexity of this check is O(length of the real password),
           | and they could also gradually determine the password itself
           | because the check would take longer as the attacker got each
           | successive character correct.
           | 
           | Taking this general idea and expanding it, there are lots of
           | places where the timing of branches of code can leak
           | information about some secret, so in cryptographic code in
           | particular, it's often beneficial to be able to ensure that
           | two branches (the success and failure branches in the above)
           | take exactly the same amount of time so the timing doesn't
           | leak information. So to fix the above you would probably want
           | to do two things. Firstly set a boolean to failure and still
           | continue the checking to ensure the "return failure quickly"
           | problem doesn't leak information and also change your
           | password check to check against a fixed-width hash or
           | something so the length of the password itself wasn't a
           | factor.
           | 
           | The problem is lots of performance optimizations (pipelining,
           | branch prediction etc) work specifically against this goal-
           | they aim to take branches quickly in the happy path of the
           | code because normally that's what you want to ensure optimal
           | performance.
           | 
           | So say instead of the above I do                 > bool
           | status = SUCCESS       > for i = 1 to hash_length {       >
           | if hash_of_entered_password[i] != hash_of_real_password[i] {
           | >      status = FAILURE       >   }       > }       >       >
           | return status
           | 
           | ...I don't want the optimizer to realize that when status
           | becomes FAILURE it can never become SUCCESS again and the
           | loop doesn't do anything else so just return early. I want it
           | to actually run the pointless comparison of the rest of the
           | hash so the timing is exactly the same each time.
           | 
           | But now my check is constant time but I've shifted the burden
           | onto the person who writes the hash function. That has to run
           | in constant time or my check will once again leak. So in
           | general people want the ability to tell the compiler that
           | they want a particular piece of code to run in constant time.
           | At the moment, in the general case I think you have to break
           | into inline assembly to achieve this.
        
             | rkagerer wrote:
             | Why not just always spin until a fixed number of ticks (or
             | microseconds for slewing clocks) have passed (starting from
             | function entry), prior to returning?
             | 
             | Obviously this doesn't mitigate power usage side channel
             | attacks, but that's not the point here.
             | 
             | It's time-bound, so let's check _time_.
        
               | littlestymaar wrote:
               | Wouldn't that be much less efficient?
        
               | seanhunter wrote:
               | You could totally do that, but in the exact same way as
               | the above, you'd want the compiler not to optimize your
               | spinlock away thinking it wasn't needed. My understanding
               | is in lots of real applications, the asm code that I
               | mentioned is in part making sure it waits specific
               | numbers of clocks in each branch to ensure they all
               | exactly balance.
        
             | codedokode wrote:
             | By the way, for small arrays removing branches also
             | improves performance (according to my tests). So if you do
             | a constant-time comparison like this:
             | match = True         for a, b in pad(entered_pass,
             | real_pass):             match = match and a == b
             | return match
             | 
             | Then it will be faster as well. I was surprised to find
             | this.
        
       | charcircuit wrote:
       | These are meaningless without guarantees that the processor will
       | run the instructions in constant time and not run the code as
       | fast as possible. Claims like cmov on x86 always being constant
       | time are dangerous because a microcode update could change that
       | to not be the case anymore. Programmers want an actual guarantee
       | that the code will take the same amount of time.
       | 
       | We should be asking our CPU vendors to support enabling a
       | constant time mode of some sort for sensitive operations.
        
         | Gibbon1 wrote:
         | That's been one of my counters to the bitch that C isn't safe.
         | The underlying architecture isn't safe.
         | 
         | That said WG21 and WG14 don't seem to be able to get the memo
         | that safety is more important than single core speed. Or as I
         | suspect a bunch members are actually malicious.
        
         | derangedHorse wrote:
         | I agree. For use cases where side channel attacks are likely to
         | be attempted, the security of the system ultimately depends on
         | both the software and hardware used.
        
         | adrian_b wrote:
         | Nowadays both Intel/AMD CPUs and Arm-based CPUs guarantee that
         | a certain subset of the instructions are executed in constant
         | time.
         | 
         | For an example of a list of such instructions see:
         | 
         | https://www.intel.com/content/www/us/en/developer/articles/t...
         | 
         | However, cooperation from the operating system is necessary, as
         | the constant-time execution mode may need to be enabled by
         | setting certain CPU-control bits in protected registers (e.g.
         | IA32_UARCH_MISC_CTL[DOITM]).
         | 
         | See for instance:
         | 
         | https://www.intel.com/content/www/us/en/developer/articles/t...
         | 
         | CMOV is on the list of instructions with constant-time
         | execution, but the list is valid only with the corresponding
         | control bit set correctly.
        
           | cesarb wrote:
           | > However, cooperation from the operating system is
           | necessary, as the constant-time execution mode may need to be
           | enabled by setting certain CPU-control bits in protected
           | registers (e.g. IA32_UARCH_MISC_CTL[DOITM]).
           | 
           | The way ARM does this is way better, since it doesn't need
           | help from the operating system: user-space can directly set
           | and clear the DIT bit. Operating system cooperation is
           | necessary only to know whether that bit exists (because the
           | ID registers are not directly readable by user mode).
        
       | rurban wrote:
       | Or just disable the fucking broken optimizer for such a function.
       | #pragma GCC optimize ("O0")
        
         | adrian_b wrote:
         | Disabling optimizations does not necessarily result in more
         | deterministic execution.
         | 
         | With "-O0", the generated code normally retains a huge number
         | of useless register loads and stores, which lead to non-
         | deterministic timing due to contention in the use of caches and
         | of the main memory interface. Optimized code may run only
         | inside registers, being thus executed in constant time
         | regardless of what other CPU cores do.
         | 
         | The only good part is that this non-deterministic timing will
         | not normally depend on the data values. The main danger of the
         | non-constant execution time is when this time depends on the
         | values of the processed data, which provides information about
         | those values.
         | 
         | There are cases when disabling optimization may cause data-
         | dependent timing, e.g. if with optimization the compiler would
         | have chosen a conditional move and without optimization it
         | chooses a data-dependent branch.
         | 
         | The only certain way of achieving data-independent timing is to
         | use either assembly language or appropriate compiler
         | intrinsics.
        
       | anematode wrote:
       | I'd love if this could make it into Rust... but I'm wondering if
       | it'd be a bit of a burden on the creators of alternative backends
       | (e.g. Cranelift), since if they implemented it naively, they
       | would be unsuitable for compiling cryptographic code using it.
        
         | aw1621107 wrote:
         | > but I'm wondering if it'd be a bit of a burden on the
         | creators of alternative backends (e.g. Cranelift)
         | 
         | Technically _any_ new feature that requires backend support is
         | an additional burden on backend devs. There 's nothing special
         | about constant-time builtins in this respect.
         | 
         | > since if they implemented it naively
         | 
         | Strictly speaking, whether an implementation is naive is
         | independent of whether it is correct. An implementation that
         | purports to be constant time while not actually being constant
         | time is wrong, no matter how naive or sophisticated the
         | implementation may be.
        
       ___________________________________________________________________
       (page generated 2025-11-26 23:01 UTC)