[HN Gopher] Intel to disable TSX by default on more CPUs with ne...
       ___________________________________________________________________
        
       Intel to disable TSX by default on more CPUs with new microcode
        
       Author : pella
       Score  : 101 points
       Date   : 2021-06-28 17:36 UTC (5 hours ago)
        
 (HTM) web link (www.phoronix.com)
 (TXT) w3m dump (www.phoronix.com)
        
       | tyingq wrote:
       | Wikipedia says:
       | 
       |  _" According to different benchmarks, TSX/TSX-NI can provide
       | around 40% faster applications execution in specific workloads,
       | and 4-5 times more database transactions per second (TPS)"_[1]
       | 
       | That sounds like a pretty big deal to turn off. Though it seems
       | to be based on microbenchmarks. I wonder what the real world
       | impact for a typical RDBMS is.
       | 
       | [1]
       | https://en.wikipedia.org/wiki/Transactional_Synchronization_...
        
         | rodgerd wrote:
         | Attempting to do transactional memory (and, I've been told, the
         | 1.2 kW power requirement!) was what sunk Sun's Rock processors.
        
         | uniqueuid wrote:
         | From past benchmarks of Spectre and Meltdown mitigations[1], we
         | know that synchronization primitives and context switches were
         | especially impacted. The same seems to be the case here, so I'd
         | expect RDBMS to be heavily affected unless they attempt to
         | specifically circumvent the deactivation.
         | 
         | Curiously, postgres seems to perform okay (<10% drop) two years
         | after the initial mitigations - perhaps due to newer versions
         | of kernel-level fixes having less of a performance impact.
         | 
         | [edit] According to phoronix, the small penalty is mostly due
         | to hardware-based mitigations in newer architectures.
         | 
         | [1]
         | https://www.phoronix.com/scan.php?page=article&item=spectre-...
        
         | Palomides wrote:
         | I don't think any major DBs use TSX
        
           | tyingq wrote:
           | It shows up in glibc a couple of times around pthreads, if
           | you search around for "elision", "RTM", "HLE", etc.
        
             | rrss wrote:
             | yes, glibc tried to use TSX in pthreads, and it was a
             | disaster and all distros patched glibc to not use it.
             | eventually upstream glibc turned it off by default too.
        
               | aj3 wrote:
               | Why was it a disaster? Is it buggy or what?
        
               | rrss wrote:
               | At first glibc's use of TSX/RTM for mutexes was behind a
               | compile-time opt-in, and Intel was hyping up how great
               | hardware transactional memory was, so most distros
               | flipped the switch to enable it. But glibc's use of
               | TSX/RTM for rwlocks was not behind a compile-time guard,
               | and was used whenever CPUID said it was enabled.
               | 
               | I think the first issue was that pthread locks using TSX
               | behaved differently for incorrect usage, so that
               | applications that had double-unlock bugs worked fine with
               | the normal implementation but crashed with TSX glibc.
               | These were app bugs, but a lot of people were unhappy
               | that a glibc update caused their applications to crash,
               | so some distros patched glibc to make time to fix the
               | apps.
               | 
               | Then TSX was broken in Haswell and Intel disabled it, but
               | it was a total mess, such that the bit in CPUID
               | indicating support for TSX/RTM did not reliably indicate
               | support for TSX/RTM (i.e. bit was set indicating support,
               | but any TSX/RTM instruction resulted in SIGILL), so
               | various versions of glibc ended up with various lists of
               | CPU model numbers to determine when to avoid using TSX.
               | 
               | repeat for broadwell.
               | 
               | At this point more distros decided to turn off TSX in
               | glibc entirely. But most did this by just removing the
               | compile-time opt-in, which did not do anything for
               | rwlocks. A couple distros noticed that rwlocks were still
               | using TSX, and patched that too to remove lock elision
               | entirely - but many did not, so rwlocks on haswell or
               | early broadwell would still cause issues.
               | 
               | In my experience, a lot of the time RTM was not useful
               | for performance because the transactions would be too
               | large and abort. It was worse because in these cases
               | glibc would try the transaction again several times
               | before giving up, which could cause massive performance
               | hits compared to just doing the normal lock to start
               | with.
               | 
               | There were errata about TSX in skylake too, but I can't
               | remember if Intel actually turned RTM off with microcode
               | before now or just left it as is because no one was using
               | it.
               | 
               | Eventually glibc was changed to always compile in support
               | for TSX but require a runtime opt-in using an environment
               | variable.
               | 
               | (this is from memory, the timeline is probably not quite
               | right)
               | 
               | EDIT: I had a bit of deja-vu writing this, and it turns
               | out I have whined about this on HN before lol:
               | https://news.ycombinator.com/item?id=22694546
        
               | aj3 wrote:
               | Thank you, this is very insightful.
        
               | topspin wrote:
               | Early TSX hardware from Intel had unfixable flaws. The
               | problems were fixed in later CPUs but that left the
               | burden of detection on developers. Then along came side
               | channel attacks and TSX became a vector for that. Plus
               | it's Intel only, so you have to use another approach on
               | AMD, which has completely distinct instructions, and this
               | is an area with extreme subtleties where you really don't
               | want or can't afford to deal with multiple
               | implementations.
               | 
               | In other words it's been a total shit show. It doesn't
               | have to be, but until Intel takes it seriously and
               | produces a unflawed and secure realization it's going to
               | continue to be a shit show. Even then you're still left
               | with the Intel != AMD problem...
        
               | patrakov wrote:
               | A lot of applications were subtly buggy (unlocked an
               | already-unlocked mutex sometimes). Unlike the previous
               | implementation, with TSX, unlocking an already unlocked
               | mutex is an instant crash. And the applications were
               | buggy exactly because nothing bad happened before due to
               | this bug, and there were no widespread-enough tools that
               | would have helped recognizing this as an application
               | problem during development.
        
         | monocasa wrote:
         | I really liked Cliff Click's talk on why transactional memory
         | didn't live up to it's promises (at least for accelerating
         | general purpose locking of Java code not written specifically
         | for transactional memory).
         | 
         | Cliff Click -- The Azul Hardware Transactional Memory
         | experience
         | 
         | https://www.youtube.com/watch?v=GEkeOHw87Sg
         | 
         | TL;DW: Transactional memory sort of promised lock elision to
         | speed up gratuitously lock heavy code, particularly for read
         | heavy workloads. Think a caching hash map that only has the
         | occasional writer. It didn't live up to that since the max
         | transaction size is really easy to overflow (far easier than
         | you may think), and general code has a nasty habit of writing
         | metrics even in read cases which means that transactions all
         | write to the same address, which means constant transaction
         | conflicts, which means slower code than just using a lock.
         | Cliff Click then makes the argument that the right move in the
         | vast majority of cases if you're having lock contention is to
         | rewrite the algorithm so that threads (or at least cores)
         | communicate via messages and share nothing rather than trying
         | to use transactional memory so that you can cleanly fork out
         | across a cluster rather than being constrained to a single box
         | still after the rewrite. And for context he comes to that
         | conclusion while working on massive 768 core boxes where the
         | value add is "we sell you a giant non-NUMA single box that
         | absolutely eats Java for breakfast, lunch, and dinner".
         | 
         | Even if you read the TL;DW, I still encourage you to listen to
         | his talk; he's an incredibly smart engineer and I learn
         | something pretty much every time I listen to him.
        
           | rayiner wrote:
           | Cliff Click, aside from being a brilliant programmer, is a
           | great technical communicator. He came out of a fantastic
           | department at Rice that's produced some of the most readable
           | academic papers on compilers and VMs I've encountered.
        
         | cpleppert wrote:
         | The research papers linked in wikipedia don't really support
         | what wikipedia claims even though they are very favorable to
         | TSX. In one paper [1] the only way they could demonstrate a
         | performance improvement in a real world benchmark was changing
         | the underlying lock strategy entirely. The rest are just
         | synthetic benchmarks that transactional memory can look good on
         | due to the lack of actual contention but which required lots of
         | changes to take advantage of. In [2] they used what appears to
         | be a very naive b-tree implementation and an index tree
         | combined with a dictionary. Even though this should probably be
         | a great use case for transaction memory they once again have to
         | make major changes to the implementation.
         | 
         | In a real database, you will run into transaction aborts much
         | more frequently to say nothing of the correctness concerns
         | raised by others in this thread.
         | 
         | [1]:https://web.archive.org/web/20161110144922/http://pcl.intel
         | -... [2]:Improving In-Memory Database Index Performance
        
         | alfalfasprout wrote:
         | I've wanted to use TSX for years... it's great as a way to get
         | atomicity for non-primitive types in a lock-free way. Could be
         | super useful in trading.
         | 
         | The issue is it's been YEARS now that intel claims it's ready
         | then turns it off in chips shortly thereafter. It's basically
         | vaporware at this point.
        
           | Syonyk wrote:
           | I spent a week or so a couple years back messing around with
           | TSX as well to see if I could improve some performance of
           | ringbuffers without having to do some more complex locking.
           | It would have required a lot of rework, and someone else on
           | the project pointed out that no TSX implementation seemed to
           | survive extended contact with reality without being disabled.
           | 
           | So I just wrote the more complex reader/writer lock system
           | and went on my way. Turns out to have been the right
           | answer...
           | 
           | At this point, TSX is like a new Google product launch -
           | "This, too, shall pass." I don't know why anyone would bother
           | spending much time with it after all these generations of
           | "TSX! Wait, no... uh..."
        
           | ploxiln wrote:
           | I've disliked the idea of TSX since the beginning. Having
           | studied some circuit design, computer architecture, and
           | operating systems in college, though I work in user-space
           | software now I do like to understand how all the layers
           | beneath me work. TSX has always given me the heebie-jeebies.
           | Sure it might give great performance uplift if you have a
           | team of skilled scientists and engineers and servers
           | dedicated to a single application. But for general computing,
           | it's really crazy. Good for academic papers, good for
           | marketing to the director level, bad for general engineering
           | work.
           | 
           | It turns out that it doesn't really work. Surprise! not
           | really
        
             | lallysingh wrote:
             | Why? The implementation sounded simple: bus snoop on
             | addresses used by the load unit, invalidate if anyone
             | writes to those lines.
        
             | alfalfasprout wrote:
             | I'm not sure TSX was really intended to be used everywhere
             | though. Admittedly the glibc team tried to use it in
             | pthreads and it was a disaster (a lot of code broke and of
             | course it kept getting disabled in CPUs).
             | 
             | Their issue IMO was not really scoping down TSX... the
             | project was far too ambitious.
        
               | ploxiln wrote:
               | Intel employees added it to glibc. I think Intel wanted
               | to show everyone "don't be scared, silly, you're already
               | using it!"
               | 
               | https://lwn.net/Articles/534761/
               | http://halobates.de/adding-lock-elision-to-linux.pdf
        
           | alfalfasprout wrote:
           | I agree, the implementation isn't great. The problem IMO is
           | that it tried to do way too much and be way too general which
           | leads to a bunch of tricky edge cases they need to handle
           | (and significantly complicates uses of caches).
           | 
           | That said, hardware support for atomic use of larger memory
           | regions than is currently supported is something that is
           | worthy as a problem to solve.
        
         | profmonocle wrote:
         | Yikes, this is one time I _really hope_ benchmarks don 't
         | reflect reality. Imagine you have a DB server that typically
         | peaks around 30% CPU. You reboot one night to install patches,
         | the next peak the server is suddenly completely maxed out due
         | to this performance hit, causing application timeouts. Your
         | update logs might not even help if you're on a cloud VM, since
         | microcode updates are handled by the hypervisor.
        
           | ikiris wrote:
           | To be blunt, if you're doing cloud VMs that way, you set
           | yourself up for failure. The whole point of cloud is
           | autoscale distributed workflows.
        
           | astrange wrote:
           | My understanding is that literally no one uses TSX except for
           | possibly one PS3 emulator.
        
             | monocasa wrote:
             | I wouldn't be surprised if a few more emulators use it too.
             | It's the cleanest way to emulate ll/sc style archs on x86.
        
       | no_time wrote:
       | sudo apt-mark hold intel-microcode
       | 
       | I will reconsider once a PoC comes out that affects either nginx
       | or sshd.
        
         | [deleted]
        
       | syntaxing wrote:
       | Does this affect 4th gen i-series? I noticed this past week that
       | my Mac M1 is almost twice as fast as my Intel desktop. Wasn't
       | sure if it's because of this.
        
       | hashhar wrote:
       | TSX has always been a correctness nightmare and nobody uses it
       | AFAIK. glibc also had some support for it but they disabled it by
       | default long ago due to correctness issues.
       | 
       | Do people even read past the headlines these days?
        
         | necheffa wrote:
         | I actually delt with TSX issues just a couple years ago on a
         | supported and patched SUSE cluster.
        
       | jnwatson wrote:
       | Is Intel planning on fixing TSX in future CPUs?
        
         | necheffa wrote:
         | Sort of. Sapphire Rapids is supposed to include new
         | instructions that fix via extending TSX.
         | 
         | But I doubt we'll see microcode patches that fix older CPUs. I
         | have a Haswell where TSX was permanently disabled via microcode
         | and s Comet Lake that straight up came from the foundry without
         | TSX.
        
       | profmonocle wrote:
       | I wonder what the reaction would be if similar post-sale
       | downgrades happened in other industries? Imagine if a car company
       | issued a critical safety recall, and the fix cut the vehicle's
       | MPG in half. Would the manufacturer be forced to buy back the
       | vehicles? Issue partial refunds? Replace them with newer models
       | that perform similar to how the original was meant to?
       | 
       | (Not a perfect analogy I admit - opting out of a CPU security
       | mitigation typically isn't going to risk anyone's life. At least
       | not on _most_ servers.)
        
         | mhh__ wrote:
         | I have read (no proof but it was on a tech forum and seemed
         | simple enough to be real) that Intel _will_ refund at least
         | some customers (The wording wasn 't clear whether it was
         | someone who bought a CPU or bought a HPC cluster...) if the CPU
         | was sold to you as having a feature and its then disabled _and_
         | you can show to them you used it.
        
           | r00fus wrote:
           | Those conditions sound onerous for the customer to provide.
        
             | mhh__ wrote:
             | Maybe, but if it was the same with FDIV from what I saw.
             | The bug was bad but actually bumping into it was quite
             | rare.
        
               | myself248 wrote:
               | Didn't that backfire? Having to prove that you used FDIV,
               | and that errors in the fourth-or-beyond decimal place
               | would actually affect you, was such bad press for Intel,
               | that they eventually offered replacements to anyone upon
               | request. As Wikipedia tells it:
               | 
               | > Intel's response to the FDIV bug has been cited as a
               | case of the public relations impact of a problem
               | eclipsing the practical impact of said problem on
               | customers. While most users were unlikely to encounter
               | the flaw in their day-to-day computing, the company's
               | initial reaction to not replace chips unless customers
               | could guarantee they were affected caused pushback from a
               | vocal minority of industry experts. The subsequent
               | publicity generated shook consumer confidence in the
               | CPUs, and led to a demand for action even from people
               | unlikely to be affected by the issue. Andrew Grove,
               | Intel's CEO at the time was quoted in the Wall Street
               | Journal as saying "I think the kernel of the issue we
               | missed [...] was that we presumed to tell somebody what
               | they should or shouldn't worry about, or should or
               | shouldn't do".
        
         | slavboj wrote:
         | That literally happened with the VW turbodiesels.
        
           | dev_tty01 wrote:
           | Big difference though. VW created the issue and then ran a
           | program to develop and implement a way to hide the problem.
           | Malicious intent. I don't think we can say the same for Intel
           | with a security bug that is uncovered after the sale. Now, if
           | someone could show Intel knew about Spectre or similar bugs
           | and hid them, then you are off to the races.
        
           | donalhunt wrote:
           | * in some markets. :(
        
           | schmichael wrote:
           | And the repercussions were substantial:
           | https://en.wikipedia.org/wiki/Volkswagen_emissions_scandal
           | 
           | (I'm not arguing equivalence between these two "post-consumer
           | downgrades" -- just pointing out there's not only prior art
           | for it occurring, but also prior art for
           | repercussions/compensation.)
        
           | monocasa wrote:
           | And they were subject to a $15B class action settlement
           | because of it in the US alone.
           | 
           | https://www.classaction.com/volkswagen/settlement/
        
           | jhenkens wrote:
           | First thing I thought of. I believe the trade off was getting
           | a warranty extension on the engine/emmissions system.
        
             | gambiting wrote:
             | In some US states VW was forced to buy the vehicles back
             | from owners if they wanted to sell them. I know most people
             | jumped on the offer as VW had to pay a certain pre-agreed
             | price and it was higher than market value.
        
           | collsni wrote:
           | It absolutely did. I owned a VW for 3 years and it was bought
           | back from me for 3k more than I originally paid for it.
           | https://en.wikipedia.org/wiki/Volkswagen_emissions_scandal
        
       | NewNetNow wrote:
       | Good news. ARM's CCA [1] seems revolutionary in comparison.
       | 
       | [1] https://www.arm.com/why-arm/architecture/security-
       | features/a...
        
         | zamadatix wrote:
         | CCA seems more like SGX than TSX
        
         | uniqueuid wrote:
         | From Wikipedia:
         | 
         | >Transactional Synchronization Extensions (TSX), also called
         | Transactional Synchronization Extensions New Instructions (TSX-
         | NI), is an extension to the x86 instruction set architecture
         | (ISA) that adds hardware transactional memory support, speeding
         | up execution of multi-threaded software through lock elision.
         | 
         | How is that related to CCA, which seems to be a security
         | feature?
        
       | anonymousiam wrote:
       | I read this article earlier this morning and I have one question
       | that I haven't found the answer for: Are the old Intel firmware
       | blobs still available somewhere, and can we choose to not load
       | this "upgrade"? If yes, how do we revert?
        
         | mjg59 wrote:
         | Unless your firmware vendor updates the microcode embedded in
         | your firmware and you update to that newer firmware, the new
         | microcode is loaded at runtime and won't persist over power
         | cycles. The old microcode is still available, yes.
        
           | anonymousiam wrote:
           | Yes, I understand that the upgraded firmware is loaded after
           | boot (unless burned into the MB ROM). What I'm looking for is
           | availability of the old firmware blobs. Intel probably has
           | copyright on them and can control their distribution.
        
             | michaellarabel wrote:
             | Yes, some former releases at least via
             | https://github.com/intel/Intel-Linux-Processor-Microcode-
             | Dat... is what I use during some of my past/present Linux
             | testing
        
       | intricatedetail wrote:
       | Should such crippling warrant a recall and refund? Not sure how a
       | small company could get away with something like this, but for
       | Intel it is just a patch?
        
         | wmf wrote:
         | If you use TSX don't apply the update. If you never used TSX it
         | doesn't matter.
        
           | monocasa wrote:
           | It sounds like they're disabling it because it's
           | fundamentally broken on those microarchs, as they're citing
           | memory ordering issues. So it's still a case of Intel
           | admitting that their product as advertised is defective even
           | if you don't apply the update.
        
       | sheepdestroyer wrote:
       | All these performance neutering patches, made in the name of
       | 'security' that in certain situations has no value whatsoever,
       | should only be accepted if also conditioned by the "mitigations="
       | Kernel boot option.
        
         | jeffbee wrote:
         | To be clear, TSX does not work and has never worked. Anyone
         | depending on TSX "for performance" is actually getting
         | incorrectness.
        
         | PixelOfDeath wrote:
         | Why not go all they way and let everything run in ring 0? Then
         | Intel can prefetch beyond kernel entry calls again! And we also
         | can see how AMD will performing under the same conditions!
         | 
         | With Intels wave of TSX disabling patches going over all the
         | previous CPU generations, AMD apparently got it right by never
         | bringing their own version into production at all.
        
           | DaiPlusPlus wrote:
           | > Why not go all they way and let everything run in ring 0?
           | 
           | How would you get memory protection or debugger support then?
           | 
           | Or the very-likely use-case of rendering live (i.e.
           | untrusted) web-content in-game? That would require a modern
           | engine like Chromium which in-turn necessitates per-process
           | isolation.
           | 
           | Also, no-one wants to have to reboot their entire computer
           | system just because some application code crashed. Even if
           | preemptive scheduling and the MMU works in ring 0 all
           | processes in ring 0 can manipulate the MMU registers, so a
           | misbehaving process can still bring down the entire system.
           | Bad idea.
           | 
           | Even dedicated HPC applications need it - remember that the
           | HPC/scientific-computing/batch-job-processing model is the
           | whole reason that operating-systems exist today: they're
           | highly evolved successors to job-control programs (e.g.
           | Chippewa Operating System). So I don't think they'd want to
           | give up all the improvements of the past 50+ years.
           | 
           | Ultimately we're just arguing about the principle of how the
           | user who owns the computer should be in _ultimate control_
           | over the hardware - but unfortunately Apple has demonstrated
           | that principle is unnecessary to make obscene profits and
           | also runs contrary to the realities of running a large secure
           | platform today. We shall see how this ends-up...
        
           | alerighi wrote:
           | In theory, wouldn't make sense to run all the code that runs
           | as the root user as ring zero?
           | 
           | The root user still can access read/write the whole memory of
           | the system anyway via /dev/mem, load kernel modules, and do
           | practically anything on the system. In that context, a change
           | of privilege is useless in practice, and can lead to
           | performance degradation.
           | 
           | This will lead performance benefits in most system daemons,
           | notably programs that interact closely with the kernel like
           | systemd.
        
           | Bancakes wrote:
           | Cool, like 6th/7th gen consoles used to be. Simple, single-
           | app OSs.
        
         | ikiris wrote:
         | Yeah it's much cheaper just to pay the ransomware and have your
         | data stolen. No one seems to care if your data is leaked
         | anyway.
        
           | vbezhenar wrote:
           | Are there any examples of real world attacks happened because
           | of those mitigations turned off?
        
         | alfalfasprout wrote:
         | Agreed. I'm all for mitigations when used somewhere where the
         | security could be an issue but for use cases where raw
         | computation and performance matters the most it should be easy
         | to disable this stuff
        
         | rrss wrote:
         | This isn't in the name of 'security.' TSX has been functionally
         | broken in every architecture that implements it.
         | 
         | From the article:
         | 
         | > A memory ordering issue is what is reportedly leading Intel
         | to now deprecate TSX on various processors.
         | 
         | And from the Intel document linked in the article:
         | 
         | > The default RTM force-abort behavior can be optionally
         | disabled by setting MSR bit TSX_FORCE_ABORT.SDV_ENABLE_RTM=1.
         | However, when RTM force abort is disabled in this way, RTM
         | usage may be subject to memory-ordering correctness issues. Due
         | to these issues, this unsupported mode should not be enabled
         | for production use.
        
       ___________________________________________________________________
       (page generated 2021-06-28 23:01 UTC)