[HN Gopher] Intel to disable TSX by default on more CPUs with ne...
___________________________________________________________________
Intel to disable TSX by default on more CPUs with new microcode
Author : pella
Score : 101 points
Date : 2021-06-28 17:36 UTC (5 hours ago)
(HTM) web link (www.phoronix.com)
(TXT) w3m dump (www.phoronix.com)
| tyingq wrote:
| Wikipedia says:
|
| _" According to different benchmarks, TSX/TSX-NI can provide
| around 40% faster applications execution in specific workloads,
| and 4-5 times more database transactions per second (TPS)"_[1]
|
| That sounds like a pretty big deal to turn off. Though it seems
| to be based on microbenchmarks. I wonder what the real world
| impact for a typical RDBMS is.
|
| [1]
| https://en.wikipedia.org/wiki/Transactional_Synchronization_...
| rodgerd wrote:
| Attempting to do transactional memory (and, I've been told, the
| 1.2 kW power requirement!) was what sunk Sun's Rock processors.
| uniqueuid wrote:
| From past benchmarks of Spectre and Meltdown mitigations[1], we
| know that synchronization primitives and context switches were
| especially impacted. The same seems to be the case here, so I'd
| expect RDBMS to be heavily affected unless they attempt to
| specifically circumvent the deactivation.
|
| Curiously, postgres seems to perform okay (<10% drop) two years
| after the initial mitigations - perhaps due to newer versions
| of kernel-level fixes having less of a performance impact.
|
| [edit] According to phoronix, the small penalty is mostly due
| to hardware-based mitigations in newer architectures.
|
| [1]
| https://www.phoronix.com/scan.php?page=article&item=spectre-...
| Palomides wrote:
| I don't think any major DBs use TSX
| tyingq wrote:
| It shows up in glibc a couple of times around pthreads, if
| you search around for "elision", "RTM", "HLE", etc.
| rrss wrote:
| yes, glibc tried to use TSX in pthreads, and it was a
| disaster and all distros patched glibc to not use it.
| eventually upstream glibc turned it off by default too.
| aj3 wrote:
| Why was it a disaster? Is it buggy or what?
| rrss wrote:
| At first glibc's use of TSX/RTM for mutexes was behind a
| compile-time opt-in, and Intel was hyping up how great
| hardware transactional memory was, so most distros
| flipped the switch to enable it. But glibc's use of
| TSX/RTM for rwlocks was not behind a compile-time guard,
| and was used whenever CPUID said it was enabled.
|
| I think the first issue was that pthread locks using TSX
| behaved differently for incorrect usage, so that
| applications that had double-unlock bugs worked fine with
| the normal implementation but crashed with TSX glibc.
| These were app bugs, but a lot of people were unhappy
| that a glibc update caused their applications to crash,
| so some distros patched glibc to make time to fix the
| apps.
|
| Then TSX was broken in Haswell and Intel disabled it, but
| it was a total mess, such that the bit in CPUID
| indicating support for TSX/RTM did not reliably indicate
| support for TSX/RTM (i.e. bit was set indicating support,
| but any TSX/RTM instruction resulted in SIGILL), so
| various versions of glibc ended up with various lists of
| CPU model numbers to determine when to avoid using TSX.
|
| repeat for broadwell.
|
| At this point more distros decided to turn off TSX in
| glibc entirely. But most did this by just removing the
| compile-time opt-in, which did not do anything for
| rwlocks. A couple distros noticed that rwlocks were still
| using TSX, and patched that too to remove lock elision
| entirely - but many did not, so rwlocks on haswell or
| early broadwell would still cause issues.
|
| In my experience, a lot of the time RTM was not useful
| for performance because the transactions would be too
| large and abort. It was worse because in these cases
| glibc would try the transaction again several times
| before giving up, which could cause massive performance
| hits compared to just doing the normal lock to start
| with.
|
| There were errata about TSX in skylake too, but I can't
| remember if Intel actually turned RTM off with microcode
| before now or just left it as is because no one was using
| it.
|
| Eventually glibc was changed to always compile in support
| for TSX but require a runtime opt-in using an environment
| variable.
|
| (this is from memory, the timeline is probably not quite
| right)
|
| EDIT: I had a bit of deja-vu writing this, and it turns
| out I have whined about this on HN before lol:
| https://news.ycombinator.com/item?id=22694546
| aj3 wrote:
| Thank you, this is very insightful.
| topspin wrote:
| Early TSX hardware from Intel had unfixable flaws. The
| problems were fixed in later CPUs but that left the
| burden of detection on developers. Then along came side
| channel attacks and TSX became a vector for that. Plus
| it's Intel only, so you have to use another approach on
| AMD, which has completely distinct instructions, and this
| is an area with extreme subtleties where you really don't
| want or can't afford to deal with multiple
| implementations.
|
| In other words it's been a total shit show. It doesn't
| have to be, but until Intel takes it seriously and
| produces a unflawed and secure realization it's going to
| continue to be a shit show. Even then you're still left
| with the Intel != AMD problem...
| patrakov wrote:
| A lot of applications were subtly buggy (unlocked an
| already-unlocked mutex sometimes). Unlike the previous
| implementation, with TSX, unlocking an already unlocked
| mutex is an instant crash. And the applications were
| buggy exactly because nothing bad happened before due to
| this bug, and there were no widespread-enough tools that
| would have helped recognizing this as an application
| problem during development.
| monocasa wrote:
| I really liked Cliff Click's talk on why transactional memory
| didn't live up to it's promises (at least for accelerating
| general purpose locking of Java code not written specifically
| for transactional memory).
|
| Cliff Click -- The Azul Hardware Transactional Memory
| experience
|
| https://www.youtube.com/watch?v=GEkeOHw87Sg
|
| TL;DW: Transactional memory sort of promised lock elision to
| speed up gratuitously lock heavy code, particularly for read
| heavy workloads. Think a caching hash map that only has the
| occasional writer. It didn't live up to that since the max
| transaction size is really easy to overflow (far easier than
| you may think), and general code has a nasty habit of writing
| metrics even in read cases which means that transactions all
| write to the same address, which means constant transaction
| conflicts, which means slower code than just using a lock.
| Cliff Click then makes the argument that the right move in the
| vast majority of cases if you're having lock contention is to
| rewrite the algorithm so that threads (or at least cores)
| communicate via messages and share nothing rather than trying
| to use transactional memory so that you can cleanly fork out
| across a cluster rather than being constrained to a single box
| still after the rewrite. And for context he comes to that
| conclusion while working on massive 768 core boxes where the
| value add is "we sell you a giant non-NUMA single box that
| absolutely eats Java for breakfast, lunch, and dinner".
|
| Even if you read the TL;DW, I still encourage you to listen to
| his talk; he's an incredibly smart engineer and I learn
| something pretty much every time I listen to him.
| rayiner wrote:
| Cliff Click, aside from being a brilliant programmer, is a
| great technical communicator. He came out of a fantastic
| department at Rice that's produced some of the most readable
| academic papers on compilers and VMs I've encountered.
| cpleppert wrote:
| The research papers linked in wikipedia don't really support
| what wikipedia claims even though they are very favorable to
| TSX. In one paper [1] the only way they could demonstrate a
| performance improvement in a real world benchmark was changing
| the underlying lock strategy entirely. The rest are just
| synthetic benchmarks that transactional memory can look good on
| due to the lack of actual contention but which required lots of
| changes to take advantage of. In [2] they used what appears to
| be a very naive b-tree implementation and an index tree
| combined with a dictionary. Even though this should probably be
| a great use case for transaction memory they once again have to
| make major changes to the implementation.
|
| In a real database, you will run into transaction aborts much
| more frequently to say nothing of the correctness concerns
| raised by others in this thread.
|
| [1]:https://web.archive.org/web/20161110144922/http://pcl.intel
| -... [2]:Improving In-Memory Database Index Performance
| alfalfasprout wrote:
| I've wanted to use TSX for years... it's great as a way to get
| atomicity for non-primitive types in a lock-free way. Could be
| super useful in trading.
|
| The issue is it's been YEARS now that intel claims it's ready
| then turns it off in chips shortly thereafter. It's basically
| vaporware at this point.
| Syonyk wrote:
| I spent a week or so a couple years back messing around with
| TSX as well to see if I could improve some performance of
| ringbuffers without having to do some more complex locking.
| It would have required a lot of rework, and someone else on
| the project pointed out that no TSX implementation seemed to
| survive extended contact with reality without being disabled.
|
| So I just wrote the more complex reader/writer lock system
| and went on my way. Turns out to have been the right
| answer...
|
| At this point, TSX is like a new Google product launch -
| "This, too, shall pass." I don't know why anyone would bother
| spending much time with it after all these generations of
| "TSX! Wait, no... uh..."
| ploxiln wrote:
| I've disliked the idea of TSX since the beginning. Having
| studied some circuit design, computer architecture, and
| operating systems in college, though I work in user-space
| software now I do like to understand how all the layers
| beneath me work. TSX has always given me the heebie-jeebies.
| Sure it might give great performance uplift if you have a
| team of skilled scientists and engineers and servers
| dedicated to a single application. But for general computing,
| it's really crazy. Good for academic papers, good for
| marketing to the director level, bad for general engineering
| work.
|
| It turns out that it doesn't really work. Surprise! not
| really
| lallysingh wrote:
| Why? The implementation sounded simple: bus snoop on
| addresses used by the load unit, invalidate if anyone
| writes to those lines.
| alfalfasprout wrote:
| I'm not sure TSX was really intended to be used everywhere
| though. Admittedly the glibc team tried to use it in
| pthreads and it was a disaster (a lot of code broke and of
| course it kept getting disabled in CPUs).
|
| Their issue IMO was not really scoping down TSX... the
| project was far too ambitious.
| ploxiln wrote:
| Intel employees added it to glibc. I think Intel wanted
| to show everyone "don't be scared, silly, you're already
| using it!"
|
| https://lwn.net/Articles/534761/
| http://halobates.de/adding-lock-elision-to-linux.pdf
| alfalfasprout wrote:
| I agree, the implementation isn't great. The problem IMO is
| that it tried to do way too much and be way too general which
| leads to a bunch of tricky edge cases they need to handle
| (and significantly complicates uses of caches).
|
| That said, hardware support for atomic use of larger memory
| regions than is currently supported is something that is
| worthy as a problem to solve.
| profmonocle wrote:
| Yikes, this is one time I _really hope_ benchmarks don 't
| reflect reality. Imagine you have a DB server that typically
| peaks around 30% CPU. You reboot one night to install patches,
| the next peak the server is suddenly completely maxed out due
| to this performance hit, causing application timeouts. Your
| update logs might not even help if you're on a cloud VM, since
| microcode updates are handled by the hypervisor.
| ikiris wrote:
| To be blunt, if you're doing cloud VMs that way, you set
| yourself up for failure. The whole point of cloud is
| autoscale distributed workflows.
| astrange wrote:
| My understanding is that literally no one uses TSX except for
| possibly one PS3 emulator.
| monocasa wrote:
| I wouldn't be surprised if a few more emulators use it too.
| It's the cleanest way to emulate ll/sc style archs on x86.
| no_time wrote:
| sudo apt-mark hold intel-microcode
|
| I will reconsider once a PoC comes out that affects either nginx
| or sshd.
| [deleted]
| syntaxing wrote:
| Does this affect 4th gen i-series? I noticed this past week that
| my Mac M1 is almost twice as fast as my Intel desktop. Wasn't
| sure if it's because of this.
| hashhar wrote:
| TSX has always been a correctness nightmare and nobody uses it
| AFAIK. glibc also had some support for it but they disabled it by
| default long ago due to correctness issues.
|
| Do people even read past the headlines these days?
| necheffa wrote:
| I actually delt with TSX issues just a couple years ago on a
| supported and patched SUSE cluster.
| jnwatson wrote:
| Is Intel planning on fixing TSX in future CPUs?
| necheffa wrote:
| Sort of. Sapphire Rapids is supposed to include new
| instructions that fix via extending TSX.
|
| But I doubt we'll see microcode patches that fix older CPUs. I
| have a Haswell where TSX was permanently disabled via microcode
| and s Comet Lake that straight up came from the foundry without
| TSX.
| profmonocle wrote:
| I wonder what the reaction would be if similar post-sale
| downgrades happened in other industries? Imagine if a car company
| issued a critical safety recall, and the fix cut the vehicle's
| MPG in half. Would the manufacturer be forced to buy back the
| vehicles? Issue partial refunds? Replace them with newer models
| that perform similar to how the original was meant to?
|
| (Not a perfect analogy I admit - opting out of a CPU security
| mitigation typically isn't going to risk anyone's life. At least
| not on _most_ servers.)
| mhh__ wrote:
| I have read (no proof but it was on a tech forum and seemed
| simple enough to be real) that Intel _will_ refund at least
| some customers (The wording wasn 't clear whether it was
| someone who bought a CPU or bought a HPC cluster...) if the CPU
| was sold to you as having a feature and its then disabled _and_
| you can show to them you used it.
| r00fus wrote:
| Those conditions sound onerous for the customer to provide.
| mhh__ wrote:
| Maybe, but if it was the same with FDIV from what I saw.
| The bug was bad but actually bumping into it was quite
| rare.
| myself248 wrote:
| Didn't that backfire? Having to prove that you used FDIV,
| and that errors in the fourth-or-beyond decimal place
| would actually affect you, was such bad press for Intel,
| that they eventually offered replacements to anyone upon
| request. As Wikipedia tells it:
|
| > Intel's response to the FDIV bug has been cited as a
| case of the public relations impact of a problem
| eclipsing the practical impact of said problem on
| customers. While most users were unlikely to encounter
| the flaw in their day-to-day computing, the company's
| initial reaction to not replace chips unless customers
| could guarantee they were affected caused pushback from a
| vocal minority of industry experts. The subsequent
| publicity generated shook consumer confidence in the
| CPUs, and led to a demand for action even from people
| unlikely to be affected by the issue. Andrew Grove,
| Intel's CEO at the time was quoted in the Wall Street
| Journal as saying "I think the kernel of the issue we
| missed [...] was that we presumed to tell somebody what
| they should or shouldn't worry about, or should or
| shouldn't do".
| slavboj wrote:
| That literally happened with the VW turbodiesels.
| dev_tty01 wrote:
| Big difference though. VW created the issue and then ran a
| program to develop and implement a way to hide the problem.
| Malicious intent. I don't think we can say the same for Intel
| with a security bug that is uncovered after the sale. Now, if
| someone could show Intel knew about Spectre or similar bugs
| and hid them, then you are off to the races.
| donalhunt wrote:
| * in some markets. :(
| schmichael wrote:
| And the repercussions were substantial:
| https://en.wikipedia.org/wiki/Volkswagen_emissions_scandal
|
| (I'm not arguing equivalence between these two "post-consumer
| downgrades" -- just pointing out there's not only prior art
| for it occurring, but also prior art for
| repercussions/compensation.)
| monocasa wrote:
| And they were subject to a $15B class action settlement
| because of it in the US alone.
|
| https://www.classaction.com/volkswagen/settlement/
| jhenkens wrote:
| First thing I thought of. I believe the trade off was getting
| a warranty extension on the engine/emmissions system.
| gambiting wrote:
| In some US states VW was forced to buy the vehicles back
| from owners if they wanted to sell them. I know most people
| jumped on the offer as VW had to pay a certain pre-agreed
| price and it was higher than market value.
| collsni wrote:
| It absolutely did. I owned a VW for 3 years and it was bought
| back from me for 3k more than I originally paid for it.
| https://en.wikipedia.org/wiki/Volkswagen_emissions_scandal
| NewNetNow wrote:
| Good news. ARM's CCA [1] seems revolutionary in comparison.
|
| [1] https://www.arm.com/why-arm/architecture/security-
| features/a...
| zamadatix wrote:
| CCA seems more like SGX than TSX
| uniqueuid wrote:
| From Wikipedia:
|
| >Transactional Synchronization Extensions (TSX), also called
| Transactional Synchronization Extensions New Instructions (TSX-
| NI), is an extension to the x86 instruction set architecture
| (ISA) that adds hardware transactional memory support, speeding
| up execution of multi-threaded software through lock elision.
|
| How is that related to CCA, which seems to be a security
| feature?
| anonymousiam wrote:
| I read this article earlier this morning and I have one question
| that I haven't found the answer for: Are the old Intel firmware
| blobs still available somewhere, and can we choose to not load
| this "upgrade"? If yes, how do we revert?
| mjg59 wrote:
| Unless your firmware vendor updates the microcode embedded in
| your firmware and you update to that newer firmware, the new
| microcode is loaded at runtime and won't persist over power
| cycles. The old microcode is still available, yes.
| anonymousiam wrote:
| Yes, I understand that the upgraded firmware is loaded after
| boot (unless burned into the MB ROM). What I'm looking for is
| availability of the old firmware blobs. Intel probably has
| copyright on them and can control their distribution.
| michaellarabel wrote:
| Yes, some former releases at least via
| https://github.com/intel/Intel-Linux-Processor-Microcode-
| Dat... is what I use during some of my past/present Linux
| testing
| intricatedetail wrote:
| Should such crippling warrant a recall and refund? Not sure how a
| small company could get away with something like this, but for
| Intel it is just a patch?
| wmf wrote:
| If you use TSX don't apply the update. If you never used TSX it
| doesn't matter.
| monocasa wrote:
| It sounds like they're disabling it because it's
| fundamentally broken on those microarchs, as they're citing
| memory ordering issues. So it's still a case of Intel
| admitting that their product as advertised is defective even
| if you don't apply the update.
| sheepdestroyer wrote:
| All these performance neutering patches, made in the name of
| 'security' that in certain situations has no value whatsoever,
| should only be accepted if also conditioned by the "mitigations="
| Kernel boot option.
| jeffbee wrote:
| To be clear, TSX does not work and has never worked. Anyone
| depending on TSX "for performance" is actually getting
| incorrectness.
| PixelOfDeath wrote:
| Why not go all they way and let everything run in ring 0? Then
| Intel can prefetch beyond kernel entry calls again! And we also
| can see how AMD will performing under the same conditions!
|
| With Intels wave of TSX disabling patches going over all the
| previous CPU generations, AMD apparently got it right by never
| bringing their own version into production at all.
| DaiPlusPlus wrote:
| > Why not go all they way and let everything run in ring 0?
|
| How would you get memory protection or debugger support then?
|
| Or the very-likely use-case of rendering live (i.e.
| untrusted) web-content in-game? That would require a modern
| engine like Chromium which in-turn necessitates per-process
| isolation.
|
| Also, no-one wants to have to reboot their entire computer
| system just because some application code crashed. Even if
| preemptive scheduling and the MMU works in ring 0 all
| processes in ring 0 can manipulate the MMU registers, so a
| misbehaving process can still bring down the entire system.
| Bad idea.
|
| Even dedicated HPC applications need it - remember that the
| HPC/scientific-computing/batch-job-processing model is the
| whole reason that operating-systems exist today: they're
| highly evolved successors to job-control programs (e.g.
| Chippewa Operating System). So I don't think they'd want to
| give up all the improvements of the past 50+ years.
|
| Ultimately we're just arguing about the principle of how the
| user who owns the computer should be in _ultimate control_
| over the hardware - but unfortunately Apple has demonstrated
| that principle is unnecessary to make obscene profits and
| also runs contrary to the realities of running a large secure
| platform today. We shall see how this ends-up...
| alerighi wrote:
| In theory, wouldn't make sense to run all the code that runs
| as the root user as ring zero?
|
| The root user still can access read/write the whole memory of
| the system anyway via /dev/mem, load kernel modules, and do
| practically anything on the system. In that context, a change
| of privilege is useless in practice, and can lead to
| performance degradation.
|
| This will lead performance benefits in most system daemons,
| notably programs that interact closely with the kernel like
| systemd.
| Bancakes wrote:
| Cool, like 6th/7th gen consoles used to be. Simple, single-
| app OSs.
| ikiris wrote:
| Yeah it's much cheaper just to pay the ransomware and have your
| data stolen. No one seems to care if your data is leaked
| anyway.
| vbezhenar wrote:
| Are there any examples of real world attacks happened because
| of those mitigations turned off?
| alfalfasprout wrote:
| Agreed. I'm all for mitigations when used somewhere where the
| security could be an issue but for use cases where raw
| computation and performance matters the most it should be easy
| to disable this stuff
| rrss wrote:
| This isn't in the name of 'security.' TSX has been functionally
| broken in every architecture that implements it.
|
| From the article:
|
| > A memory ordering issue is what is reportedly leading Intel
| to now deprecate TSX on various processors.
|
| And from the Intel document linked in the article:
|
| > The default RTM force-abort behavior can be optionally
| disabled by setting MSR bit TSX_FORCE_ABORT.SDV_ENABLE_RTM=1.
| However, when RTM force abort is disabled in this way, RTM
| usage may be subject to memory-ordering correctness issues. Due
| to these issues, this unsupported mode should not be enabled
| for production use.
___________________________________________________________________
(page generated 2021-06-28 23:01 UTC)