[HN Gopher] Transactional memory similarity to garbage collection
___________________________________________________________________
Transactional memory similarity to garbage collection
Author : signa11
Score : 67 points
Date : 2021-05-05 10:49 UTC (2 days ago)
(HTM) web link (www.realworldtech.com)
(TXT) w3m dump (www.realworldtech.com)
| pcdavid wrote:
| This analogy is not a new idea, see "The transactional memory /
| garbage collection analogy" by Dan Grossman at OOPSLA 2007:
| https://dl.acm.org/doi/abs/10.1145/1297027.1297080
|
| PDF available here:
| https://homes.cs.washington.edu/~djg/papers/analogy_oopsla07...
| amelius wrote:
| > Transactional memory is currently one of the hottest topics
| in computer-science research, having attracted the focus of
| researchers in programming languages, computer architecture,
| and parallel programming, as well as the attention of
| development groups at major software and hardware companies.
| The fundamental source of the excitement is the belief that by
| replacing locks and condition variables with transactions we
| can make it easier to write correct and efficient shared-memory
| parallel programs.
|
| Has this research come to fruition? What languages/environments
| use such an approach today?
| cma wrote:
| Hardware support is only very recent, Intel added support at
| various times and then had to remove it due to security and
| correctness issues, but I think it is now finally available:
|
| > Intel's Transactional Synchronization Extensions (TSX) is
| available in some of the Skylake processors. It was earlier
| implemented in Haswell and Broadwell processors as well, but
| the implementations turned out both times to be defective and
| support for TSX was disabled. The TSX specification describes
| the transactional memory API for use by software developers,
| but withholds details on technical implementation.[11] ARM
| architecture has a similar extension.[12]
|
| https://en.wikipedia.org/wiki/Transactional_memory
|
| I believe even with it available, OSes turn it off because of
| all the other Intel vulnerabilities and it making them even
| more expensive to workaround.
| The_rationalist wrote:
| Tsx is disabled on rhel :/
| https://www.phoronix.com/scan.php?page=news_item&px=Red-
| Hat-...
| whateveracct wrote:
| Haskell has wonderful STM in the stdlib and a pretty good
| collection of community libraries on top of that.
| jasonwatkinspdx wrote:
| Well, I'd say it's a bit muddled. Pure STM systems have seen
| limited success, mostly on the jvm, particularly clojure. But
| as a technology it's not spread widely in the industry,
| despite the fundamental advantage of composability. I
| personally would attribute this to a sort of gulf created by
| two common approaches:
|
| 1. It's become very common to structure software as a
| stateless service cluster backed by a stateful database
| technology. Most of the databases used in this approach would
| not be described as STM, but the net effect to the
| application is similar, even if involving more layers and
| complexity.
|
| 2. People have gotten pretty comfortable with simple mutex
| patterns connected by queues of various sorts. This is a sort
| of worse is better situation, where the simplicity and high
| performance of a mutex protected whatever in a toy test or
| microbenchmark is far more efficient than STM. However, a
| system composed of many of these mutex protected islands
| proves a whole different beast indeed. STM has largely been
| criticized from the perspective of the former, vs the latter.
|
| There are many people who have made the observation that
| transactional concurrency control, state of the art garbage
| collection, and even file systems have been converging on
| similar features. This is being driven by the common
| constrains on both sides of what hardware and humans expect.
| In particular with persistent memory, I think you'll see all
| three of these unify into a single design, because systems
| that attempt to solve these problems separately will have
| very inferior match to the hardware.
| [deleted]
| thesz wrote:
| SPJ of Haskell fame once discussed perceived STM failure in
| .Net: https://haskell-cafe.haskell.narkive.com/t6LSdcoE/is-
| there-a...
|
| Basically, poor STM performance in .Net is too much
| mutation and mutation-induced logging. This is why
| clojure's STM is much more widely used.
|
| I have particularly deep experience in various Haskell
| projects and I must insist that no relatively (1) big
| Haskell project (2) with a hint of concurrency and (3)
| developed by several people can go without use of STM.
|
| I can use MVars myself for problem space discovery. When
| there are at least two of use, I will use TVar and insist
| others should too.
| barrkel wrote:
| I think trying to scale out compute + mutable state across
| multiple CPUs on a single box has somewhat fallen by the
| wayside outside of specialized applications like databases
| and distributed query engines, as a local maximum avoided
| in favour of scalability via more boxes.
|
| There's several forces behind this. Applications are more
| likely to be web-based or services - i.e. inherently
| distributed - and less likely to be desktop or traditional
| client/server, where almost all compute happened on a
| single server. As distributed services, maximizing
| statelessness and avoiding mutable shared memory is key to
| solving a lot of problems: scaling out (no shared memory),
| redundancy (keep a second copy of the application logic
| running somewhere, no sync required), recovery and
| idempotence (if something fails, try again until it
| succeeds - repeated attempts are safe).
|
| Reliable persistent queues are part of that. They let you
| bring services up and down and switch over without down
| time, or restart after a failure and resume where they left
| off.
|
| The problems of shared mutable state are best kept in
| specialized applications: databases, queuing systems,
| distributed caches, consistent key-value stores. Keeping
| state consistent in a distributed system is a genuinely
| hard problem, and STM isn't much help, except perhaps as an
| implementation detail in some of those specialized
| applications.
| jandrewrogers wrote:
| For what it's worth, scaling mutable shared state across
| multiple CPUs on a single box has fallen by the wayside
| for databases too. Thread-per-core style software
| architecture has become idiomatic, in part because it
| efficiently scales up on machines with a very large
| number of cores. Nothing about this precludes scale-out,
| it just allows you to serve 10-100x the load on a single
| box as the more typical naive design.
|
| Web applications aren't special in this regard. We've
| just normalized being incredibly wasteful of computing
| resources when designing them, spinning up huge clusters
| to serve a workload that could be easily satisfied with a
| single machine (or a few machines for availability).
|
| Premature scale-out, because we've forgotten how to
| scale-up, is the root of much evil.
| kvhdude wrote:
| This has been my experience as well. I shifted from
| embedded systems programming to web services world
| briefly. I found that scale up approach has been so
| maligned that it is ok to spin up several hundred ec2
| instances running a (poorly written) java application
| instead of a single multi core instance running erlang
| that performed much better. Some frameworks even ran web
| services in php and spent all effort in not having the
| request reach php code since it was doing like 10
| reqs/sec.
| foobiekr wrote:
| I think (1) is really about where you land it.
|
| I am not free to name the product, but I worked on a
| product that has done billions of dollars in revenue that
| designed a bunch of state management checkpointing using a
| custom STM implementation for C (yes, C). It made so, so
| many things straightforward in terms of corner cases; we
| also extended transactionality to sync over the wire in
| some cases.
|
| I also think STM-adjacent designs are gaining some traction
| - for example, FoundationDB looks an awful lot like an STM
| system from the point of view of applications, much more
| than it looks like a traditional database.
| pjlegato wrote:
| Clojure famously makes extensive use of a software
| transactional memory (STM) to provide painless
| multithreading:
|
| * https://clojure.org/reference/refs *
| https://sw1nn.com/blog/2012/04/11/clojure-stm-what-why-how/
| twoodfin wrote:
| It's widely regarded as a dead end. Conceptually, it's quite
| appealing so who knows, it might find new life someday.
| Garbage collection was a niche technology for decades, as was
| JIT compilation.
|
| Joe Duffy's write-up on how STM failed at Microsoft with .NET
| is illustrative:
|
| http://joeduffyblog.com/2010/01/03/a-brief-retrospective-
| on-...
| PaulHoule wrote:
| right... it's not hard to invent some scheme that scales to
| a 2 core computer but it is not crazy to have 64 or more
| cores these days and even if you have 2% serial overhead
| from communications you are not going to scale past 50.
|
| The problems of making a system really scale as opposed to
| just scale are bad enough that the best we know how to do
| is give people simpler primitives that work -- each layer
| you add to the system is another opportunity to add another
| limitation to it.
|
| Modern garbage collectors are fairly scalable (e.g. don't
| break outright as the heap size increases) and any manual
| memory management would also have the risk of scaling
| problems, to the point where you might say that garbage
| collection is more "scalable" when the thing being scaled
| is the kind of arbitrary complexity that appears in
| business applications.
|
| What people complain about with garbage collection are: (1)
| a fixed constant overhead they think is too high and will
| probably always think to high, (2) long pauses that are
| manageable in practice unless you are timing events with
| millisecond precision with a microcontroller.
|
| Thus one of these things is mainstream and other isn't.
| tomp wrote:
| > What people complain about with garbage collection are:
| (1) a fixed constant overhead they think is too high and
| will probably always think to high, (2) long pauses that
| are manageable in practice unless you are timing events
| with millisecond precision with a microcontroller.
|
| I keep hearing this about "modern" GC and it keeps being
| very much not true in practice. I'm running IntelliJ IDEA
| (Java) on a laptop with 8GB of RAM, and it keeps freezing
| every so often. I'm suspecting GC causing paging. This is
| something that is obviously avoided with reference
| counting or manual memory management.
| jhgb wrote:
| > This is something that is obviously avoided with
| reference counting
|
| You'd prefer your data being larger in memory due to
| refcount fields and slower to access due to atomic
| operations on those fields?
| tomp wrote:
| It's not like tracing GC is without memory overhead...
|
| Slower is a question of priorities. I'd definitely trade
| off slower _overall_ performance (I don 't really need my
| IDE to be _that_ performant... it 's not like it's using
| 100% CPU, or even 10%!) but a huge improvement in
| _latency_ (i.e. no catastrophic slowdowns / freezes).
| jhgb wrote:
| There are very few ways of having zero memory overhead,
| though.
|
| Also, most of the flaws of Java's memory management seem
| to be concentrated in the part of the language that says
| that almost everything has to be an object with a header
| and on the heap. Remove that and your graph of objects to
| trace shrinks very substantially.
| withinboredom wrote:
| Interestingly, I was running out of memory, so I gave it
| more RAM (like 20GB) and it got even worse. Apparently it
| takes a long time to GC that much memory. I'd also rather
| amortize that pause. The loss of productivity due to a
| sudden pause is real.
| twoodfin wrote:
| Yeah. As is pointed out down-thread of the linked post,
| even when it was confined to a niche, garbage collection
| did what it claimed it would (eliminate the need for
| error-prone manual memory management) at an acceptable
| cost for many systems & applications. JITs were similar.
|
| So far, STM and HTM have followed a path more like VLIW
| instruction sets: Conceptually appealing for their
| promise of simplification, but in practice only
| demonstrating success in specialized uses, with the
| tradeoffs proving unattractive seemingly everywhere else
| they've been tried.
| sitkack wrote:
| I think it is a false equivalence to say that STM and HTM
| are like VLIW (which has been been a resounding success
| for DSP workloads). I am _not_ defending VLIW.
|
| Transactions are extremely powerful conceptually, to be
| composable and not cause flapping requires a higher level
| of coordination.
| signa11 wrote:
| honestly, i found torvalds opinion quite useful:
| https://www.realworldtech.com/forum/?threadid=201184&curpost...
| RcouF1uZ4gsC wrote:
| I thought the big win for Transactional Memory over locking was
| composability. Locks don't compose. Whenever accessing data,
| you have to think about all the locks or else you end up with
| dead(live)lock.
|
| The promise of transactional memory was that it composes and
| you only have to think about your data access and not have to
| worry about running into deadlocks.
| hinkley wrote:
| Borrow checkers feel like a bottom up approach to STM to me.
|
| There's a lot of static analysis in the JVM that reduces
| locking, speeds up garbage collection, and removes the need
| to declare the type of a generic variable.
|
| I expect over time we will see inference in borrow checkers
| reduce the brain cycles you need to spend thinking about
| these things, and borrow semantics will be "good enough" for
| a long time.
|
| Or who knows, maybe STM systems will be able to infer borrow
| semantics and benefit from aggressive lock elision.
| slver wrote:
| Memory is allocated in pages, which are split in subpages, etc.,
| then you throw at it heaps, queues, linked lists and so on.
|
| All of this structure doesn't exist in the memory. So you need
| transactions at the page level which is too crude, or the byte
| level which is too granular.
|
| Hardware is simply the wrong level to solve this problem at.
| Unless the hardware starts understanding how we use memory way,
| way, way, way better than it does right now.
| drivebycomment wrote:
| You have misunderstood what transactional memory is. Think of
| it as database transaction applied to memory operations. Like a
| database transaction makes a collection of read and write and
| computation based on those read and write appear atomic w.r.t
| other transactions and read/write, transactional memory
| generally aims to make a section of instructions to have
| transactional semantic for all load and store and other
| instructions in the transaction. Most transactional memory
| proposals are not page level granularity - though obviously
| most proposals have certain limits, and byte level being too
| granular doesn't even make any sense.
| bigmattystyles wrote:
| Does this all go back to a derivative of the halting problem? You
| have an abstraction, a computer must make a decision on what you
| might ever want to do but while _you_ may never intent for a set
| of conditions to occur, the computer cannot 'know' that. And the
| abstraction doesn't offer a way for you to specify your
| conditions to a more optimal manner.
| exabrial wrote:
| Quarkus on the JVM has an interesting implementation of TX
| Memory, and it's integrated with the JTA global XA manager, which
| really gives it some cool capabilities.
___________________________________________________________________
(page generated 2021-05-07 23:01 UTC)