[HN Gopher] Transactional memory similarity to garbage collection
       ___________________________________________________________________
        
       Transactional memory similarity to garbage collection
        
       Author : signa11
       Score  : 67 points
       Date   : 2021-05-05 10:49 UTC (2 days ago)
        
 (HTM) web link (www.realworldtech.com)
 (TXT) w3m dump (www.realworldtech.com)
        
       | pcdavid wrote:
       | This analogy is not a new idea, see "The transactional memory /
       | garbage collection analogy" by Dan Grossman at OOPSLA 2007:
       | https://dl.acm.org/doi/abs/10.1145/1297027.1297080
       | 
       | PDF available here:
       | https://homes.cs.washington.edu/~djg/papers/analogy_oopsla07...
        
         | amelius wrote:
         | > Transactional memory is currently one of the hottest topics
         | in computer-science research, having attracted the focus of
         | researchers in programming languages, computer architecture,
         | and parallel programming, as well as the attention of
         | development groups at major software and hardware companies.
         | The fundamental source of the excitement is the belief that by
         | replacing locks and condition variables with transactions we
         | can make it easier to write correct and efficient shared-memory
         | parallel programs.
         | 
         | Has this research come to fruition? What languages/environments
         | use such an approach today?
        
           | cma wrote:
           | Hardware support is only very recent, Intel added support at
           | various times and then had to remove it due to security and
           | correctness issues, but I think it is now finally available:
           | 
           | > Intel's Transactional Synchronization Extensions (TSX) is
           | available in some of the Skylake processors. It was earlier
           | implemented in Haswell and Broadwell processors as well, but
           | the implementations turned out both times to be defective and
           | support for TSX was disabled. The TSX specification describes
           | the transactional memory API for use by software developers,
           | but withholds details on technical implementation.[11] ARM
           | architecture has a similar extension.[12]
           | 
           | https://en.wikipedia.org/wiki/Transactional_memory
           | 
           | I believe even with it available, OSes turn it off because of
           | all the other Intel vulnerabilities and it making them even
           | more expensive to workaround.
        
           | The_rationalist wrote:
           | Tsx is disabled on rhel :/
           | https://www.phoronix.com/scan.php?page=news_item&px=Red-
           | Hat-...
        
           | whateveracct wrote:
           | Haskell has wonderful STM in the stdlib and a pretty good
           | collection of community libraries on top of that.
        
           | jasonwatkinspdx wrote:
           | Well, I'd say it's a bit muddled. Pure STM systems have seen
           | limited success, mostly on the jvm, particularly clojure. But
           | as a technology it's not spread widely in the industry,
           | despite the fundamental advantage of composability. I
           | personally would attribute this to a sort of gulf created by
           | two common approaches:
           | 
           | 1. It's become very common to structure software as a
           | stateless service cluster backed by a stateful database
           | technology. Most of the databases used in this approach would
           | not be described as STM, but the net effect to the
           | application is similar, even if involving more layers and
           | complexity.
           | 
           | 2. People have gotten pretty comfortable with simple mutex
           | patterns connected by queues of various sorts. This is a sort
           | of worse is better situation, where the simplicity and high
           | performance of a mutex protected whatever in a toy test or
           | microbenchmark is far more efficient than STM. However, a
           | system composed of many of these mutex protected islands
           | proves a whole different beast indeed. STM has largely been
           | criticized from the perspective of the former, vs the latter.
           | 
           | There are many people who have made the observation that
           | transactional concurrency control, state of the art garbage
           | collection, and even file systems have been converging on
           | similar features. This is being driven by the common
           | constrains on both sides of what hardware and humans expect.
           | In particular with persistent memory, I think you'll see all
           | three of these unify into a single design, because systems
           | that attempt to solve these problems separately will have
           | very inferior match to the hardware.
        
             | [deleted]
        
             | thesz wrote:
             | SPJ of Haskell fame once discussed perceived STM failure in
             | .Net: https://haskell-cafe.haskell.narkive.com/t6LSdcoE/is-
             | there-a...
             | 
             | Basically, poor STM performance in .Net is too much
             | mutation and mutation-induced logging. This is why
             | clojure's STM is much more widely used.
             | 
             | I have particularly deep experience in various Haskell
             | projects and I must insist that no relatively (1) big
             | Haskell project (2) with a hint of concurrency and (3)
             | developed by several people can go without use of STM.
             | 
             | I can use MVars myself for problem space discovery. When
             | there are at least two of use, I will use TVar and insist
             | others should too.
        
             | barrkel wrote:
             | I think trying to scale out compute + mutable state across
             | multiple CPUs on a single box has somewhat fallen by the
             | wayside outside of specialized applications like databases
             | and distributed query engines, as a local maximum avoided
             | in favour of scalability via more boxes.
             | 
             | There's several forces behind this. Applications are more
             | likely to be web-based or services - i.e. inherently
             | distributed - and less likely to be desktop or traditional
             | client/server, where almost all compute happened on a
             | single server. As distributed services, maximizing
             | statelessness and avoiding mutable shared memory is key to
             | solving a lot of problems: scaling out (no shared memory),
             | redundancy (keep a second copy of the application logic
             | running somewhere, no sync required), recovery and
             | idempotence (if something fails, try again until it
             | succeeds - repeated attempts are safe).
             | 
             | Reliable persistent queues are part of that. They let you
             | bring services up and down and switch over without down
             | time, or restart after a failure and resume where they left
             | off.
             | 
             | The problems of shared mutable state are best kept in
             | specialized applications: databases, queuing systems,
             | distributed caches, consistent key-value stores. Keeping
             | state consistent in a distributed system is a genuinely
             | hard problem, and STM isn't much help, except perhaps as an
             | implementation detail in some of those specialized
             | applications.
        
               | jandrewrogers wrote:
               | For what it's worth, scaling mutable shared state across
               | multiple CPUs on a single box has fallen by the wayside
               | for databases too. Thread-per-core style software
               | architecture has become idiomatic, in part because it
               | efficiently scales up on machines with a very large
               | number of cores. Nothing about this precludes scale-out,
               | it just allows you to serve 10-100x the load on a single
               | box as the more typical naive design.
               | 
               | Web applications aren't special in this regard. We've
               | just normalized being incredibly wasteful of computing
               | resources when designing them, spinning up huge clusters
               | to serve a workload that could be easily satisfied with a
               | single machine (or a few machines for availability).
               | 
               | Premature scale-out, because we've forgotten how to
               | scale-up, is the root of much evil.
        
               | kvhdude wrote:
               | This has been my experience as well. I shifted from
               | embedded systems programming to web services world
               | briefly. I found that scale up approach has been so
               | maligned that it is ok to spin up several hundred ec2
               | instances running a (poorly written) java application
               | instead of a single multi core instance running erlang
               | that performed much better. Some frameworks even ran web
               | services in php and spent all effort in not having the
               | request reach php code since it was doing like 10
               | reqs/sec.
        
             | foobiekr wrote:
             | I think (1) is really about where you land it.
             | 
             | I am not free to name the product, but I worked on a
             | product that has done billions of dollars in revenue that
             | designed a bunch of state management checkpointing using a
             | custom STM implementation for C (yes, C). It made so, so
             | many things straightforward in terms of corner cases; we
             | also extended transactionality to sync over the wire in
             | some cases.
             | 
             | I also think STM-adjacent designs are gaining some traction
             | - for example, FoundationDB looks an awful lot like an STM
             | system from the point of view of applications, much more
             | than it looks like a traditional database.
        
           | pjlegato wrote:
           | Clojure famously makes extensive use of a software
           | transactional memory (STM) to provide painless
           | multithreading:
           | 
           | * https://clojure.org/reference/refs *
           | https://sw1nn.com/blog/2012/04/11/clojure-stm-what-why-how/
        
           | twoodfin wrote:
           | It's widely regarded as a dead end. Conceptually, it's quite
           | appealing so who knows, it might find new life someday.
           | Garbage collection was a niche technology for decades, as was
           | JIT compilation.
           | 
           | Joe Duffy's write-up on how STM failed at Microsoft with .NET
           | is illustrative:
           | 
           | http://joeduffyblog.com/2010/01/03/a-brief-retrospective-
           | on-...
        
             | PaulHoule wrote:
             | right... it's not hard to invent some scheme that scales to
             | a 2 core computer but it is not crazy to have 64 or more
             | cores these days and even if you have 2% serial overhead
             | from communications you are not going to scale past 50.
             | 
             | The problems of making a system really scale as opposed to
             | just scale are bad enough that the best we know how to do
             | is give people simpler primitives that work -- each layer
             | you add to the system is another opportunity to add another
             | limitation to it.
             | 
             | Modern garbage collectors are fairly scalable (e.g. don't
             | break outright as the heap size increases) and any manual
             | memory management would also have the risk of scaling
             | problems, to the point where you might say that garbage
             | collection is more "scalable" when the thing being scaled
             | is the kind of arbitrary complexity that appears in
             | business applications.
             | 
             | What people complain about with garbage collection are: (1)
             | a fixed constant overhead they think is too high and will
             | probably always think to high, (2) long pauses that are
             | manageable in practice unless you are timing events with
             | millisecond precision with a microcontroller.
             | 
             | Thus one of these things is mainstream and other isn't.
        
               | tomp wrote:
               | > What people complain about with garbage collection are:
               | (1) a fixed constant overhead they think is too high and
               | will probably always think to high, (2) long pauses that
               | are manageable in practice unless you are timing events
               | with millisecond precision with a microcontroller.
               | 
               | I keep hearing this about "modern" GC and it keeps being
               | very much not true in practice. I'm running IntelliJ IDEA
               | (Java) on a laptop with 8GB of RAM, and it keeps freezing
               | every so often. I'm suspecting GC causing paging. This is
               | something that is obviously avoided with reference
               | counting or manual memory management.
        
               | jhgb wrote:
               | > This is something that is obviously avoided with
               | reference counting
               | 
               | You'd prefer your data being larger in memory due to
               | refcount fields and slower to access due to atomic
               | operations on those fields?
        
               | tomp wrote:
               | It's not like tracing GC is without memory overhead...
               | 
               | Slower is a question of priorities. I'd definitely trade
               | off slower _overall_ performance (I don 't really need my
               | IDE to be _that_ performant... it 's not like it's using
               | 100% CPU, or even 10%!) but a huge improvement in
               | _latency_ (i.e. no catastrophic slowdowns  / freezes).
        
               | jhgb wrote:
               | There are very few ways of having zero memory overhead,
               | though.
               | 
               | Also, most of the flaws of Java's memory management seem
               | to be concentrated in the part of the language that says
               | that almost everything has to be an object with a header
               | and on the heap. Remove that and your graph of objects to
               | trace shrinks very substantially.
        
               | withinboredom wrote:
               | Interestingly, I was running out of memory, so I gave it
               | more RAM (like 20GB) and it got even worse. Apparently it
               | takes a long time to GC that much memory. I'd also rather
               | amortize that pause. The loss of productivity due to a
               | sudden pause is real.
        
               | twoodfin wrote:
               | Yeah. As is pointed out down-thread of the linked post,
               | even when it was confined to a niche, garbage collection
               | did what it claimed it would (eliminate the need for
               | error-prone manual memory management) at an acceptable
               | cost for many systems & applications. JITs were similar.
               | 
               | So far, STM and HTM have followed a path more like VLIW
               | instruction sets: Conceptually appealing for their
               | promise of simplification, but in practice only
               | demonstrating success in specialized uses, with the
               | tradeoffs proving unattractive seemingly everywhere else
               | they've been tried.
        
               | sitkack wrote:
               | I think it is a false equivalence to say that STM and HTM
               | are like VLIW (which has been been a resounding success
               | for DSP workloads). I am _not_ defending VLIW.
               | 
               | Transactions are extremely powerful conceptually, to be
               | composable and not cause flapping requires a higher level
               | of coordination.
        
       | signa11 wrote:
       | honestly, i found torvalds opinion quite useful:
       | https://www.realworldtech.com/forum/?threadid=201184&curpost...
        
         | RcouF1uZ4gsC wrote:
         | I thought the big win for Transactional Memory over locking was
         | composability. Locks don't compose. Whenever accessing data,
         | you have to think about all the locks or else you end up with
         | dead(live)lock.
         | 
         | The promise of transactional memory was that it composes and
         | you only have to think about your data access and not have to
         | worry about running into deadlocks.
        
           | hinkley wrote:
           | Borrow checkers feel like a bottom up approach to STM to me.
           | 
           | There's a lot of static analysis in the JVM that reduces
           | locking, speeds up garbage collection, and removes the need
           | to declare the type of a generic variable.
           | 
           | I expect over time we will see inference in borrow checkers
           | reduce the brain cycles you need to spend thinking about
           | these things, and borrow semantics will be "good enough" for
           | a long time.
           | 
           | Or who knows, maybe STM systems will be able to infer borrow
           | semantics and benefit from aggressive lock elision.
        
       | slver wrote:
       | Memory is allocated in pages, which are split in subpages, etc.,
       | then you throw at it heaps, queues, linked lists and so on.
       | 
       | All of this structure doesn't exist in the memory. So you need
       | transactions at the page level which is too crude, or the byte
       | level which is too granular.
       | 
       | Hardware is simply the wrong level to solve this problem at.
       | Unless the hardware starts understanding how we use memory way,
       | way, way, way better than it does right now.
        
         | drivebycomment wrote:
         | You have misunderstood what transactional memory is. Think of
         | it as database transaction applied to memory operations. Like a
         | database transaction makes a collection of read and write and
         | computation based on those read and write appear atomic w.r.t
         | other transactions and read/write, transactional memory
         | generally aims to make a section of instructions to have
         | transactional semantic for all load and store and other
         | instructions in the transaction. Most transactional memory
         | proposals are not page level granularity - though obviously
         | most proposals have certain limits, and byte level being too
         | granular doesn't even make any sense.
        
       | bigmattystyles wrote:
       | Does this all go back to a derivative of the halting problem? You
       | have an abstraction, a computer must make a decision on what you
       | might ever want to do but while _you_ may never intent for a set
       | of conditions to occur, the computer cannot  'know' that. And the
       | abstraction doesn't offer a way for you to specify your
       | conditions to a more optimal manner.
        
       | exabrial wrote:
       | Quarkus on the JVM has an interesting implementation of TX
       | Memory, and it's integrated with the JTA global XA manager, which
       | really gives it some cool capabilities.
        
       ___________________________________________________________________
       (page generated 2021-05-07 23:01 UTC)