[HN Gopher] Synchronization Is Bad for Scale
       ___________________________________________________________________
        
       Synchronization Is Bad for Scale
        
       Author : thunderbong
       Score  : 54 points
       Date   : 2024-07-07 17:03 UTC (5 hours ago)
        
 (HTM) web link (wippler.dev)
 (TXT) w3m dump (wippler.dev)
        
       | kccqzy wrote:
       | The author abandoned the distributed lock service he/she is
       | writing, and it might benefit to read the original Google paper
       | on their distributed lock service Chubby to understand why it is
       | so enduring at Google:
       | https://static.googleusercontent.com/media/research.google.c...
       | 
       | > I abandoned the project after it very quickly become apparent
       | that despite having written the service in this super fast, brand
       | new language called golang, the service just wasn't fast enough
       | to handle the scale we threw at it.
       | 
       | This makes me think the author wishes to use the distributed lock
       | service for some purpose that's not well served by distributed
       | locks. It's not that distributed locks are bad, it's just that
       | the author seems to have a particular use case already in mind
       | that's poorly suited to a distributed lock service.
        
         | mgaunard wrote:
         | The mistake was using golang apparently.
         | 
         | If you need high-performance and fine control of
         | synchronization, just use a low-level systems programming
         | language.
        
           | Animats wrote:
           | You can get into that type of lock congestion trouble in any
           | language. It's an algorithm problem, not a language problem.
           | 
           | I discovered last year that Wine has terrible internal lock
           | problems inside its user-side storage allocator. That's in C.
           | If you have enough threads calling "realloc", the allocator
           | goes into futex congestion collapse and performance drops by
           | two orders of magnitude. My graphics program went from 60 FPS
           | to 0.5 FPS. They optimized too hard for the no-congestion
           | case.
           | 
           | This is a Wine-only problem; Microsoft's own code doesn't
           | have this problem.
           | 
           | I've had lock congestion problems in Rust. Sometimes you need
           | a fair mutex, or something gets frozen out. Both fair and
           | non-fair mutexes are available; see the "parking_lot" crate.
           | 
           | There's a place inside WGPU that has a lock congestion
           | problem in one of three locks, and I'm going to have to add
           | more profiling to someone else's code to find that. I can see
           | the problem with Tracy, but need to add more profiling scopes
           | to narrow it down.
           | 
           | But that is high-performance graphics stuff, where
           | microseconds count. Sending spam (OK, bulk marketing emails)
           | doesn't need to be that tightly coupled. Mailing list removal
           | runs on a timescale of days, not milliseconds. What else in
           | that space has to be tightly interlocked?
        
             | forrestthewoods wrote:
             | > I can see the problem with Tracy, but need to add more
             | profiling scopes to narrow it down.
             | 
             | If you can run on Windows try Superluminal.
             | 
             | https://superluminal.eu/
        
             | mgaunard wrote:
             | A good language just gives you the necessary tooling to do
             | whatever you want, it doesn't magically fix problems.
             | 
             | Only languages like C++ have a memory model that allows you
             | to do lock-free programming for example (C and Rust copied
             | the C++ model).
             | 
             | Also, what kind of serious person allocates memory from the
             | system allocator in a real-time loop? Your problems seem
             | self-inflicted. Regardless there are many allocators that
             | optimize for concurrent allocations: tcmalloc, jemalloc,
             | mimalloc...
        
             | rowanG077 wrote:
             | Considering the vast number of programs that wine works
             | extremely well with I'm not so sure they spent too much
             | optimizing the no-congestion case. You are just doing
             | something extremely quirky in your program.
        
           | faitswulff wrote:
           | I suspect that when it was chosen at Mailgun, Golang was
           | still being billed as a systems programming language.
        
             | neonsunset wrote:
             | Exactly.
             | 
             | Go makes you think you control the details except you
             | don't. Hackernews makes you think you don't control the
             | details in C# except you do.
             | 
             | Yet another project that would have been able to solve its
             | woes if it had picked a better option.
        
           | bee_rider wrote:
           | Does the language make a huge difference here? In a
           | distributed system a signal to be sent over the network
           | travels at the same speed whether it was transmitted by a C
           | or Python program, right?
        
           | jeffbee wrote:
           | It would take you 1 minute to write a Go application that
           | exploits per-CPU data structures and probably years to debug
           | your attempt to do so in C.
        
             | convolvatron wrote:
             | actually its really _hard_ in go to make cpu bound control
             | flow, state, and allocation. do goroutines have any notion
             | of locality? i've been looking and haven't been able to
             | find anything
        
               | jeffbee wrote:
               | Go comes out of the box with sync.Pool
        
         | candiddevmike wrote:
         | What's an OSS equivalent to Chubby? Etcd?
        
           | antoinealb wrote:
           | etcd is fairly similar to Chubby indeed, or Zookeeper.
        
           | zX41ZdbW wrote:
           | ZooKeeper or ClickHouse Keeper, Etcd, Consul.
        
           | chucky_z wrote:
           | Consul has an entire locking system that's based on Chubby in
           | the KV system. I've used it at pretty big scales and it seems
           | fine.
           | 
           | If you want actual OSS you can use an older release, the KV
           | system guts haven't changed much in awhile iirc.
        
         | jeffbee wrote:
         | > author wishes to use the distributed lock service for some
         | purpose that's not well served by distributed locks
         | 
         | Exactly. It should be clear that a distributed lock service has
         | a finite, and low, overall rate of progress. It obviously
         | cannot be in the critical path of every transaction globally.
         | But when using it for events which rarely happen, such as
         | electing a new master of a partition of a database, or some
         | other thing that happens once a week, then the low throughput
         | is not an issue.
        
         | jimmaswell wrote:
         | I ddon't know why they'd expect Go to be a panacea. It has
         | everything to do with architecture.
        
       | neonsunset wrote:
       | Because Go is ill-suited for projects like these :)
       | 
       | Just like synchronization, Go scales poorly with project
       | complexity and aspirations. It is unable to efficiently take
       | advantage of beefy many-core nodes, its GC is bad at achieving
       | high throughput and the language itself is incapable of providing
       | zero-cost abstractions and non-zero cost abstractions end up
       | being more expensive than in other compiled languages.
       | 
       | Should have picked .NET for scalability and low-level tuning. Or
       | Rust if higher engineering cost is acceptable.
        
         | thanhhaimai wrote:
         | These are a lot of extraordinary claims. I would love to learn
         | more about the evidence and what make you think this way. There
         | are many successful companies that built scalable services with
         | golang, so the burden of proof for the above claims are high.
        
           | neonsunset wrote:
           | "Set 50 instead of 4 to replica count" and "continues scaling
           | when you give it something that isn't anemic 2CPU 512Mi
           | container" are two very different things :)
           | 
           | In either case, seeing undeserved praise of Go is the most
           | expected outcome from the audience that used to praise e.g.
           | Elizabeth Holmes.
           | 
           | Please do look at the way its internals work, and compare the
           | compiler output (and I mean not the useless ASM that Go's
           | disasm outputs but what is shown by e.g. Ghidra) and
           | primitive overhead with Rust, C# and even Kotlin.
           | 
           | Also hearing about Go's concurrent primitives makes me laugh.
           | CSP is a 40 years old concept, and Go bolted itself to it,
           | while also learning nothing from other languages to be
           | modern, therefore can't be effectively used for more advanced
           | concurrency scenarios nor enables you to remove overhead when
           | needed.
           | 
           | Try writing a concurrent data structure in Go that performs
           | as fast as in C++. C# let's you do that. Go - not so much.
        
             | kbolino wrote:
             | A language that officially recommends "don't communicate by
             | sharing memory; share memory by communicating" is obviously
             | not meant to implement concurrent data structures well.
             | This means, in some cases, it is indeed the wrong language
             | to use. Go is no longer positioned at the low-level
             | "systems programming" segment and hasn't been since not
             | long after its release. It's better to think of it as
             | "compiled concurrent Python" than "a competitor to Rust".
        
           | guilhas wrote:
           | Even the article solves the problem with Go still
        
         | BobbyJo wrote:
         | > Just like synchronization, Go scales poorly with project
         | complexity and aspirations.
         | 
         | I've had the opposite experience. Compared to C++ codebases
         | I've worked on, Golang has way less "complexity per feature".
         | 
         | > It is unable to efficiently take advantage of beefy many-core
         | nodes
         | 
         | Go had the best built-in concurrency primitives of any language
         | I've used (python, java, rust, c++/c).
         | 
         | > its GC is bad at achieving high throughput
         | 
         | Isn't this just the nature of using GC? You give up control
         | over when some work gets done, of course it affects throughput.
         | 
         | > incapable of providing zero-cost abstractions
         | 
         | ime, bad abstractions slow systems of any appreciable
         | complexity a lot more than the cost of the abstractions in
         | latency.
         | 
         | > non-zero cost abstractions end up being more expensive than
         | in other compiled languages.
         | 
         | Do you consider Java a compiled language? I mean, of course its
         | slower than rust and c/++, as it has a more complex/abstracted
         | runtime. But last I heard, it was on par, or even faster, than
         | java.
        
           | TylerE wrote:
           | Plus the usual mistake of assuming (re: GC) that the system
           | (de)allocators are free. They weren't. I strongly suspect
           | that GCs are extremely competitive in most non-trivial
           | applications.
           | 
           | Don't judge every GC based on experience with the Sun JRE 20
           | years ago.
        
             | neonsunset wrote:
             | They are, but not so much in the case of Go - it is
             | optimized for small containers with few CPU cores, that do
             | not need to sustain (inevitable) high allocation
             | throughput.
             | 
             | This is what happens when it's in different conditions, in
             | a test that is still very friendly to Go using the stack
             | which it's supposed to be very strong at:
             | https://github.com/LesnyRumcajs/grpc_bench/discussions/441
             | Note the CPU % usage. The numbers only get more interesting
             | once you start using 64 core hosts.
             | 
             | You could also look at write
             | barrier/synchronization/allocation sensitive microbenchmark
             | instead which paints way less positive picture:
             | https://benchmarksgame-
             | team.pages.debian.net/benchmarksgame/... (special props to
             | OpenJDK's write barrier elision, this is something .NET
             | needs to do better at still)
        
           | jpgvm wrote:
           | > it was on par, or even faster, than java.
           | 
           | This is rarely the case outside of very simplified cases.
           | 
           | Golang prioritises compilation speed over optimisation.
           | Unlike Rust/C/C++ which either use one of the heavily
           | optimising IR compilers (gcc or LLVM) Golang uses neither,
           | preferring to implement piecemeal the optimisations that work
           | well on relatively simple code - in order to keep the
           | compiler very fast and simple.
           | 
           | Java is the opposite of this. It has multiple compilation
           | stages, the first of which is the bytecode compiler javac but
           | then also the runtime JIT compiler, both of which favor peak
           | performance (i.e optimisation).
           | 
           | Golang can appear faster on many micro-benchmarks but for any
           | highly numeric load or remotely generic code using interfaces
           | etc become involved then Java is going to come out on top and
           | it won't be particularly close.
           | 
           | This isn't to say you can't write very fast Golang code, you
           | can. If you understand (or simply inspect generated
           | code/profile) what the compiler can and can't do for you you
           | can coax it into making fast machine code but you need to be
           | vigilant and modifying tight loops can be more troublesome
           | than equivalent fast Java code.
           | 
           | I have written very fast stuff in both languages and if I
           | needed to pick a language for peak runtime perf it would be
           | Java out of those two every time.
           | 
           | Note this is completely ignoring the very weak throughput of
           | the Golang GC which is another easy way to get into very bad
           | performance problems with Golang that can be hard to resolve.
        
       | 23B1 wrote:
       | I'm curious if HN readers see articles like this and transpose
       | the logic to other areas - like management or leadership. Open
       | question, just curious.
        
         | tinix wrote:
         | yep this is a physical problem as much as it's logical. but I'm
         | very much an abstract systems thinker.
         | 
         | it reminds me of lean wastes... kinda.
         | 
         | organizing better and having autonomous units of work would be
         | more efficient than bottlenecking decisions to a shared
         | resource.
         | 
         | this applies to social systems the same as physical ones, for
         | sure.
        
       | dig1 wrote:
       | Without more details about the project requirements, my initial
       | impression is that the author could get most of these features
       | from Zookeeper, which is designed for stuff like distributed
       | locks and synchronization.
       | 
       | However I've seen the pattern to use the database for a complete
       | system synchronization in multiple projects. Don't have to say
       | how this is a bad thing - probably easy to start with but a
       | nightmare to scale later, just like the author mentioned in the
       | article.
        
       | ibash wrote:
       | Did I miss it or nowhere did he talk about the problem he was
       | trying to solve?
       | 
       | Any solution can be bad at scale if it doesn't fit your
       | problem... unfortunately there's no way to know without stating
       | the problem.
        
       | sitkack wrote:
       | I would use Aerospike, it has strong consistency or session
       | consistency and will soon get multirecord transactions.
       | 
       | https://dbdb.io/db/aerospike
       | 
       | And also see the second bullet point from this Guy Steele talk
       | from 2010 Strange Loop. Synchronization and ordering and a lack
       | of idempotency need to be heavily justified.
       | 
       | "How to Think about Parallel Programming: Not!" - Guy L. Steele
       | Jr. (Strange Loop 2010)
       | https://www.youtube.com/live/dPK6t7echuA?app=desktop&t=1747s
       | 
       | transcript https://github.com/matthiasn/talk-
       | transcripts/blob/master/St... thanks matthiasn!
       | 
       | Previous discussion https://news.ycombinator.com/item?id=2105661
        
       | tigron wrote:
       | Synchronization scales for redundancy purposes unless you are
       | single core.
        
         | mrkeen wrote:
         | or your readers are blocked by your writer
        
       ___________________________________________________________________
       (page generated 2024-07-07 23:00 UTC)