[HN Gopher] Synchronization Is Bad for Scale
___________________________________________________________________
Synchronization Is Bad for Scale
Author : thunderbong
Score : 54 points
Date : 2024-07-07 17:03 UTC (5 hours ago)
(HTM) web link (wippler.dev)
(TXT) w3m dump (wippler.dev)
| kccqzy wrote:
| The author abandoned the distributed lock service he/she is
| writing, and it might benefit to read the original Google paper
| on their distributed lock service Chubby to understand why it is
| so enduring at Google:
| https://static.googleusercontent.com/media/research.google.c...
|
| > I abandoned the project after it very quickly become apparent
| that despite having written the service in this super fast, brand
| new language called golang, the service just wasn't fast enough
| to handle the scale we threw at it.
|
| This makes me think the author wishes to use the distributed lock
| service for some purpose that's not well served by distributed
| locks. It's not that distributed locks are bad, it's just that
| the author seems to have a particular use case already in mind
| that's poorly suited to a distributed lock service.
| mgaunard wrote:
| The mistake was using golang apparently.
|
| If you need high-performance and fine control of
| synchronization, just use a low-level systems programming
| language.
| Animats wrote:
| You can get into that type of lock congestion trouble in any
| language. It's an algorithm problem, not a language problem.
|
| I discovered last year that Wine has terrible internal lock
| problems inside its user-side storage allocator. That's in C.
| If you have enough threads calling "realloc", the allocator
| goes into futex congestion collapse and performance drops by
| two orders of magnitude. My graphics program went from 60 FPS
| to 0.5 FPS. They optimized too hard for the no-congestion
| case.
|
| This is a Wine-only problem; Microsoft's own code doesn't
| have this problem.
|
| I've had lock congestion problems in Rust. Sometimes you need
| a fair mutex, or something gets frozen out. Both fair and
| non-fair mutexes are available; see the "parking_lot" crate.
|
| There's a place inside WGPU that has a lock congestion
| problem in one of three locks, and I'm going to have to add
| more profiling to someone else's code to find that. I can see
| the problem with Tracy, but need to add more profiling scopes
| to narrow it down.
|
| But that is high-performance graphics stuff, where
| microseconds count. Sending spam (OK, bulk marketing emails)
| doesn't need to be that tightly coupled. Mailing list removal
| runs on a timescale of days, not milliseconds. What else in
| that space has to be tightly interlocked?
| forrestthewoods wrote:
| > I can see the problem with Tracy, but need to add more
| profiling scopes to narrow it down.
|
| If you can run on Windows try Superluminal.
|
| https://superluminal.eu/
| mgaunard wrote:
| A good language just gives you the necessary tooling to do
| whatever you want, it doesn't magically fix problems.
|
| Only languages like C++ have a memory model that allows you
| to do lock-free programming for example (C and Rust copied
| the C++ model).
|
| Also, what kind of serious person allocates memory from the
| system allocator in a real-time loop? Your problems seem
| self-inflicted. Regardless there are many allocators that
| optimize for concurrent allocations: tcmalloc, jemalloc,
| mimalloc...
| rowanG077 wrote:
| Considering the vast number of programs that wine works
| extremely well with I'm not so sure they spent too much
| optimizing the no-congestion case. You are just doing
| something extremely quirky in your program.
| faitswulff wrote:
| I suspect that when it was chosen at Mailgun, Golang was
| still being billed as a systems programming language.
| neonsunset wrote:
| Exactly.
|
| Go makes you think you control the details except you
| don't. Hackernews makes you think you don't control the
| details in C# except you do.
|
| Yet another project that would have been able to solve its
| woes if it had picked a better option.
| bee_rider wrote:
| Does the language make a huge difference here? In a
| distributed system a signal to be sent over the network
| travels at the same speed whether it was transmitted by a C
| or Python program, right?
| jeffbee wrote:
| It would take you 1 minute to write a Go application that
| exploits per-CPU data structures and probably years to debug
| your attempt to do so in C.
| convolvatron wrote:
| actually its really _hard_ in go to make cpu bound control
| flow, state, and allocation. do goroutines have any notion
| of locality? i've been looking and haven't been able to
| find anything
| jeffbee wrote:
| Go comes out of the box with sync.Pool
| candiddevmike wrote:
| What's an OSS equivalent to Chubby? Etcd?
| antoinealb wrote:
| etcd is fairly similar to Chubby indeed, or Zookeeper.
| zX41ZdbW wrote:
| ZooKeeper or ClickHouse Keeper, Etcd, Consul.
| chucky_z wrote:
| Consul has an entire locking system that's based on Chubby in
| the KV system. I've used it at pretty big scales and it seems
| fine.
|
| If you want actual OSS you can use an older release, the KV
| system guts haven't changed much in awhile iirc.
| jeffbee wrote:
| > author wishes to use the distributed lock service for some
| purpose that's not well served by distributed locks
|
| Exactly. It should be clear that a distributed lock service has
| a finite, and low, overall rate of progress. It obviously
| cannot be in the critical path of every transaction globally.
| But when using it for events which rarely happen, such as
| electing a new master of a partition of a database, or some
| other thing that happens once a week, then the low throughput
| is not an issue.
| jimmaswell wrote:
| I ddon't know why they'd expect Go to be a panacea. It has
| everything to do with architecture.
| neonsunset wrote:
| Because Go is ill-suited for projects like these :)
|
| Just like synchronization, Go scales poorly with project
| complexity and aspirations. It is unable to efficiently take
| advantage of beefy many-core nodes, its GC is bad at achieving
| high throughput and the language itself is incapable of providing
| zero-cost abstractions and non-zero cost abstractions end up
| being more expensive than in other compiled languages.
|
| Should have picked .NET for scalability and low-level tuning. Or
| Rust if higher engineering cost is acceptable.
| thanhhaimai wrote:
| These are a lot of extraordinary claims. I would love to learn
| more about the evidence and what make you think this way. There
| are many successful companies that built scalable services with
| golang, so the burden of proof for the above claims are high.
| neonsunset wrote:
| "Set 50 instead of 4 to replica count" and "continues scaling
| when you give it something that isn't anemic 2CPU 512Mi
| container" are two very different things :)
|
| In either case, seeing undeserved praise of Go is the most
| expected outcome from the audience that used to praise e.g.
| Elizabeth Holmes.
|
| Please do look at the way its internals work, and compare the
| compiler output (and I mean not the useless ASM that Go's
| disasm outputs but what is shown by e.g. Ghidra) and
| primitive overhead with Rust, C# and even Kotlin.
|
| Also hearing about Go's concurrent primitives makes me laugh.
| CSP is a 40 years old concept, and Go bolted itself to it,
| while also learning nothing from other languages to be
| modern, therefore can't be effectively used for more advanced
| concurrency scenarios nor enables you to remove overhead when
| needed.
|
| Try writing a concurrent data structure in Go that performs
| as fast as in C++. C# let's you do that. Go - not so much.
| kbolino wrote:
| A language that officially recommends "don't communicate by
| sharing memory; share memory by communicating" is obviously
| not meant to implement concurrent data structures well.
| This means, in some cases, it is indeed the wrong language
| to use. Go is no longer positioned at the low-level
| "systems programming" segment and hasn't been since not
| long after its release. It's better to think of it as
| "compiled concurrent Python" than "a competitor to Rust".
| guilhas wrote:
| Even the article solves the problem with Go still
| BobbyJo wrote:
| > Just like synchronization, Go scales poorly with project
| complexity and aspirations.
|
| I've had the opposite experience. Compared to C++ codebases
| I've worked on, Golang has way less "complexity per feature".
|
| > It is unable to efficiently take advantage of beefy many-core
| nodes
|
| Go had the best built-in concurrency primitives of any language
| I've used (python, java, rust, c++/c).
|
| > its GC is bad at achieving high throughput
|
| Isn't this just the nature of using GC? You give up control
| over when some work gets done, of course it affects throughput.
|
| > incapable of providing zero-cost abstractions
|
| ime, bad abstractions slow systems of any appreciable
| complexity a lot more than the cost of the abstractions in
| latency.
|
| > non-zero cost abstractions end up being more expensive than
| in other compiled languages.
|
| Do you consider Java a compiled language? I mean, of course its
| slower than rust and c/++, as it has a more complex/abstracted
| runtime. But last I heard, it was on par, or even faster, than
| java.
| TylerE wrote:
| Plus the usual mistake of assuming (re: GC) that the system
| (de)allocators are free. They weren't. I strongly suspect
| that GCs are extremely competitive in most non-trivial
| applications.
|
| Don't judge every GC based on experience with the Sun JRE 20
| years ago.
| neonsunset wrote:
| They are, but not so much in the case of Go - it is
| optimized for small containers with few CPU cores, that do
| not need to sustain (inevitable) high allocation
| throughput.
|
| This is what happens when it's in different conditions, in
| a test that is still very friendly to Go using the stack
| which it's supposed to be very strong at:
| https://github.com/LesnyRumcajs/grpc_bench/discussions/441
| Note the CPU % usage. The numbers only get more interesting
| once you start using 64 core hosts.
|
| You could also look at write
| barrier/synchronization/allocation sensitive microbenchmark
| instead which paints way less positive picture:
| https://benchmarksgame-
| team.pages.debian.net/benchmarksgame/... (special props to
| OpenJDK's write barrier elision, this is something .NET
| needs to do better at still)
| jpgvm wrote:
| > it was on par, or even faster, than java.
|
| This is rarely the case outside of very simplified cases.
|
| Golang prioritises compilation speed over optimisation.
| Unlike Rust/C/C++ which either use one of the heavily
| optimising IR compilers (gcc or LLVM) Golang uses neither,
| preferring to implement piecemeal the optimisations that work
| well on relatively simple code - in order to keep the
| compiler very fast and simple.
|
| Java is the opposite of this. It has multiple compilation
| stages, the first of which is the bytecode compiler javac but
| then also the runtime JIT compiler, both of which favor peak
| performance (i.e optimisation).
|
| Golang can appear faster on many micro-benchmarks but for any
| highly numeric load or remotely generic code using interfaces
| etc become involved then Java is going to come out on top and
| it won't be particularly close.
|
| This isn't to say you can't write very fast Golang code, you
| can. If you understand (or simply inspect generated
| code/profile) what the compiler can and can't do for you you
| can coax it into making fast machine code but you need to be
| vigilant and modifying tight loops can be more troublesome
| than equivalent fast Java code.
|
| I have written very fast stuff in both languages and if I
| needed to pick a language for peak runtime perf it would be
| Java out of those two every time.
|
| Note this is completely ignoring the very weak throughput of
| the Golang GC which is another easy way to get into very bad
| performance problems with Golang that can be hard to resolve.
| 23B1 wrote:
| I'm curious if HN readers see articles like this and transpose
| the logic to other areas - like management or leadership. Open
| question, just curious.
| tinix wrote:
| yep this is a physical problem as much as it's logical. but I'm
| very much an abstract systems thinker.
|
| it reminds me of lean wastes... kinda.
|
| organizing better and having autonomous units of work would be
| more efficient than bottlenecking decisions to a shared
| resource.
|
| this applies to social systems the same as physical ones, for
| sure.
| dig1 wrote:
| Without more details about the project requirements, my initial
| impression is that the author could get most of these features
| from Zookeeper, which is designed for stuff like distributed
| locks and synchronization.
|
| However I've seen the pattern to use the database for a complete
| system synchronization in multiple projects. Don't have to say
| how this is a bad thing - probably easy to start with but a
| nightmare to scale later, just like the author mentioned in the
| article.
| ibash wrote:
| Did I miss it or nowhere did he talk about the problem he was
| trying to solve?
|
| Any solution can be bad at scale if it doesn't fit your
| problem... unfortunately there's no way to know without stating
| the problem.
| sitkack wrote:
| I would use Aerospike, it has strong consistency or session
| consistency and will soon get multirecord transactions.
|
| https://dbdb.io/db/aerospike
|
| And also see the second bullet point from this Guy Steele talk
| from 2010 Strange Loop. Synchronization and ordering and a lack
| of idempotency need to be heavily justified.
|
| "How to Think about Parallel Programming: Not!" - Guy L. Steele
| Jr. (Strange Loop 2010)
| https://www.youtube.com/live/dPK6t7echuA?app=desktop&t=1747s
|
| transcript https://github.com/matthiasn/talk-
| transcripts/blob/master/St... thanks matthiasn!
|
| Previous discussion https://news.ycombinator.com/item?id=2105661
| tigron wrote:
| Synchronization scales for redundancy purposes unless you are
| single core.
| mrkeen wrote:
| or your readers are blocked by your writer
___________________________________________________________________
(page generated 2024-07-07 23:00 UTC)