[HN Gopher] Must be this tall to write multi-threaded code (2015)
___________________________________________________________________
Must be this tall to write multi-threaded code (2015)
Author : luu
Score : 81 points
Date : 2022-06-09 18:53 UTC (1 days ago)
(HTM) web link (bholley.net)
(TXT) w3m dump (bholley.net)
| bena wrote:
| I think warnings like this are good and necessary. A lot of
| people are going to ignore this warning simply because they
| wrongly believe they're good enough to ignore it. And they're
| going to try and write this massive project simply to prove that
| the warning doesn't apply to them. And they will serve as great
| examples of why that warning is there in the first place.
|
| But there's another group. This group heeds the warning. They do
| their best to not violate the warning. But then they run into an
| issue. They exhaust every possible option trying to resolve the
| issue. But eventually, they realize that if they can violate the
| warning, just a little bit. And they're careful. And they violate
| the warning in the least way possible and take every action
| deliberately. And it works.
|
| Because the warning doesn't mean it's impossible, it just means
| that it is incredibly difficult and should be approached with the
| utmost care.
|
| Eventually, we do learn the pitfalls, internalize them, abstract
| them away, etc. And then we can even do away with the warning.
| Because it's no longer the edge of the universe, it's just
| another stop along the way.
| bee_rider wrote:
| Just sprinkle in some OMP PARALLEL DO's it'll be fine.
| bitwize wrote:
| I knew I was right to trust Steve Summit with multithreaded stuff
| back when I worked with him!
| robocat wrote:
| > Buggy multi-threaded code creates race conditions, which are
| the most dangerous and time-consuming class of bugs in software.
| > safe and scalable parallelism is achievable by minimizing or
| eliminating concurrent access to shared mutable state. > Rust is
| designed from the ground up to facilitate this
|
| Does async code with mutable state sometimes result in race
| conditions in rust?
|
| I have had that experience with JavaScript. For example old
| school short timeouts with a function written with a closure
| implicitly expects the state to remain the same but a race
| condition happens asynchronously to change the state, and then
| the closure is called, and then the closure code fails. A problem
| of unintuitive surprise to the programmer that can happen with
| async code (even without thread concurrency).
| buybackoff wrote:
| I would change _to edit_ MT code. It usually starts with a clear
| picture in mind... But even one 's own code quickly becomes
| cryptic without lots of comments and hidden assumptions. I used
| to have this picture printed and put above a server in a small
| startup-like team. The whole thing ended not related to any lock
| or interlocked call though.
| dang wrote:
| Discussed at the time:
|
| _Must Be This Tall to Write Multi-Threaded Code_ -
| https://news.ycombinator.com/item?id=9905374 - July 2015 (112
| comments)
| eftychis wrote:
| The problem with concurrency and parallelism is ownership --
| mainly that is write access than coherence. I have seen again and
| again the bad habit of people to bypass well defined set ups of
| data ownership because they feel they are smarter, it is
| unnecessary, we will figure it out down the road, and my
| favourite: it doesn't solve deadlocks so why use anything as some
| bug families remain.
|
| The issue is not of frameworks or type systems but of companies
| and people. Engineers know this stuff by training, but either are
| new and commit hubris or management makes them (or maybe some of
| them are simply untrained -- and somehow never went to a bank
| (queues)).
|
| P.S. For fun instance, I have seen unsafe used to bypass Rust's
| type and ownership system to keep track of db snapshots in a
| "nice setup." Rust was not wrong.
| YesBox wrote:
| I recently wrote some multi-threaded code (for the first time)
| for a game I'm working on. Luckily the data I need to process is
| in a (C++) vector, so I was able to break the data up by dividing
| the vector size by available cores. I certainly felt 10 ft. tall
| when it worked!
|
| Another area of my code where concurrency will greatly benefit it
| requires updating an unordered_map. Since it's positioned based
| and not unit_id based, I'll need to figure out a way to prevent
| data races. The best option I've thought of so far is assigning
| thread numbers to each edge in my graph, and storing the units in
| each edge in a queue, which should prevent data races and car
| collisions. (edit: ha)
|
| That being said, I'm the sole developer on this project and I
| cannot imagine working on a multi-threaded application with
| dozens of other developers. It is nice dipping my toes in the
| water though.
| kerblang wrote:
| Most of us work in information systems, which is to say, doing
| database stuff all day, usually with an RDBMS like MySQL,
| Postgres etc. And what are we doing? Grabbing a bunch of
| transactional locks. Against what? Shared mutable state, and not
| in-memory state, but database state counting into the thousands,
| millions, billions or more records. And if we do it wrong?
| Deadlock, often deadlock across multiple processes. And what do
| we do about that? We deal with it, because what else can you do?
|
| You could of course throw transactions away and declare them more
| trouble than they're worth, but that probably won't go over well.
| I've found it helps to have a "consistent first lock", so for
| example in a consumer-based system you'd lock the customer record
| first, because most transactional operations happen in that
| context. If you always have that consistent first lock then
| deadlock can't happen.
|
| My point is that if I assert "multithreading is unacceptable!",
| most of business information systems goes out the window on the
| same principle, because the locking is even more dastardly -
| multi-process and persistent state instead of in-process in-
| memory state. I don't think you could throw
| actors/schmactors/coroutines/goroutines/etc at such usefully
| either. And if you said "Well only the best programmers need
| apply," well BIS is not exactly hotshot heaven, by and large.
| It's a bunch of grunts.
|
| So I agree multithreading is hard when you are locking many
| things at once, which I try to avoid. But I just don't get this
| thing of trying to banish it or make it the exclusive domain of
| geniuses.
| quantified wrote:
| 2015
| gumby wrote:
| Not that anything has improved since then.
| xwdv wrote:
| Aye, it feels like we've mostly reached a plateau and arrived
| here through short terminism.
| hedora wrote:
| Well, Rust is replacing C++ incredibly quickly.
| spacechild1 wrote:
| Citation needed.
| hedora wrote:
| Amazon somewhat quietly replaced the S3 service
| implementation with a rust reimplementation, then
| announced it after the transition landed without making
| any bad headlines.
|
| I've heard Microsoft has moved to Rust for all new
| projects (there is an exception process for C++, of
| course).
|
| Parts of Chrome use it now.
|
| Support for using Rust in the mainline Linux kernel
| should land soon.
| Kab1r wrote:
| Go, Erlang/Elixir maybe, but as much as I like Rust, it has
| not made writing _concurrent_ programs easier imo.
| hedora wrote:
| I write concurrent systems code for a living. Rust is
| much easier to get right than go, because it uses the
| borrow checker to statically check for correct
| synchronization. (Check out tokio.)
|
| I haven't used erlang/elixir.
| ntoskrnl wrote:
| Seems to be a theme at Mozilla.
|
| > Must be this tall to use `unsafe`
|
| https://i.stack.imgur.com/w9XMh.jpg
| khuey wrote:
| That's more or less the same spot in Mozilla's SF office.
| kcl wrote:
| I've written on this problem before:
| https://kevinlawler.com/parallel
|
| One issue is that the pthread model is too low level (think AND
| and OR gates), but you must include it for completeness.
|
| I agree with the comments here that say higher-level
| multithreading support needs to be "baked in" to the language.
| Grand Central Dispatch did a good job of raising the bar here,
| but has its own problems, and is arguably too low level.
|
| Another issue is that existing, widespread languages are too old
| to have cooked higher-level threading support in. You really have
| to do it from the start. C++ bolt-on parallelism might as well be
| syntactic sugar. Newer languages, like Rust, have attempted to
| solve concurrency problems but failed - it's not clear that the
| designers genuinely understand concurrency at all. Goroutines
| solve one limited issue of many and seem about as difficult as
| asking programmers to write threaded programs.
|
| To do this correctly you need a new language that cooks it in
| from the start. There is a laundry list of things that have to be
| done precisely. It might be possible to do this by rewriting a
| reference implementation of an existing popular language, but the
| development overhead and performance concerns involved in
| backwards-compatibility make that probably infeasible. (Python
| can't even escape it's global-interpreter lock.) So likely the
| language that solves this problem is either unwritten or unknown.
|
| Other mentions in the comments here:
|
| ThreadSanitizer: It's not very good, nor can it be.
|
| Erlang: solves the problem, but the actor formulation is too
| awkward for all but Erlang-specific use-cases.
| lostcolony wrote:
| >> Erlang: solves the problem, but the actor formulation is too
| awkward for all but Erlang-specific use-cases.
|
| Err? I mean, it requires rethinking a bunch of the lessons you
| learn when concurrency is hard and expensive and to be
| minimized, sure. But when concurrency is easy and cheap and to
| be embraced, a bunch of things become a lot easier and more
| natural to write.
|
| My go to example, from a production system, was a complicated
| task scheduling system (with user and hardware interrupts to
| handle). Sure, we could have written it using threads and a
| priority queue or something...instead we just had one Erlang
| process per task, with interrupts routed to that process as
| messages (and linked timer processes that fired messages to
| those processes for the scheduled stuff, essentially just
| another type of interrupt/message). Super simple to write,
| super simple to reason about, never had any sort of race
| condition or locking issues, across years of development and
| production use.
| waynesonfire wrote:
| > Erlang: solves the problem, but the actor formulation is too
| awkward for all but Erlang-specific use-cases.
|
| Play with Erlang a bit and you'll find the modern band-aid
| pattern of async / await awkward.
| tialaramex wrote:
| > it's critical to explore ways to incrementally add safe
| concurrency in C++.
|
| I don't buy it. Some things are foundational and attempts to add
| them incrementally are doomed because your foundations won't
| support the work. After building them easy things seem easy - but
| then hard things tear out your inadequately founded "incremental
| addition" and make a huge mess everywhere.
|
| Google has a team doing this sort of thing (indeed David Baron
| might even be on it for all I know), for exactly the same reason,
| they wrote a web browser in C++ and it has threading problems and
| they wish they could fix that. So Mozilla isn't exploring terra
| incognita here, and if there's something to be found I hope they
| find it, but I expect nothing.
| nemothekid wrote:
| > _So Mozilla isn 't exploring terra incognita here, and if
| there's something to be found I hope they find it, but I expect
| nothing._
|
| This post was written in 2015. Looking back I think we can
| infer:
|
| 1. They found a solution (create a new language)
|
| 2. From the success of that language, it seems to deliver on
| its promise (although it has its warts)
|
| 3. They failed to incorporate that into the web browser because
| rewriting a web browser as a momentous task that had little
| measurable business impact; and decided to layoff the engineers
| instead.
|
| Google is a wealthier company, but they may decide that trying
| to "fix" C++ is a more realistic endeavor.
| tialaramex wrote:
| Rust is mentioned in the blog post.
|
| Firefox has a whole bunch of Rust in it, much more than it
| did when that blog post was written. Mozilla let go some core
| Rust people, but of course it still employs numerous Rust
| programmers.
| dblohm7 wrote:
| (Former Mozilla employee here)
|
| Rust is encouraged to be used by all developers who are
| writing native code (it's not supposed to be just the
| domain of a select few). As I recall, the message was, "Not
| knowing Rust must not be an excuse for why you're not using
| Rust." Of course developers must still contend with the
| practicalities of integrating Rust into an existing C++
| codebase, so some guidelines were developed:
|
| https://wiki.mozilla.org/Oxidation
| steveklabnik wrote:
| > They failed to incorporate that into the web browser
|
| Rust is just over 10% of Firefox at the time of writing this
| comment https://4e6.github.io/firefox-lang-stats/
|
| > and decided to layoff the engineers instead.
|
| They didn't lay off people writing Rust in Firefox, they laid
| off people working on Rust itself.
| nlewycky wrote:
| I don't have a photo but we used to have the same sign at Google,
| with an added note "unless accompanied by ThreadSanitizer".
|
| At runtime: https://clang.llvm.org/docs/ThreadSanitizer.html
|
| At compile-time:
| https://clang.llvm.org/docs/ThreadSafetyAnalysis.html
|
| The article points out that that: The resulting
| invariants end up being documented in comments: //
| These variables are protected by monitor X:
|
| but you can write those in C++ code and have the compiler check
| them. That's what thread safety annotations are for.
| kubanczyk wrote:
| TIL that, so predictably, "[Go's] race detector is based on the
| C/C++ ThreadSanitizer runtime library". So, I can definitely
| confirm its usefulness. https://go.dev/blog/race-detector
| new_stranger wrote:
| Assuming you don't need bare-level performance and Go is okay, I
| love how easy and approachable parallel code is with Go. Locks
| and channels each offer trade offs you can easily benchmark using
| the built in go test.
|
| I've written so many different concurrent approaches to workers,
| pipelines, and multi-stage processing systems and found it always
| easy to reason through ways to limit the number of workers,
| prevent duplicate work, and implement backoffs.
| anonymousDan wrote:
| I don't know, maybe it's the sadist in me but I enjoy the
| challenge of writing fast and correct multi-threaded code. It's a
| valuable skill if you can master it.
| SilasX wrote:
| Skill? It can't be reduced to following easily checkable
| heuristics and rules?
| anonymousDan wrote:
| I don't claim to have mastered it!
| layer8 wrote:
| You mean masochist? ;)
|
| I'm similar, though I wish programming languages would gives us
| more facilities to statically double-check the correctness
| reasoning.
| oriolid wrote:
| Masochist? Think about the poor programmer who has to
| maintain GP's code :)
| tanyar wrote:
| This is the way. That code breaks when someone new comes
| along and makes several modifications that has implications
| which take a long time to rear their ugly head.
|
| Broken multi-threaded code behaves a lot like perfectly
| coded multi-threaded code....until you get a lot of traffic
| through that code, you know, like on a busy holiday weekend
| when lots of revenue is flowing through the business!
| ChrisMarshallNY wrote:
| Low-level thread programming is a minefield. I've been doing
| IRQ/concurrent programming since the 1980s, and still hate it.
| Thread bugs are a nightmare.
|
| Concurrency needs to be "baked into" higher-level languages,
| development APIs, and standard libraries. Basically, covered in
| bubble-wrap. That's starting to happen. It won't be as efficient
| as low-level coding, but it is likely to still have a significant
| impact on the performance of most code.
|
| And folks that are good at low-level threading would be well-
| served to work to support this.
| hinkley wrote:
| I studied distributed computing in college, and I spent a lot
| of time not quite internalizing the fact that since this was an
| elective at one of the highest ranked state schools in the
| nation, probably most other people didn't have the same
| information I did.
|
| I ended up doing a lot of concurrency work early on because I
| was good at it, but over time the evidence just kept piling up
| that nobody else could really follow it, and so I've used it
| less and less over time, and in particular try very hard to
| keep it away from the main flow of the application. It's more
| something I pull out for special occasions, or to explain bugs.
|
| Where a lot of frameworks fail is that while re-entrance and
| concurrency are different problems, they share a lot of
| qualities, both computationally and with the limits of human
| reasoning. Recursive code with side effects looks and acts a
| lot like concurrent code, because here I am looking at some
| piece of data and the moment I look away some other asshole
| changed it out from under me. Most frameworks end up being a
| bad study in recursion, usually in the name of reducing code
| duplication.
|
| Pure functional people love recursive code, but it's the pure
| part that avoids the problems with trying to interlace multiple
| activities at once. Without that, you're trying to eat your
| dessert without eating your vegetables first.
| [deleted]
| nick_ wrote:
| Pony has concurrency baked in via high-perf actor
| implementation. It's really nice. I believe Go has this as
| well, but it also has low-level concurrency API as well?
| skohan wrote:
| Idk I think it is not as difficult as it is sometimes framed. I
| think the key is to design your program/separation of concerns
| in such a way as to make it easy to reason about concurrency,
| and minimize the need for synchronization points.
|
| I think a lot of the problems arise when you just try to write
| normal synchronous code, or to parallelize code which was
| originally synchronous, and don't realize the implicit
| constraints you had been relying on which no longer hold when
| concurrency is introduced.
| ChrisMarshallNY wrote:
| FP is good for adapting to threading, but it has
| difficulties, when applied to things like GUI programming,
| async communications, or device control (places that need
| threading).
|
| A lot of languages are getting things like async/await/actor,
| etc., but even that is still too low-level for a lot of
| programmers. It can easily turn into a lot of single-threaded
| code.
|
| It needs to be completely under the surface. Swift does that
| with generics (I suspect other languages do it, as well). For
| example, you can say Array<Int>, or [Int]. They mean the same
| thing, but one does not have the generic syntax.
|
| If we can do similar stuff with things like mutexes and
| syncing, then it will go a long way towards highly
| performant, safe, code.
| skohan wrote:
| It's interesting that you choose Swift as an example,
| because I think in some ways Swift does a poor job of
| accommodating async using the "bubble wrap" approach.
|
| Specifically I am talking about the default of using atomic
| reference counting on all references. This makes it
| somewhat idiot proof at least with respect to avoiding
| NPE's, but it comes at a huge performance cost (especially
| on x86 which I guess is becoming less of a concern for
| Swift code).
|
| I think this is a fundamental issue: a big part of the
| benefit of concurrency and parallelism is supposed to be
| increased performance. However a lot of the ways to make
| concurrency "generally safe" tend to hamstring performance,
| because you have to protect against all the different types
| od errors programmers can make, which basically means a lot
| of expensive synchronization.
|
| Maybe there is a way to hide this completely, as you say,
| in a performant way. To do this, I think you would almost
| have to have either some kind of really clever risk-free
| data structures, or else a very smart compiler.
|
| Another approach might be to keep concurrency at arms
| length from the average programmer. I.e. hide 90% of it
| inside of libraries, and expose safe interfaces which the
| programmer interacts with in a mostly synchronous way.
| ChrisMarshallNY wrote:
| _> I think in some ways Swift does a poor job of
| accommodating async using the "bubble wrap" approach._
|
| I agree. Async/await has a "kludgy" feel to it (to me).
| It's "added on," not "embedded in." But I'm not a
| language writer, so I don't have alternatives to offer.
|
| I was talking about how it does generics. I think it does
| that quite well.
| dgb23 wrote:
| To me it feels like I'm folding and knotting code.
| skohan wrote:
| I agree with you there. Swift has some really excellent
| syntactic sugar.
| hinkley wrote:
| Based on a non-scientific study, I think the spatial thinkers
| do great with concurrency, the visual thinkers do okay, and
| everyone else is in real trouble. Which reminds me, I need to
| interview my developer friend with aphantasia about how he
| feels about concurrency.
|
| Concurrency should be the sizzle and not the steak, otherwise
| you're reducing the diversity of your team rather
| substantially. Good people are hard enough to find. Driving
| the people you have away doesn't generally end well.
| [deleted]
| SleepyMyroslav wrote:
| just my 2c Friday rant on old article.
|
| Games had to adapt to threading before I even joined. I got lucky
| to work with some very efficient threaded systems. They were
| never correct or safe xD.
|
| But I can share that if you see comment that these variables are
| protected by this lock more that few times per million of lines
| of code performance is not the reason why it is threaded.
|
| From public videos i think best explanation is available from
| Sean Parent "Better Code: Concurrency" [1].
|
| I really would like to see high level or safer languages or
| systems to come to future game platforms. It should be possible.
| It just needs economical factors to kick in. Like people finally
| notice that slow and safe is not going to be any faster next year
| with new hardware.
|
| 1. https://youtu.be/32f6JrQPV8c
| muglug wrote:
| I'm glad that Rust has changed this, at least in my very specific
| case: I wrote a command-line application in PHP and adding
| parallelisation was a long journey of trial and error and
| concurrency bugs that took over a week to get right.
|
| I rewrote the same application in Rust. I followed the Arc/Mutex
| documentation and parallelised it over a few hours. It worked
| perfectly.
| chasil wrote:
| "If the fastest chips have N cores, a mostly-single-threaded
| program can only harness (1/N)th of the available resources. As N
| grows (and it is growing), parallel programs will win."
|
| This win will be constrained by Amdahl's law. A single
| program/process that is multithreaded can effectively use only a
| finite number of cores.
|
| To use more resources, the task must be broken into multiple
| processes.
|
| It can be faster to run $(nproc) number of gzip processes in
| parallel than to use pigz, for example.
|
| https://en.wikipedia.org/wiki/Amdahl's_law
| spacechild1 wrote:
| > This win will be constrained by Amdahl's law. A single
| program/process that is multithreaded can effectively use only
| a finite number of cores.
|
| This assumes that the workload is fixed. A classical example
| would be compiling source code: you have a fixed number of
| source files and the compiler is only able to parallelize so
| much.
|
| On the other hand, there are applications where more threads
| allow you to increase the workload itself. For example, a
| modern DAW (digital audio workstation) is able to render tracks
| in parallel; if you have more threads, you can use more tracks
| with more plugins. In a computer game, more threads might mean
| more complex simulations, more objects, better graphics, etc.
| cortesoft wrote:
| Also, the major change with having so many cores is the ability
| to do lots of different things at once on a single machine. You
| don't need any single process to utilize all the cores on a
| machine, you take advantage of all those cores by having a
| single machine do a lot do things.
| ajuc wrote:
| Doesn't help you at all if you want the game to run at 60 fps
| and instead you can run it 8 times in parallel at 30 fps
| each.
| chasil wrote:
| It remains important not to buy cores that you cannot use,
| especially if there are other constrained components that
| would have a larger performance impact.
|
| Sophie Wilson (creator of the ARM instruction set) spends
| some time on these issues, also stressing core throttling
| and shutdown due to heat.
|
| https://www.youtube.com/watch?v=zX4ZNfvw1cw
| JoeAltmaier wrote:
| There was a paper long ago that showed duality between
| semaphore/locking code and message-queuing code. So folks figured
| they were the same.
|
| Not so! semaphore/locking is very hard if more than one semaphore
| exists. Look at the 'dining philosophers problem' and so on.
|
| But queuing! Done right, that can be proved correct through
| static analysis. Do threads that consume one queue, block on
| another queue? Draw the graph of queue-blocking - does it loop
| anywhere? Then you could have a problem.
|
| I.e. if your message-consumers don't block at all, then you
| cannot have a problem with deadlock.
|
| You CAN have a problem with messages stalling however -
| languishing on a list waiting for something to complete that
| might never complete. But at runtime this can be debugged fairly
| easily - if all your queues are visible to the debugger.
|
| In the semaphore implementation, the locking situation depends on
| the state of threads and their stacks, which all have to be
| frisked and back-executed to find out who holds what lock etc.
| Not always debugger-friendly to do.
|
| I favor a multi-threading environment of threads dedicated to
| queues, and all the queues visible to the debugger. That kind of
| setup has never done me dirty.
| btilly wrote:
| _I favor a multi-threading environment of threads dedicated to
| queues, and all the queues visible to the debugger. That kind
| of setup has never done me dirty._
|
| You mean like Go does with channels? (Not sure how good their
| visibility is to the debugger though.)
| fmajid wrote:
| It's based on Tony Hoare's _Communicating Sequential
| Processes_ , and is far more comprehensible and possible to
| reason about than primitives like mutexes and semaphores that
| are too close to the underlying hardware implementation.
|
| http://www.usingcsp.com
| lostcolony wrote:
| Probably more like Elixir/Erlang, if we're talking
| programming language. Though even there, BEAM processes are
| the unit of execution, and are multiplexed on threads. Parent
| references OS development elsewhere.
|
| Go has a couple of deviations; channels aren't queues unless
| you are careful to size them > the number of things you'll
| ever put in them (else the thing trying to put something on a
| channel deadlocks; you can of course timeout on it, but it
| presents tight coupling in that case, the goroutine sending
| has to care about the state of the thing receiving),
| goroutines aren't dedicated to channels (that is, there is an
| m-n relationship between channels and goroutines which can
| lead to a lot of incidental complexity), and, you can see
| what is on a channel if you have a breakpoint, but that
| assumes you can get the code to break inside of something
| containing a reference to the channel.
| convolvatron wrote:
| its a _little_ more costly depending, but otherwise I agree
| with you. it also clearly delineates what data is shared and
| everything else can be assumed private...so it helps make the
| overall architecture more explicit
| Etheryte wrote:
| This sounds interesting and I've implemented simple ideas
| similar to the pattern you describe before, however I haven't
| read about its use in depth. Do you happen know of an
| article/book/resource that describes this along with real world
| experiences? If not, would you mind writing a blog post or
| article on it please?
| JoeAltmaier wrote:
| I had the benefit of cutting my teeth at my first job,
| working on a message-passing OS. CTOS, based on RMX/86 used
| messages for everything. It was a very early networked
| computing system. Diskless workstations where file messages
| simply got forwarded to a server etc. And all the messages
| and queues were visible in the debugger!
|
| So I learned good thread hygiene right out of school.
| dekhn wrote:
| the only model I use with threads is either pre-static (start a
| function on each thread with unique parameters, result is
| returned to a driving function at the end) or message-queue
| (each thread is blocking, waiting for a message). For the
| former, I use shared memory, but it's read only and guaranteed
| to live longer than any of the worker threads that use it
| (passing a consted ref_ptr). For the latter, I don't share any
| memory at all; the message itself contains the parameters,
| fully materialized.
|
| I can ensure all my queues are loop-free (directed acyclic
| graph of workers reading from their parent nodes and writing to
| their children nodes). IIRC one of the complaints about the
| petri net model is that it was unprovable that any problem
| could finish, even trivial ones.
|
| This has worked pretty well for me and I usually don't have to
| debug any truly nasty thread problems. My biggest problem tends
| to be thread pools where one of the workers gets wedged, and
| there's no way to cleanly kill the worker thread and steal the
| work onto another thread.
| JoeAltmaier wrote:
| Particularly network activity! At the lowest levels the
| creaky old TCP/IP stack not amenable to non-blocking.
| anonymousDan wrote:
| One trick I used to use when writing multithreaded java code
| for debugging purposes was to never use the built in
| synchronized/implicit object lock, since that made it quite
| difficult to understand from a stack trace what object was
| being locked. Instead, we would define an explicit inner lock
| and lock on that explicitly. The class name would then show up
| in stack traces making it much easier to track down.
| hinkley wrote:
| In cases where not all operations on the objects necessitate
| the same lock semantics, this is also handy because you can
| split or nest locks to reduce write-blocks-read and read-
| blocks-write situations.
|
| But if you're finding yourself using this more than a couple
| of times in a project, you're probably uncovering violations
| of the Single Responsibility Principle. Especially if you're
| splitting locks.
| itsmemattchung wrote:
| One of my _favorite_ threading papers published in 1982,
| written by Andrew Birrell:
|
| https://s3.amazonaws.com/content.udacity-data.com/courses/ud...
|
| Reading about that paper in graduate school cleared up a lot of
| misconceptions I had about threads
| morelisp wrote:
| I mean that's a nice theoretical argument, but the empirical
| results are not so clear.
|
| https://dl.acm.org/doi/10.1145/3297858.3304069
|
| _Surprisingly, our study shows that it is as easy to make
| concurrency bugs with message passing as with shared memory,
| sometimes even more. For example, around 58% of blocking bugs
| are caused by message passing._
|
| ...
|
| _Our study found that message passing does not necessarily
| make multithreaded programs less error-prone than shared
| memory. In fact, message passing is the main cause of blocking
| bugs. To make it worse, when combined with traditional
| synchronization primitives or with other new language features
| and libraries, message passing can cause blocking bugs that are
| very hard to detect._
|
| I'd also add that in the modern world we've "solved" many
| multi-threading problems by splitting the queues and data
| stores out into multiple services. Now we don't have any
| threading problems - only "distributed system" problems, and if
| there's a bug there that's the architect's fault, not the
| programmer's!
| JoeAltmaier wrote:
| Yeah so many services APIs are blocking, it's hard to keep
| threads non-blocking. Often there is a way, but you have to
| dig for it.
| syrrim wrote:
| Dining philosophers is solved by always acquiring locks in the
| same order.
| [deleted]
| chrisseaton wrote:
| ...if it's possible to do that for your application! It isn't
| always, if what lock you need next depends on the value of a
| previously locked value.
| JoeAltmaier wrote:
| Right, perhaps in a nested call or library, maybe even a
| semaphore you didn't know about!
|
| Multithreading with semaphores/locks is about as dirty as
| it can get.
| syrrim wrote:
| The top level comments strategy with queues seems similar,
| in that it isn't always possible. It requires the queues
| form a DAG, which is another word for partial ordering.
| Thus, requiring the locks form a partial ordering doesn't
| seem like a comparatively excessive restriction.
| chrisseaton wrote:
| > doesn't seem like a comparatively excessive restriction
|
| I think there are some applications and algorithms that
| we just don't know a way to create an ordering for
| realistically fine-grained locks.
|
| For example airline seat reservation. Someone clicking on
| seats to reserve them. You can't apply locks in a total
| order because... you can't read the person's mind on what
| seat they're going to click next.
|
| Or Lee's algorithm.
|
| (I wrote the first half a PhD on this.)
|
| https://chrisseaton.com/research-symposium-2012/seaton-
| irreg...
| jerf wrote:
| I think it's gotten easier over time, actually, and even when
| this was written it was theoretically out of date.
|
| There is a famous paper about how bad threads are from the 1990s.
| I believe it is valid on its own terms but misdiagnoses the
| underlying issue. The underlying issue is this: Taking multiple
| locks at once is not a sane operation at scale. Any system built
| on this will fail.
|
| There is no magic bullet that solves all concurrency problems,
| but a combination of techniques, used with discipline, take it
| from insanely impossible to merely challenging:
|
| 1. Assign data to a thread, with only that thread able to access
| it. Offer concurrency via message passing to that thread alone,
| and don't allow references to escape from the thread (pointers,
| whatever you want to call it). Language support for this is nice
| (Erlang and Pony do it one way, Haskell another, Rust yet a
| third) but not technically necessary.
|
| 2. Using locks is valid, and even quite helpful and useful, but
| never let yourself take more than one. If you need transactions,
| see #1. Don't write code that encourages people to take more than
| one. Don't forget about things that may take locks without you
| thinking about it (e.g., in Go, taking a mutex _and then_ trying
| to send on a channel is taking two locks); this is one thing that
| can be a bit of a challenge.
|
| 3. For more complicated things, use libraries as much as possible
| that have solved the super complicated cases already. For
| instance, databases that offer transactional security. You can do
| this even if you don't otherwise "need" such databases. Or
| libraries that offer a "parallel map" within your language, or
| other canned solutions to parallelism/concurrency.
|
| 4. Use whatever support your language provides for "linting" your
| conditions. The Go race condition checker is theoretically
| insufficient but practically useful (passing it doesn't mean that
| you have a safe program but it's a good start). Your language's
| mileage may vary but take advantage of what you can.
|
| Honestly, just these things taken together can take you quite
| far. Again, these aren't magical solutions. There is an
| irreducible sense in which threading is harder than not
| threading, and I don't expect that to ever change. There will
| always be programmers that are more expert in it than others. But
| it doesn't have to be an impossible challenge anymore. And to a
| large degree, the challenge becomes less "solving the multithread
| problem" and more just figuring out how to architect systems
| under the restrictions of the patterns that work and compose
| together.
|
| Although all that said, at a browser scale concurrency is always
| going to be a problem. They gotta go fast, but that fights the
| safe patterns pretty hard because they do involve a certain
| amount of runtime slowdown (e.g., passing a message is always
| going to be slower than just _not_ passing a message). Rust is
| the only practical solution at that scale. But just as You Are
| Not Google, You Are Not Writing A Browser. More normal programs
| can go a long way with just what I laid out above.
| dllthomas wrote:
| > Language support for this is nice (Erlang and Pony do it one
| way, Haskell another, Rust yet a third) but not technically
| necessary.
|
| At one point I did some name mangling magic with the C
| preprocessor for "this can only be accessed from the thread
| (statically) named foo" for a project where that was relevant.
| It was pleasant, although limited in how much it could
| generalize - all the threads very much needed to be statically
| known.
| dandelany wrote:
| This is an excellent comment that clearly comes from a lot of
| experience. Can you say any more about why having multiple
| locks is such a red flag? Is it because you start getting into
| deadlock/hungry philosophers territory? Are there rules of
| thumb for using multiple locks that make it more tractable or
| is it just always a bad idea?
| dgb23 wrote:
| Not OP, but that might be an implication. Another, more
| general reason is that this kind of concurrency requires
| design. It can't be thought of as a defensive programming
| technique. Locks are too powerful and they are by definition
| a point of contention.
|
| I think providing a lock is like giving up control to someone
| else. It's inherent coupling. To contrast, when I first
| learned about them I thought of them as a protection
| mechanism of an isolated part, but they are an execution
| contract that everyone participates in.
| jerf wrote:
| "It's inherent coupling."
|
| This matches well with one of the beliefs I've been
| polishing up over the past few years which I don't think is
| well understood, which is that structured programming tends
| to create more coupling than we realize in deep call
| stacks. When you end up with a 2000-line call stack in
| Java, that's 2000 function invocations coupled together by
| the way structured programming works; for instance, at a
| bare minimum, that's 1999 functions that can't progress
| until the innermost one does, and that's not the only form
| of coupling.
|
| I think the coupling induced by structured programming is
| fairly light per stack frame on average, but they tend to
| add up quickly because they're just so darned easy to add
| more of.
|
| Locks are another thing that makes this really come to
| light. The more stack frames between some function and
| something above it in the stack that has taken a lock, the
| more likely it is for execution to eventually wander back
| into something trying to take a lock inadvisably.
|
| I mean, I've had a number of deadlocks just within an
| object where it has some internal lock, and a method takes
| that lock, then tries to call a method that takes the same
| lock. Whoops. Easy to fix, of course, but that's the easy
| case. The cases get harder in real code.
| ntoskrnl wrote:
| The rule of thumb for multiple locks is always acquire them
| in the same order. Lock A, lock B, unlock B, unlock A.
| Otherwise you risk a deadlock. If another thread locks B,
| then tries to lock A, you're in trouble.
| mandevil wrote:
| And keeping that discipline over many developers, over many
| years, is in practice impossible, which is why multiple
| locks seem attractive but are actually a major problem for
| complex systems.
| layer8 wrote:
| You can, of course, encapsulate such a lock order with
| dedicated classes (or functions/closures) modeling linear
| types enforcing the sequence of lock A > lock B > unlock
| B > unlock A. The resulting objects can still be
| misapplied, but should fail immediately with a runtime
| error, making it much harder to inadvertently misuse.
| jerf wrote:
| But that still doesn't scale up to more locks of that
| size very well, because you start trying to take even
| more of them, and in dynamic orders, and etc etc.
|
| I know about the theoretical solution of taking locks
| always in a defined order, but I don't consider it a
| practical solution. If taking locks willy-nilly lets you
| scale to program size X, this might give you 2X or 3X,
| but that's not nearly large enough in practice.
|
| & dandelany, the community has chipped in and answered
| for me. I endorse this subthread, at least, as I write
| this comment. :)
| anarazel wrote:
| IME there's plenty problems where nested locks will lead
| to a simpler solution than trying to avoid them. It's
| important to avoid when reasonably doable, but it's also
| not uncommon to take it too far and end up with a slower
| and way more complicated solution.
| secondcoming wrote:
| What is 'browser scale'? I'd imagine most browsers spend most
| of their time idle.
| Slartie wrote:
| If you want to see some of the most complex multithreading
| systems, take a look inside modern browsers. That's what he
| means by "browser scale".
|
| I have some knowledge into working with Chromium's internals,
| and it is crazy what's going on inside there. Several
| different job and message queue systems interlocking with
| each other, tasks spread across multiple processes getting
| split and regrouped and synced and recombined, practically
| everything running async, just so a few pixels are rendered a
| little faster when your multicore machine updates a small
| section of a website.
|
| Browsers are indeed idling a lot, but if you want them to do
| something, they must be as crazy fast as possible. The
| competition in that space is ridiculous.
| jerf wrote:
| In the tens of millions to hundreds of millions of lines of
| code. Browsers are some of the largest things out there. See
| also desktop office suites, multi-decade CAD programs, a few
| other things.
|
| Few programs are at that scale. IIRC, even the Linux kernel
| isn't _really_ at that scale, once you clean away a few files
| in the drivers that are just literally millions of #defines
| or something.
|
| If you're writing at that scale, by all means copy the
| necessary practices, but don't just casually assume your need
| is super large and you've just gotta use the hard-core lock-
| based concurrency because it's the only option for you. You
| can get a _long_ way with much nicer architecture that is
| only slightly slower.
|
| For instance, I write a lot of Go. I've sometimes gotten a
| bit nervous about writing a goroutine (/thread) that owns
| some particular data structure because I'm like "wow, a lot
| of things are going to access this, is this going to become a
| bottleneck in the system?" But every time I've run the
| analysis carefully, the answer generally comes back not just
| "no" but "heck no, not even close". Your intuition can
| deceive you very easily. Locking a single structure behind
| the limit that "we can only dedicate at most 1 CPU full time
| to maintaining this structure" is, in practice, far less
| scary than it sounds. 1 CPU is a lot of power and you
| generally need a lot of other CPUs trying to access it
| simultaneously to overwhelm it.
|
| Ahdahl's Law kinda works in your favor on this matter (for
| once); for something like this to be the bottleneck, it needs
| to approach accounting for (1/(CPUs - 1)) of your execution
| time, and for that to be the case, those other processes
| hammering this 1-CPU-locked data structure need to be doing
| whatever else they are doing very quickly. As with anything
| in discussions like this, it absolutely can happen! But
| always check your intuition concretely before going crazy
| adopting something more complicated and "higher performance";
| odds are good this isn't a problem for a given "random" data
| structure.
| macintux wrote:
| > In this approach, threads own their data, and communicate with
| message-passing. This is easier said than done, because the
| language constructs, primitives, and design patterns for building
| system software this way are still in their infancy.
|
| "Infancy"? Erlang was invented in the 80s.
| tonnydourado wrote:
| OP does say system software, and although Erlang can have
| amazing uptime, I'm not so sure about it's raw performance
| story (on the other hand, I do remember seeing some insane
| benchmarks from a elixir framework, so, maybe I'm wrong).
|
| Also, other than Erlang and Elixir, which other reasonably
| mainstream language has this as first class features? Even Rust
| doesn't really put queues and message passing front and center,
| it just makes it easier to get away with doing _traditional_
| thread programming.
|
| Things can be old in years since discovery/invention, but still
| in it's infancy in usability and adoption.
| Lammy wrote:
| > Also, other than Erlang and Elixir, which other reasonably
| mainstream language has this as first class features?
|
| Ruby 3.0+ with Ractors: https://docs.ruby-
| lang.org/en/master/ractor_md.html#label-Co...
| kubanczyk wrote:
| Go has channels (`chan`) which it considers more native than
| e.g. Mutex. The latter is in a stdlib, the former is a
| feature of the language proper.
|
| Alas, the Go ecosystem could use more channels. I'd say that
| the adoption is still at the infancy stage. I wonder whether
| there are any practical shortcomings or is it just a
| readability tradeoff (the Mutex tends to be super-readable,
| where the channels not so).
| moonchild wrote:
| And the CSP paper dates to 1978.
| bjourne wrote:
| Truth be told, it is trivially easy to create deadlocks even in
| a pure message-passing environment such as Erlang. Message-
| passing is way oversold and doesn't solve as many problems as
| people think.
| TheMagicHorsey wrote:
| Erlang made it pretty easy.
|
| But then not everyone wants to learn Erlang. And if you start
| writing NIFs all crazy, you might get yourself into trouble
| again.
|
| But it seems tractable.
|
| Am I being naive?
| hereforphone wrote:
| No, you're absolutely right: Erlang is magnificent and
| superior.
| ghufran_syed wrote:
| I have very little experience of writing concurrent code, but
| this seems relevant:
| https://clojure.org/about/concurrent_programming
| xigency wrote:
| I don't really support this view.
|
| While doing anything creative with threading is in fact
| challenging, in Java it's easy to safely boost performance with
| threading using parallel streams and concurrent data structures.
|
| Even in other languages like C/C++, it's mostly worth it to
| design something up-front that is naturally parallelizable and
| find a safe and correct way to do it.
|
| And the incentive to parallelize is only increasing with the
| increasing thread counts of newer hardware. Sure, it's scary, but
| there's really no excuse to avoid running with threading,
| multiple processes, or multiple services.
| jayd16 wrote:
| In my experience I saw a bunch of novice devs using parallel
| streams for N<=100 lists and such, for much worse performance.
| Its certainly not foolproof either or it would just be the
| default.
| marginalia_nu wrote:
| To be fair, whether parallel streams is useful isn't a
| function of N alone, but the time each operation takes. If
| it's a nontrivial calculation, then by all means.
|
| Although parallel streams are seldom the best option in terms
| of performance even for large N. They may often be faster
| than sequential processing for large N, but a more thought
| out approach is almost always faster still, often
| significantly so.
___________________________________________________________________
(page generated 2022-06-10 23:00 UTC)