[HN Gopher] The free lunch is over: a fundamental turn toward co...
       ___________________________________________________________________
        
       The free lunch is over: a fundamental turn toward concurrency in
       software (2005)
        
       Author : pcr910303
       Score  : 105 points
       Date   : 2021-06-27 07:14 UTC (1 days ago)
        
 (HTM) web link (www.gotw.ca)
 (TXT) w3m dump (www.gotw.ca)
        
       | amelius wrote:
       | If only GPUs were more common and easier to work with.
        
         | Const-me wrote:
         | > If only GPUs were more common
         | 
         | They are very common.
         | 
         | In 2021 on Windows, is safe to assume the GPU does at least D3D
         | 11.0. The last GPU which does not is Intel Sandy Bridge,
         | discontinued in 2013. D3D 11.0 supports lots of features about
         | GPGPU. The main issue with D3D11 is FP64 support being
         | optional, the hardware may or may not support.
         | 
         | > and easier to work with
         | 
         | I don't find neither DirectCompute 11, nor CUDA, exceptionally
         | hard to work with. D3D12 and Vulkan are indeed hard, IMO.
        
           | amelius wrote:
           | > I don't find neither DirectCompute 11, nor CUDA,
           | exceptionally hard to work with.
           | 
           | It still sucks that, if you're not careful or unlucky, GPUs
           | may interfere with normal video operation during setup.
           | 
           | Also on one machine I'm getting "GPU lost" error messages
           | every once in a while during a long computation, and I have
           | no other option than to reboot my machine.
           | 
           | Further, the form factor sucks. Every card comes with 4 or
           | more DVI connectors which I don't need.
           | 
           | If parallel is the future, then give me something that fits
           | on the mainboard directly, and which is more or less a
           | commodity. Not the proprietary crap that vendors are
           | currently shoving at us.
        
             | Const-me wrote:
             | > on one machine I'm getting "GPU lost" error messages
             | every once in a while during a long computation, and I have
             | no other option than to reboot my machine.
             | 
             | If that's your computer, just change the registry setting
             | disabling the TDR. By default, Windows uses 2 seconds to
             | limit max.pipeline latency. When exceeded, the OS resets
             | the GPU, restarts the driver, and logs that "device lost"
             | in the event log.
             | 
             | If that's your customer's computer you have to be more
             | creative and fix your compute shaders. When you have a lot
             | of things to compute and/or you detect underpowered
             | hardware, split your Dispatch() calls into a series of
             | smaller ones. As a nice side effect, this minimizes effect
             | of your application on 3D rendering things running on the
             | same GPU by other programs.
             | 
             | Don't submit them all at once. Submit two initially then
             | one at a time using ID3D11Query to track completion of
             | these compute shaders. ID3D11Fence can do that better (can
             | sleep on WaitForSingleObject saving electricity) but fences
             | are less compatible unfortunately. That's an optional
             | feature introduced in Win10 post-release in some update,
             | and e.g. VMWare SVGA 3D doesn't support fences.
             | 
             | > the form factor sucks. Every card comes with 4 or more
             | DVI connectors which I don't need.
             | 
             | Some datacenter GPUs, and newer mining GPUs come without
             | connectors. The mainstream commodity cards to have
             | connectors because the primary market for them is supposed
             | to be gamers.
             | 
             | > something that fits on the mainboard directly
             | 
             | Unlikely to happen because thermals. Modern high-end GPUs
             | consume more electricity than comparable CPUs, e.g. RTX
             | 3090 needs 350W, similarly priced EPYC 7453 needs 225W.
        
               | amelius wrote:
               | Thanks for the advice (I'm on Linux), but my point is
               | that the GPU really is not the "first-class citizen" that
               | the CPU is.
        
               | Const-me wrote:
               | > my point is that the GPU really is not the "first-class
               | citizen" that the CPU is.
               | 
               | On Windows GPU is the first-class citizen since Vista. In
               | Vista, Microsoft started to use D3D10 for their desktop
               | compositor. In Windows 7 they have upgraded to Direct3D
               | 11.
               | 
               | The transition wasn't smooth. Only gamers had 3D GPUs
               | before Vista, many people needed new computers.
               | Technically, Microsoft had to change driver model to
               | support a few required features.
               | 
               | On the bright side, now that XP->Win7 transition is long
               | in the past, and 3D GPUs are used for everything on
               | Windows. All web browsers are using D3D to render stuff,
               | albeit not directly, through the higher-level libraries
               | like Direct2D and DirectWrite.
               | 
               | Linux doesn't even have these higher-level libraries.
               | They are possible to implement on top of whichever GPU
               | API is available https://github.com/Const-
               | me/Vrmac#vector-graphics-engine but so far nobody did it
               | well enough.
               | 
               | It's very similar situation with GPU compute on Linux.
               | The kernel and driver support has arrived by now, but the
               | higher-level user mode things are still missing.
               | 
               | P.S. If you're on Linux, pretty sure you can still
               | refactor your compute shaders the same way and they will
               | work fine afterwards. You obviously don't have
               | ID3D11Query/ID3D11Fence but most GPU APIs have
               | replacements: VkFence(3) in Vulkan,
               | EGL_KHR_fence_sync/GL_OES_EGL_sync/VG_KHR_EGL_sync
               | extensions for GL and GLES, etc.
        
             | theandrewbailey wrote:
             | > Further, the form factor sucks. Every card comes with 4
             | or more DVI connectors which I don't need.
             | 
             | DVI connectors are huge! I've never seen a card with more
             | than 2 of them.
             | 
             | You working with something more exotic than a PCI-e card?
        
       | snickerer wrote:
       | I agree with this article a lot.
       | 
       | The layers over layers of software bloat increased with the
       | hardware speed. The general trade-off is: We make the software as
       | slow as possible to have the shortest development time with the
       | least educated programmers. "Hey web dev intern, click me a new
       | UI in the next two hours! Hurry!"
       | 
       | This intern-clicks-a-new-UI works because we have a dozen layers
       | of incomprehension-supporting-technologies. You don't need to
       | know how most parts of the machine are working, Several
       | libraries, VMs, and frameworks will make you a bed of roses.
       | 
       | My point is that we a overdoing it with the convenience for the
       | developers. Today there is way too much complexity and bloat in
       | our systems. And there are not enough programmers trained to
       | handle memory management and similar low-level tasks. Or their
       | bosses wouldn't allow it because the deadline, you know.
       | 
       | I think the general trend is bad because there is no free lunch.
       | No silver bullet. Everything is a trade-off. For example C is
       | still important because C's trade-off between programming
       | convenience and runtime efficiency is very good. You pay a lot
       | and you get a lot.
       | 
       | This is also true for parallel programming. To write highly
       | efficient parallel code you need skill and education. No silver
       | bullet tooling will make this less "hard".
       | 
       | And here I see the irony. Faster CPUs were used to have lower
       | educated devs delivering quicker. More parallel CPUs need higher
       | skilled devs working slower to utilize the chip's full potential.
        
         | kaliszad wrote:
         | The whole software vs hardware and developer vs runtime
         | efficiency discussion is way more nuanced. No, the layers as
         | they are currently are at least partially hindering
         | productivity, because they all have bugs, rather painful
         | limitations and are constantly changing under your feet. There
         | is so much complexity, you don't have a chance to understand it
         | (and the implications of it) completely from the click in a
         | client's web browser all the way to the instructions executed
         | on the server. This is not convenient and nobody (not even
         | Google, Facebook and others) can handle the complexity, which
         | manifests itself in bugs, weird behaviour, surprising security
         | holes, other rare defects and more every day.
         | 
         | You don't want to manage memory manually unless you absolutely
         | have to. Almost certainly, it is going to be way more
         | demanding, you will make mistakes and the performance will not
         | really be much better if you do whole system benchmarks for
         | like 95% of applications. It is like using mining equipment and
         | explosives to hang a picture on your concrete wall, it is just
         | totally over the top. I am also not denying, there are use
         | cases for something, where most of the responsibility is on you
         | e.g. embedded or infrastructure software. Most people just
         | aren't aware that they need the skill and understanding to
         | reliably and most importantly responsibly wield it. In both
         | cases, there is considerable room for better, more robust
         | tools, and better methods/ practices and a great reduction in
         | developer hubris.
         | 
         | Some have used better hardware for saving on development effort
         | to some degree. There are applications that really need the
         | extra performance to help users do their work in a more
         | convenient way - not just developers. Some of the performance
         | is lost due to systemic inefficiencies such as the web browser
         | and most related technologies. E.g SVG rendering in web
         | browsers is a mine field of bugs and bad performance, so you do
         | the rendering yourself using canvas and JavaScript, which
         | visually is the same thing but there is a lot more work for the
         | developer and e.g. printing/ export into PDF (where you cannot
         | just recalculate) is where you still kind of want the vector
         | graphics capability in some form. Virtualization has enabled
         | slicing of the cores to serve more customers with low cost and
         | readily available compute. We also have way more users and
         | different security expectations now than we used to in the
         | past, so a single machine has to handle and check more stuff or
         | we can use a single machine for stuff that would need a data
         | centre or wouldn't be possible/ feasible before.
        
         | vmchale wrote:
         | > More parallel CPUs need higher skilled devs working slower to
         | utilize the chip's full potential.
         | 
         | Parallelism isn't so bad, you can use a functional style a lot
         | of the time! E.g. Futhark, the accelerate framework, APLs....
        
           | dragontamer wrote:
           | Based on what I can tell... the programmers who made huge
           | contributions to parallel compute are those programmers who
           | grossly understood the hardware.
           | 
           | It didn't matter if they were Java programmers (Azul), Lisp
           | Programmers (1980s CM2: Steel / Hille), GPU programmers, or C
           | programmers. If you understood the machine, you figured out a
           | fast and concurrent solution.
        
         | alfalfasprout wrote:
         | The point about skill is absolutely key.
         | 
         | Two types of engineer now exist: Those that are largely "code
         | monkeys" implementing features using high level tooling and
         | guard rails and those that are responsible for building that
         | tooling and low level components. FWIW I've met plenty of
         | engineers even from FAANG companies with fairly limited
         | exposure to writing performant software.
         | 
         | Over time, the former bucket has grown significantly as
         | barriers to entry have dropped but this has led to extremely
         | bloated and inefficient codebases. Deployment patterns like
         | micro services running on container orchestration platforms
         | mean that it's easier to scale bad code horizontally pretty
         | easily and these "high level frameworks" are generally "good
         | enough" in terms of latency per-request. So the only time
         | efficiency ever becomes a concern for most companies are when
         | cost becomes an issue.
         | 
         | It'll be interesting to see how this all unfolds. I wouldn't be
         | surprised if the huge incoming supply of unskilled engineers
         | doesn't cause compensation to drop significantly in general.
        
           | thewarrior wrote:
           | To save money on infrastructure you need to pay your
           | engineers more. Companies are choosing to hire cheaper
           | engineers as they are the ones that are harder to replace.
        
       | rustybolt wrote:
       | People having been saying this for decades and while it's true,
       | concurrency is still widely regarded as 'too hard'.
       | 
       | I'm not sure if this is justified (e.g. concurrency is inherently
       | too hard to be viable), or due to the lack of
       | tooling/conventions/education.
        
         | nicoburns wrote:
         | > I'm not sure if this is justified (e.g. concurrency is
         | inherently too hard to be viable), or due to the lack of
         | tooling/conventions/education.
         | 
         | I think it's the tooling. Rust's modelling of concurrency using
         | it's type system (the Send and Sync traits) make concurrency
         | pretty straightforward for most use cases. You still have to be
         | super-careful when creating the core abstractions using unsafe
         | code, but once you have them they can easily be shared as
         | libraries and it's a compile error to violate the invariants.
         | And this means that most projects will never have to write the
         | hard parts themselves and get concurrency for close to free.
        
         | pjmlp wrote:
         | I will go with lack of education as main issue.
        
           | eternalban wrote:
           | Memory models are subtle. Temporal reasoning in context of
           | h/w, e.g. multi-core coherence, and language runtime MM is
           | non-trivial. So there is a baseline level of difficulty baked
           | into the domain. Education certainly is necessary, but here
           | can only inform of what needs to be considered, known
           | pitfalls, patterns of concurrency, etc.
           | 
           | As to OP, well it better be viable, because we certainly need
           | to deal with it. So better tooling and conventions
           | encapsulated in expert developed libraries. The education
           | level required will naturally fall into the categories for
           | those who will develop the tools/libraries, and those that
           | will use them.
        
             | rbanffy wrote:
             | > Temporal reasoning in context of h/w, e.g. multi-core
             | coherence, and language runtime MM is non-trivial.
             | 
             | Don't OSs expose that in the sense you can pin a threads to
             | closest cores according to the memory access you need?
        
           | kwhitefoot wrote:
           | Lack of real need I think.
           | 
           | Most of the computers in the world are either dedicated
           | embedded controllers or end user devices. Concurrency in
           | embedded controllers is pretty much an ordinary thing and has
           | been since the days of the 6502/Z80/8080. For end user
           | devices the kind of concurrency that matters to the end user
           | is also not extraordinary, plenty of things happen in the
           | background when one is browsing, word processing, listening
           | to music, etc.
           | 
           | So that leaves concurrency inside applications and that just
           | isn't something that affects most of the end users. There
           | really isn't much for a word processor to actually do while
           | the user is thinking about which key to press so it can do
           | those few things that there was not time for during the
           | keypress.
           | 
           | Mostly what is needed is more efficient code. Niklaus Wirth
           | was complaining that code was getting slower more quickly
           | than hardware was getting faster forty years in 1995 and it
           | seems that he is still right.
           | 
           | See https://blog.frantovo.cz/s/1576/Niklaus%20Wirth%20-%20A%2
           | 0Pl...
        
             | dvfjsdhgfv wrote:
             | > So that leaves concurrency inside applications and that
             | just isn't something that affects most of the end users.
             | There really isn't much for a word processor to actually do
             | while the user is thinking about which key to press so it
             | can do those few things that there was not time for during
             | the keypress.
             | 
             | This obviously depends on the application. Whenever you
             | need to wait for an app to finish an operation that is not
             | related to I/O, there is some potential for improvement. If
             | a CPU-bound operation makes you wait for more than, say, a
             | minute, it's almost definitely a candidate for
             | optimization. Whether multithreading is a good solution or
             | not depends on each case - when you need to
             | communicate/lock a lot, it might not make sense. A good
             | part of the solution is figuring out if it makes sense, how
             | to partition the work etc.; the other part of this hard
             | work is implementing and debugging it.
        
             | pjmlp wrote:
             | Interesting that you mention Wirth, since all his
             | programming languages from Modula-2 onwards do expose
             | concurrency and paralelism.
        
         | [deleted]
        
         | spacechild1 wrote:
         | > concurrency is still widely regarded as 'too hard'.
         | 
         | The question is: by whom?
         | 
         | High performance software, like game engines, DAWs or video
         | editors, has been heavily multithreaded for a while now.
         | 
         | Maybe it's consumer or business software that could profit from
         | more multithreading? I don't know, because I don't work in
         | those areas.
        
           | criddell wrote:
           | There's been a lot of work done in C++ concurrency. Some
           | higher level libraries that leverage recent (primitive)
           | additions are emerging and they look pretty cool.
           | 
           | For example: https://github.com/David-Haim/concurrencpp
        
           | tuyiown wrote:
           | The hard part is to improve perceived performance or resource
           | overhead for general software, where each task outcome is
           | tightly coupled with app state, eg. a parallelize a slow task
           | on button click and have guarantee that final state is
           | coherent. Nothing out of reach, but doing that without
           | compromising too much on maintainability is still an opened
           | question.
        
           | bartread wrote:
           | > The question is: by whom?
           | 
           | I do feel similarly, even though I wouldn't classify myself
           | as a great engineer.
           | 
           | I've been writing concurrent software in managed languages,
           | such as Java and C#[0], from the very beginning of my career,
           | up until today. The level of multithreading has varied, but
           | it's always been there. For anything beyond basic CRUD it
           | pretty much becomes a requirement, both on desktop and on the
           | web[1].
           | 
           | That doesn't mean I've never had a tricky race condition to
           | debug (and, yes, they're hard to debug) during development,
           | but I've never shipped a concurrency related bug to
           | production[2].
           | 
           | The canonical examples of concurrency gone wrong are things
           | like giving somebody a deadly radiation dose from a medical
           | device but, in terms of serious software bugs, I do wonder
           | how common concurrency bugs are relative to other types of
           | bug, and whether they're really more serious in aggregate
           | than those other types of bug.
           | 
           |  _[0] Admittedly these languages make it a lot easier to
           | avoid shooting yourself in the foot than C and C++ do._
           | 
           |  _[1] Also worth bearing in mind that an inherent property of
           | most, if not all, distributed software is that it 's also
           | concurrent: the moment you have multiple processes running
           | indepedently or interdependently you also usually have a
           | concurrent system, with the potential for "distributed race
           | conditions" to occur. I.e., if you have a SPA that also has a
           | non-trivial back-end, you have a concurrent system - just
           | spread across different processes, generally on different
           | machines._
           | 
           |  _[2] In the context of in-process concurrency. Distributed
           | concurrency is a different matter._
        
         | dmytroi wrote:
         | My 5 cents would be wrong abstraction and wrong tooling, let me
         | elaborate a bit of them:
         | 
         | - Current abstractions are mostly based on POSIX threads and
         | C++/Java memory models. I think they are poorly representing
         | what is actually happening in hardware. For example Acquire
         | barrier in C++ makes it really hard to understand that in
         | hardware it equals to a flush of invalidation queue of the
         | core, to sanity check your understanding try answering the
         | question "do you need memory barriers if you have multithreaded
         | application (2+ threads) running a lock-free algorithm on a
         | single core?", correct answer is no, because same core always
         | sees it's own writes as they would happen in program order,
         | even if OOO pipeline would reorder them. Or threads, they seem
         | to be an entity that can either run or stop, but in hardware
         | there are no threads, CPU just jumps to a different point in
         | memory (albeit through rings 3->0->3). Heck even whole memory
         | allocation story, we have generations of developers thinking
         | about memory allocation and it's safety, yet hardware doesn't
         | have that concept at all, memory range mapping concept would be
         | the closest to what MMU actually does. Hence the impedance
         | mismatch between hardware and current low level abstractions
         | created a lot of developers who "know" how all of this works
         | but doesn't actually know, and a bit afraid to crush through
         | layers. I want more engineers to not be afraid and be
         | comfortable with all low level bits even if they would never
         | touch them, because one day you will find a bug like broken
         | compare-exchange implementation in LLVM or similar.
         | 
         | - Tooling is way off, the main problem with multithreading is
         | that it's all "in runtime" and dynamic, for example if I'm
         | making a lockfree hashmap, the only way for me to get into the
         | edgecases of my algorithm (like two threads trying to acquire
         | same token or something) is to run a bruteforce test between
         | multiple threads and wait until it actually happens.
         | Bruteforce-test-development scales very poorly, and testing
         | something like consensus algorithms for hundreds of threads is
         | just a nightmare of complexity of test fixtures involved. Then
         | you get into ok, so how much testing is enough? How do you
         | measure coverage? Lines of code? Branches? Threads-per-line?
         | When are you sure that your algorithm is correct? Don't get me
         | wrong, I've seen it multiple times, simple 100 lines of code
         | passing tons of reviews only for me to find a race condition
         | (algorithmical one) half a year later, and now it's deployed
         | everywhere and very costly to fix. Another way would be to skip
         | all of that and start modeling your algorithms first, TLA+ is
         | one of the better tools for that out there, prove that your
         | model is correct, and then implement it. Using something like
         | TLA+ can make your multithreading coding a breeze in any
         | language.
         | 
         | And probably absence of transactional memory also contributes
         | greatly, passing 8 or 16 bytes around atomically is easy, but
         | try 24 or 32? Now you need to build out an insanely complicated
         | algorithm that involves a lot of mathematics just to prove that
         | it's correct.
        
         | mkl95 wrote:
         | The rise of async programming in backend web dev is making some
         | people even more confused about models. For instance, many
         | senior engineers out there don't understand the difference
         | between sync multithreaded and async single threaded.
        
           | Xunjin wrote:
           | I kind doubt what I know, so looked for a nice SO link for
           | those who don't know the difference :)
           | 
           | https://stackoverflow.com/a/34681101
        
         | jerf wrote:
         | "concurrency is still widely regarded as 'too hard'."
         | 
         | Is it? There isn't going to be an official declaration from the
         | Masters of Computer Science that "2019 was the year concurrency
         | ceased being Too Hard." or anything.
         | 
         | My perception is that it is steadily becoming less and less
         | notable for a program to be "concurrent". Tooling is becoming
         | better. Common practices are becoming better. (In fact, you
         | could arguably take common practices back to the 1990s and even
         | with the tools of the day, tame concurrency. While the tooling
         | had its issues too, I would assert the problem was _more_ the
         | practices than the tooling.) Understanding of how to use it
         | reasonably safely is steadily spreading, and runtimes that make
         | it safer yet are starting to get attention.
         | 
         | I'm not sure I've seen a case where there was a problem that
         | _ought_ to be using concurrency, but nobody involved could
         | figure out any way to deal with it or was too afraid to open
         | that door in a long time. There 's still plenty of cases where
         | it doesn't matter even now, of course, because one core is a
         | lot of computing power on its own. But it seems to be that for
         | everyone out there who would benefit from concurrency, they're
         | _mostly_ capable of using it nowadays. Not necessarily without
         | issue, but that 's an unfair bar; you can still get yourself in
         | concurrency trouble in Haskell or Erlang, but it's a lot easier
         | than it used to be to get it right.
        
         | samuell wrote:
         | I see a lot of potential in pipeline concurrency, as seen in
         | dataflow (DF) and flow-based programming (FBP). That is,
         | modeling computation as pipelines where one component sends
         | data to the next component via message passing. As long as
         | there is enough data it will be possible for multiple
         | components in the chain to work concurrently.
         | 
         | The benefits are that no other synchronization is needed than
         | the data sent between processes, and race conditions are ruled
         | out as long as only one process is allowed to process a data
         | item at a time (this is the rule in FBP).
         | 
         | The main blockers I think is that it requires quite a rethink
         | of the architecture of software. I see this rethink happening
         | in larger, especially distributed systems, which are modeled a
         | lot around these principles already, using systems such as
         | Kafka and message queues to communicate, which more or less
         | forces people to model computations around the data flow.
         | 
         | I think the same could happen inside monolithic applications
         | too, with the right tooling. The concurrency primitives in Go
         | are superbly suited to this in my experience, given that you
         | work with the right paradigm, which I've been writing about
         | before [1, 2], and started making a micro-unframework for [3]
         | (though the latter one will be possible to make so much nicer
         | after we get generics in Go).
         | 
         | But then, I also think there are some lessons to be learned
         | about the right granularity for processes and data in the
         | pipeline. Due to the overhead of message passing, it will not
         | make sense performance-wise to use dataflow for the very
         | finest-grain data.
         | 
         | Perhaps this in a sense parallels what we see with distributed
         | computing, where there is a certain breaking point before which
         | it isn't really worth it to go with distributed computing,
         | because of all the overhead, both performance-wise and
         | complexity-wise.
         | 
         | [1] https://blog.gopheracademy.com/composable-pipelines-
         | pattern/
         | 
         | [2] https://blog.gopheracademy.com/advent-2015/composable-
         | pipeli...
         | 
         | [3] https://flowbase.org
        
       | jmchuster wrote:
       | It seems to me that where we really ended up was distributed
       | systems. We solve problems by not just making our code concurrent
       | to use more cores, but by also making it use more computers.
        
         | r00fus wrote:
         | There are security and latency issues that make distributed
         | over multiple cores != multiple processors != multiple
         | computers.
         | 
         | Each step adds more complexity.
        
       | rkangel wrote:
       | I find it interesting that the language that best deals with
       | parallelism (IMO) was invented significantly before we moved in
       | the multi-core direction.
       | 
       | The programming language landscape evolved and developed in the
       | face of multi-code - async being a classic example. But the
       | language that's often held up as the best solution to any given
       | parallelism problem is Erlang. Erlang was built as a good
       | programming model for concurrency on a single core, and then when
       | multi-core came along, SMP was 'just' a VM enhancement and no
       | programs needed changing at all to get the full advantage of all
       | the cores (barring some special cases).
        
         | chrisseaton wrote:
         | > no programs needed changing at all to get the full advantage
         | of all the cores (barring some special cases)
         | 
         | Some programs just don't have any available parallelism in them
         | - no matter how many processes you split them up into. You need
         | to build parallel algorithms and data structures. That's still
         | often a research-level topic.
         | 
         | Erlang's not going to magic parallelism out of thin air in a
         | program that doesn't have any.
        
           | lostcolony wrote:
           | Kind of strawmanning here, no? Obviously sequential tasks are
           | run sequentially. The OP isn't saying "magic" happens. Just,
           | Erlang was written from the beginning to encourage
           | parallelized code. Its whole programming model is around
           | that. Unlike pretty much every language it was contemporary
           | with, which assumed that the default was a single unit of
           | execution (and that code running in parallel was the
           | exception).
           | 
           | Think of task scheduling; the normal Algol/C family of
           | languages would say "create a priority queue for your tasks,
           | wait until the top of the queue is ready to run, pull it off,
           | run it, repeat". Very sequential thinking. Of course, you
           | then end up with issues around what happens if the processes
           | take non-negligible amounts of time, what happens if you need
           | to add new items into the queue, etc. At the lowest level
           | that might be solved with interrupts, at a higher level it
           | might be threads or separate processes, but you still have to
           | worry about concurrent data access (locks and the like around
           | the queue). But both the general approach, and the 'best
           | practice' guidelines, especially prior to multi-core, (due to
           | the difficulties of shared data), were/are to minimize
           | concurrency.
           | 
           | Erlang, you just spin up every task as its own process. The
           | process sleeps until it's time to do something, then it does
           | it. The language encourages you to solve the problem in a
           | parallel manner, and it did so from the beginning.
           | 
           | So, yes, you "need to build parallel algorithms and data
           | structures". The point is, Erlang took advantage of them, as
           | a language, even before the hardware could. The way you would
           | write Erlang for a single core was the way you would write
           | Erlang for many cores, which was unique for a language at the
           | time, and the OP is arguing that not only was Erlang unique
           | in doing that, but that there really isn't a better model for
           | handling parallelism from a language and correctness
           | standpoint (presumably; from a performance perspective m:n
           | threading, of which Erlang actors are a form, is generally
           | agreed upon to lose some efficiency).
        
             | chrisseaton wrote:
             | > Kind of strawmanning here, no?
             | 
             | I don't think so? I'm responding to:
             | 
             | > no programs needed changing at all to get the full
             | advantage of all the cores (barring some special cases)
             | 
             | Those 'special cases' are all the algorithms which aren't
             | inherently parallel, which is most of them! How many
             | algorithms inherently have a large amount of natural
             | parallelism? I'd say it's very few.
             | 
             | For example no matter how you write MD5 on Erlang... it's
             | not going to run in parallel is it?
        
               | dTal wrote:
               | Who says "most" algorithms aren't inherently parallel?
               | How do you even enumerate that set? What counts as an
               | "algorithm" anyway? Parallel processing is the norm in
               | biology. And even if we grant that most of the algorithms
               | humans have devised to run on our von Neumann machines
               | are sequential - doesn't that say more about us, than
               | about some intrinsic property of algorithms?
        
               | chrisseaton wrote:
               | I mean just take a look in any book of algorithms and
               | think for yourself how many have inherent parallelism.
        
               | lostcolony wrote:
               | As I said before - 'Obviously sequential tasks are run
               | sequentially.'
               | 
               | Innately sequential algorithms will still be sequential.
               | Erlang already gave developers the tools, and incentive,
               | to write things in a parallelized fashion. If a developer
               | had a need that was directly sequential, it will, of
               | course, still be sequential. If they had a need that
               | would benefit from parallelism, they had the tools, the
               | context, and the encouragement from the language itself
               | to write it in parallel.
               | 
               | Again, I think you're strawmanning here. You kind of
               | implied it yourself with your earlier statement,
               | "Erlang's not going to magic parallelism out of thin air
               | in a program that doesn't have any". Saying that implies
               | you think that's the OP's claim.
               | 
               | As I said before, the OP wasn't claiming magic happens,
               | and algorithms written serially suddenly become parallel.
               | You can, of course, still write sequential code in
               | Erlang, same as every language, because execution takes
               | place in time. Just that the natural, ergonomic way of
               | writing the language, of solving problems in the
               | language, has always been parallel.
               | 
               | And if you mean "well, I run MD5 checksums on individual
               | blocks of the file, and parallelize it that way" - yes,
               | that is going to be one of those special cases the OP
               | mentioned, if you didn't already optimize it that way due
               | to memory pressure concerns. But if it's "run an md5
               | checksum against every file in this directory and check
               | it against a manifest", the ergonomic approach in Erlang
               | pre-multicore is to scatter/gather it across actors
               | (since any one file might fail in strange ways and you
               | want to isolate that failure and handle it predictably
               | with a supervisor), and that would parallelize "for
               | free", with no concurrency concerns. Unlike every other
               | contemporary language.
        
               | chrisseaton wrote:
               | > they had the tools, the context, and the encouragement
               | from the language itself to write it in parallel
               | 
               | No I think Erlang gives you the tools, context, and
               | encouragement to write it concurrently. After you have
               | concurrency, you may or may not have parallelism.
               | 
               | And if you don't you will have to restructure your
               | program to get it, even if your program was already
               | highly concurrent. That's contrary to what the person I
               | was replying to said.
        
               | lostcolony wrote:
               | If you actually have concurrency, and you have the
               | hardware able to handle it, why do you not have
               | parallelism?
               | 
               | I.e., simply spinning up an actor obviously does not
               | imply concurrency; the classic Erlang 'ping pong' example
               | shows processes waiting to receive a message and
               | responding when they do. Not actually concurrent, despite
               | multiple actors and multiple processor cores it's a
               | sequential program, not a concurrent one.
               | 
               | Likewise, if you have 5 processes all adding a list of
               | numbers (independently and sequentially, one element at a
               | time), but only 4 cores, you would have only 4 processes
               | running in parallel at a time; a hardware limit. Up the
               | cores, you can have all 5.
               | 
               | Can you give me an example of a concurrent program that
               | is not also parallel, due to non-hardware reasons, in
               | Erlang? Because switching the goalposts around - as you
               | say, Erlang encourages you to write concurrent programs,
               | even before CPUs allowed true parallelism, and because of
               | that, once the CPU allowed true parallelism, it only
               | required a VM tweak and suddenly Erlang programs were
               | running in parallel. But you say no, that that wasn't the
               | case; just because you wrote things to be concurrent, and
               | now the hardware has changed to allow them to be in
               | parallel, they aren't necessarily parallel. I'd love an
               | example of that.
        
               | Jtsummers wrote:
               | Concurrency produces parallelism when the concurrent
               | portions can _actually_ run at the same time. That is,
               | between synchronization points which force sequential
               | behavior.
               | 
               | It's very feasible to write a concurrent program that
               | overuses (versus what is strictly needed) synchronization
               | points in order to produce something which offers a
               | better codebase (by some measure) but provides no
               | parallelism. You have to remove/minimize these
               | synchronization points in order to obtain parallelism.
               | It's also possible that by design these synchronization
               | points can't be eliminated.
               | 
               | Erlang's message passing is inherently asynchronous, but
               | you could conceive of similar synchronization points in a
               | program that will reduce the potential parallelism or
               | eliminate it depending on the overall structure. For
               | instance:                 PID ! {self(), Message}, ;;
               | forgot self() initially       receive         {PID,
               | Response} ->       end,       ...
               | 
               | You still have a concurrent design (which can be easier
               | to reason about), but because you're waiting on a
               | response you end up without any parallelism no matter how
               | many cores you throw at it.
        
               | lostcolony wrote:
               | Right, when I said "actually concurrent", I meant the
               | code -could- run simultaneously (or more formally, that
               | the items of execution aren't necessarily causally
               | related). Any sequential operation could be spread out
               | over an arbitrary number of Erlang processes, without it
               | actually being concurrent. That design isn't concurrent,
               | you just are involving a bunch of processes that wait.
               | 
               | That said, you are right that a synch point around a
               | controlled resource (i.e., "we only want to support
               | processing 5 requests at a time") would still have
               | concurrent tasks (i.e., ten requests come in; there is no
               | casual relation between when they run). Of
               | course...that's an intentional point of synchronization
               | necessary for correctness; it still strikes me as a
               | strawman to say that you don't get parallelization for
               | concurrent code, just because you forced a bottleneck
               | somewhere.
        
               | chrisseaton wrote:
               | > If you actually have concurrency, and you have the
               | hardware able to handle it, why do you not have
               | parallelism?
               | 
               | You can have concurrent tasks but with dependencies
               | between them which means there is no parallelism
               | available.
               | 
               | > Can you give me an example of a concurrent program that
               | is not also parallel, due to non-hardware reasons, in
               | Erlang?
               | 
               | For example....
               | 
               | > the classic Erlang 'ping pong' example shows processes
               | waiting to receive a message and responding when they do.
               | Not actually concurrent, despite multiple actors and
               | multiple processor cores it's a sequential program, not a
               | concurrent one.
               | 
               | No that _is_ actually concurrent. These are two
               | concurrent tasks. They just don 't have any parallelism
               | between them because there is a sequential dependency and
               | only one is able to make progress at any given time.
               | 
               | Concurrent and parallel are not the same thing, if you
               | didn't know. Concurrency is having work in progress on
               | multiple tasks at the same time. You don't need to
               | actually be advancing more than one task at the same time
               | for it to be concurrent.
        
               | lostcolony wrote:
               | "You can have concurrent tasks but with dependencies
               | between them which means there is no parallelism
               | available." - that isn't really concurrent then, is it?
               | Concurrency implies running two things concurrently;
               | you're just creating things to wait. No useful work is
               | being done. You broke a sequential task across processes.
               | Just because those processes are stuck on a receive loop
               | doesn't mean they're doing anything. Might as well say
               | your Java program is concurrent because you spun up a
               | second thread that just waits.
               | 
               | That was what I was getting at with my question;
               | arbitrary processes that do their work in a serial
               | fashion...isn't a concurrent program. It's a badly
               | written serial one.
        
               | chrisseaton wrote:
               | > that isn't really concurrent then
               | 
               | It is, using standard industry terminology.
               | 
               | Concurrency does not require that concurrent tasks are
               | able to make independent progress at the same time, just
               | that they are in some point of progress at the same time.
               | 
               | Back in 2012 when I was working on my PhD I wrote a paper
               | about an example algorithm with very little inherent
               | parallelism, where even if you have multiple concurrent
               | tasks all making forward progress you may find it
               | extremely hard to actually produce results in parallel,
               | if you want to do a deep dive into this problem.
               | 
               | https://chrisseaton.com/multiprog2012/seaton-
               | multiprog2012.p...
               | 
               | I wrote a poster about it as well for an easier
               | introduction.
               | 
               | https://chrisseaton.com/research-symposium-2012/seaton-
               | irreg...
        
               | lostcolony wrote:
               | Thank you; the example helped clarify for me.
               | 
               | Yes, any time there are shared resources, concurrent
               | tasks can hit synchronization points, and end up
               | bottlenecking on them. Of course, I'd contend that a
               | point of synchronization is a form of serialization, but
               | I agree that the tasks being synchronized would still be
               | described as concurrent (since they don't causally affect
               | their ordering). But such a synchronization is a
               | necessary part of the algorithm you're choosing to use
               | (even prior to SMP support), or it would be completely
               | unnatural to include it. I don't know that it really
               | invalidates the OP's point, that the language didn't
               | remove the bottlenecks you introduced?
        
               | Twisol wrote:
               | > It is, using standard industry terminology.
               | 
               | Just to add some support here, concurrency definitely has
               | very little to do with actual independence of progress.
               | It has far more to do with encapsulation of knowledge
               | (for an active agent, not a passive entity like an object
               | [^]), and how different agents coordinate to exchange
               | that knowledge.
               | 
               | An echo server + client is a concurrent system, despite
               | having only one meaningful scheduling of actions over
               | time (i.e. no interleaving or parallelism). Serialization
               | and parallelism are both global properties, as they
               | require a perspective "from above" to observe, while
               | concurrency is a property local to each task.
               | 
               | I can appreciate everyone is saying upthread, but I've
               | definitely found it valuable to think about concurrency
               | in these terms.
               | 
               | [^] Even this is overselling it. You can easily have two
               | logical tasks and one physical agent responsible for
               | executing both of them, and this is still a concurrent
               | system. Dataflow programming is a lot like this -- you're
               | not really concerned with the global parallelism of the
               | whole system, you're just describing a series of tasks
               | over local knowledge that propagate results downstream.
        
         | kitd wrote:
         | _But the language that 's often held up as the best solution to
         | any given parallelism problem is Erlang._
         | 
         | Occam implements a lot of similar ideas very neatly too. The
         | code looks a lot like programming with Go channels and CSP. I
         | am surprised it hasn't caught on better in the multi-core
         | world, it being older than even Erlang.
        
           | Jtsummers wrote:
           | I had to double check, the last Occam release was in 1994. I
           | think the language was way to niche, and global communication
           | way too poor still (compared to the '00s and '10s with
           | ubiquitous or near ubiquitous internet access) to catch on.
           | It was nearly a decade later that the first consumer
           | multicore x86 processors started coming out (2003 if I've
           | tracked down the right dates). And multicore didn't become
           | common for a few more years and then the baseline near the
           | end of the '00s, beginning of the '10s.
           | 
           | That time gap and relative unknown of Occam pretty much
           | doomed it even if it would have been wildly useful. By that
           | point you had lots of other languages to come out, and Google
           | pushing Go.
        
           | acchow wrote:
           | > I am surprised it hasn't caught on better in the multi-core
           | world, it being older than even Erlang.
           | 
           | Because multicore support in Ocaml has been "just a couple
           | years away" for many years now.
        
             | krylon wrote:
             | Occam, not Ocaml. They are two different languages.
             | 
             | Although I have no idea if Occam can utilize multiple
             | cores.
        
               | TickleSteve wrote:
               | Occam was designed for the Transputer, which was intended
               | to be used in arrays of processors, hence it being based
               | on CSP principles.
               | 
               | So, yes. it can.
        
               | krylon wrote:
               | TIL
               | 
               | Thank you!
        
               | acchow wrote:
               | Oops misread that.
        
         | [deleted]
        
         | ashleyn wrote:
         | The writing was on the wall as early as the beginning of the
         | Pentium 4 era.
         | 
         | There was this assumption that clock speed would just continue
         | to grow at the insane rates it did in the 80s and 90s, and then
         | reality caught up when the Netburst architecture didn't scale
         | well enough to avoid heat issues and crazy insane pipelines.
         | When Intel rolled out the Core architecture, they went back to
         | a P6-derived design, a microarchitecture that dated back to
         | 1995 - effectively an admission that Netburst had failed.
        
           | lostcolony wrote:
           | Erlang was begun in 1986. It left the lab and was used for
           | production products first in 1996. The Pentium 4 came out in
           | 2000.
           | 
           | So, an interesting aside, but not sure it's relevant to the
           | parent comment.
        
           | dleslie wrote:
           | And initial forays into many cores was decades earlier then
           | that. Aside from big iron, going back to the early 60s, there
           | was some business class devices like those made by Sequent:
           | 
           | https://en.m.wikipedia.org/wiki/Sequent_Computer_Systems
        
       | flakiness wrote:
       | 2005, when the most important platform was the desktop, when the
       | virtualization was yet to be mature, and when the dominant system
       | programming language is C++.
       | 
       | Today CPU cores are sliced by cloud vendors to sell out in
       | smaller portion, and the phones are hesitant to go many-core as
       | it will eat your battery in light-speed. Dark silicon is spent
       | for domain specific circuits like AI, media or networking instead
       | of generic cores.
       | 
       | Parallelism is still very hard problem in theory, but its
       | practical need isn't as prevalent as we thought on a decade-plus
       | ago, partly thanks for the cloud and mobile. For most of us, at
       | least the parallelism is kind of solved-by-someone-else problem.
       | It is left for the small number of experts.
       | 
       | Concurrency is still there, but the situation is much better than
       | before (async/await, immutable data types, actors...)
        
       | dragontamer wrote:
       | The free lunch isn't quite over, although significant
       | advancements were made in parallel computing... it turns out that
       | CPUs have been able to "auto-parallelize" your sequential code
       | all along. Just not nearly as efficiently as explicitly parallel
       | methodologies.
       | 
       | In 2005, your typical CPU was a 2.2GHz Athlon 64 3700+. In 2021,
       | your typical CPU is a Ryzen 5700x at 3.8 GHz.
       | 
       | Single-threaded performance is far better than 72% faster
       | however. The Ryzen 5700x has far more L3 cache, far more
       | instructions-per-clock, far more execution resources than the ol'
       | 2005 era Athlon.
       | 
       | In fact, server-class EPYC systems are commonly in the 2GHz
       | range, because low-frequency saves a lot on power and servers
       | want lower power usage. Today's EPYCs are still far faster per-
       | core than the Athlons of old.
       | 
       | -------------
       | 
       | This is because your "single thread" is executed more-and-more in
       | parallel today. Thanks to the magic of "dependency cutting"
       | compilers, the compiler + CPU auto-parallelizes your code and
       | runs them on the 8+ execution pipelines found on modern CPU
       | "cores".
       | 
       | Traditionally, the CPUs in the 90s had a singular pipeline. But
       | the 90s and 00s brought forth out of order execution, as well as
       | parallel execution pipelines (aka: superscalar execution). That
       | means 2 or more pipelines execute your "sequential" code, yes, in
       | parallel. Modern cores have more than 8 pipelines and are more
       | than capable of 4+ or 6+ operations per clock tick.
       | 
       | This is less efficient than explicit, programmer given
       | parallelism. But it is far easier to accomplish. The Apple M1
       | continues this tradition of wider execution. I'm not sure if
       | "sequential" is dead quite yet (even if there's a huge amount of
       | machine working to auto-translate sequential into parallel
       | technically... our code is largely written in a "sequential"
       | fashion)
       | 
       | -------------
       | 
       | But the big advancements after 2005 was the rise of GPGPU
       | compute. It was always known that SIMD (aka: GPUs) were the most
       | parallel systems, from the late 1980s and early 90s the SIMD
       | supercomputers always had the most FLOPs.
       | 
       | OpenCL and CUDA really took parallelism more / SIMD mainstream.
       | And indeed: these SIMD systems (be it AMD MI100 or NVidia A100)
       | are far more efficient and far higher compute capabilities than
       | anything else.
       | 
       | The only "competitor" on the supercomputer scale is the Fugaku
       | supercomputer, with SVE (512-bit SIMD) ARM using HBM RAM (same as
       | the high-end GPUs). SIMD seems like the obvious parallel compute
       | methodology if you really need tons and tons of compute power.
        
       | limoce wrote:
       | There seems to be no way to efficiently replay concurrent
       | programs in a deterministic fashion on multiple cores.
       | Nondeterminism makes parallelism and concurrency inherently hard
       | and unfriendly to new comers. It becomes even more difficult in
       | recent years due to architecture decisions: weak memory order
       | makes things worse.
       | 
       | Supposing you are going to write nontrivial concurrent programs
       | like toy Raft, I believe that looking through RPC logs will be
       | the most painful thing.
       | 
       | In contrast, on a single core, gdb is good enough. And there are
       | also advanced examples like VMware's fault-tolerant VM and
       | FundationDB's deterministic simulation. If we can debug
       | concurrent programs without dirty tricks, just like single-
       | threaded ones, I guess utilizing concurrency will be as handy as
       | calling a function.
        
         | aseure wrote:
         | > There seems to be no way to efficiently replay concurrent
         | programs in a deterministic fashion
         | 
         | I wish there was something like Jepsen [1] broadly available to
         | mainstream programming languages to do just that.
         | 
         | [1] https://jepsen.io/consistency
        
         | Const-me wrote:
         | > no way to efficiently replay concurrent programs in a
         | deterministic fashion on multiple cores
         | 
         | OpenMP with static scheduler and hardcoded count of threads?
         | 
         | OpenMP ain't a silver bullet, but for many practically useful
         | scenarios it's indeed as simple as calling a function.
        
           | hderms wrote:
           | what about if you throw in async IO into the mix? seems like
           | you'd have to have some kind of vmware-like virtualization
           | where you record all IO interrupts into a linearalizable log
        
             | Const-me wrote:
             | When you throw async I/O in the mix it's no longer just
             | your application you're debugging, it's the whole OS with
             | other processes, the OS kernel with these drivers and DMAs,
             | and external hardware that's not a code at all.
             | 
             | Can be tricky to debug or implement, but when I need that I
             | normally use async-await in C#. The debugger in the IDE is
             | multi-threaded, and does support these tasks.
        
       | kvhdude wrote:
       | I recall transactional memory being pitched to take a bite out of
       | lock overheads associated with multithreading. (as opposed to
       | lockless algorithms). Has it become mainstream?
        
         | kaliszad wrote:
         | If you use Clojure than yes, software transactional memory is
         | readily available. https://clojure.org/reference/refs
        
       | typon wrote:
       | I blame Python and Javascript. Two of the most popular languages
       | without proper concurrency support.
        
         | maattdd wrote:
         | You mean parallelism ? JS has good concurrency support (event,
         | async..)
        
           | criddell wrote:
           | Did it in 2005 though?
           | 
           | I think Python also has async / await support today.
        
             | RegW wrote:
             | Perhaps in 2005 we were still able to call across browser
             | frames and abusing the one thread per frame model.
             | 
             | async / await isn't really concurrency. It's a mechanism
             | for picking up the job after something else has been done
             | (perhaps like a callback). In a way it's been an advantage
             | of Javascript: one or two threads do all the work in a
             | timely way without all that messy thread scheduling and
             | context switches.
        
               | spenczar5 wrote:
               | That's concurrency without in-process parallelism.
        
           | gnarbarian wrote:
           | Parallelism in JavaScript is simple using web workers. of
           | course the tricky part as with all parallel applications is
           | managing the complexity you create yourself. The only
           | languages that seem to do a good job at handling this for you
           | are challenging in other ways like Haskell and Erlang.
        
             | kaliszad wrote:
             | I don't think web workers are a good design. They are maybe
             | simple, but lack in flexibility and capability. E.g. if I
             | am not mistaken, you cannot control animations from a web
             | worker. You also have to communicate with them using
             | strings, therefore there is considerable overhead.
             | 
             | JavaScript and the related ecosystem in the browser lack a
             | number of things, that are then patched over using
             | additional complexity such as web assembly, bug handling/
             | feature workarounds in applications, mobile apps that you
             | basically have to develop, because the web isn't useable/
             | performant enough for some stuff. Of course we have some
             | extra experience now and it is easy to criticize in
             | hindsight. E.g. the decision to make JavaScript less LISPy
             | was perhaps a good decision for it's large adoption but
             | longer term the homoiconicity could have been used for
             | easier generation instead of e.g. stuff like WebAssembly.
             | We also know, that stuff like Clojure(Script) is possible
             | now. We also know, that we really want a native 64bit
             | integer type instead of hodge-podge solutions with reduced
             | precision of 53 bits or big decimal with worse/ complicated
             | performance characteristics.
             | 
             | For the last 12 or so years, the web and related
             | technologies became an application platform and a video
             | delivery network with demands that rival native
             | applications. We even use JavaScript on the server side
             | through Node.JS. This has enabled tremendous things to
             | happen but it is also perhaps the least stable platform to
             | develop for. The web actively breaks some of the more
             | complex applications that it has enabled precisely because
             | the foundations are not solid enough. The current state of
             | affairs is we keep piling on new and shiny APIs and
             | technologies, that perhaps have considerable merit but we
             | don't really fix the broken things in a way normal devs
             | could keep up. I mean, how do you imagine to keep up, if
             | even GMail, Google Maps, YouTube, Facebook and all its web
             | apps, even major news sites and other applications backed
             | by big companies still have rather obvious bugs in these
             | products? I guess, "move fast and break things [for the
             | later generations to fix all this mess]" it is.
        
           | jerf wrote:
           | I find the distinction far less interesting than most people.
           | I thing it's easier to think of parallelism simply as an
           | interesting special case of concurrency, and to think of
           | runtimes and systems that are "concurrent" but can't run
           | literally simultaneously, like Javascript or Python, as
           | simply accidents of history not worth specially writing into
           | the definitions of our terms. Every year "concurrent" code
           | that can't be scheduled to run on multiple CPUs
           | simultaneously is less and less interesting. And I don't
           | anticipate any new language coming out in the future that
           | will try to be "concurrent but can only run on one CPU", so
           | this meaning is just going to fade into history as a
           | particular quirk of some legacy runtimes.
           | 
           | No, I do not consider any runtime that can't run on multiple
           | CPUs simultaneously to have "good" concurrency support. It
           | would, at best, be _bad_ concurrency support. Better than
           | "no" support, sure, but not _good_ in 2021. If a new language
           | came out with that support, nobody would call it  "good"
           | support, they'd call it failing to even raise the table
           | stakes a new language needs nowadays.
        
             | Spivak wrote:
             | What's your definition of running simultaneously because
             | CPython's approach to running on multiple cores is
             | multiprocessing which works honestly fine. The tooling to
             | do it is pretty slick where you can ignore a lot of the
             | typical pain of IPC.
             | 
             | Because if "good" concurrency support means single-process
             | multi-threaded on multiple cores running with enough locks
             | that you can have multiple threads executing code
             | simultaneously in a shared memory space then a lot of
             | languages are going to fall down or punt all responsibility
             | for doing that safely to the programmer which might as well
             | be no support.
        
               | jerf wrote:
               | Shared memory spaces. Your "slick tooling" is one of the
               | hacks I mentioned that history will forget, by the
               | standard of "if a new language emerged that tried to call
               | that 'concurrency' nobody would take it seriously".
               | 
               | I should say that "hack" here isn't necessarily a
               | perjorative. There are reasons for communities to create
               | and deploy those. There are plenty of cases where
               | existing code can be leveraged to work _better_ than it
               | could without it, and that 's the relevant standard for
               | whether something is useful, not whether or not in a
               | parallel universe the code could have been written in
               | some completely different way or whether in some
               | hypothetical sense if you could push a button and rewrite
               | something for free you'd end up with something better.
               | Hacks can be good, and every language will pick them up
               | at some point as history evolves around it and some of
               | the core assumptions a language/runtime made go out of
               | date. But it's still a hack.
               | 
               | While OS process boundaries do provide some nice
               | features, they are also in general overkill. See Erlang
               | and Pony for some alternate riffs on the idea, and if you
               | look at it hard enough and understand it deeply enough,
               | even what Rust does can provide some similar benefits
               | without raising a full OS process boundary between bits
               | of code.
        
               | catern wrote:
               | You're saying that _shared memory concurrency_ is the
               | future? I think you 've got it completely wrong. Shared
               | memory concurrency is the past. It was thought to be a
               | good idea in the 80s and 90s, but we now know that it's
               | both hard to program, and that it works poorly with
               | highly-parallel, high-performance hardware. In the
               | future, we'll see more and more programming systems which
               | don't provide shared memory concurrency at all.
        
               | bottled_poe wrote:
               | I believe there will always be situations (though
               | specific) where a shared-memory concurrency model is the
               | most efficient and performant use of available resources.
               | For this reason alone, shared-memory concurrency will
               | always have a place. That said, I generally agree that
               | the isolated parallel memory model is preferable for
               | simplicity's sake.
        
             | simiones wrote:
             | I have usually seen the distinction between concurrency and
             | parallelism being drawn for precisely the opposite reason:
             | parallelism without concurrency is a relatively commonly
             | used niche, it is much simpler than concurrency, and it has
             | special tools that make sense just for it.
             | 
             | For example, CUDA exposes a parallel programming model with
             | low support for actual concurrency. Similarly, OpenMP is
             | mostly useful for parallel computations that rarely need
             | coordination. C#'s Parallel package is another example.
             | 
             | By contrast, concurrency often rears its ugly head even in
             | single-threaded workflows, even ones that are part of
             | larger multi-threaded programs.
        
         | Spivak wrote:
         | What is Python missing in your eyes? Cooperative multitasking
         | is a relatively recent thing to be in vogue again, and python
         | has had good support for multithreading and multiprocessing
         | since forever.
        
           | typon wrote:
           | Without in-process multi-threading with OS threads, writing
           | performant parallel code is nearly impossible. Most of the
           | time you have to drop down to C/C++, disable the GIL and
           | implement parallelism there.
        
             | aisengard wrote:
             | Every time someone brings up the GIL and performance, the
             | answer is pretty much always "use a different language for
             | the places you need especially high performance". All this
             | sturm and drang about Python not being performant enough is
             | missing the point. It's not supposed to be the answer to
             | special cases where maximum performance is required! It's a
             | general-purpose language that is built around community and
             | easy readability and elimination of boilerplate, and be
             | good-enough for a vast majority of usecases. Just like any
             | language, there are usecases where it is not appropriate.
             | Clearly, it has succeeded given its significant reach and
             | longevity.
        
               | Spivak wrote:
               | Most high-level languages implement their performance
               | critical functions at a lower level. However, languages
               | that rely on a GIL (Python and Ruby) or isolated memory
               | spaces (V8) have an additional hurdle that if you want a
               | model of concurrency with many threads acting
               | simultaneously on a single shared memory space you have
               | to do additional work.
               | 
               | For Python you either have to have your library pause
               | execution of Python bytecode and do multithreaded things
               | on your memory space, allocate an isolated memory arena
               | for your multithreaded work, copy data in and out of it,
               | and then spawn non-Python threads (see PEP 554 for an IRL
               | example of this idea with the Python interpreter itself),
               | or copy data to another process which can do the
               | multithreaded things.
               | 
               | With PEP 554 (right now using the C API) _I_ personally
               | have no issue with multithreaded Python since the
               | overhead of copying memory between interpreters is fast
               | enough for my work but it is overhead.
        
               | typon wrote:
               | There is nothing about the Python language spec (as
               | gleaned from looking at how CPython works) that forces it
               | to be slow or not handle parallelism. In fact the
               | addition of the massive amounts of library code and
               | language changes to support async show that Python isn't
               | even immune to preventing added complexity.
        
           | N1H1L wrote:
           | Also dask works pretty good all the way upto 200-250 CPU
           | cores. It's pretty straightforward to build your code to use
           | dask rather than numpy or pandas
        
       | rbanffy wrote:
       | One thing I'd love to do is a Smalltalk implementation where
       | every message is processed in a separate thread. Could be a nice
       | educational tool, as well as a great excuse to push workstations
       | with hundreds of cores.
        
       | lincpa wrote:
       | It's ultimate solution:
       | 
       | The Grand Unified Programming Theory: The Pure Function Pipeline
       | Data Flow with Principle-based Warehouse/Workshop Model
       | 
       | Apple M1 chip is the best case.
       | 
       | https://github.com/linpengcheng/PurefunctionPipelineDataflow
        
       | dang wrote:
       | Past related threads:
       | 
       |  _The Free Lunch Is Over - A Fundamental Turn Toward Concurrency
       | in Software_ - https://news.ycombinator.com/item?id=15415039 -
       | Oct 2017 (1 comment)
       | 
       |  _The Free Lunch Is Over: A Fundamental Turn Toward Concurrency
       | in Software (2005)_ -
       | https://news.ycombinator.com/item?id=10096100 - Aug 2015 (2
       | comments)
       | 
       |  _The Moore 's Law free lunch is over. Now welcome to the
       | hardware jungle._ - https://news.ycombinator.com/item?id=3502223
       | - Jan 2012 (75 comments)
       | 
       |  _The Free Lunch Is Over_ -
       | https://news.ycombinator.com/item?id=1441820 - June 2010 (24
       | comments)
       | 
       | Others?
        
       ___________________________________________________________________
       (page generated 2021-06-28 23:01 UTC)