[HN Gopher] The free lunch is over: a fundamental turn toward co...
___________________________________________________________________
The free lunch is over: a fundamental turn toward concurrency in
software (2005)
Author : pcr910303
Score : 105 points
Date : 2021-06-27 07:14 UTC (1 days ago)
(HTM) web link (www.gotw.ca)
(TXT) w3m dump (www.gotw.ca)
| amelius wrote:
| If only GPUs were more common and easier to work with.
| Const-me wrote:
| > If only GPUs were more common
|
| They are very common.
|
| In 2021 on Windows, is safe to assume the GPU does at least D3D
| 11.0. The last GPU which does not is Intel Sandy Bridge,
| discontinued in 2013. D3D 11.0 supports lots of features about
| GPGPU. The main issue with D3D11 is FP64 support being
| optional, the hardware may or may not support.
|
| > and easier to work with
|
| I don't find neither DirectCompute 11, nor CUDA, exceptionally
| hard to work with. D3D12 and Vulkan are indeed hard, IMO.
| amelius wrote:
| > I don't find neither DirectCompute 11, nor CUDA,
| exceptionally hard to work with.
|
| It still sucks that, if you're not careful or unlucky, GPUs
| may interfere with normal video operation during setup.
|
| Also on one machine I'm getting "GPU lost" error messages
| every once in a while during a long computation, and I have
| no other option than to reboot my machine.
|
| Further, the form factor sucks. Every card comes with 4 or
| more DVI connectors which I don't need.
|
| If parallel is the future, then give me something that fits
| on the mainboard directly, and which is more or less a
| commodity. Not the proprietary crap that vendors are
| currently shoving at us.
| Const-me wrote:
| > on one machine I'm getting "GPU lost" error messages
| every once in a while during a long computation, and I have
| no other option than to reboot my machine.
|
| If that's your computer, just change the registry setting
| disabling the TDR. By default, Windows uses 2 seconds to
| limit max.pipeline latency. When exceeded, the OS resets
| the GPU, restarts the driver, and logs that "device lost"
| in the event log.
|
| If that's your customer's computer you have to be more
| creative and fix your compute shaders. When you have a lot
| of things to compute and/or you detect underpowered
| hardware, split your Dispatch() calls into a series of
| smaller ones. As a nice side effect, this minimizes effect
| of your application on 3D rendering things running on the
| same GPU by other programs.
|
| Don't submit them all at once. Submit two initially then
| one at a time using ID3D11Query to track completion of
| these compute shaders. ID3D11Fence can do that better (can
| sleep on WaitForSingleObject saving electricity) but fences
| are less compatible unfortunately. That's an optional
| feature introduced in Win10 post-release in some update,
| and e.g. VMWare SVGA 3D doesn't support fences.
|
| > the form factor sucks. Every card comes with 4 or more
| DVI connectors which I don't need.
|
| Some datacenter GPUs, and newer mining GPUs come without
| connectors. The mainstream commodity cards to have
| connectors because the primary market for them is supposed
| to be gamers.
|
| > something that fits on the mainboard directly
|
| Unlikely to happen because thermals. Modern high-end GPUs
| consume more electricity than comparable CPUs, e.g. RTX
| 3090 needs 350W, similarly priced EPYC 7453 needs 225W.
| amelius wrote:
| Thanks for the advice (I'm on Linux), but my point is
| that the GPU really is not the "first-class citizen" that
| the CPU is.
| Const-me wrote:
| > my point is that the GPU really is not the "first-class
| citizen" that the CPU is.
|
| On Windows GPU is the first-class citizen since Vista. In
| Vista, Microsoft started to use D3D10 for their desktop
| compositor. In Windows 7 they have upgraded to Direct3D
| 11.
|
| The transition wasn't smooth. Only gamers had 3D GPUs
| before Vista, many people needed new computers.
| Technically, Microsoft had to change driver model to
| support a few required features.
|
| On the bright side, now that XP->Win7 transition is long
| in the past, and 3D GPUs are used for everything on
| Windows. All web browsers are using D3D to render stuff,
| albeit not directly, through the higher-level libraries
| like Direct2D and DirectWrite.
|
| Linux doesn't even have these higher-level libraries.
| They are possible to implement on top of whichever GPU
| API is available https://github.com/Const-
| me/Vrmac#vector-graphics-engine but so far nobody did it
| well enough.
|
| It's very similar situation with GPU compute on Linux.
| The kernel and driver support has arrived by now, but the
| higher-level user mode things are still missing.
|
| P.S. If you're on Linux, pretty sure you can still
| refactor your compute shaders the same way and they will
| work fine afterwards. You obviously don't have
| ID3D11Query/ID3D11Fence but most GPU APIs have
| replacements: VkFence(3) in Vulkan,
| EGL_KHR_fence_sync/GL_OES_EGL_sync/VG_KHR_EGL_sync
| extensions for GL and GLES, etc.
| theandrewbailey wrote:
| > Further, the form factor sucks. Every card comes with 4
| or more DVI connectors which I don't need.
|
| DVI connectors are huge! I've never seen a card with more
| than 2 of them.
|
| You working with something more exotic than a PCI-e card?
| snickerer wrote:
| I agree with this article a lot.
|
| The layers over layers of software bloat increased with the
| hardware speed. The general trade-off is: We make the software as
| slow as possible to have the shortest development time with the
| least educated programmers. "Hey web dev intern, click me a new
| UI in the next two hours! Hurry!"
|
| This intern-clicks-a-new-UI works because we have a dozen layers
| of incomprehension-supporting-technologies. You don't need to
| know how most parts of the machine are working, Several
| libraries, VMs, and frameworks will make you a bed of roses.
|
| My point is that we a overdoing it with the convenience for the
| developers. Today there is way too much complexity and bloat in
| our systems. And there are not enough programmers trained to
| handle memory management and similar low-level tasks. Or their
| bosses wouldn't allow it because the deadline, you know.
|
| I think the general trend is bad because there is no free lunch.
| No silver bullet. Everything is a trade-off. For example C is
| still important because C's trade-off between programming
| convenience and runtime efficiency is very good. You pay a lot
| and you get a lot.
|
| This is also true for parallel programming. To write highly
| efficient parallel code you need skill and education. No silver
| bullet tooling will make this less "hard".
|
| And here I see the irony. Faster CPUs were used to have lower
| educated devs delivering quicker. More parallel CPUs need higher
| skilled devs working slower to utilize the chip's full potential.
| kaliszad wrote:
| The whole software vs hardware and developer vs runtime
| efficiency discussion is way more nuanced. No, the layers as
| they are currently are at least partially hindering
| productivity, because they all have bugs, rather painful
| limitations and are constantly changing under your feet. There
| is so much complexity, you don't have a chance to understand it
| (and the implications of it) completely from the click in a
| client's web browser all the way to the instructions executed
| on the server. This is not convenient and nobody (not even
| Google, Facebook and others) can handle the complexity, which
| manifests itself in bugs, weird behaviour, surprising security
| holes, other rare defects and more every day.
|
| You don't want to manage memory manually unless you absolutely
| have to. Almost certainly, it is going to be way more
| demanding, you will make mistakes and the performance will not
| really be much better if you do whole system benchmarks for
| like 95% of applications. It is like using mining equipment and
| explosives to hang a picture on your concrete wall, it is just
| totally over the top. I am also not denying, there are use
| cases for something, where most of the responsibility is on you
| e.g. embedded or infrastructure software. Most people just
| aren't aware that they need the skill and understanding to
| reliably and most importantly responsibly wield it. In both
| cases, there is considerable room for better, more robust
| tools, and better methods/ practices and a great reduction in
| developer hubris.
|
| Some have used better hardware for saving on development effort
| to some degree. There are applications that really need the
| extra performance to help users do their work in a more
| convenient way - not just developers. Some of the performance
| is lost due to systemic inefficiencies such as the web browser
| and most related technologies. E.g SVG rendering in web
| browsers is a mine field of bugs and bad performance, so you do
| the rendering yourself using canvas and JavaScript, which
| visually is the same thing but there is a lot more work for the
| developer and e.g. printing/ export into PDF (where you cannot
| just recalculate) is where you still kind of want the vector
| graphics capability in some form. Virtualization has enabled
| slicing of the cores to serve more customers with low cost and
| readily available compute. We also have way more users and
| different security expectations now than we used to in the
| past, so a single machine has to handle and check more stuff or
| we can use a single machine for stuff that would need a data
| centre or wouldn't be possible/ feasible before.
| vmchale wrote:
| > More parallel CPUs need higher skilled devs working slower to
| utilize the chip's full potential.
|
| Parallelism isn't so bad, you can use a functional style a lot
| of the time! E.g. Futhark, the accelerate framework, APLs....
| dragontamer wrote:
| Based on what I can tell... the programmers who made huge
| contributions to parallel compute are those programmers who
| grossly understood the hardware.
|
| It didn't matter if they were Java programmers (Azul), Lisp
| Programmers (1980s CM2: Steel / Hille), GPU programmers, or C
| programmers. If you understood the machine, you figured out a
| fast and concurrent solution.
| alfalfasprout wrote:
| The point about skill is absolutely key.
|
| Two types of engineer now exist: Those that are largely "code
| monkeys" implementing features using high level tooling and
| guard rails and those that are responsible for building that
| tooling and low level components. FWIW I've met plenty of
| engineers even from FAANG companies with fairly limited
| exposure to writing performant software.
|
| Over time, the former bucket has grown significantly as
| barriers to entry have dropped but this has led to extremely
| bloated and inefficient codebases. Deployment patterns like
| micro services running on container orchestration platforms
| mean that it's easier to scale bad code horizontally pretty
| easily and these "high level frameworks" are generally "good
| enough" in terms of latency per-request. So the only time
| efficiency ever becomes a concern for most companies are when
| cost becomes an issue.
|
| It'll be interesting to see how this all unfolds. I wouldn't be
| surprised if the huge incoming supply of unskilled engineers
| doesn't cause compensation to drop significantly in general.
| thewarrior wrote:
| To save money on infrastructure you need to pay your
| engineers more. Companies are choosing to hire cheaper
| engineers as they are the ones that are harder to replace.
| rustybolt wrote:
| People having been saying this for decades and while it's true,
| concurrency is still widely regarded as 'too hard'.
|
| I'm not sure if this is justified (e.g. concurrency is inherently
| too hard to be viable), or due to the lack of
| tooling/conventions/education.
| nicoburns wrote:
| > I'm not sure if this is justified (e.g. concurrency is
| inherently too hard to be viable), or due to the lack of
| tooling/conventions/education.
|
| I think it's the tooling. Rust's modelling of concurrency using
| it's type system (the Send and Sync traits) make concurrency
| pretty straightforward for most use cases. You still have to be
| super-careful when creating the core abstractions using unsafe
| code, but once you have them they can easily be shared as
| libraries and it's a compile error to violate the invariants.
| And this means that most projects will never have to write the
| hard parts themselves and get concurrency for close to free.
| pjmlp wrote:
| I will go with lack of education as main issue.
| eternalban wrote:
| Memory models are subtle. Temporal reasoning in context of
| h/w, e.g. multi-core coherence, and language runtime MM is
| non-trivial. So there is a baseline level of difficulty baked
| into the domain. Education certainly is necessary, but here
| can only inform of what needs to be considered, known
| pitfalls, patterns of concurrency, etc.
|
| As to OP, well it better be viable, because we certainly need
| to deal with it. So better tooling and conventions
| encapsulated in expert developed libraries. The education
| level required will naturally fall into the categories for
| those who will develop the tools/libraries, and those that
| will use them.
| rbanffy wrote:
| > Temporal reasoning in context of h/w, e.g. multi-core
| coherence, and language runtime MM is non-trivial.
|
| Don't OSs expose that in the sense you can pin a threads to
| closest cores according to the memory access you need?
| kwhitefoot wrote:
| Lack of real need I think.
|
| Most of the computers in the world are either dedicated
| embedded controllers or end user devices. Concurrency in
| embedded controllers is pretty much an ordinary thing and has
| been since the days of the 6502/Z80/8080. For end user
| devices the kind of concurrency that matters to the end user
| is also not extraordinary, plenty of things happen in the
| background when one is browsing, word processing, listening
| to music, etc.
|
| So that leaves concurrency inside applications and that just
| isn't something that affects most of the end users. There
| really isn't much for a word processor to actually do while
| the user is thinking about which key to press so it can do
| those few things that there was not time for during the
| keypress.
|
| Mostly what is needed is more efficient code. Niklaus Wirth
| was complaining that code was getting slower more quickly
| than hardware was getting faster forty years in 1995 and it
| seems that he is still right.
|
| See https://blog.frantovo.cz/s/1576/Niklaus%20Wirth%20-%20A%2
| 0Pl...
| dvfjsdhgfv wrote:
| > So that leaves concurrency inside applications and that
| just isn't something that affects most of the end users.
| There really isn't much for a word processor to actually do
| while the user is thinking about which key to press so it
| can do those few things that there was not time for during
| the keypress.
|
| This obviously depends on the application. Whenever you
| need to wait for an app to finish an operation that is not
| related to I/O, there is some potential for improvement. If
| a CPU-bound operation makes you wait for more than, say, a
| minute, it's almost definitely a candidate for
| optimization. Whether multithreading is a good solution or
| not depends on each case - when you need to
| communicate/lock a lot, it might not make sense. A good
| part of the solution is figuring out if it makes sense, how
| to partition the work etc.; the other part of this hard
| work is implementing and debugging it.
| pjmlp wrote:
| Interesting that you mention Wirth, since all his
| programming languages from Modula-2 onwards do expose
| concurrency and paralelism.
| [deleted]
| spacechild1 wrote:
| > concurrency is still widely regarded as 'too hard'.
|
| The question is: by whom?
|
| High performance software, like game engines, DAWs or video
| editors, has been heavily multithreaded for a while now.
|
| Maybe it's consumer or business software that could profit from
| more multithreading? I don't know, because I don't work in
| those areas.
| criddell wrote:
| There's been a lot of work done in C++ concurrency. Some
| higher level libraries that leverage recent (primitive)
| additions are emerging and they look pretty cool.
|
| For example: https://github.com/David-Haim/concurrencpp
| tuyiown wrote:
| The hard part is to improve perceived performance or resource
| overhead for general software, where each task outcome is
| tightly coupled with app state, eg. a parallelize a slow task
| on button click and have guarantee that final state is
| coherent. Nothing out of reach, but doing that without
| compromising too much on maintainability is still an opened
| question.
| bartread wrote:
| > The question is: by whom?
|
| I do feel similarly, even though I wouldn't classify myself
| as a great engineer.
|
| I've been writing concurrent software in managed languages,
| such as Java and C#[0], from the very beginning of my career,
| up until today. The level of multithreading has varied, but
| it's always been there. For anything beyond basic CRUD it
| pretty much becomes a requirement, both on desktop and on the
| web[1].
|
| That doesn't mean I've never had a tricky race condition to
| debug (and, yes, they're hard to debug) during development,
| but I've never shipped a concurrency related bug to
| production[2].
|
| The canonical examples of concurrency gone wrong are things
| like giving somebody a deadly radiation dose from a medical
| device but, in terms of serious software bugs, I do wonder
| how common concurrency bugs are relative to other types of
| bug, and whether they're really more serious in aggregate
| than those other types of bug.
|
| _[0] Admittedly these languages make it a lot easier to
| avoid shooting yourself in the foot than C and C++ do._
|
| _[1] Also worth bearing in mind that an inherent property of
| most, if not all, distributed software is that it 's also
| concurrent: the moment you have multiple processes running
| indepedently or interdependently you also usually have a
| concurrent system, with the potential for "distributed race
| conditions" to occur. I.e., if you have a SPA that also has a
| non-trivial back-end, you have a concurrent system - just
| spread across different processes, generally on different
| machines._
|
| _[2] In the context of in-process concurrency. Distributed
| concurrency is a different matter._
| dmytroi wrote:
| My 5 cents would be wrong abstraction and wrong tooling, let me
| elaborate a bit of them:
|
| - Current abstractions are mostly based on POSIX threads and
| C++/Java memory models. I think they are poorly representing
| what is actually happening in hardware. For example Acquire
| barrier in C++ makes it really hard to understand that in
| hardware it equals to a flush of invalidation queue of the
| core, to sanity check your understanding try answering the
| question "do you need memory barriers if you have multithreaded
| application (2+ threads) running a lock-free algorithm on a
| single core?", correct answer is no, because same core always
| sees it's own writes as they would happen in program order,
| even if OOO pipeline would reorder them. Or threads, they seem
| to be an entity that can either run or stop, but in hardware
| there are no threads, CPU just jumps to a different point in
| memory (albeit through rings 3->0->3). Heck even whole memory
| allocation story, we have generations of developers thinking
| about memory allocation and it's safety, yet hardware doesn't
| have that concept at all, memory range mapping concept would be
| the closest to what MMU actually does. Hence the impedance
| mismatch between hardware and current low level abstractions
| created a lot of developers who "know" how all of this works
| but doesn't actually know, and a bit afraid to crush through
| layers. I want more engineers to not be afraid and be
| comfortable with all low level bits even if they would never
| touch them, because one day you will find a bug like broken
| compare-exchange implementation in LLVM or similar.
|
| - Tooling is way off, the main problem with multithreading is
| that it's all "in runtime" and dynamic, for example if I'm
| making a lockfree hashmap, the only way for me to get into the
| edgecases of my algorithm (like two threads trying to acquire
| same token or something) is to run a bruteforce test between
| multiple threads and wait until it actually happens.
| Bruteforce-test-development scales very poorly, and testing
| something like consensus algorithms for hundreds of threads is
| just a nightmare of complexity of test fixtures involved. Then
| you get into ok, so how much testing is enough? How do you
| measure coverage? Lines of code? Branches? Threads-per-line?
| When are you sure that your algorithm is correct? Don't get me
| wrong, I've seen it multiple times, simple 100 lines of code
| passing tons of reviews only for me to find a race condition
| (algorithmical one) half a year later, and now it's deployed
| everywhere and very costly to fix. Another way would be to skip
| all of that and start modeling your algorithms first, TLA+ is
| one of the better tools for that out there, prove that your
| model is correct, and then implement it. Using something like
| TLA+ can make your multithreading coding a breeze in any
| language.
|
| And probably absence of transactional memory also contributes
| greatly, passing 8 or 16 bytes around atomically is easy, but
| try 24 or 32? Now you need to build out an insanely complicated
| algorithm that involves a lot of mathematics just to prove that
| it's correct.
| mkl95 wrote:
| The rise of async programming in backend web dev is making some
| people even more confused about models. For instance, many
| senior engineers out there don't understand the difference
| between sync multithreaded and async single threaded.
| Xunjin wrote:
| I kind doubt what I know, so looked for a nice SO link for
| those who don't know the difference :)
|
| https://stackoverflow.com/a/34681101
| jerf wrote:
| "concurrency is still widely regarded as 'too hard'."
|
| Is it? There isn't going to be an official declaration from the
| Masters of Computer Science that "2019 was the year concurrency
| ceased being Too Hard." or anything.
|
| My perception is that it is steadily becoming less and less
| notable for a program to be "concurrent". Tooling is becoming
| better. Common practices are becoming better. (In fact, you
| could arguably take common practices back to the 1990s and even
| with the tools of the day, tame concurrency. While the tooling
| had its issues too, I would assert the problem was _more_ the
| practices than the tooling.) Understanding of how to use it
| reasonably safely is steadily spreading, and runtimes that make
| it safer yet are starting to get attention.
|
| I'm not sure I've seen a case where there was a problem that
| _ought_ to be using concurrency, but nobody involved could
| figure out any way to deal with it or was too afraid to open
| that door in a long time. There 's still plenty of cases where
| it doesn't matter even now, of course, because one core is a
| lot of computing power on its own. But it seems to be that for
| everyone out there who would benefit from concurrency, they're
| _mostly_ capable of using it nowadays. Not necessarily without
| issue, but that 's an unfair bar; you can still get yourself in
| concurrency trouble in Haskell or Erlang, but it's a lot easier
| than it used to be to get it right.
| samuell wrote:
| I see a lot of potential in pipeline concurrency, as seen in
| dataflow (DF) and flow-based programming (FBP). That is,
| modeling computation as pipelines where one component sends
| data to the next component via message passing. As long as
| there is enough data it will be possible for multiple
| components in the chain to work concurrently.
|
| The benefits are that no other synchronization is needed than
| the data sent between processes, and race conditions are ruled
| out as long as only one process is allowed to process a data
| item at a time (this is the rule in FBP).
|
| The main blockers I think is that it requires quite a rethink
| of the architecture of software. I see this rethink happening
| in larger, especially distributed systems, which are modeled a
| lot around these principles already, using systems such as
| Kafka and message queues to communicate, which more or less
| forces people to model computations around the data flow.
|
| I think the same could happen inside monolithic applications
| too, with the right tooling. The concurrency primitives in Go
| are superbly suited to this in my experience, given that you
| work with the right paradigm, which I've been writing about
| before [1, 2], and started making a micro-unframework for [3]
| (though the latter one will be possible to make so much nicer
| after we get generics in Go).
|
| But then, I also think there are some lessons to be learned
| about the right granularity for processes and data in the
| pipeline. Due to the overhead of message passing, it will not
| make sense performance-wise to use dataflow for the very
| finest-grain data.
|
| Perhaps this in a sense parallels what we see with distributed
| computing, where there is a certain breaking point before which
| it isn't really worth it to go with distributed computing,
| because of all the overhead, both performance-wise and
| complexity-wise.
|
| [1] https://blog.gopheracademy.com/composable-pipelines-
| pattern/
|
| [2] https://blog.gopheracademy.com/advent-2015/composable-
| pipeli...
|
| [3] https://flowbase.org
| jmchuster wrote:
| It seems to me that where we really ended up was distributed
| systems. We solve problems by not just making our code concurrent
| to use more cores, but by also making it use more computers.
| r00fus wrote:
| There are security and latency issues that make distributed
| over multiple cores != multiple processors != multiple
| computers.
|
| Each step adds more complexity.
| rkangel wrote:
| I find it interesting that the language that best deals with
| parallelism (IMO) was invented significantly before we moved in
| the multi-core direction.
|
| The programming language landscape evolved and developed in the
| face of multi-code - async being a classic example. But the
| language that's often held up as the best solution to any given
| parallelism problem is Erlang. Erlang was built as a good
| programming model for concurrency on a single core, and then when
| multi-core came along, SMP was 'just' a VM enhancement and no
| programs needed changing at all to get the full advantage of all
| the cores (barring some special cases).
| chrisseaton wrote:
| > no programs needed changing at all to get the full advantage
| of all the cores (barring some special cases)
|
| Some programs just don't have any available parallelism in them
| - no matter how many processes you split them up into. You need
| to build parallel algorithms and data structures. That's still
| often a research-level topic.
|
| Erlang's not going to magic parallelism out of thin air in a
| program that doesn't have any.
| lostcolony wrote:
| Kind of strawmanning here, no? Obviously sequential tasks are
| run sequentially. The OP isn't saying "magic" happens. Just,
| Erlang was written from the beginning to encourage
| parallelized code. Its whole programming model is around
| that. Unlike pretty much every language it was contemporary
| with, which assumed that the default was a single unit of
| execution (and that code running in parallel was the
| exception).
|
| Think of task scheduling; the normal Algol/C family of
| languages would say "create a priority queue for your tasks,
| wait until the top of the queue is ready to run, pull it off,
| run it, repeat". Very sequential thinking. Of course, you
| then end up with issues around what happens if the processes
| take non-negligible amounts of time, what happens if you need
| to add new items into the queue, etc. At the lowest level
| that might be solved with interrupts, at a higher level it
| might be threads or separate processes, but you still have to
| worry about concurrent data access (locks and the like around
| the queue). But both the general approach, and the 'best
| practice' guidelines, especially prior to multi-core, (due to
| the difficulties of shared data), were/are to minimize
| concurrency.
|
| Erlang, you just spin up every task as its own process. The
| process sleeps until it's time to do something, then it does
| it. The language encourages you to solve the problem in a
| parallel manner, and it did so from the beginning.
|
| So, yes, you "need to build parallel algorithms and data
| structures". The point is, Erlang took advantage of them, as
| a language, even before the hardware could. The way you would
| write Erlang for a single core was the way you would write
| Erlang for many cores, which was unique for a language at the
| time, and the OP is arguing that not only was Erlang unique
| in doing that, but that there really isn't a better model for
| handling parallelism from a language and correctness
| standpoint (presumably; from a performance perspective m:n
| threading, of which Erlang actors are a form, is generally
| agreed upon to lose some efficiency).
| chrisseaton wrote:
| > Kind of strawmanning here, no?
|
| I don't think so? I'm responding to:
|
| > no programs needed changing at all to get the full
| advantage of all the cores (barring some special cases)
|
| Those 'special cases' are all the algorithms which aren't
| inherently parallel, which is most of them! How many
| algorithms inherently have a large amount of natural
| parallelism? I'd say it's very few.
|
| For example no matter how you write MD5 on Erlang... it's
| not going to run in parallel is it?
| dTal wrote:
| Who says "most" algorithms aren't inherently parallel?
| How do you even enumerate that set? What counts as an
| "algorithm" anyway? Parallel processing is the norm in
| biology. And even if we grant that most of the algorithms
| humans have devised to run on our von Neumann machines
| are sequential - doesn't that say more about us, than
| about some intrinsic property of algorithms?
| chrisseaton wrote:
| I mean just take a look in any book of algorithms and
| think for yourself how many have inherent parallelism.
| lostcolony wrote:
| As I said before - 'Obviously sequential tasks are run
| sequentially.'
|
| Innately sequential algorithms will still be sequential.
| Erlang already gave developers the tools, and incentive,
| to write things in a parallelized fashion. If a developer
| had a need that was directly sequential, it will, of
| course, still be sequential. If they had a need that
| would benefit from parallelism, they had the tools, the
| context, and the encouragement from the language itself
| to write it in parallel.
|
| Again, I think you're strawmanning here. You kind of
| implied it yourself with your earlier statement,
| "Erlang's not going to magic parallelism out of thin air
| in a program that doesn't have any". Saying that implies
| you think that's the OP's claim.
|
| As I said before, the OP wasn't claiming magic happens,
| and algorithms written serially suddenly become parallel.
| You can, of course, still write sequential code in
| Erlang, same as every language, because execution takes
| place in time. Just that the natural, ergonomic way of
| writing the language, of solving problems in the
| language, has always been parallel.
|
| And if you mean "well, I run MD5 checksums on individual
| blocks of the file, and parallelize it that way" - yes,
| that is going to be one of those special cases the OP
| mentioned, if you didn't already optimize it that way due
| to memory pressure concerns. But if it's "run an md5
| checksum against every file in this directory and check
| it against a manifest", the ergonomic approach in Erlang
| pre-multicore is to scatter/gather it across actors
| (since any one file might fail in strange ways and you
| want to isolate that failure and handle it predictably
| with a supervisor), and that would parallelize "for
| free", with no concurrency concerns. Unlike every other
| contemporary language.
| chrisseaton wrote:
| > they had the tools, the context, and the encouragement
| from the language itself to write it in parallel
|
| No I think Erlang gives you the tools, context, and
| encouragement to write it concurrently. After you have
| concurrency, you may or may not have parallelism.
|
| And if you don't you will have to restructure your
| program to get it, even if your program was already
| highly concurrent. That's contrary to what the person I
| was replying to said.
| lostcolony wrote:
| If you actually have concurrency, and you have the
| hardware able to handle it, why do you not have
| parallelism?
|
| I.e., simply spinning up an actor obviously does not
| imply concurrency; the classic Erlang 'ping pong' example
| shows processes waiting to receive a message and
| responding when they do. Not actually concurrent, despite
| multiple actors and multiple processor cores it's a
| sequential program, not a concurrent one.
|
| Likewise, if you have 5 processes all adding a list of
| numbers (independently and sequentially, one element at a
| time), but only 4 cores, you would have only 4 processes
| running in parallel at a time; a hardware limit. Up the
| cores, you can have all 5.
|
| Can you give me an example of a concurrent program that
| is not also parallel, due to non-hardware reasons, in
| Erlang? Because switching the goalposts around - as you
| say, Erlang encourages you to write concurrent programs,
| even before CPUs allowed true parallelism, and because of
| that, once the CPU allowed true parallelism, it only
| required a VM tweak and suddenly Erlang programs were
| running in parallel. But you say no, that that wasn't the
| case; just because you wrote things to be concurrent, and
| now the hardware has changed to allow them to be in
| parallel, they aren't necessarily parallel. I'd love an
| example of that.
| Jtsummers wrote:
| Concurrency produces parallelism when the concurrent
| portions can _actually_ run at the same time. That is,
| between synchronization points which force sequential
| behavior.
|
| It's very feasible to write a concurrent program that
| overuses (versus what is strictly needed) synchronization
| points in order to produce something which offers a
| better codebase (by some measure) but provides no
| parallelism. You have to remove/minimize these
| synchronization points in order to obtain parallelism.
| It's also possible that by design these synchronization
| points can't be eliminated.
|
| Erlang's message passing is inherently asynchronous, but
| you could conceive of similar synchronization points in a
| program that will reduce the potential parallelism or
| eliminate it depending on the overall structure. For
| instance: PID ! {self(), Message}, ;;
| forgot self() initially receive {PID,
| Response} -> end, ...
|
| You still have a concurrent design (which can be easier
| to reason about), but because you're waiting on a
| response you end up without any parallelism no matter how
| many cores you throw at it.
| lostcolony wrote:
| Right, when I said "actually concurrent", I meant the
| code -could- run simultaneously (or more formally, that
| the items of execution aren't necessarily causally
| related). Any sequential operation could be spread out
| over an arbitrary number of Erlang processes, without it
| actually being concurrent. That design isn't concurrent,
| you just are involving a bunch of processes that wait.
|
| That said, you are right that a synch point around a
| controlled resource (i.e., "we only want to support
| processing 5 requests at a time") would still have
| concurrent tasks (i.e., ten requests come in; there is no
| casual relation between when they run). Of
| course...that's an intentional point of synchronization
| necessary for correctness; it still strikes me as a
| strawman to say that you don't get parallelization for
| concurrent code, just because you forced a bottleneck
| somewhere.
| chrisseaton wrote:
| > If you actually have concurrency, and you have the
| hardware able to handle it, why do you not have
| parallelism?
|
| You can have concurrent tasks but with dependencies
| between them which means there is no parallelism
| available.
|
| > Can you give me an example of a concurrent program that
| is not also parallel, due to non-hardware reasons, in
| Erlang?
|
| For example....
|
| > the classic Erlang 'ping pong' example shows processes
| waiting to receive a message and responding when they do.
| Not actually concurrent, despite multiple actors and
| multiple processor cores it's a sequential program, not a
| concurrent one.
|
| No that _is_ actually concurrent. These are two
| concurrent tasks. They just don 't have any parallelism
| between them because there is a sequential dependency and
| only one is able to make progress at any given time.
|
| Concurrent and parallel are not the same thing, if you
| didn't know. Concurrency is having work in progress on
| multiple tasks at the same time. You don't need to
| actually be advancing more than one task at the same time
| for it to be concurrent.
| lostcolony wrote:
| "You can have concurrent tasks but with dependencies
| between them which means there is no parallelism
| available." - that isn't really concurrent then, is it?
| Concurrency implies running two things concurrently;
| you're just creating things to wait. No useful work is
| being done. You broke a sequential task across processes.
| Just because those processes are stuck on a receive loop
| doesn't mean they're doing anything. Might as well say
| your Java program is concurrent because you spun up a
| second thread that just waits.
|
| That was what I was getting at with my question;
| arbitrary processes that do their work in a serial
| fashion...isn't a concurrent program. It's a badly
| written serial one.
| chrisseaton wrote:
| > that isn't really concurrent then
|
| It is, using standard industry terminology.
|
| Concurrency does not require that concurrent tasks are
| able to make independent progress at the same time, just
| that they are in some point of progress at the same time.
|
| Back in 2012 when I was working on my PhD I wrote a paper
| about an example algorithm with very little inherent
| parallelism, where even if you have multiple concurrent
| tasks all making forward progress you may find it
| extremely hard to actually produce results in parallel,
| if you want to do a deep dive into this problem.
|
| https://chrisseaton.com/multiprog2012/seaton-
| multiprog2012.p...
|
| I wrote a poster about it as well for an easier
| introduction.
|
| https://chrisseaton.com/research-symposium-2012/seaton-
| irreg...
| lostcolony wrote:
| Thank you; the example helped clarify for me.
|
| Yes, any time there are shared resources, concurrent
| tasks can hit synchronization points, and end up
| bottlenecking on them. Of course, I'd contend that a
| point of synchronization is a form of serialization, but
| I agree that the tasks being synchronized would still be
| described as concurrent (since they don't causally affect
| their ordering). But such a synchronization is a
| necessary part of the algorithm you're choosing to use
| (even prior to SMP support), or it would be completely
| unnatural to include it. I don't know that it really
| invalidates the OP's point, that the language didn't
| remove the bottlenecks you introduced?
| Twisol wrote:
| > It is, using standard industry terminology.
|
| Just to add some support here, concurrency definitely has
| very little to do with actual independence of progress.
| It has far more to do with encapsulation of knowledge
| (for an active agent, not a passive entity like an object
| [^]), and how different agents coordinate to exchange
| that knowledge.
|
| An echo server + client is a concurrent system, despite
| having only one meaningful scheduling of actions over
| time (i.e. no interleaving or parallelism). Serialization
| and parallelism are both global properties, as they
| require a perspective "from above" to observe, while
| concurrency is a property local to each task.
|
| I can appreciate everyone is saying upthread, but I've
| definitely found it valuable to think about concurrency
| in these terms.
|
| [^] Even this is overselling it. You can easily have two
| logical tasks and one physical agent responsible for
| executing both of them, and this is still a concurrent
| system. Dataflow programming is a lot like this -- you're
| not really concerned with the global parallelism of the
| whole system, you're just describing a series of tasks
| over local knowledge that propagate results downstream.
| kitd wrote:
| _But the language that 's often held up as the best solution to
| any given parallelism problem is Erlang._
|
| Occam implements a lot of similar ideas very neatly too. The
| code looks a lot like programming with Go channels and CSP. I
| am surprised it hasn't caught on better in the multi-core
| world, it being older than even Erlang.
| Jtsummers wrote:
| I had to double check, the last Occam release was in 1994. I
| think the language was way to niche, and global communication
| way too poor still (compared to the '00s and '10s with
| ubiquitous or near ubiquitous internet access) to catch on.
| It was nearly a decade later that the first consumer
| multicore x86 processors started coming out (2003 if I've
| tracked down the right dates). And multicore didn't become
| common for a few more years and then the baseline near the
| end of the '00s, beginning of the '10s.
|
| That time gap and relative unknown of Occam pretty much
| doomed it even if it would have been wildly useful. By that
| point you had lots of other languages to come out, and Google
| pushing Go.
| acchow wrote:
| > I am surprised it hasn't caught on better in the multi-core
| world, it being older than even Erlang.
|
| Because multicore support in Ocaml has been "just a couple
| years away" for many years now.
| krylon wrote:
| Occam, not Ocaml. They are two different languages.
|
| Although I have no idea if Occam can utilize multiple
| cores.
| TickleSteve wrote:
| Occam was designed for the Transputer, which was intended
| to be used in arrays of processors, hence it being based
| on CSP principles.
|
| So, yes. it can.
| krylon wrote:
| TIL
|
| Thank you!
| acchow wrote:
| Oops misread that.
| [deleted]
| ashleyn wrote:
| The writing was on the wall as early as the beginning of the
| Pentium 4 era.
|
| There was this assumption that clock speed would just continue
| to grow at the insane rates it did in the 80s and 90s, and then
| reality caught up when the Netburst architecture didn't scale
| well enough to avoid heat issues and crazy insane pipelines.
| When Intel rolled out the Core architecture, they went back to
| a P6-derived design, a microarchitecture that dated back to
| 1995 - effectively an admission that Netburst had failed.
| lostcolony wrote:
| Erlang was begun in 1986. It left the lab and was used for
| production products first in 1996. The Pentium 4 came out in
| 2000.
|
| So, an interesting aside, but not sure it's relevant to the
| parent comment.
| dleslie wrote:
| And initial forays into many cores was decades earlier then
| that. Aside from big iron, going back to the early 60s, there
| was some business class devices like those made by Sequent:
|
| https://en.m.wikipedia.org/wiki/Sequent_Computer_Systems
| flakiness wrote:
| 2005, when the most important platform was the desktop, when the
| virtualization was yet to be mature, and when the dominant system
| programming language is C++.
|
| Today CPU cores are sliced by cloud vendors to sell out in
| smaller portion, and the phones are hesitant to go many-core as
| it will eat your battery in light-speed. Dark silicon is spent
| for domain specific circuits like AI, media or networking instead
| of generic cores.
|
| Parallelism is still very hard problem in theory, but its
| practical need isn't as prevalent as we thought on a decade-plus
| ago, partly thanks for the cloud and mobile. For most of us, at
| least the parallelism is kind of solved-by-someone-else problem.
| It is left for the small number of experts.
|
| Concurrency is still there, but the situation is much better than
| before (async/await, immutable data types, actors...)
| dragontamer wrote:
| The free lunch isn't quite over, although significant
| advancements were made in parallel computing... it turns out that
| CPUs have been able to "auto-parallelize" your sequential code
| all along. Just not nearly as efficiently as explicitly parallel
| methodologies.
|
| In 2005, your typical CPU was a 2.2GHz Athlon 64 3700+. In 2021,
| your typical CPU is a Ryzen 5700x at 3.8 GHz.
|
| Single-threaded performance is far better than 72% faster
| however. The Ryzen 5700x has far more L3 cache, far more
| instructions-per-clock, far more execution resources than the ol'
| 2005 era Athlon.
|
| In fact, server-class EPYC systems are commonly in the 2GHz
| range, because low-frequency saves a lot on power and servers
| want lower power usage. Today's EPYCs are still far faster per-
| core than the Athlons of old.
|
| -------------
|
| This is because your "single thread" is executed more-and-more in
| parallel today. Thanks to the magic of "dependency cutting"
| compilers, the compiler + CPU auto-parallelizes your code and
| runs them on the 8+ execution pipelines found on modern CPU
| "cores".
|
| Traditionally, the CPUs in the 90s had a singular pipeline. But
| the 90s and 00s brought forth out of order execution, as well as
| parallel execution pipelines (aka: superscalar execution). That
| means 2 or more pipelines execute your "sequential" code, yes, in
| parallel. Modern cores have more than 8 pipelines and are more
| than capable of 4+ or 6+ operations per clock tick.
|
| This is less efficient than explicit, programmer given
| parallelism. But it is far easier to accomplish. The Apple M1
| continues this tradition of wider execution. I'm not sure if
| "sequential" is dead quite yet (even if there's a huge amount of
| machine working to auto-translate sequential into parallel
| technically... our code is largely written in a "sequential"
| fashion)
|
| -------------
|
| But the big advancements after 2005 was the rise of GPGPU
| compute. It was always known that SIMD (aka: GPUs) were the most
| parallel systems, from the late 1980s and early 90s the SIMD
| supercomputers always had the most FLOPs.
|
| OpenCL and CUDA really took parallelism more / SIMD mainstream.
| And indeed: these SIMD systems (be it AMD MI100 or NVidia A100)
| are far more efficient and far higher compute capabilities than
| anything else.
|
| The only "competitor" on the supercomputer scale is the Fugaku
| supercomputer, with SVE (512-bit SIMD) ARM using HBM RAM (same as
| the high-end GPUs). SIMD seems like the obvious parallel compute
| methodology if you really need tons and tons of compute power.
| limoce wrote:
| There seems to be no way to efficiently replay concurrent
| programs in a deterministic fashion on multiple cores.
| Nondeterminism makes parallelism and concurrency inherently hard
| and unfriendly to new comers. It becomes even more difficult in
| recent years due to architecture decisions: weak memory order
| makes things worse.
|
| Supposing you are going to write nontrivial concurrent programs
| like toy Raft, I believe that looking through RPC logs will be
| the most painful thing.
|
| In contrast, on a single core, gdb is good enough. And there are
| also advanced examples like VMware's fault-tolerant VM and
| FundationDB's deterministic simulation. If we can debug
| concurrent programs without dirty tricks, just like single-
| threaded ones, I guess utilizing concurrency will be as handy as
| calling a function.
| aseure wrote:
| > There seems to be no way to efficiently replay concurrent
| programs in a deterministic fashion
|
| I wish there was something like Jepsen [1] broadly available to
| mainstream programming languages to do just that.
|
| [1] https://jepsen.io/consistency
| Const-me wrote:
| > no way to efficiently replay concurrent programs in a
| deterministic fashion on multiple cores
|
| OpenMP with static scheduler and hardcoded count of threads?
|
| OpenMP ain't a silver bullet, but for many practically useful
| scenarios it's indeed as simple as calling a function.
| hderms wrote:
| what about if you throw in async IO into the mix? seems like
| you'd have to have some kind of vmware-like virtualization
| where you record all IO interrupts into a linearalizable log
| Const-me wrote:
| When you throw async I/O in the mix it's no longer just
| your application you're debugging, it's the whole OS with
| other processes, the OS kernel with these drivers and DMAs,
| and external hardware that's not a code at all.
|
| Can be tricky to debug or implement, but when I need that I
| normally use async-await in C#. The debugger in the IDE is
| multi-threaded, and does support these tasks.
| kvhdude wrote:
| I recall transactional memory being pitched to take a bite out of
| lock overheads associated with multithreading. (as opposed to
| lockless algorithms). Has it become mainstream?
| kaliszad wrote:
| If you use Clojure than yes, software transactional memory is
| readily available. https://clojure.org/reference/refs
| typon wrote:
| I blame Python and Javascript. Two of the most popular languages
| without proper concurrency support.
| maattdd wrote:
| You mean parallelism ? JS has good concurrency support (event,
| async..)
| criddell wrote:
| Did it in 2005 though?
|
| I think Python also has async / await support today.
| RegW wrote:
| Perhaps in 2005 we were still able to call across browser
| frames and abusing the one thread per frame model.
|
| async / await isn't really concurrency. It's a mechanism
| for picking up the job after something else has been done
| (perhaps like a callback). In a way it's been an advantage
| of Javascript: one or two threads do all the work in a
| timely way without all that messy thread scheduling and
| context switches.
| spenczar5 wrote:
| That's concurrency without in-process parallelism.
| gnarbarian wrote:
| Parallelism in JavaScript is simple using web workers. of
| course the tricky part as with all parallel applications is
| managing the complexity you create yourself. The only
| languages that seem to do a good job at handling this for you
| are challenging in other ways like Haskell and Erlang.
| kaliszad wrote:
| I don't think web workers are a good design. They are maybe
| simple, but lack in flexibility and capability. E.g. if I
| am not mistaken, you cannot control animations from a web
| worker. You also have to communicate with them using
| strings, therefore there is considerable overhead.
|
| JavaScript and the related ecosystem in the browser lack a
| number of things, that are then patched over using
| additional complexity such as web assembly, bug handling/
| feature workarounds in applications, mobile apps that you
| basically have to develop, because the web isn't useable/
| performant enough for some stuff. Of course we have some
| extra experience now and it is easy to criticize in
| hindsight. E.g. the decision to make JavaScript less LISPy
| was perhaps a good decision for it's large adoption but
| longer term the homoiconicity could have been used for
| easier generation instead of e.g. stuff like WebAssembly.
| We also know, that stuff like Clojure(Script) is possible
| now. We also know, that we really want a native 64bit
| integer type instead of hodge-podge solutions with reduced
| precision of 53 bits or big decimal with worse/ complicated
| performance characteristics.
|
| For the last 12 or so years, the web and related
| technologies became an application platform and a video
| delivery network with demands that rival native
| applications. We even use JavaScript on the server side
| through Node.JS. This has enabled tremendous things to
| happen but it is also perhaps the least stable platform to
| develop for. The web actively breaks some of the more
| complex applications that it has enabled precisely because
| the foundations are not solid enough. The current state of
| affairs is we keep piling on new and shiny APIs and
| technologies, that perhaps have considerable merit but we
| don't really fix the broken things in a way normal devs
| could keep up. I mean, how do you imagine to keep up, if
| even GMail, Google Maps, YouTube, Facebook and all its web
| apps, even major news sites and other applications backed
| by big companies still have rather obvious bugs in these
| products? I guess, "move fast and break things [for the
| later generations to fix all this mess]" it is.
| jerf wrote:
| I find the distinction far less interesting than most people.
| I thing it's easier to think of parallelism simply as an
| interesting special case of concurrency, and to think of
| runtimes and systems that are "concurrent" but can't run
| literally simultaneously, like Javascript or Python, as
| simply accidents of history not worth specially writing into
| the definitions of our terms. Every year "concurrent" code
| that can't be scheduled to run on multiple CPUs
| simultaneously is less and less interesting. And I don't
| anticipate any new language coming out in the future that
| will try to be "concurrent but can only run on one CPU", so
| this meaning is just going to fade into history as a
| particular quirk of some legacy runtimes.
|
| No, I do not consider any runtime that can't run on multiple
| CPUs simultaneously to have "good" concurrency support. It
| would, at best, be _bad_ concurrency support. Better than
| "no" support, sure, but not _good_ in 2021. If a new language
| came out with that support, nobody would call it "good"
| support, they'd call it failing to even raise the table
| stakes a new language needs nowadays.
| Spivak wrote:
| What's your definition of running simultaneously because
| CPython's approach to running on multiple cores is
| multiprocessing which works honestly fine. The tooling to
| do it is pretty slick where you can ignore a lot of the
| typical pain of IPC.
|
| Because if "good" concurrency support means single-process
| multi-threaded on multiple cores running with enough locks
| that you can have multiple threads executing code
| simultaneously in a shared memory space then a lot of
| languages are going to fall down or punt all responsibility
| for doing that safely to the programmer which might as well
| be no support.
| jerf wrote:
| Shared memory spaces. Your "slick tooling" is one of the
| hacks I mentioned that history will forget, by the
| standard of "if a new language emerged that tried to call
| that 'concurrency' nobody would take it seriously".
|
| I should say that "hack" here isn't necessarily a
| perjorative. There are reasons for communities to create
| and deploy those. There are plenty of cases where
| existing code can be leveraged to work _better_ than it
| could without it, and that 's the relevant standard for
| whether something is useful, not whether or not in a
| parallel universe the code could have been written in
| some completely different way or whether in some
| hypothetical sense if you could push a button and rewrite
| something for free you'd end up with something better.
| Hacks can be good, and every language will pick them up
| at some point as history evolves around it and some of
| the core assumptions a language/runtime made go out of
| date. But it's still a hack.
|
| While OS process boundaries do provide some nice
| features, they are also in general overkill. See Erlang
| and Pony for some alternate riffs on the idea, and if you
| look at it hard enough and understand it deeply enough,
| even what Rust does can provide some similar benefits
| without raising a full OS process boundary between bits
| of code.
| catern wrote:
| You're saying that _shared memory concurrency_ is the
| future? I think you 've got it completely wrong. Shared
| memory concurrency is the past. It was thought to be a
| good idea in the 80s and 90s, but we now know that it's
| both hard to program, and that it works poorly with
| highly-parallel, high-performance hardware. In the
| future, we'll see more and more programming systems which
| don't provide shared memory concurrency at all.
| bottled_poe wrote:
| I believe there will always be situations (though
| specific) where a shared-memory concurrency model is the
| most efficient and performant use of available resources.
| For this reason alone, shared-memory concurrency will
| always have a place. That said, I generally agree that
| the isolated parallel memory model is preferable for
| simplicity's sake.
| simiones wrote:
| I have usually seen the distinction between concurrency and
| parallelism being drawn for precisely the opposite reason:
| parallelism without concurrency is a relatively commonly
| used niche, it is much simpler than concurrency, and it has
| special tools that make sense just for it.
|
| For example, CUDA exposes a parallel programming model with
| low support for actual concurrency. Similarly, OpenMP is
| mostly useful for parallel computations that rarely need
| coordination. C#'s Parallel package is another example.
|
| By contrast, concurrency often rears its ugly head even in
| single-threaded workflows, even ones that are part of
| larger multi-threaded programs.
| Spivak wrote:
| What is Python missing in your eyes? Cooperative multitasking
| is a relatively recent thing to be in vogue again, and python
| has had good support for multithreading and multiprocessing
| since forever.
| typon wrote:
| Without in-process multi-threading with OS threads, writing
| performant parallel code is nearly impossible. Most of the
| time you have to drop down to C/C++, disable the GIL and
| implement parallelism there.
| aisengard wrote:
| Every time someone brings up the GIL and performance, the
| answer is pretty much always "use a different language for
| the places you need especially high performance". All this
| sturm and drang about Python not being performant enough is
| missing the point. It's not supposed to be the answer to
| special cases where maximum performance is required! It's a
| general-purpose language that is built around community and
| easy readability and elimination of boilerplate, and be
| good-enough for a vast majority of usecases. Just like any
| language, there are usecases where it is not appropriate.
| Clearly, it has succeeded given its significant reach and
| longevity.
| Spivak wrote:
| Most high-level languages implement their performance
| critical functions at a lower level. However, languages
| that rely on a GIL (Python and Ruby) or isolated memory
| spaces (V8) have an additional hurdle that if you want a
| model of concurrency with many threads acting
| simultaneously on a single shared memory space you have
| to do additional work.
|
| For Python you either have to have your library pause
| execution of Python bytecode and do multithreaded things
| on your memory space, allocate an isolated memory arena
| for your multithreaded work, copy data in and out of it,
| and then spawn non-Python threads (see PEP 554 for an IRL
| example of this idea with the Python interpreter itself),
| or copy data to another process which can do the
| multithreaded things.
|
| With PEP 554 (right now using the C API) _I_ personally
| have no issue with multithreaded Python since the
| overhead of copying memory between interpreters is fast
| enough for my work but it is overhead.
| typon wrote:
| There is nothing about the Python language spec (as
| gleaned from looking at how CPython works) that forces it
| to be slow or not handle parallelism. In fact the
| addition of the massive amounts of library code and
| language changes to support async show that Python isn't
| even immune to preventing added complexity.
| N1H1L wrote:
| Also dask works pretty good all the way upto 200-250 CPU
| cores. It's pretty straightforward to build your code to use
| dask rather than numpy or pandas
| rbanffy wrote:
| One thing I'd love to do is a Smalltalk implementation where
| every message is processed in a separate thread. Could be a nice
| educational tool, as well as a great excuse to push workstations
| with hundreds of cores.
| lincpa wrote:
| It's ultimate solution:
|
| The Grand Unified Programming Theory: The Pure Function Pipeline
| Data Flow with Principle-based Warehouse/Workshop Model
|
| Apple M1 chip is the best case.
|
| https://github.com/linpengcheng/PurefunctionPipelineDataflow
| dang wrote:
| Past related threads:
|
| _The Free Lunch Is Over - A Fundamental Turn Toward Concurrency
| in Software_ - https://news.ycombinator.com/item?id=15415039 -
| Oct 2017 (1 comment)
|
| _The Free Lunch Is Over: A Fundamental Turn Toward Concurrency
| in Software (2005)_ -
| https://news.ycombinator.com/item?id=10096100 - Aug 2015 (2
| comments)
|
| _The Moore 's Law free lunch is over. Now welcome to the
| hardware jungle._ - https://news.ycombinator.com/item?id=3502223
| - Jan 2012 (75 comments)
|
| _The Free Lunch Is Over_ -
| https://news.ycombinator.com/item?id=1441820 - June 2010 (24
| comments)
|
| Others?
___________________________________________________________________
(page generated 2021-06-28 23:01 UTC)