[HN Gopher] Show HN: Coros - A Modern C++ Library for Task Paral...
___________________________________________________________________
Show HN: Coros - A Modern C++ Library for Task Parallelism
Hello Hacker News. I'm Martin, a graduate student from Prague, and
I've been working on Coros, a C++ library for task-based
parallelism. After spending some time with OpenMP and oneTBB, I
wanted to try building a library using modern features from the C++
standard library. I've used coroutines for task encapsulation and
C++23 expected for exception handling, while trying to maintain
good performance. Additionally, I've implemented monadic-like
behavior to allow easy chaining of tasks, similar to the monadic
operations in std::expected. You can check out the project here:
https://github.com/mtmucha/coros While this library isn't fully-
fledged or production-ready, I'd really appreciate your feedback!
Author : singledigits
Score : 64 points
Date : 2024-09-25 13:05 UTC (9 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| throwaway17_17 wrote:
| I am pretty okay with the code (I'm essentially talking about the
| usage syntax for the library and its type) shown in the examples.
| However, at this point any parallel computing implementations
| must address the baseline issues presented in "Scalability! But
| at what COST? (McSherry,Isard,Murray 2015)" a paper whose central
| question is can a parallel computation exhibit a Configuration
| that Outperforms a Single Thread (the COST in the title). [1]
| There is a good discussion of the paper and its applicability to
| parallel (and distributed) computation implementations in Richard
| Feldman's 2024 Distributed Systems talk "Distributed Pure
| Functions". [2]
|
| At this point in the life-cycle of the concept of parallel
| computation, I think it has become somewhat imperative that devs
| in the area begin to honestly evaluate the practicality and
| benefits/drawbacks of using the techniques for a given
| application area and attempt to 'sell' their libraries,
| techniques, idioms, etc using a more transparent approach. Also,
| I generally think that people that argue for more prevalence of
| parallel code, especially those arguing for the default being
| parallel (or concurrent), have to wrestle with and address these
| same issues.
|
| Again, I don't dislike the premise of the library, think the
| usage examples seem very sensible and well designed, and I really
| like parallel computation as an area of study in general.
| Further, I really think that setting out a task for one's self
|
| 'to try building a library using modern features from the C++
| standard library. I've used coroutines for task encapsulation and
| C++23 expected for exception handling, while trying to maintain
| good performance.'
|
| after taking inspiration from two well respected and frequently
| utilized libraries in the space is great and the internals of the
| library I saw look clean and well architected.
|
| 1 -
| https://www.usenix.org/system/files/conference/hotos15/hotos... 2
| - https://youtu.be/ztY1YRiaSiE?si=npBREw9vdF5dHcJh&t=350
| singledigits wrote:
| Thank you for your thoughtful feedback.
|
| I've just skimmed through the paper, and it raises interesting
| and valid point about scalability in parallel computing. I'll
| definitely look into it more thoroughly, as well as the talk
| you mentioned.
|
| I'm glad you find the usage examples well-designed and
| appreciate your positive remarks about the library's
| architecture. Thank you again for your insights.
| SolarNet wrote:
| I think you are misapplying that paper? This as a library is
| the "batteries" to C++'s no-batteries-included standard library
| which does not implement asynchronous coroutines at all.
|
| The paper is much more on the side of application and system
| performance. But you couldn't even write such a system without
| a library like this providing you the tools to do so. This is
| much more in the domain of "basic tool for ecosystem" than
| "library for specific tasks". It's on the user of the tool to
| address the paper's question, not the builder of the tools.
| throwaway17_17 wrote:
| You are not incorrect in stating that the primary focus of
| the paper is more on the application side. However, I think
| providers of a parallel computation infrastructure would
| benefit from profiling a wide range of potential use cases
| across several work load sizes. This could then lead to a
| section in a README where the baseline overhead was broken
| down per workload/worksize measurements and a back of the
| envelope estimate by an application developer would be more
| particularly motivated when deciding which infrastructure
| tool may be the best fit for their application's specific
| requirements.
| tlb wrote:
| In your dequeue/circular buffer implementation, how is it able to
| grow the queue without locking?
|
| The code seems to rely on atomics for head & tail, but grows the
| queue without any special provisions I can see.
|
| https://github.com/mtmucha/coros/blob/ee30d3c1d0602c3071aa26...
| singledigits wrote:
| The concept behind the deque is explained in Correct and
| Efficient Work-Stealing for Weak Memory Models [1].
|
| The idea is that only the owning thread can push tasks into the
| deque. If the owning thread detects that the deque is full, it
| creates a new one and copies the original values. Once the copy
| is ready, the owning thread "publishes" it by storing it in the
| buffer variable. Pointers to the deque are atomic, as well as
| the indices. Other threads can manipulate only the indices, and
| even if a stealing thread has an old pointer, it still points
| to valid data.
|
| I hope I understood your question correctly and that this
| answer is helpful. You can find more details in the paper
| mentioned above.
|
| [1] https://inria.hal.science/hal-00802885/document
| Koshkin wrote:
| There's also a high-quality, sophisticated Threading Building
| Blocks by Intel (which I wish would become a part of the C++
| standard library).
|
| https://en.wikipedia.org/wiki/Threading_Building_Blocks
| Zitrax wrote:
| You can see in the repository that it was benchmarked against
| oneTBB.
| jcelerier wrote:
| TBB was already far from the state-of-the-art 7/8 years ago,
| and there are continuously new approaches that outperform it
| such as https://github.com/taskflow/taskflow ;
| https://github.com/google/marl ; and the most recent contender
| https://github.com/dpuyda/scheduling
| throwaway_94404 wrote:
| I just can't get my brain around coroutines.
|
| Can anyone recommend a good tutorial or resource for me to read.
|
| I find it so frustrating as I don't think it's necessarily a
| complex subject but my brain just doesn't get it.
|
| Related perhaps but many (many, many) years ago, when learning
| BASIC, I assumed GOSUB went off and started executing the code in
| the subroutine as well as the rest of the inline code. That
| suggests to me that I should perhaps have a deeper understanding
| of this but I really don't...
| dataflow wrote:
| Do you mean C++ coroutines, or coroutines in general? If you're
| new to the concept I would try to start with Python's, then
| Javascript or C#. C++'s is way more complicated.
| pjmlp wrote:
| Note that C# and C++ are quite similar, the biggest
| difference are the lifetime gotchas and not having coroutines
| runtime on the standard library.
|
| Their design has a common source, and the magic methods for
| awaitables as well.
| SolarNet wrote:
| Co-routines can be a nebulous sort of concept because it means
| different things in different places and not all of them have
| the same features. But some of the big points are:
|
| - Heap allocated call frame. Instead of being pushed onto the
| stack, co-routines tend to have their call frame (local
| variables, arguments, etc.) placed into heap memory (or at
| least _may_ be place-able into heap memory). This often enables
| the other features.
|
| - Control can leave co-routines in more ways than standard
| function calls. Generally this means returning (often called
| "yield") to the caller without completing the whole function.
| It can then be later resumed, returning to where the function
| originally left off. Generators are a common pattern enabled by
| co-routines that rely on only this part (and so many systems
| can optimize out the heap usage, for example).
|
| - A co-routine is usually an object with an interface that
| allows you to move it around and resume it in different places
| than it was originally called. This can include on different
| threads, or depending on the sophistication of the system,
| different processes or machines.
|
| Those are the three big points in my mind. I'd recommend trying
| lua coroutines, personally (I like minmalist engines like
| defold to use it in) to really get a feel for how these are on
| the edge between "language feature" and "library feature".
| singledigits wrote:
| I feel you! Coroutines can be tricky at first. I recommend
| Lewis Baker's blog about coroutines [1], which is detailed and
| insightful. Additionally, cppreference [2] is a great resource
| to understand how coroutines work in C++.
|
| In a nutshell, C++ coroutines are almost like regular
| functions, except that they can be "paused" (suspended), and
| their state is stored on the heap so they can be resumed later.
| When you resume a coroutine, its state is loaded back, and
| execution continues from where it left off.
|
| The complicated part comes from the interfaces through which
| you use coroutines in C++. Each coroutine needs to be
| associated with a promise object, which defines how the
| coroutine behaves (for example, what happens when you co_return
| a value). Then, there are awaiters, which define what happens
| when you co_await them. For example, C++ provides a built-in
| awaiter called suspend_always{}, which you can co_await to
| pause the coroutine.
|
| If you take your time and go thoroughly through the blog and
| Cppreference, you'll definitely get the hang of it.
|
| Hope this helps.
|
| [1] https://lewissbaker.github.io/ [2]
| https://en.cppreference.com/w/cpp/language/coroutines
| loeg wrote:
| They're just green threads with some nice syntax sugar,
| right? Instead of an OS-level "pause" with a futex-wait or
| sleep (resumed by the kernel scheduler), they do an
| application-level pause and require some application-level
| scheduler. (But coroutines can still call library or kernel
| functions that block/sleep, breaking userspace scheduling?)
| singledigits wrote:
| Yes, exactly. Coroutines are one possible implementation of
| green threads. Once they are scheduled/loaded on an OS
| thread, they behave just like regular functions with their
| own call stack. This means they can indeed call blocking
| operations at the OS level. A possible approach to handle
| such operations would be to wrap the blocking call, suspend
| the coroutine, and then resume it once the operation is
| complete, perhaps by polling(checking for completion).
| jpc0 wrote:
| Dumbed down way too far.
|
| They are a function that can remember where they are in their
| own execution so when they are called later they continue
| execution where they left of.
|
| There are many many ways of implementing that functionality,
| C++ standard coroutines are only one such implementation.
|
| What you do with them is whatever you want, it's pretty common
| to handle IO using them but generators are also a pretty common
| example. But that is generally high level.
|
| C++ coroutines are basic building blocks and are very low
| level, there is no executor ( rust tokio / python asyncio ) so
| don't be worried if it seems hard to use, it is hard to use.
|
| Look at std::generator for how coroutines are used to implement
| a generator, cppcoro is also a pretty popular library that
| builds abstractions on top of coroutines and also has some
| executors if I remember correctly.
| baq wrote:
| imagine a virtual (green) thread which the kernel doesn't run
| in parallel until you tell it it's ok to do so (when you
| explicitly yield control) and then can continue from that place
| when you explicitly tell it to.
|
| you can even try to run those virtual threads on real threads.
| much fun to be had.
| Koshkin wrote:
| One way to get a sense of coroutines is to consider the
| behavior presented by the async/await design pattern [1], where
| 'await' suspends the execution of the currently running code
| and yields control to the 'async' task. (As an adage goes,
| "async is not asynchronous, and await does not await
| anything.") Yet another pattern is "promise/future", where the
| code execution is (or may be) suspended as soon as the code
| tries to obtain the promised result.
|
| [1] https://learn.microsoft.com/en-
| us/dotnet/csharp/asynchronous...
| dxuh wrote:
| Coroutines themselves are a really simple concept. But in
| practice they give you all the headaches async stuff generally
| gives you. And in C++ there is a _ton_ of extra complication,
| especially because there is no support library. I wrote this in
| a tutorial a while ago:
|
| > they are functions that can suspend themselves, meaning they
| stop themselves without returning, even in the middle of their
| body and then can later can be resumed, continuing execution at
| the point they suspended from earlier.
|
| If you want to use coroutines in C++ specifically you can have
| a look at this tutorial, if you want:
| https://theshoemaker.de/posts/yet-another-cpp-coroutine-tuto...
| I don't know of anyone that read it, but I spent a lot of time
| on it.
|
| It essentially tries to explain how to build a coroutine
| support library yourself, but if you don't care about that,
| skip it and just use libcoro or cppcoro. They have examples
| too. My little async io library has some examples as well if
| you want to get an idea.
| marhee wrote:
| Coroutines are just multiple call stacks. If coroutine A calls
| coroutine B then B excutes on its on stack and can 'yield' a
| value back to A. Yielding is just a return across stacks
| without destroying the current stack. So A continued with the
| yielded valie on its own stack and when ready calls B again
| which continues on its own stack with the next statement after
| the previous yield. Etc.
|
| Notice that this does not necessarily involve parallelism,
| although it can. For example, Lua has non parallel
| (cooperative) co-routines. Go had parallel coroutins, called
| goroutines, but theoretically only if they they use channels to
| exchange values. Otherwise, if they're not exchanging
| information they would not becoroutins in the sense that they
| work together in solving something.
| binary132 wrote:
| I didn't understand C++ coroutines until I learned to use Lua
| coroutines. It's basically not that different from gotos, if
| goto saved local state.
| IshKebab wrote:
| Yeah there aren't many good resources on it unfortunately. One
| thing to note is C++20's coroutine support is _really_ low
| level. It 's designed for library authors so that they can
| build the kinds of things "normal" people want - tasks,
| generators, futures, promises, etc.
|
| This video is the best intro I've found. It actually explains
| what is happening in memory, which is the only way to really
| understand anything in C++.
|
| https://youtu.be/aibjUHx7vew
|
| Also this is decent:
|
| https://www.scs.stanford.edu/~dm/blog/c++-coroutines.html
|
| But don't try and write a coroutine library yourself. Use
| something like libcoro.
| germandiago wrote:
| How is this library different from Boost.Cobalt and cppcoro?
| singledigits wrote:
| Thank you for your question.
|
| I've included a link to Lewis Baker's blog (the author of
| CppCoro) in my repository as an excellent explanation of
| coroutines. From my understanding, after reviewing his library,
| it is no longer in active development and hasn't been updated
| for a couple of years. CppCoro was an experimental library
| intended to explore coroutines while they were still an
| experimental feature. For example, CppCoro uses a custom type
| for storing values, similar to std::optional from the standard
| library (if I'm not mistaken).
|
| For my implementation, I've opted to leverage std::expected
| from C++23 for storing values. I've also implemented monadic-
| like chaining. CppCoro, however, seems to focus more on
| asynchronous operations, whereas my library focuses more on
| task-based parallelism.
|
| I don't have experience with Boost.Cobalt, so I can't provide
| insights there, but I will definitely look into it now that
| you've mentioned it.
|
| Hope this helps.
| feverzsj wrote:
| I think Op's lib is for fork-join style parallel algorithms.
| It's like TBB but is based on continuation stealing.
| Boost.Cobalt and cppcoro are general coroutine libs. They are
| mostly used for async IO programming.
| neonsunset wrote:
| This looks _exactly_ like .NET 's task abstraction.
|
| If it works anywhere near as good, I'm definitely giving this a
| try next time I need to work on a C++ project. Thanks!
| pjmlp wrote:
| As historical note for those that don't follow C++, C++20 co-
| routines grew up from the work done with asynchronous
| programming on WinRT for C# and C++, inspired by Midori and
| .NET async/await.
|
| Most of the magic methods expected by C++ compilers in
| awaitable types, are also present in the structured typing used
| by C# for awaitables.
|
| The preview implementation for VC++ and clang were done by a
| Microsoft employee, Gor Nishanov, his talks are always quite
| interesting.
| singledigits wrote:
| Hopefully, you find it useful! If you have any ideas or
| suggestions for improvement, feel free to open an issue or let
| me know. Thanks for considering it!
| viralsink wrote:
| Is there a way to prevent callback hell in C++ when doing
| asynchronous communication with C++ before 20? Coroutines seem to
| be the only clean solution. Promises can work, but they tend to
| be difficult to reason about if branching is involved.
| darknavi wrote:
| Traditionally the way to prevent "callback hell" is to use
| something like async/await syntax. Without that there aren't a
| ton of good options. Like you mentioned, you could switch to
| promises with polling.
| gpderetta wrote:
| There are library-only stackful coroutines options, like in
| boost.
| gsliepen wrote:
| It would be nice if there was a function to wait for tasks and to
| return the results at the same time, so that you could write
| something like: auto [a, b] = co_await
| coros::wait_tasks(fib(n - 1), fib(n - 2)); return a + b;
| singledigits wrote:
| Thank you for your feedback.
|
| I understand that working with tasks and retrieving values can
| feel a bit clunky. The main reason I've structured it this way
| is that individual tasks are RAII objects, and their coroutine
| state is destroyed once they go out of scope. However, I could
| modify the awaitable returned from wait_tasks to store tasks,
| and then return values directly to the user. This could
| definitely be a more ergonomic overload for the function. I'll
| look into it!
| fooker wrote:
| If you need this interface, use threads.
| leeter wrote:
| Looks like a good start. I'm not actually sure I'd use it on
| windows however. CPPWinrt has a really decent coroutine support
| library with tools like winrt::resume_background() [1], I use it
| extensively even in desktop apps because it makes using the
| windows threadpool (which is active by default for all windows
| processes since at least windows 7) trivial. I've basically moved
| most of my threading code onto that unless I need a dedicated
| thread to hold a context for some reason. But, that's a windows
| specific thing as far as I know.
|
| [1] https://learn.microsoft.com/en-us/uwp/cpp-ref-for-
| winrt/resu...
| singledigits wrote:
| Thank you for your response!
|
| I don't have experience with WinRT, but it does seem quite
| similar at first glance. One of the key reasons I focused on
| modern C++ was to ensure cross-platform compatibility. However,
| I completely understand that if you're working on Windows and
| are already familiar with WinRT, sticking with it makes perfect
| sense. I'll take a closer look at WinRT to see if there are any
| significant differences.
| leeter wrote:
| My suggestion is aim for compatibility with cppwinrt, but not
| anything else. That way devs can freely intermix and get the
| best of the utilities of both.
| cxx wrote:
| This looks very promising, it's refreshing to see a library with
| a sane interface.
|
| One thing I'd like to see is the possibility to run the
| coroutines in the main thread, without spawning any new threads
| in the thread pool. It might seem strange but sometimes you just
| need to do I/O stuff concurrently in a place where you're not
| allowed to spawn other threads.
|
| Other than that congrats on the release, I hope you keep working
| on it!
| OnlyMortal wrote:
| Have you had a look at SeaStar and how it works with coroutines?
___________________________________________________________________
(page generated 2024-09-25 23:01 UTC)