[HN Gopher] Show HN: Coros - A Modern C++ Library for Task Paral...
       ___________________________________________________________________
        
       Show HN: Coros - A Modern C++ Library for Task Parallelism
        
       Hello Hacker News.  I'm Martin, a graduate student from Prague, and
       I've been working on Coros, a C++ library for task-based
       parallelism.  After spending some time with OpenMP and oneTBB, I
       wanted to try building a library using modern features from the C++
       standard library. I've used coroutines for task encapsulation and
       C++23 expected for exception handling, while trying to maintain
       good performance.  Additionally, I've implemented monadic-like
       behavior to allow easy chaining of tasks, similar to the monadic
       operations in std::expected.  You can check out the project here:
       https://github.com/mtmucha/coros  While this library isn't fully-
       fledged or production-ready, I'd really appreciate your feedback!
        
       Author : singledigits
       Score  : 64 points
       Date   : 2024-09-25 13:05 UTC (9 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | throwaway17_17 wrote:
       | I am pretty okay with the code (I'm essentially talking about the
       | usage syntax for the library and its type) shown in the examples.
       | However, at this point any parallel computing implementations
       | must address the baseline issues presented in "Scalability! But
       | at what COST? (McSherry,Isard,Murray 2015)" a paper whose central
       | question is can a parallel computation exhibit a Configuration
       | that Outperforms a Single Thread (the COST in the title). [1]
       | There is a good discussion of the paper and its applicability to
       | parallel (and distributed) computation implementations in Richard
       | Feldman's 2024 Distributed Systems talk "Distributed Pure
       | Functions". [2]
       | 
       | At this point in the life-cycle of the concept of parallel
       | computation, I think it has become somewhat imperative that devs
       | in the area begin to honestly evaluate the practicality and
       | benefits/drawbacks of using the techniques for a given
       | application area and attempt to 'sell' their libraries,
       | techniques, idioms, etc using a more transparent approach. Also,
       | I generally think that people that argue for more prevalence of
       | parallel code, especially those arguing for the default being
       | parallel (or concurrent), have to wrestle with and address these
       | same issues.
       | 
       | Again, I don't dislike the premise of the library, think the
       | usage examples seem very sensible and well designed, and I really
       | like parallel computation as an area of study in general.
       | Further, I really think that setting out a task for one's self
       | 
       | 'to try building a library using modern features from the C++
       | standard library. I've used coroutines for task encapsulation and
       | C++23 expected for exception handling, while trying to maintain
       | good performance.'
       | 
       | after taking inspiration from two well respected and frequently
       | utilized libraries in the space is great and the internals of the
       | library I saw look clean and well architected.
       | 
       | 1 -
       | https://www.usenix.org/system/files/conference/hotos15/hotos... 2
       | - https://youtu.be/ztY1YRiaSiE?si=npBREw9vdF5dHcJh&t=350
        
         | singledigits wrote:
         | Thank you for your thoughtful feedback.
         | 
         | I've just skimmed through the paper, and it raises interesting
         | and valid point about scalability in parallel computing. I'll
         | definitely look into it more thoroughly, as well as the talk
         | you mentioned.
         | 
         | I'm glad you find the usage examples well-designed and
         | appreciate your positive remarks about the library's
         | architecture. Thank you again for your insights.
        
         | SolarNet wrote:
         | I think you are misapplying that paper? This as a library is
         | the "batteries" to C++'s no-batteries-included standard library
         | which does not implement asynchronous coroutines at all.
         | 
         | The paper is much more on the side of application and system
         | performance. But you couldn't even write such a system without
         | a library like this providing you the tools to do so. This is
         | much more in the domain of "basic tool for ecosystem" than
         | "library for specific tasks". It's on the user of the tool to
         | address the paper's question, not the builder of the tools.
        
           | throwaway17_17 wrote:
           | You are not incorrect in stating that the primary focus of
           | the paper is more on the application side. However, I think
           | providers of a parallel computation infrastructure would
           | benefit from profiling a wide range of potential use cases
           | across several work load sizes. This could then lead to a
           | section in a README where the baseline overhead was broken
           | down per workload/worksize measurements and a back of the
           | envelope estimate by an application developer would be more
           | particularly motivated when deciding which infrastructure
           | tool may be the best fit for their application's specific
           | requirements.
        
       | tlb wrote:
       | In your dequeue/circular buffer implementation, how is it able to
       | grow the queue without locking?
       | 
       | The code seems to rely on atomics for head & tail, but grows the
       | queue without any special provisions I can see.
       | 
       | https://github.com/mtmucha/coros/blob/ee30d3c1d0602c3071aa26...
        
         | singledigits wrote:
         | The concept behind the deque is explained in Correct and
         | Efficient Work-Stealing for Weak Memory Models [1].
         | 
         | The idea is that only the owning thread can push tasks into the
         | deque. If the owning thread detects that the deque is full, it
         | creates a new one and copies the original values. Once the copy
         | is ready, the owning thread "publishes" it by storing it in the
         | buffer variable. Pointers to the deque are atomic, as well as
         | the indices. Other threads can manipulate only the indices, and
         | even if a stealing thread has an old pointer, it still points
         | to valid data.
         | 
         | I hope I understood your question correctly and that this
         | answer is helpful. You can find more details in the paper
         | mentioned above.
         | 
         | [1] https://inria.hal.science/hal-00802885/document
        
       | Koshkin wrote:
       | There's also a high-quality, sophisticated Threading Building
       | Blocks by Intel (which I wish would become a part of the C++
       | standard library).
       | 
       | https://en.wikipedia.org/wiki/Threading_Building_Blocks
        
         | Zitrax wrote:
         | You can see in the repository that it was benchmarked against
         | oneTBB.
        
         | jcelerier wrote:
         | TBB was already far from the state-of-the-art 7/8 years ago,
         | and there are continuously new approaches that outperform it
         | such as https://github.com/taskflow/taskflow ;
         | https://github.com/google/marl ; and the most recent contender
         | https://github.com/dpuyda/scheduling
        
       | throwaway_94404 wrote:
       | I just can't get my brain around coroutines.
       | 
       | Can anyone recommend a good tutorial or resource for me to read.
       | 
       | I find it so frustrating as I don't think it's necessarily a
       | complex subject but my brain just doesn't get it.
       | 
       | Related perhaps but many (many, many) years ago, when learning
       | BASIC, I assumed GOSUB went off and started executing the code in
       | the subroutine as well as the rest of the inline code. That
       | suggests to me that I should perhaps have a deeper understanding
       | of this but I really don't...
        
         | dataflow wrote:
         | Do you mean C++ coroutines, or coroutines in general? If you're
         | new to the concept I would try to start with Python's, then
         | Javascript or C#. C++'s is way more complicated.
        
           | pjmlp wrote:
           | Note that C# and C++ are quite similar, the biggest
           | difference are the lifetime gotchas and not having coroutines
           | runtime on the standard library.
           | 
           | Their design has a common source, and the magic methods for
           | awaitables as well.
        
         | SolarNet wrote:
         | Co-routines can be a nebulous sort of concept because it means
         | different things in different places and not all of them have
         | the same features. But some of the big points are:
         | 
         | - Heap allocated call frame. Instead of being pushed onto the
         | stack, co-routines tend to have their call frame (local
         | variables, arguments, etc.) placed into heap memory (or at
         | least _may_ be place-able into heap memory). This often enables
         | the other features.
         | 
         | - Control can leave co-routines in more ways than standard
         | function calls. Generally this means returning (often called
         | "yield") to the caller without completing the whole function.
         | It can then be later resumed, returning to where the function
         | originally left off. Generators are a common pattern enabled by
         | co-routines that rely on only this part (and so many systems
         | can optimize out the heap usage, for example).
         | 
         | - A co-routine is usually an object with an interface that
         | allows you to move it around and resume it in different places
         | than it was originally called. This can include on different
         | threads, or depending on the sophistication of the system,
         | different processes or machines.
         | 
         | Those are the three big points in my mind. I'd recommend trying
         | lua coroutines, personally (I like minmalist engines like
         | defold to use it in) to really get a feel for how these are on
         | the edge between "language feature" and "library feature".
        
         | singledigits wrote:
         | I feel you! Coroutines can be tricky at first. I recommend
         | Lewis Baker's blog about coroutines [1], which is detailed and
         | insightful. Additionally, cppreference [2] is a great resource
         | to understand how coroutines work in C++.
         | 
         | In a nutshell, C++ coroutines are almost like regular
         | functions, except that they can be "paused" (suspended), and
         | their state is stored on the heap so they can be resumed later.
         | When you resume a coroutine, its state is loaded back, and
         | execution continues from where it left off.
         | 
         | The complicated part comes from the interfaces through which
         | you use coroutines in C++. Each coroutine needs to be
         | associated with a promise object, which defines how the
         | coroutine behaves (for example, what happens when you co_return
         | a value). Then, there are awaiters, which define what happens
         | when you co_await them. For example, C++ provides a built-in
         | awaiter called suspend_always{}, which you can co_await to
         | pause the coroutine.
         | 
         | If you take your time and go thoroughly through the blog and
         | Cppreference, you'll definitely get the hang of it.
         | 
         | Hope this helps.
         | 
         | [1] https://lewissbaker.github.io/ [2]
         | https://en.cppreference.com/w/cpp/language/coroutines
        
           | loeg wrote:
           | They're just green threads with some nice syntax sugar,
           | right? Instead of an OS-level "pause" with a futex-wait or
           | sleep (resumed by the kernel scheduler), they do an
           | application-level pause and require some application-level
           | scheduler. (But coroutines can still call library or kernel
           | functions that block/sleep, breaking userspace scheduling?)
        
             | singledigits wrote:
             | Yes, exactly. Coroutines are one possible implementation of
             | green threads. Once they are scheduled/loaded on an OS
             | thread, they behave just like regular functions with their
             | own call stack. This means they can indeed call blocking
             | operations at the OS level. A possible approach to handle
             | such operations would be to wrap the blocking call, suspend
             | the coroutine, and then resume it once the operation is
             | complete, perhaps by polling(checking for completion).
        
         | jpc0 wrote:
         | Dumbed down way too far.
         | 
         | They are a function that can remember where they are in their
         | own execution so when they are called later they continue
         | execution where they left of.
         | 
         | There are many many ways of implementing that functionality,
         | C++ standard coroutines are only one such implementation.
         | 
         | What you do with them is whatever you want, it's pretty common
         | to handle IO using them but generators are also a pretty common
         | example. But that is generally high level.
         | 
         | C++ coroutines are basic building blocks and are very low
         | level, there is no executor ( rust tokio / python asyncio ) so
         | don't be worried if it seems hard to use, it is hard to use.
         | 
         | Look at std::generator for how coroutines are used to implement
         | a generator, cppcoro is also a pretty popular library that
         | builds abstractions on top of coroutines and also has some
         | executors if I remember correctly.
        
         | baq wrote:
         | imagine a virtual (green) thread which the kernel doesn't run
         | in parallel until you tell it it's ok to do so (when you
         | explicitly yield control) and then can continue from that place
         | when you explicitly tell it to.
         | 
         | you can even try to run those virtual threads on real threads.
         | much fun to be had.
        
         | Koshkin wrote:
         | One way to get a sense of coroutines is to consider the
         | behavior presented by the async/await design pattern [1], where
         | 'await' suspends the execution of the currently running code
         | and yields control to the 'async' task. (As an adage goes,
         | "async is not asynchronous, and await does not await
         | anything.") Yet another pattern is "promise/future", where the
         | code execution is (or may be) suspended as soon as the code
         | tries to obtain the promised result.
         | 
         | [1] https://learn.microsoft.com/en-
         | us/dotnet/csharp/asynchronous...
        
         | dxuh wrote:
         | Coroutines themselves are a really simple concept. But in
         | practice they give you all the headaches async stuff generally
         | gives you. And in C++ there is a _ton_ of extra complication,
         | especially because there is no support library. I wrote this in
         | a tutorial a while ago:
         | 
         | > they are functions that can suspend themselves, meaning they
         | stop themselves without returning, even in the middle of their
         | body and then can later can be resumed, continuing execution at
         | the point they suspended from earlier.
         | 
         | If you want to use coroutines in C++ specifically you can have
         | a look at this tutorial, if you want:
         | https://theshoemaker.de/posts/yet-another-cpp-coroutine-tuto...
         | I don't know of anyone that read it, but I spent a lot of time
         | on it.
         | 
         | It essentially tries to explain how to build a coroutine
         | support library yourself, but if you don't care about that,
         | skip it and just use libcoro or cppcoro. They have examples
         | too. My little async io library has some examples as well if
         | you want to get an idea.
        
         | marhee wrote:
         | Coroutines are just multiple call stacks. If coroutine A calls
         | coroutine B then B excutes on its on stack and can 'yield' a
         | value back to A. Yielding is just a return across stacks
         | without destroying the current stack. So A continued with the
         | yielded valie on its own stack and when ready calls B again
         | which continues on its own stack with the next statement after
         | the previous yield. Etc.
         | 
         | Notice that this does not necessarily involve parallelism,
         | although it can. For example, Lua has non parallel
         | (cooperative) co-routines. Go had parallel coroutins, called
         | goroutines, but theoretically only if they they use channels to
         | exchange values. Otherwise, if they're not exchanging
         | information they would not becoroutins in the sense that they
         | work together in solving something.
        
         | binary132 wrote:
         | I didn't understand C++ coroutines until I learned to use Lua
         | coroutines. It's basically not that different from gotos, if
         | goto saved local state.
        
         | IshKebab wrote:
         | Yeah there aren't many good resources on it unfortunately. One
         | thing to note is C++20's coroutine support is _really_ low
         | level. It 's designed for library authors so that they can
         | build the kinds of things "normal" people want - tasks,
         | generators, futures, promises, etc.
         | 
         | This video is the best intro I've found. It actually explains
         | what is happening in memory, which is the only way to really
         | understand anything in C++.
         | 
         | https://youtu.be/aibjUHx7vew
         | 
         | Also this is decent:
         | 
         | https://www.scs.stanford.edu/~dm/blog/c++-coroutines.html
         | 
         | But don't try and write a coroutine library yourself. Use
         | something like libcoro.
        
       | germandiago wrote:
       | How is this library different from Boost.Cobalt and cppcoro?
        
         | singledigits wrote:
         | Thank you for your question.
         | 
         | I've included a link to Lewis Baker's blog (the author of
         | CppCoro) in my repository as an excellent explanation of
         | coroutines. From my understanding, after reviewing his library,
         | it is no longer in active development and hasn't been updated
         | for a couple of years. CppCoro was an experimental library
         | intended to explore coroutines while they were still an
         | experimental feature. For example, CppCoro uses a custom type
         | for storing values, similar to std::optional from the standard
         | library (if I'm not mistaken).
         | 
         | For my implementation, I've opted to leverage std::expected
         | from C++23 for storing values. I've also implemented monadic-
         | like chaining. CppCoro, however, seems to focus more on
         | asynchronous operations, whereas my library focuses more on
         | task-based parallelism.
         | 
         | I don't have experience with Boost.Cobalt, so I can't provide
         | insights there, but I will definitely look into it now that
         | you've mentioned it.
         | 
         | Hope this helps.
        
         | feverzsj wrote:
         | I think Op's lib is for fork-join style parallel algorithms.
         | It's like TBB but is based on continuation stealing.
         | Boost.Cobalt and cppcoro are general coroutine libs. They are
         | mostly used for async IO programming.
        
       | neonsunset wrote:
       | This looks _exactly_ like .NET 's task abstraction.
       | 
       | If it works anywhere near as good, I'm definitely giving this a
       | try next time I need to work on a C++ project. Thanks!
        
         | pjmlp wrote:
         | As historical note for those that don't follow C++, C++20 co-
         | routines grew up from the work done with asynchronous
         | programming on WinRT for C# and C++, inspired by Midori and
         | .NET async/await.
         | 
         | Most of the magic methods expected by C++ compilers in
         | awaitable types, are also present in the structured typing used
         | by C# for awaitables.
         | 
         | The preview implementation for VC++ and clang were done by a
         | Microsoft employee, Gor Nishanov, his talks are always quite
         | interesting.
        
         | singledigits wrote:
         | Hopefully, you find it useful! If you have any ideas or
         | suggestions for improvement, feel free to open an issue or let
         | me know. Thanks for considering it!
        
       | viralsink wrote:
       | Is there a way to prevent callback hell in C++ when doing
       | asynchronous communication with C++ before 20? Coroutines seem to
       | be the only clean solution. Promises can work, but they tend to
       | be difficult to reason about if branching is involved.
        
         | darknavi wrote:
         | Traditionally the way to prevent "callback hell" is to use
         | something like async/await syntax. Without that there aren't a
         | ton of good options. Like you mentioned, you could switch to
         | promises with polling.
        
         | gpderetta wrote:
         | There are library-only stackful coroutines options, like in
         | boost.
        
       | gsliepen wrote:
       | It would be nice if there was a function to wait for tasks and to
       | return the results at the same time, so that you could write
       | something like:                   auto [a, b] = co_await
       | coros::wait_tasks(fib(n - 1), fib(n - 2));         return a + b;
        
         | singledigits wrote:
         | Thank you for your feedback.
         | 
         | I understand that working with tasks and retrieving values can
         | feel a bit clunky. The main reason I've structured it this way
         | is that individual tasks are RAII objects, and their coroutine
         | state is destroyed once they go out of scope. However, I could
         | modify the awaitable returned from wait_tasks to store tasks,
         | and then return values directly to the user. This could
         | definitely be a more ergonomic overload for the function. I'll
         | look into it!
        
         | fooker wrote:
         | If you need this interface, use threads.
        
       | leeter wrote:
       | Looks like a good start. I'm not actually sure I'd use it on
       | windows however. CPPWinrt has a really decent coroutine support
       | library with tools like winrt::resume_background() [1], I use it
       | extensively even in desktop apps because it makes using the
       | windows threadpool (which is active by default for all windows
       | processes since at least windows 7) trivial. I've basically moved
       | most of my threading code onto that unless I need a dedicated
       | thread to hold a context for some reason. But, that's a windows
       | specific thing as far as I know.
       | 
       | [1] https://learn.microsoft.com/en-us/uwp/cpp-ref-for-
       | winrt/resu...
        
         | singledigits wrote:
         | Thank you for your response!
         | 
         | I don't have experience with WinRT, but it does seem quite
         | similar at first glance. One of the key reasons I focused on
         | modern C++ was to ensure cross-platform compatibility. However,
         | I completely understand that if you're working on Windows and
         | are already familiar with WinRT, sticking with it makes perfect
         | sense. I'll take a closer look at WinRT to see if there are any
         | significant differences.
        
           | leeter wrote:
           | My suggestion is aim for compatibility with cppwinrt, but not
           | anything else. That way devs can freely intermix and get the
           | best of the utilities of both.
        
       | cxx wrote:
       | This looks very promising, it's refreshing to see a library with
       | a sane interface.
       | 
       | One thing I'd like to see is the possibility to run the
       | coroutines in the main thread, without spawning any new threads
       | in the thread pool. It might seem strange but sometimes you just
       | need to do I/O stuff concurrently in a place where you're not
       | allowed to spawn other threads.
       | 
       | Other than that congrats on the release, I hope you keep working
       | on it!
        
       | OnlyMortal wrote:
       | Have you had a look at SeaStar and how it works with coroutines?
        
       ___________________________________________________________________
       (page generated 2024-09-25 23:01 UTC)