[HN Gopher] Coroutines in C (2000)
       ___________________________________________________________________
        
       Coroutines in C (2000)
        
       Author : ColinWright
       Score  : 164 points
       Date   : 2024-02-25 16:45 UTC (6 hours ago)
        
 (HTM) web link (www.chiark.greenend.org.uk)
 (TXT) w3m dump (www.chiark.greenend.org.uk)
        
       | pieterr wrote:
       | From the same author: Simon Tatham's Portable Puzzle Collection
       | 
       | https://www.chiark.greenend.org.uk/~sgtatham/puzzles/
        
       | anymouse123456 wrote:
       | I assume I'm missing that this a joke, it's honestly hard for me
       | to tell.
       | 
       | But in the conclusion, the author talks about actually making
       | this work by providing a context object to hold all of the
       | intermediate state and providing this context object to the
       | callee.
       | 
       | Once this is required, how does this approach compare to simply
       | using an external iterator?
       | 
       | Seems to me like an iterator solves the lion's share of the
       | problem here. It moves the state into the caller's stack (or
       | above them), it's easy to understand, simple to implement and
       | doesn't involve unenclosed and context-dependent macros.
        
         | syncurrent wrote:
         | Proto-Activities have this context to store the state in the
         | caller.
         | 
         | https://github.com/frameworklabs/proto_activities
        
         | ajross wrote:
         | > Seems to me like an iterator solves the lion's share of the
         | problem here.
         | 
         | Iterator APIs are indeed aimed at the same kind of problem, but
         | they're not the same solution. And often they're harder to
         | write. If you have a component with a big list of stuff, it's
         | generally easier to write and understand the idea of "iterate
         | over my big list of stuff and emit one item at a time to my
         | consumer" than it is "what state do I need to remember such
         | that when I get called again I can emit the correct next item
         | given the one I just emitted?".
         | 
         | Coroutines are a way of expressing the former. Iterators are
         | the latter. If all you do is write the outer loop, iterators
         | are absolutely just as good. If you need to write the iterator
         | itself, it's more of a discussion.
        
         | btilly wrote:
         | Why would you assume that this is a joke?
         | 
         | C (particularly back when this was written) was a low level
         | language. You could not simply use an external iterator - they
         | didn't exist. And if you try to roll your own, you'll wind up
         | dealing with a lot of complications around resource management
         | in a language which lacks basic memory management.
         | 
         | But the proof is in the pudding. Back then it was common to
         | want to telnet into a Unix machine from Windows. And the only
         | two solutions that worked well enough to consider were
         | installing Cygwin, or installing PuTTY. Cygwin was better if
         | you needed a Unix environment on your Windows machine.
         | Otherwise PuTTY was your answer. As the article comments, PuTTY
         | was written with this technique.
         | 
         | When you've solved a problem that a lot of people had, and your
         | solution is widely acknowledged as the best one out there,
         | people get interested in how you think it should be solved.
         | Which is why this article interested me when I first saw in
         | many years ago on Slashdot.
         | 
         | So absolutely not a joke.
        
       | rwmj wrote:
       | Coroutines are fun, but in real code please consider using actual
       | threads. Modern processors have many cores, but coroutines will
       | (often) only use a single core.
       | 
       | Edit to add: This is a real world problem too. Until recently
       | qemu, which extensively uses coroutines, would put a lot of its
       | block device I/O through a single thread. This caused some
       | performance issues. Kevin Wolf and others have spent years of
       | effort fixing this so modern qemu will use many threads for I/O
       | (this work will appear in RHEL 9.4).
        
         | Quekid5 wrote:
         | Not just that, but the scaling problems with threads are
         | usually massively overstated. It's true that thread switching
         | has quite a bit more overhead, but it's been optimized a _lot_
         | since the bad old days of 15+ years ago. (Plus, unless you 're
         | using a massive number of threads it's very unlikely that
         | thread switching is going to be your bottleneck.)
        
           | jeffreygoesto wrote:
           | Unless you're on QNX 7 of course...
        
         | ajross wrote:
         | > coroutines will (often) only use a single core
         | 
         | That's generally the desired behavior. If you have decoupled,
         | parallel workloads they're going to naturally be working on
         | disjoint data. The idea behind coroutines is that you have some
         | kind of _local_ workload with synchronous data that, for
         | whatever reason, is easiest to express  "inside out" with a
         | function that gets to loop over something and "push" the
         | results to its abstracted consumer whose code lives somewhere
         | else, vs. the natural functional paradigm where the inner loop
         | is a conceptual "pull" controlled by the caller.
        
           | repelsteeltje wrote:
           | Thank you for eloquently expressing an observation I probably
           | should have learned years ago.
        
         | pengaru wrote:
         | There's often a sweet spot to be had in mixing threads and
         | coroutines, where you have a coroutine scheduler instance per
         | thread, and a thread created per core.
         | 
         | Then rarely, if ever, migrate coroutines across schedulers, and
         | rarely, if ever, share data between coroutines on different
         | schedulers.
         | 
         | Coroutines can enable an ergonomic concurrent programming style
         | while avoiding the need for any locking at all via cooperative
         | scheduling. You generally end up with higher scheduling
         | latencies, but potentially quite high throughput by removing
         | any need for atomics/locking overheads, and no timer constantly
         | interrupting execution for preemptive scheduling.
        
           | rwmj wrote:
           | Right, that's what qemu has ended up with.
        
         | c-smile wrote:
         | > please consider using actual threads.
         | 
         | Bad advice in general.
         | 
         | Why would you run separate thread if you only want is to
         | iterate over nodes in a tree (as an example of non flat
         | collection).
        
           | cylon13 wrote:
           | It's never bad advice to consider something.
        
             | klyrs wrote:
             | To the contrary, consideration takes time, and rules of
             | thumb are valuable to mitigate overthinking.
        
             | a1369209993 wrote:
             | No, it's _frequently_ bad advice to consider something. See
             | eg https://www.xkcd.com/1445/.
        
           | rwmj wrote:
           | Real world and toy examples are very different. The example
           | isn't like what people are using coroutines for in the real
           | world. I'd urge you to look at how coroutines are used for
           | inversion of control (quite correctly) in qemu.
        
         | DinaCoder99 wrote:
         | That seems like an orthogonal concern to structuring control
         | flow, though it is much more difficult if you intend to use
         | coroutines across multiple threads. There's nothing stopping
         | you from using both threading and coroutines.
        
         | hgs3 wrote:
         | > Coroutines are fun, but in real code please consider using
         | actual threads.
         | 
         | Coroutines are lightweight and trivial to synchronize. They are
         | perfect for small bits of incremental computation, like
         | iterators and tokenizers. Maybe you're thinking of green
         | threads?
        
         | ot1138 wrote:
         | This is out of the question for real time apps. Co-routines are
         | an elegant solution to implement cooperative multitasking in
         | such cases.
        
         | narag wrote:
         | _Coroutines are fun, but in real code please consider using
         | actual threads. Modern processors have many cores, but
         | coroutines will (often) only use a single core._
         | 
         | Threads and coroutines have different purposes. Coroutines are
         | more about logical structure.
        
         | samatman wrote:
         | The only connection between threads and coroutines is that some
         | single-threaded language runtimes only have coroutines, so you
         | might occasionally use them where threads would be a better
         | choice.
         | 
         | Coroutines are a way of structuring single-threaded execution,
         | and a useful one. The example in the Fine Article of a
         | producer-consumer pattern is a good one, attaching a stream to
         | a parser isn't a parallel algorithm so threads are useless for
         | writing it.
         | 
         | Naturally, using a single-threaded paradigm for work which
         | could be performed in parallel is inefficient, but coroutines
         | aren't a poor man's parallelism, they're a control structure
         | which functions on its own terms. They can be combined
         | productively with threads, such as using an event loop in a web
         | server to thread (as in needle) coroutines through various
         | blocking events with a dispatcher, and the runtime can spin up
         | a thread per core to parallelize this, which reduces per-thread
         | coordination to checking the depth of each thread's work queue
         | and farming the request to the least congested one.
        
           | Thorrez wrote:
           | > attaching a stream to a parser isn't a parallel algorithm
           | so threads are useless for writing it.
           | 
           | Couldn't it be done in 2 threads? The output of the
           | decompressor thread feeds to the input of the parser thread.
        
         | tiberius_p wrote:
         | Coroutines are good for modelling concurrency which is
         | different from parallelism. Concurrency is useful for
         | abstraction and expressiveness. Parallelism is useful for
         | making your code run faster by running parts of it in parallel
         | on multiple cores. You could make concurrent programs run
         | faster on multiple cores by distributing the coroutines which
         | don't share state on multiple working threads in a thread pool,
         | thus mixing concurrency and parallelism...but they are still
         | two different things with different purposes.
        
       | omoikane wrote:
       | > no commonly used high level language supports the coroutine
       | 
       | This might have been the case back in 2000, but these days many
       | languages do support it, including C++20, Lua, Python, Ruby, etc.
        
         | perbu wrote:
         | fwiw, Simula67 had coroutines. Not the first to do so, but IIRC
         | it was the first major language to do so.
        
       | paulddraper wrote:
       | (2000)
        
       | DriftRegion wrote:
       | I've found myself at this webpage multiple times while trying to
       | minimize the complexity of APIs in my C projects. I think it does
       | a lovely job explaining control flow and it has helped me to
       | think more explicitly about storage of state on and off the stack
       | as well as the readability consequences of different approaches.
       | 
       | My conclusion for now is that the choice to use C coroutines is
       | best left to the library user. For example: Mongoose
       | (https://github.com/cesanta/mongoose) uses event callbacks to
       | deal with asynchronousness. It is much more pleasant to wrap a
       | library like this in whatever thread/task primitives your system
       | has rather than try to port the mythical cross-platform C
       | couroutine or worse, std::thread.
        
         | chongli wrote:
         | It's Simon Tatham's website. He's well known for being the
         | author of PuTTY [1] and his puzzle collection [2]!
         | 
         | [1] https://www.chiark.greenend.org.uk/~sgtatham/putty/
         | 
         | [2] https://www.chiark.greenend.org.uk/~sgtatham/puzzles/
        
           | hnfong wrote:
           | I've known about the two projects for literally 20+ years,
           | but wow I never knew it was the same person behind them....
        
           | abhgh wrote:
           | Oh wow... I have had the Android port of his puzzles (your
           | second reference links to it) on my phone for a while. Had no
           | idea the developer of Putty had anything to do with it!
        
       | DenisM wrote:
       | Setjmp/longjump are the built-in coroutines in C, no?
        
         | dkjaudyeqooe wrote:
         | These are stackless coroutines, if you use longjump you have to
         | create a stack for the coroutine.
         | 
         | There are pros and cons for each style.
        
         | ajross wrote:
         | You can absolutely build coroutines out of a generalized
         | context switch. So yes, in some sense. But note that the linked
         | article doesn't use setjmp/longjmp, which is what makes it so
         | clever.
         | 
         | FWIW: would I personally actually use this trick? Almost
         | certainly not. C APIs aren't well suited to that level of
         | abstraction IMHO, if you have an app that needs it leave the C
         | stuff to the stuff C is good at and wrap a C++ or Rust or
         | whatever layer on top for the subtleties.
        
         | fweimer wrote:
         | Some longjmp implementations unwind the stack, so they can't be
         | used for coroutine switching. Even if it works (it's
         | technically undefined), you need to get a suitable stack from
         | somewhere.
         | 
         | The next issue is that usually, applications want to resume
         | coroutines on a thread different from the one it on which it
         | was suspended. That runs into trouble because on some systems,
         | compilers cache the address of thread-local variables in the
         | local stack frame, assuming that the thread does not switch in
         | a function mid-execution.
        
           | DenisM wrote:
           | The only platform I've seen stack unwind was VAX/VMS :)
           | 
           | But yes, you do need to allocate the stack which could take
           | up a lot of ram.
           | 
           | It's odd not to mention it in the article though.
        
             | fweimer wrote:
             | Current glibc unwinds the shadow stack if it is active: htt
             | ps://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86..
             | .
             | 
             | It makes longjmp useless for coroutine switching, although
             | it does not result in other effects of stack unwinding
             | (such as invoking C++ destructors).
             | 
             | On Windows, longjmp really unwinds the stack (and maybe
             | this is something influenced by VMS):
             | https://learn.microsoft.com/en-us/cpp/c-runtime-
             | library/refe... "In Microsoft C++ code on Windows, longjmp
             | uses the same stack-unwinding semantics as exception-
             | handling code. It's safe to use in the same places that C++
             | exceptions can be raised."
        
               | DenisM wrote:
               | Well, things have changed since I looked last. Thanks for
               | explaining.
               | 
               | FWIW, back in the nineties we just wrote our own
               | setjmp/longjmp for VMS to avoid stack unwind - save
               | registers / restore registers. We used it to implement
               | coroutines in Modula 2, iirc.
        
         | josephcsible wrote:
         | No. The C standard says this about longjmp: "if the function
         | containing the invocation of the setjmp macro has terminated
         | execution in the interim [...] the behavior is undefined". So
         | while you can longjmp out of functions, you can't longjmp back
         | into them.
        
       | gkbrk wrote:
       | I've used this for some embedded/IoT projects before. They work
       | really well.
        
       | c-smile wrote:
       | C++ version of the approach:
       | https://www.codeproject.com/Tips/29524/Generators-in-C
       | 
       | I am using this in my Sciter, just in case. Works quite well and
       | convenient.
        
       | vinay_ys wrote:
       | Ah, this page again! it's been more than two decades? since I saw
       | this page last? It was fun to learn about coroutines from the
       | author of PuttY the ssh client of choice on Windows those days.
        
       | utopcell wrote:
       | Coroutines. What a lovely concept! It's a joy to watch all the
       | CppCon videos about C++ coroutines, primarily by Microsoft folks.
       | "Negative-cost abstraction" is such a nice hook phrase.
       | 
       | Friends at Meta mentioned to me a couple years ago that they
       | started using c++ coroutines, which ended up being a big mistake
       | because they had to face compiler implementation bugs, which must
       | have been nasty to track down. At Google, we are eagerly waiting
       | for the brilliant folks that are working on properly integrating
       | them in google3/ to tell us when the time has come to use them.
       | 
       | This article uses Duff's device [1] to motivate structured gotos
       | via macros as an implementation strategy for C coroutines. Duff
       | wanted to loop-unroll this:                   do {
       | *to = *from++;         } while (--count > 0);
       | 
       | which he did in this way (shortened for brevity) :
       | int n = (count + 3) / 4;         switch (count % 4) {
       | case 0: do { *to = *from++;         case 3:      *to = *from++;
       | case 2:      *to = *from++;         case 1:      *to = *from++;
       | } while (--n > 0);         }
       | 
       | That is to say, he realized that he could use `case` statements
       | (almost) anywhere in a `switch` block. The connection with
       | coroutines is simple: One can wrap the whole function body with a
       | switch statement, use a static variable for holding the location
       | of the latest coroutine return, and label all co-returns with a
       | `case` statement:                 #define coBegin static int
       | state = 0; switch (state) { case 0:       #define coReturn(x) do
       | { state = __LINE_; return x; case __LINE:; } while (0)
       | #define coFinish }            int function(void) {
       | static int i;  // function state can't be local anymore.
       | coBegin;           for (i = 0; i < 10; ++i)
       | coReturn(i);           coFinish;       }
       | 
       | Sustrik's take on C coroutines might also be an interesting read
       | [2].
       | 
       | [1] https://en.wikipedia.org/wiki/Duff%27s_device
       | 
       | [2] https://250bpm.com/blog:48/index.html
        
         | dividuum wrote:
         | The alternative is to use the ,,labels as values" feature of
         | GCC. You can take the address of a label and later jump to it.
         | I contributed the code that's now in lc-addrlabels.h back in
         | 2005 :-)
         | 
         | I also used the GCC local labels feature to completely avoid
         | using __LINE__ anywhere, so you could have multiple coReturns
         | in a single code line:
         | 
         | #define LC_SET(s) do { ({ __label__ resume; resume: (s) =
         | &&resume; }); }while(0)
        
           | bxparks wrote:
           | Definitely, "labels as values" (aka "computed gotos",
           | https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html) is
           | so much better than Duff's device.
           | 
           | Unfortunately, computed-gotos is not a C language standard. I
           | don't understand why. I think FORTRAN had it in the 60s. It
           | is so useful in some situations, like a coroutine, or a byte-
           | code interpreter. Is it because some obscure DSP chip with a
           | sizeof(char)==32 using 1's complement arithmetic can't
           | support it? Then maybe make it implementation-defined and
           | allow the rest of the world get nice things.
        
         | benlivengood wrote:
         | Oh come on, just rewrite it all in Go! It should only be a few
         | billion line CR. Your SREs will thank you (eventually).
        
         | Scubabear68 wrote:
         | Ah the C pre-processor, the gift that keeps on giving after all
         | these years :-(
        
         | codemac wrote:
         | As someone who moved from google3 -> fbcode in the last few
         | years, I think there are weird upsides AND downsides to having
         | async code littered through your C++ (aka co_yield, co_return,
         | co_await, etc).
         | 
         | The advantage, compared to the internal stuff google3 was
         | using, was that as you read code, the async nature of various
         | parts was obvious. Some programmers at G would spend entire
         | quarters+ not knowing what the threading model was, and cause
         | serious bugs in retrospect.
         | 
         | The disadvantage is actually much dumber - a lot of code
         | "could" be async, and over time becomes entirely async because
         | that's the mode the programmer is in when writing the program.
         | 
         | The choice to use a spinlock vs. a mutex w/yields should be one
         | based on the size of the critical section and the threading
         | going on at the time. Unfortunately to make code more
         | readable/uniform/etc you end up with entire projects doing one
         | or the other.
         | 
         | I'd love to learn more about language implementations of
         | threading that do not default either way, but instead could
         | take a profile of the previous run, and make the next run more
         | optimal, without having to change the code or causing bugs.
        
       | dang wrote:
       | Related:
       | 
       |  _Coroutines in C (2000)_ -
       | https://news.ycombinator.com/item?id=37357673 - Sept 2023 (1
       | comment)
       | 
       |  _Coroutines in C (2000)_ -
       | https://news.ycombinator.com/item?id=36639879 - July 2023 (2
       | comments)
       | 
       |  _Coroutines in C_ -
       | https://news.ycombinator.com/item?id=23293835 - May 2020 (1
       | comment)
       | 
       |  _Coroutines in C (2000)_ -
       | https://news.ycombinator.com/item?id=19106796 - Feb 2019 (59
       | comments)
       | 
       |  _Coroutines in C, revisited_ -
       | https://news.ycombinator.com/item?id=13199245 - Dec 2016 (36
       | comments)
       | 
       |  _Coroutines in C_ -
       | https://news.ycombinator.com/item?id=13138673 - Dec 2016 (1
       | comment)
       | 
       |  _Coroutines in C (2000)_ -
       | https://news.ycombinator.com/item?id=11051004 - Feb 2016 (11
       | comments)
       | 
       |  _Show HN: Libconcurrent - Coroutines in C_ -
       | https://news.ycombinator.com/item?id=10887071 - Jan 2016 (24
       | comments)
       | 
       |  _Coroutines in C with Arbitrary Arguments_ -
       | https://news.ycombinator.com/item?id=9402314 - April 2015 (22
       | comments)
       | 
       |  _Coroutines in C (2000)_ -
       | https://news.ycombinator.com/item?id=8615501 - Nov 2014 (27
       | comments)
       | 
       |  _Coroutines in C (2000)_ -
       | https://news.ycombinator.com/item?id=6244994 - Aug 2013 (1
       | comment)
       | 
       |  _Coroutines in one page of C_ -
       | https://news.ycombinator.com/item?id=6243946 - Aug 2013 (60
       | comments)
       | 
       |  _Coroutines in C (Simon Tatham, 2000)_ -
       | https://news.ycombinator.com/item?id=1380044 - May 2010 (16
       | comments)
       | 
       |  _Coroutines in C_ - https://news.ycombinator.com/item?id=835849
       | - Sept 2009 (16 comments)
       | 
       |  _Co-routines in C_ - https://news.ycombinator.com/item?id=794157
       | - Aug 2009 (1 comment)
        
         | anfilt wrote:
         | Bunki, a C Coroutine library
         | https://news.ycombinator.com/item?id=35133095
        
       | mtlmtlmtlmtl wrote:
       | If you think this is some C black magic, try reading this by the
       | same author on creating arbitrary control structures with macros:
       | https://www.chiark.greenend.org.uk/~sgtatham/mp/
        
         | o11c wrote:
         | Note that the underscore prefix thing often is still prone to
         | shadowing. You need pretty ugly mangled names to avoid that,
         | and for external-block macros (unlike expression-ish/statement-
         | ish macros) it can't be avoided with GNU's/C23's hygienic macro
         | hack.
        
       | mid-kid wrote:
       | The "switch" method isn't too uncommon, but usually people have
       | an init function and "state" pointer that's passed into the
       | coroutine function. I've used this method a lot in embedded
       | projects, where one coroutine was handling motor
       | acceleration/deceleration while the other would simply tell it
       | what direction to go, but I've also used it for networked
       | libraries[1]. Even the standard library has a coroutine function
       | like this in "strtok()"[2]
       | 
       | You don't really need to introduce macro hell for it to be
       | manageable, though I've never found reading switch/case flow to
       | be very enjoyable.
       | 
       | [1]:
       | https://github.com/REONTeam/libmobile/blob/master/relay.c#L3...
       | 
       | [2]: https://manpages.debian.org/bookworm/manpages-
       | dev/strtok.3.e...
        
       | rurban wrote:
       | Also related, the C++ lambda fuckup:
       | https://news.ycombinator.com/item?id=33084431
        
       | whiterknight wrote:
       | UNIX pipes solve this problem. Both reader and writer are driving
       | their respective process.
        
       | anfilt wrote:
       | I honestly like stackful coroutines if you don't mind allocating
       | memory for a stack.
       | 
       | https://github.com/Keith-Cancel/Bunki
        
       ___________________________________________________________________
       (page generated 2024-02-25 23:00 UTC)