[HN Gopher] Coroutines in C (2000)
___________________________________________________________________
Coroutines in C (2000)
Author : ColinWright
Score : 164 points
Date : 2024-02-25 16:45 UTC (6 hours ago)
(HTM) web link (www.chiark.greenend.org.uk)
(TXT) w3m dump (www.chiark.greenend.org.uk)
| pieterr wrote:
| From the same author: Simon Tatham's Portable Puzzle Collection
|
| https://www.chiark.greenend.org.uk/~sgtatham/puzzles/
| anymouse123456 wrote:
| I assume I'm missing that this a joke, it's honestly hard for me
| to tell.
|
| But in the conclusion, the author talks about actually making
| this work by providing a context object to hold all of the
| intermediate state and providing this context object to the
| callee.
|
| Once this is required, how does this approach compare to simply
| using an external iterator?
|
| Seems to me like an iterator solves the lion's share of the
| problem here. It moves the state into the caller's stack (or
| above them), it's easy to understand, simple to implement and
| doesn't involve unenclosed and context-dependent macros.
| syncurrent wrote:
| Proto-Activities have this context to store the state in the
| caller.
|
| https://github.com/frameworklabs/proto_activities
| ajross wrote:
| > Seems to me like an iterator solves the lion's share of the
| problem here.
|
| Iterator APIs are indeed aimed at the same kind of problem, but
| they're not the same solution. And often they're harder to
| write. If you have a component with a big list of stuff, it's
| generally easier to write and understand the idea of "iterate
| over my big list of stuff and emit one item at a time to my
| consumer" than it is "what state do I need to remember such
| that when I get called again I can emit the correct next item
| given the one I just emitted?".
|
| Coroutines are a way of expressing the former. Iterators are
| the latter. If all you do is write the outer loop, iterators
| are absolutely just as good. If you need to write the iterator
| itself, it's more of a discussion.
| btilly wrote:
| Why would you assume that this is a joke?
|
| C (particularly back when this was written) was a low level
| language. You could not simply use an external iterator - they
| didn't exist. And if you try to roll your own, you'll wind up
| dealing with a lot of complications around resource management
| in a language which lacks basic memory management.
|
| But the proof is in the pudding. Back then it was common to
| want to telnet into a Unix machine from Windows. And the only
| two solutions that worked well enough to consider were
| installing Cygwin, or installing PuTTY. Cygwin was better if
| you needed a Unix environment on your Windows machine.
| Otherwise PuTTY was your answer. As the article comments, PuTTY
| was written with this technique.
|
| When you've solved a problem that a lot of people had, and your
| solution is widely acknowledged as the best one out there,
| people get interested in how you think it should be solved.
| Which is why this article interested me when I first saw in
| many years ago on Slashdot.
|
| So absolutely not a joke.
| rwmj wrote:
| Coroutines are fun, but in real code please consider using actual
| threads. Modern processors have many cores, but coroutines will
| (often) only use a single core.
|
| Edit to add: This is a real world problem too. Until recently
| qemu, which extensively uses coroutines, would put a lot of its
| block device I/O through a single thread. This caused some
| performance issues. Kevin Wolf and others have spent years of
| effort fixing this so modern qemu will use many threads for I/O
| (this work will appear in RHEL 9.4).
| Quekid5 wrote:
| Not just that, but the scaling problems with threads are
| usually massively overstated. It's true that thread switching
| has quite a bit more overhead, but it's been optimized a _lot_
| since the bad old days of 15+ years ago. (Plus, unless you 're
| using a massive number of threads it's very unlikely that
| thread switching is going to be your bottleneck.)
| jeffreygoesto wrote:
| Unless you're on QNX 7 of course...
| ajross wrote:
| > coroutines will (often) only use a single core
|
| That's generally the desired behavior. If you have decoupled,
| parallel workloads they're going to naturally be working on
| disjoint data. The idea behind coroutines is that you have some
| kind of _local_ workload with synchronous data that, for
| whatever reason, is easiest to express "inside out" with a
| function that gets to loop over something and "push" the
| results to its abstracted consumer whose code lives somewhere
| else, vs. the natural functional paradigm where the inner loop
| is a conceptual "pull" controlled by the caller.
| repelsteeltje wrote:
| Thank you for eloquently expressing an observation I probably
| should have learned years ago.
| pengaru wrote:
| There's often a sweet spot to be had in mixing threads and
| coroutines, where you have a coroutine scheduler instance per
| thread, and a thread created per core.
|
| Then rarely, if ever, migrate coroutines across schedulers, and
| rarely, if ever, share data between coroutines on different
| schedulers.
|
| Coroutines can enable an ergonomic concurrent programming style
| while avoiding the need for any locking at all via cooperative
| scheduling. You generally end up with higher scheduling
| latencies, but potentially quite high throughput by removing
| any need for atomics/locking overheads, and no timer constantly
| interrupting execution for preemptive scheduling.
| rwmj wrote:
| Right, that's what qemu has ended up with.
| c-smile wrote:
| > please consider using actual threads.
|
| Bad advice in general.
|
| Why would you run separate thread if you only want is to
| iterate over nodes in a tree (as an example of non flat
| collection).
| cylon13 wrote:
| It's never bad advice to consider something.
| klyrs wrote:
| To the contrary, consideration takes time, and rules of
| thumb are valuable to mitigate overthinking.
| a1369209993 wrote:
| No, it's _frequently_ bad advice to consider something. See
| eg https://www.xkcd.com/1445/.
| rwmj wrote:
| Real world and toy examples are very different. The example
| isn't like what people are using coroutines for in the real
| world. I'd urge you to look at how coroutines are used for
| inversion of control (quite correctly) in qemu.
| DinaCoder99 wrote:
| That seems like an orthogonal concern to structuring control
| flow, though it is much more difficult if you intend to use
| coroutines across multiple threads. There's nothing stopping
| you from using both threading and coroutines.
| hgs3 wrote:
| > Coroutines are fun, but in real code please consider using
| actual threads.
|
| Coroutines are lightweight and trivial to synchronize. They are
| perfect for small bits of incremental computation, like
| iterators and tokenizers. Maybe you're thinking of green
| threads?
| ot1138 wrote:
| This is out of the question for real time apps. Co-routines are
| an elegant solution to implement cooperative multitasking in
| such cases.
| narag wrote:
| _Coroutines are fun, but in real code please consider using
| actual threads. Modern processors have many cores, but
| coroutines will (often) only use a single core._
|
| Threads and coroutines have different purposes. Coroutines are
| more about logical structure.
| samatman wrote:
| The only connection between threads and coroutines is that some
| single-threaded language runtimes only have coroutines, so you
| might occasionally use them where threads would be a better
| choice.
|
| Coroutines are a way of structuring single-threaded execution,
| and a useful one. The example in the Fine Article of a
| producer-consumer pattern is a good one, attaching a stream to
| a parser isn't a parallel algorithm so threads are useless for
| writing it.
|
| Naturally, using a single-threaded paradigm for work which
| could be performed in parallel is inefficient, but coroutines
| aren't a poor man's parallelism, they're a control structure
| which functions on its own terms. They can be combined
| productively with threads, such as using an event loop in a web
| server to thread (as in needle) coroutines through various
| blocking events with a dispatcher, and the runtime can spin up
| a thread per core to parallelize this, which reduces per-thread
| coordination to checking the depth of each thread's work queue
| and farming the request to the least congested one.
| Thorrez wrote:
| > attaching a stream to a parser isn't a parallel algorithm
| so threads are useless for writing it.
|
| Couldn't it be done in 2 threads? The output of the
| decompressor thread feeds to the input of the parser thread.
| tiberius_p wrote:
| Coroutines are good for modelling concurrency which is
| different from parallelism. Concurrency is useful for
| abstraction and expressiveness. Parallelism is useful for
| making your code run faster by running parts of it in parallel
| on multiple cores. You could make concurrent programs run
| faster on multiple cores by distributing the coroutines which
| don't share state on multiple working threads in a thread pool,
| thus mixing concurrency and parallelism...but they are still
| two different things with different purposes.
| omoikane wrote:
| > no commonly used high level language supports the coroutine
|
| This might have been the case back in 2000, but these days many
| languages do support it, including C++20, Lua, Python, Ruby, etc.
| perbu wrote:
| fwiw, Simula67 had coroutines. Not the first to do so, but IIRC
| it was the first major language to do so.
| paulddraper wrote:
| (2000)
| DriftRegion wrote:
| I've found myself at this webpage multiple times while trying to
| minimize the complexity of APIs in my C projects. I think it does
| a lovely job explaining control flow and it has helped me to
| think more explicitly about storage of state on and off the stack
| as well as the readability consequences of different approaches.
|
| My conclusion for now is that the choice to use C coroutines is
| best left to the library user. For example: Mongoose
| (https://github.com/cesanta/mongoose) uses event callbacks to
| deal with asynchronousness. It is much more pleasant to wrap a
| library like this in whatever thread/task primitives your system
| has rather than try to port the mythical cross-platform C
| couroutine or worse, std::thread.
| chongli wrote:
| It's Simon Tatham's website. He's well known for being the
| author of PuTTY [1] and his puzzle collection [2]!
|
| [1] https://www.chiark.greenend.org.uk/~sgtatham/putty/
|
| [2] https://www.chiark.greenend.org.uk/~sgtatham/puzzles/
| hnfong wrote:
| I've known about the two projects for literally 20+ years,
| but wow I never knew it was the same person behind them....
| abhgh wrote:
| Oh wow... I have had the Android port of his puzzles (your
| second reference links to it) on my phone for a while. Had no
| idea the developer of Putty had anything to do with it!
| DenisM wrote:
| Setjmp/longjump are the built-in coroutines in C, no?
| dkjaudyeqooe wrote:
| These are stackless coroutines, if you use longjump you have to
| create a stack for the coroutine.
|
| There are pros and cons for each style.
| ajross wrote:
| You can absolutely build coroutines out of a generalized
| context switch. So yes, in some sense. But note that the linked
| article doesn't use setjmp/longjmp, which is what makes it so
| clever.
|
| FWIW: would I personally actually use this trick? Almost
| certainly not. C APIs aren't well suited to that level of
| abstraction IMHO, if you have an app that needs it leave the C
| stuff to the stuff C is good at and wrap a C++ or Rust or
| whatever layer on top for the subtleties.
| fweimer wrote:
| Some longjmp implementations unwind the stack, so they can't be
| used for coroutine switching. Even if it works (it's
| technically undefined), you need to get a suitable stack from
| somewhere.
|
| The next issue is that usually, applications want to resume
| coroutines on a thread different from the one it on which it
| was suspended. That runs into trouble because on some systems,
| compilers cache the address of thread-local variables in the
| local stack frame, assuming that the thread does not switch in
| a function mid-execution.
| DenisM wrote:
| The only platform I've seen stack unwind was VAX/VMS :)
|
| But yes, you do need to allocate the stack which could take
| up a lot of ram.
|
| It's odd not to mention it in the article though.
| fweimer wrote:
| Current glibc unwinds the shadow stack if it is active: htt
| ps://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86..
| .
|
| It makes longjmp useless for coroutine switching, although
| it does not result in other effects of stack unwinding
| (such as invoking C++ destructors).
|
| On Windows, longjmp really unwinds the stack (and maybe
| this is something influenced by VMS):
| https://learn.microsoft.com/en-us/cpp/c-runtime-
| library/refe... "In Microsoft C++ code on Windows, longjmp
| uses the same stack-unwinding semantics as exception-
| handling code. It's safe to use in the same places that C++
| exceptions can be raised."
| DenisM wrote:
| Well, things have changed since I looked last. Thanks for
| explaining.
|
| FWIW, back in the nineties we just wrote our own
| setjmp/longjmp for VMS to avoid stack unwind - save
| registers / restore registers. We used it to implement
| coroutines in Modula 2, iirc.
| josephcsible wrote:
| No. The C standard says this about longjmp: "if the function
| containing the invocation of the setjmp macro has terminated
| execution in the interim [...] the behavior is undefined". So
| while you can longjmp out of functions, you can't longjmp back
| into them.
| gkbrk wrote:
| I've used this for some embedded/IoT projects before. They work
| really well.
| c-smile wrote:
| C++ version of the approach:
| https://www.codeproject.com/Tips/29524/Generators-in-C
|
| I am using this in my Sciter, just in case. Works quite well and
| convenient.
| vinay_ys wrote:
| Ah, this page again! it's been more than two decades? since I saw
| this page last? It was fun to learn about coroutines from the
| author of PuttY the ssh client of choice on Windows those days.
| utopcell wrote:
| Coroutines. What a lovely concept! It's a joy to watch all the
| CppCon videos about C++ coroutines, primarily by Microsoft folks.
| "Negative-cost abstraction" is such a nice hook phrase.
|
| Friends at Meta mentioned to me a couple years ago that they
| started using c++ coroutines, which ended up being a big mistake
| because they had to face compiler implementation bugs, which must
| have been nasty to track down. At Google, we are eagerly waiting
| for the brilliant folks that are working on properly integrating
| them in google3/ to tell us when the time has come to use them.
|
| This article uses Duff's device [1] to motivate structured gotos
| via macros as an implementation strategy for C coroutines. Duff
| wanted to loop-unroll this: do {
| *to = *from++; } while (--count > 0);
|
| which he did in this way (shortened for brevity) :
| int n = (count + 3) / 4; switch (count % 4) {
| case 0: do { *to = *from++; case 3: *to = *from++;
| case 2: *to = *from++; case 1: *to = *from++;
| } while (--n > 0); }
|
| That is to say, he realized that he could use `case` statements
| (almost) anywhere in a `switch` block. The connection with
| coroutines is simple: One can wrap the whole function body with a
| switch statement, use a static variable for holding the location
| of the latest coroutine return, and label all co-returns with a
| `case` statement: #define coBegin static int
| state = 0; switch (state) { case 0: #define coReturn(x) do
| { state = __LINE_; return x; case __LINE:; } while (0)
| #define coFinish } int function(void) {
| static int i; // function state can't be local anymore.
| coBegin; for (i = 0; i < 10; ++i)
| coReturn(i); coFinish; }
|
| Sustrik's take on C coroutines might also be an interesting read
| [2].
|
| [1] https://en.wikipedia.org/wiki/Duff%27s_device
|
| [2] https://250bpm.com/blog:48/index.html
| dividuum wrote:
| The alternative is to use the ,,labels as values" feature of
| GCC. You can take the address of a label and later jump to it.
| I contributed the code that's now in lc-addrlabels.h back in
| 2005 :-)
|
| I also used the GCC local labels feature to completely avoid
| using __LINE__ anywhere, so you could have multiple coReturns
| in a single code line:
|
| #define LC_SET(s) do { ({ __label__ resume; resume: (s) =
| &&resume; }); }while(0)
| bxparks wrote:
| Definitely, "labels as values" (aka "computed gotos",
| https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html) is
| so much better than Duff's device.
|
| Unfortunately, computed-gotos is not a C language standard. I
| don't understand why. I think FORTRAN had it in the 60s. It
| is so useful in some situations, like a coroutine, or a byte-
| code interpreter. Is it because some obscure DSP chip with a
| sizeof(char)==32 using 1's complement arithmetic can't
| support it? Then maybe make it implementation-defined and
| allow the rest of the world get nice things.
| benlivengood wrote:
| Oh come on, just rewrite it all in Go! It should only be a few
| billion line CR. Your SREs will thank you (eventually).
| Scubabear68 wrote:
| Ah the C pre-processor, the gift that keeps on giving after all
| these years :-(
| codemac wrote:
| As someone who moved from google3 -> fbcode in the last few
| years, I think there are weird upsides AND downsides to having
| async code littered through your C++ (aka co_yield, co_return,
| co_await, etc).
|
| The advantage, compared to the internal stuff google3 was
| using, was that as you read code, the async nature of various
| parts was obvious. Some programmers at G would spend entire
| quarters+ not knowing what the threading model was, and cause
| serious bugs in retrospect.
|
| The disadvantage is actually much dumber - a lot of code
| "could" be async, and over time becomes entirely async because
| that's the mode the programmer is in when writing the program.
|
| The choice to use a spinlock vs. a mutex w/yields should be one
| based on the size of the critical section and the threading
| going on at the time. Unfortunately to make code more
| readable/uniform/etc you end up with entire projects doing one
| or the other.
|
| I'd love to learn more about language implementations of
| threading that do not default either way, but instead could
| take a profile of the previous run, and make the next run more
| optimal, without having to change the code or causing bugs.
| dang wrote:
| Related:
|
| _Coroutines in C (2000)_ -
| https://news.ycombinator.com/item?id=37357673 - Sept 2023 (1
| comment)
|
| _Coroutines in C (2000)_ -
| https://news.ycombinator.com/item?id=36639879 - July 2023 (2
| comments)
|
| _Coroutines in C_ -
| https://news.ycombinator.com/item?id=23293835 - May 2020 (1
| comment)
|
| _Coroutines in C (2000)_ -
| https://news.ycombinator.com/item?id=19106796 - Feb 2019 (59
| comments)
|
| _Coroutines in C, revisited_ -
| https://news.ycombinator.com/item?id=13199245 - Dec 2016 (36
| comments)
|
| _Coroutines in C_ -
| https://news.ycombinator.com/item?id=13138673 - Dec 2016 (1
| comment)
|
| _Coroutines in C (2000)_ -
| https://news.ycombinator.com/item?id=11051004 - Feb 2016 (11
| comments)
|
| _Show HN: Libconcurrent - Coroutines in C_ -
| https://news.ycombinator.com/item?id=10887071 - Jan 2016 (24
| comments)
|
| _Coroutines in C with Arbitrary Arguments_ -
| https://news.ycombinator.com/item?id=9402314 - April 2015 (22
| comments)
|
| _Coroutines in C (2000)_ -
| https://news.ycombinator.com/item?id=8615501 - Nov 2014 (27
| comments)
|
| _Coroutines in C (2000)_ -
| https://news.ycombinator.com/item?id=6244994 - Aug 2013 (1
| comment)
|
| _Coroutines in one page of C_ -
| https://news.ycombinator.com/item?id=6243946 - Aug 2013 (60
| comments)
|
| _Coroutines in C (Simon Tatham, 2000)_ -
| https://news.ycombinator.com/item?id=1380044 - May 2010 (16
| comments)
|
| _Coroutines in C_ - https://news.ycombinator.com/item?id=835849
| - Sept 2009 (16 comments)
|
| _Co-routines in C_ - https://news.ycombinator.com/item?id=794157
| - Aug 2009 (1 comment)
| anfilt wrote:
| Bunki, a C Coroutine library
| https://news.ycombinator.com/item?id=35133095
| mtlmtlmtlmtl wrote:
| If you think this is some C black magic, try reading this by the
| same author on creating arbitrary control structures with macros:
| https://www.chiark.greenend.org.uk/~sgtatham/mp/
| o11c wrote:
| Note that the underscore prefix thing often is still prone to
| shadowing. You need pretty ugly mangled names to avoid that,
| and for external-block macros (unlike expression-ish/statement-
| ish macros) it can't be avoided with GNU's/C23's hygienic macro
| hack.
| mid-kid wrote:
| The "switch" method isn't too uncommon, but usually people have
| an init function and "state" pointer that's passed into the
| coroutine function. I've used this method a lot in embedded
| projects, where one coroutine was handling motor
| acceleration/deceleration while the other would simply tell it
| what direction to go, but I've also used it for networked
| libraries[1]. Even the standard library has a coroutine function
| like this in "strtok()"[2]
|
| You don't really need to introduce macro hell for it to be
| manageable, though I've never found reading switch/case flow to
| be very enjoyable.
|
| [1]:
| https://github.com/REONTeam/libmobile/blob/master/relay.c#L3...
|
| [2]: https://manpages.debian.org/bookworm/manpages-
| dev/strtok.3.e...
| rurban wrote:
| Also related, the C++ lambda fuckup:
| https://news.ycombinator.com/item?id=33084431
| whiterknight wrote:
| UNIX pipes solve this problem. Both reader and writer are driving
| their respective process.
| anfilt wrote:
| I honestly like stackful coroutines if you don't mind allocating
| memory for a stack.
|
| https://github.com/Keith-Cancel/Bunki
___________________________________________________________________
(page generated 2024-02-25 23:00 UTC)