[HN Gopher] Async I/O for Dummies (2018)
___________________________________________________________________
Async I/O for Dummies (2018)
Author : mpweiher
Score : 37 points
Date : 2021-12-05 18:12 UTC (4 hours ago)
(HTM) web link (www.alwaysrightinstitute.com)
(TXT) w3m dump (www.alwaysrightinstitute.com)
| Zababa wrote:
| > "Manual" async programming with callback functions/closures is
| not the only way to do cooperative multi-tasking. Google's Go
| language takes another approach known as "green threads", or
| "user level" threads (vs operating system / kernel managed
| threads).
|
| Isn't Go preemptive multi-tasking instead of cooperative multi-
| tasking?
| littlestymaar wrote:
| IIRC it used be cooperative (inserting yield point in specific
| places) but it caused trouble with tight loops and they
| eventually added preemption.
| nayuki wrote:
| Java's new preview of virtual threads is relevant. It promises
| the ease-of-use and ease-of-debugging of synchronous blocking
| threads, but the scalability and low memory usage of asynchrony.
| https://openjdk.java.net/jeps/8277131 ;
| https://news.ycombinator.com/item?id=29236375
| defanor wrote:
| Not sure if it's Swift NIO introduction, or supposed to be a
| general asynchronous I/O introduction (as the title suggests).
| But if it's the latter, it seems to be quite narrow, only
| covering a few related/similar ways to use it.
| ai-dev wrote:
| I'm quire confused about the benefits of async I/O over a
| blocking thread pool. I can't reconcile the many claims I've
| read. Everyone does seem to agree that _most_ people don 't need
| async, but past that...
|
| - The overhead of a thread context switch is expensive due to
| cache thrashing
|
| - If the context switch is due to IO readiness, the overhead is
| equivalent for both
|
| - The main benefit of async is the coding model and the power it
| provides (cancellation, selection)
|
| - Threads won't scale past tens of thousands of connections
|
| - Async is about better utilizing memory, not performance
|
| - A normal-sized Linux server can handle a million threads
| without much trouble, async isn't worth it (harder coding model,
| colored functions)
|
| Is the context switch overhead the same for both? If so, why
| can't threads scale to 100k+ connections?
| pdhborges wrote:
| I also have some doubts for some specific scenarios:
|
| - Imagine you have a monolith that mostly talks to the
| database. You have a primary with multiple read replicas. How
| are you going to take advantage of millions of async tasks if
| your IO concurrency ends up being caped at a couple of thousand
| database connections?
|
| - The second question is: even if you can have a million
| requests on flight do you really want to have such a large
| blast radius on a single server?
| wtallis wrote:
| Whether a single server can handle a million threads depends on
| how often those threads need to wake up, and how much work they
| need to do before going back to sleep. Thread pools don't even
| come close to working well for storage IO, because dozens or
| hundreds of threads per core each blocking on just a few kB of
| IO that the SSDs can handle in milliseconds or less means you
| spend far too much time on context switches. Network IO is
| usually lower in aggregate throughput and much higher in
| latency, so each thread spends relatively more time sleeping
| and therefore you can fit more of them into a single machine's
| available CPU time. But a million threads on one box imposes
| significant constraints on how active those threads can be.
| ai-dev wrote:
| > But a million threads on one box imposes significant
| constraints on how active those threads can be.
|
| Wouldn't the same constraints be imposed on a million async
| tasks? What allows async tasks to be more active than
| threads? Is it the scheduling model, the overhead of context
| switching, or something else?
|
| EDIT: I reread and saw that you mentioned context switching.
| I guess my question would then be the same as here:
| https://news.ycombinator.com/item?id=29452759. Is the claim
| that the context switching overhead when due to I/O readiness
| the same true? I'm mostly thinking about network servers,
| where I think most? context switches would be due to I/O
| readiness.
| vlmutolo wrote:
| I'm only familiar with Rust's async story, though I think the
| following probably applies to other languages as well.
|
| Switching async tasks should have a smaller overhead than
| switching threads. A context switch involves giving control to
| the OS and then the OS giving it back at some point. Both
| involve lots of cache trashing, as you said, and both switches
| involve bookkeeping that the OS has to do. This probably also
| involves instruction-cache evictions.
|
| Switching async tasks involves loading the new task from memory
| onto the stack. That's it. The program can immediately start
| doing useful work again.
|
| So, the short answer is that switching threads involves:
| - more cache evictions - bigger, slower cache evictions
| - OS-level bookkeeping for threads
|
| Also creating threads involves setting up a new, expandable
| stack in the process's memory space. Creating a new task
| involves allocating a fixed amount of memory on the heap once.
| ai-dev wrote:
| > Switching async tasks involves loading the new task from
| memory onto the stack. That's it.
|
| One of the points on the list was that if the context switch
| is due to I/O readiness, then there's more work for the async
| task to do [0]:
|
| > Think about it this way--if you have a user-space thread
| which wakes up due to I/O readiness, then this means that the
| relevant kernel thread woke up from epoll_wait() or something
| similar. With blocking I/O, you call read(), and the kernel
| wakes up your thread when the read() completes. With non-
| blocking I/O, you call read(), get EAGAIN, call epoll_wait(),
| the kernel wakes up your thread when data is ready, and then
| you call read() a second time.
|
| > In both scenarios, you're calling a blocking system call
| and waking up the thread later.
|
| 0: https://news.ycombinator.com/item?id=26110699
| roblabla wrote:
| > this means that the relevant kernel thread woke up from
| epoll_wait() or something similar. With blocking I/O, you
| call read(), and the kernel wakes up your thread when the
| read() completes. With non-blocking I/O, you call read(),
| get EAGAIN, call epoll_wait(), the kernel wakes up your
| thread when data is ready, and then you call read() a
| second time.
|
| This is true for readiness-based IO (which, admittedly,
| most current async IO loops are using), but completion-
| based IO (such as IOCP or io_uring) don't suffer from this
| problem: you just add your IO operation to the queue, and
| do a single syscall that will return once one of the
| operation in your queue is completed AFAIK.
| ai-dev wrote:
| > This is true for readiness-based IO (which, admittedly,
| most current async IO loops are using)
|
| Right, so if most async I/O frameworks use readiness-
| based IO (epoll, kqueue), then the context switch
| overhead is similar. So then most of the performance
| arguments for async I/O don't stand. That's where I'm
| confused :)
| drran wrote:
| Async programs are doing lazy evaluation, so they are
| better at discovering of critical path of execution. It's
| similar to out of order execution of instructions in CPU,
| but on a higher level.
|
| In theory, a compiler can (should) rearrange order of
| execution for non-async programs also.
___________________________________________________________________
(page generated 2021-12-05 23:01 UTC)