[HN Gopher] Async I/O for Dummies (2018)
       ___________________________________________________________________
        
       Async I/O for Dummies (2018)
        
       Author : mpweiher
       Score  : 37 points
       Date   : 2021-12-05 18:12 UTC (4 hours ago)
        
 (HTM) web link (www.alwaysrightinstitute.com)
 (TXT) w3m dump (www.alwaysrightinstitute.com)
        
       | Zababa wrote:
       | > "Manual" async programming with callback functions/closures is
       | not the only way to do cooperative multi-tasking. Google's Go
       | language takes another approach known as "green threads", or
       | "user level" threads (vs operating system / kernel managed
       | threads).
       | 
       | Isn't Go preemptive multi-tasking instead of cooperative multi-
       | tasking?
        
         | littlestymaar wrote:
         | IIRC it used be cooperative (inserting yield point in specific
         | places) but it caused trouble with tight loops and they
         | eventually added preemption.
        
       | nayuki wrote:
       | Java's new preview of virtual threads is relevant. It promises
       | the ease-of-use and ease-of-debugging of synchronous blocking
       | threads, but the scalability and low memory usage of asynchrony.
       | https://openjdk.java.net/jeps/8277131 ;
       | https://news.ycombinator.com/item?id=29236375
        
       | defanor wrote:
       | Not sure if it's Swift NIO introduction, or supposed to be a
       | general asynchronous I/O introduction (as the title suggests).
       | But if it's the latter, it seems to be quite narrow, only
       | covering a few related/similar ways to use it.
        
       | ai-dev wrote:
       | I'm quire confused about the benefits of async I/O over a
       | blocking thread pool. I can't reconcile the many claims I've
       | read. Everyone does seem to agree that _most_ people don 't need
       | async, but past that...
       | 
       | - The overhead of a thread context switch is expensive due to
       | cache thrashing
       | 
       | - If the context switch is due to IO readiness, the overhead is
       | equivalent for both
       | 
       | - The main benefit of async is the coding model and the power it
       | provides (cancellation, selection)
       | 
       | - Threads won't scale past tens of thousands of connections
       | 
       | - Async is about better utilizing memory, not performance
       | 
       | - A normal-sized Linux server can handle a million threads
       | without much trouble, async isn't worth it (harder coding model,
       | colored functions)
       | 
       | Is the context switch overhead the same for both? If so, why
       | can't threads scale to 100k+ connections?
        
         | pdhborges wrote:
         | I also have some doubts for some specific scenarios:
         | 
         | - Imagine you have a monolith that mostly talks to the
         | database. You have a primary with multiple read replicas. How
         | are you going to take advantage of millions of async tasks if
         | your IO concurrency ends up being caped at a couple of thousand
         | database connections?
         | 
         | - The second question is: even if you can have a million
         | requests on flight do you really want to have such a large
         | blast radius on a single server?
        
         | wtallis wrote:
         | Whether a single server can handle a million threads depends on
         | how often those threads need to wake up, and how much work they
         | need to do before going back to sleep. Thread pools don't even
         | come close to working well for storage IO, because dozens or
         | hundreds of threads per core each blocking on just a few kB of
         | IO that the SSDs can handle in milliseconds or less means you
         | spend far too much time on context switches. Network IO is
         | usually lower in aggregate throughput and much higher in
         | latency, so each thread spends relatively more time sleeping
         | and therefore you can fit more of them into a single machine's
         | available CPU time. But a million threads on one box imposes
         | significant constraints on how active those threads can be.
        
           | ai-dev wrote:
           | > But a million threads on one box imposes significant
           | constraints on how active those threads can be.
           | 
           | Wouldn't the same constraints be imposed on a million async
           | tasks? What allows async tasks to be more active than
           | threads? Is it the scheduling model, the overhead of context
           | switching, or something else?
           | 
           | EDIT: I reread and saw that you mentioned context switching.
           | I guess my question would then be the same as here:
           | https://news.ycombinator.com/item?id=29452759. Is the claim
           | that the context switching overhead when due to I/O readiness
           | the same true? I'm mostly thinking about network servers,
           | where I think most? context switches would be due to I/O
           | readiness.
        
         | vlmutolo wrote:
         | I'm only familiar with Rust's async story, though I think the
         | following probably applies to other languages as well.
         | 
         | Switching async tasks should have a smaller overhead than
         | switching threads. A context switch involves giving control to
         | the OS and then the OS giving it back at some point. Both
         | involve lots of cache trashing, as you said, and both switches
         | involve bookkeeping that the OS has to do. This probably also
         | involves instruction-cache evictions.
         | 
         | Switching async tasks involves loading the new task from memory
         | onto the stack. That's it. The program can immediately start
         | doing useful work again.
         | 
         | So, the short answer is that switching threads involves:
         | - more cache evictions         - bigger, slower cache evictions
         | - OS-level bookkeeping for threads
         | 
         | Also creating threads involves setting up a new, expandable
         | stack in the process's memory space. Creating a new task
         | involves allocating a fixed amount of memory on the heap once.
        
           | ai-dev wrote:
           | > Switching async tasks involves loading the new task from
           | memory onto the stack. That's it.
           | 
           | One of the points on the list was that if the context switch
           | is due to I/O readiness, then there's more work for the async
           | task to do [0]:
           | 
           | > Think about it this way--if you have a user-space thread
           | which wakes up due to I/O readiness, then this means that the
           | relevant kernel thread woke up from epoll_wait() or something
           | similar. With blocking I/O, you call read(), and the kernel
           | wakes up your thread when the read() completes. With non-
           | blocking I/O, you call read(), get EAGAIN, call epoll_wait(),
           | the kernel wakes up your thread when data is ready, and then
           | you call read() a second time.
           | 
           | > In both scenarios, you're calling a blocking system call
           | and waking up the thread later.
           | 
           | 0: https://news.ycombinator.com/item?id=26110699
        
             | roblabla wrote:
             | > this means that the relevant kernel thread woke up from
             | epoll_wait() or something similar. With blocking I/O, you
             | call read(), and the kernel wakes up your thread when the
             | read() completes. With non-blocking I/O, you call read(),
             | get EAGAIN, call epoll_wait(), the kernel wakes up your
             | thread when data is ready, and then you call read() a
             | second time.
             | 
             | This is true for readiness-based IO (which, admittedly,
             | most current async IO loops are using), but completion-
             | based IO (such as IOCP or io_uring) don't suffer from this
             | problem: you just add your IO operation to the queue, and
             | do a single syscall that will return once one of the
             | operation in your queue is completed AFAIK.
        
               | ai-dev wrote:
               | > This is true for readiness-based IO (which, admittedly,
               | most current async IO loops are using)
               | 
               | Right, so if most async I/O frameworks use readiness-
               | based IO (epoll, kqueue), then the context switch
               | overhead is similar. So then most of the performance
               | arguments for async I/O don't stand. That's where I'm
               | confused :)
        
             | drran wrote:
             | Async programs are doing lazy evaluation, so they are
             | better at discovering of critical path of execution. It's
             | similar to out of order execution of instructions in CPU,
             | but on a higher level.
             | 
             | In theory, a compiler can (should) rearrange order of
             | execution for non-async programs also.
        
       ___________________________________________________________________
       (page generated 2021-12-05 23:01 UTC)