[HN Gopher] Java Virtual Threads: A Case Study
       ___________________________________________________________________
        
       Java Virtual Threads: A Case Study
        
       Author : mighty_plant
       Score  : 148 points
       Date   : 2024-07-14 05:52 UTC (3 days ago)
        
 (HTM) web link (www.infoq.com)
 (TXT) w3m dump (www.infoq.com)
        
       | exabrial wrote:
       | What is the virtual thread / event loop pattern seeking to
       | optimize? Is it context switching?
       | 
       | A number of years ago I remember trying to have a sane discussion
       | about "non blocking" and I remember saying "something" will block
       | eventually no matter what... anything from the buffer being full
       | on the NIC to your cpu being at anything less than 100%. Does it
       | shake out to any real advantage?
        
         | kevingadd wrote:
         | One of the main reasons to do virtual threads is that it allows
         | you to write naive "thread per request" code and still scale up
         | significantly without hitting the kind of scaling limits you
         | would with OS threads.
        
           | hashmash wrote:
           | The problem with the naive design is that even with virtual
           | threads, you risk running out of (heap) memory if the threads
           | ever block. Each task makes a bit of progress, allocates some
           | objects, and then lets another one do the same thing.
           | 
           | With virtual threads, you can limit the damage by using a
           | semaphore, but you still need to tune the size. This isn't
           | much different than sizing a traditional thread pool, and so
           | I'm not sure what benefit virtual threads will really have in
           | practice. You're swapping one config for another.
        
             | immibis wrote:
             | Async does exactly the same by the way.
        
             | initplus wrote:
             | The benefits from virtual threads come from the simple API
             | that it presents to the programmer. It's not a performance
             | optimization.
        
               | hashmash wrote:
               | But that same benefit was always available with platform
               | threads -- a simple API. What is the real gain by using
               | virtual threads? It's either going to be performance or
               | memory utilization.
        
               | groestl wrote:
               | It's combining the benefits from async models (state
               | machines separated from os threads, thus more optimal for
               | I/O bound workload), with the benefits from proper
               | threading models (namely the simpler human interface).
               | 
               | Memory utilization & performance is going to be similar
               | to the async callback mess.
        
               | hashmash wrote:
               | Why is an async model better than using OS threads for an
               | I/O bound workload? The OS is doing async stuff
               | internally and shielding the complexity with threads.
               | With virtual threads this work has shifted to the JVM.
               | Can the JVM do threads better than the OS?
        
               | adgjlsfhk1 wrote:
               | it can do a much better job because there isn't a
               | security boundary. OS thread scheduling requires sys
               | calls and invalidate a bunch of cache to prevent timing
               | leaks
        
               | zokier wrote:
               | > Can the JVM do threads better than the OS?
               | 
               | Yes. The JVM has far more opportunities for optimizing
               | threads because it doesn't need to uphold 50 years of
               | accumulated invariants and compatibility that current
               | OSes do, and JVM has more visibilty on the application
               | internals.
        
               | mrsilencedogood wrote:
               | "Why is an async model better than using OS threads for
               | an I/O bound workload?"
               | 
               | Because evented/callback-driven code is a nightmare to
               | reason about and breaks lots of very basic tools, like
               | the humble stack trace.
               | 
               | Another big thing for me is resource management -
               | try/finally don't work across callback boundaries, but do
               | work within a virtual thread. I recently ported a netty-
               | based evented system to virtual threads and a very long-
               | standing issue - resource leakage - turned into one very
               | nice try/finally block.
        
               | lichtenberger wrote:
               | Throughput. The code can be "suspended" on a blocking
               | call (I/O, where the platform thread usually is wasted,
               | as the CPU has nothing to do during this time). So, the
               | platform thread can do other work in the meantime.
        
               | CrimsonRain wrote:
               | Create 100k platform threads and you'll find out.
        
             | packetlost wrote:
             | Yeah, and it's generally good to be RAM limited instead of
             | CPU, no? The alternative is blowing a bunch of time on
             | syscalls and OS scheduler overhead.
             | 
             | Also the virtual threads run on a "traditional" thread pool
             | to my understanding, so you can just tweak the number of
             | worker threads to cap the total concurrency.
             | 
             | The benefit is it's overall more efficient (in the general
             | case) and lets you write linear blocking code (as opposed
             | to function coloring). You don't have to use it, but it's
             | nice that it's there. Now hopefully Valhalla actually makes
             | it in eventually
        
               | hashmash wrote:
               | The OS scheduler is still there (for the carrier
               | threads), but now you've added on top of that FJ pool
               | based scheduler overhead. Although virtual threads don't
               | have the syscall overhead when they block, there's a new
               | cost caused by allocating the internal continuation
               | object, and copying state into it. This puts more
               | pressure on the garbage collector. Context switching cost
               | due to CPU cache thrashing doesn't go away regardless of
               | which type of thread you're using.
               | 
               | I've not yet seen a study that shows that virtual threads
               | offer a huge benefit. The Open Liberty study suggests
               | that they're worse than the existing platform threads.
        
               | zokier wrote:
               | > The OS scheduler is still there (for the carrier
               | threads), but now you've added on top of that FJ pool
               | based scheduler overhead.
               | 
               | Ideally carrier threads would be pinned to isolated cpu
               | cores, which removes most aspects of OS scheduler from
               | the picture
        
               | zokier wrote:
               | > I've not yet seen a study that shows that virtual
               | threads offer a huge benefit.
               | 
               | Not exactly Java virtual threads, but a study on how
               | userland threads beat kernel threads.
               | 
               | https://cs.uwaterloo.ca/~mkarsten/papers/sigmetrics2020.h
               | tml
               | 
               | For quick results, check figures 11 and 15 from the
               | (preprint) paper. Userland threads ("fred") have ~50%
               | higher throughput while having orders of magnitude better
               | latency at high load levels, in a real-world application
               | (memcached).
        
               | packetlost wrote:
               | The study says there's _surprising_ performance problems
               | with Java 's virtual thread implementation. Their test of
               | throughput was also hilarious, they put 2000 OS threads
               | vs 2000 virtual threads: most of the time OS threads
               | don't start falling apart until 100k+ threads. You _can_
               | architect an application such that you can handle 200k
               | simultaneous connections using platform-thread-per-core,
               | but it 's harder to reason about than the linear,
               | blocking code that virtual threads and async allow for.
               | 
               | > Context switching cost due to CPU cache thrashing
               | doesn't go away regardless of which type of thread you're
               | using.
               | 
               | Except it's not a context switch? You're jumping to
               | another instruction in the program, one that should be
               | _very_ predictable. You _might_ lose your cache, but it
               | will depend on a ton of factors.
               | 
               | > there's a new cost caused by allocating the internal
               | continuation object, and copying state into it.
               | 
               | This is more of a problem with the implementation (not
               | every virtual thread language does it this way), but yeah
               | this is more overhead on the application. I assume
               | there's improvements that can be made to ease GC
               | pressure, like using object pools.
               | 
               | Usually virtual threads are a memory vs CPU tradeoff that
               | you typically use in massively concurrent IO-bound
               | applications. Total throughput should take over platform
               | threads with hundreds of thousands of connections, but
               | below that probably perform worse, I'm not that surprised
               | by that.
        
               | electroly wrote:
               | > Except it's not a context switch? You're jumping to
               | another instruction in the program, one that should be
               | very predictable. You might lose your cache, but it will
               | depend on a ton of factors.
               | 
               | Java virtual threads are stackful; they have to save and
               | restore the stack every time they mount a different
               | virtual thread to the platform thread. They do this by
               | naive[0] copying of the stack out to a heap allocation
               | and then back again, every time. That's clearly a context
               | switch that you're paying for; it's just not in the
               | kernel. I believe this is what the person you're replying
               | to is talking about.
               | 
               | [0] Not totally naive. They do take some effort to copy
               | only subsets of the stack if they can get away with it.
               | But it's still all done by copies. I don't know enough to
               | understand why they need to copy and can't just swap
               | stack pointers. I think it's related to the need to
               | dynamically grow the stack when the thread is active vs.
               | having a fixed size heap allocation to store the stack
               | copy.
        
             | dikei wrote:
             | > The problem with the naive design is that even with
             | virtual threads, you risk running out of (heap) memory if
             | the threads ever block.
             | 
             | The key with virtual threads is they are so light weight
             | that you can have thousands of them running concurrently:
             | even when they block for I/O, it doesn't matter. It's
             | similar to light weight coroutine in other language like Go
             | or Kotlin.
        
             | imtringued wrote:
             | What you are complaining about has nothing to do with
             | thread pools or virtual threads. You're pointing out the
             | fact that more parallelism will also need more hardware and
             | that a finite hardware budget will need a back pressure
             | strategy to keep resource consumption within a limit. While
             | you might be correct that "sizing a traditional thread
             | pool" is a back pressure strategy that can be applied to
             | virtual threads, the problem with it is that IO bound
             | threads will prevent CPU bound threads from making
             | progress. You don't want to apply back pressure based on
             | the number of tasks. You want back pressure to be in
             | response to resource utilization, so that enough tasks get
             | scheduled to max out the hardware.
             | 
             | This is a common problem with people using Java parallel
             | streams, because they by default share a single global
             | thread pool and the way to use your own thread pool is also
             | extremely counterintuitive, because it essentially relies
             | on some implicit thread local magic to choose to distribute
             | the stream in the thread pool that the parallel stream was
             | launched on, instead of passing it as a parameter.
             | 
             | It would be best if people came up with more dynamic back
             | pressure strategies, because this is a more general problem
             | that goes way beyond thread pools. In fact, one of the key
             | problems of automatic parallelization is deciding at what
             | point there is too much parallelization.
        
         | fzeindl wrote:
         | Does it shake out to any real advantage?
         | 
         | To put it shortly: Writing single-threaded blocking code is far
         | easier for most people and has many other benefits, like more
         | understandable and readable programs:
         | https://www.youtube.com/watch?v=449j7oKQVkc
         | 
         | The main reason why non-blocking IO with it's style of
         | intertwining concurrency and algorithms came along is that
         | starting a thread for every request was too expensive. With
         | virtual threads that problem is eliminated so we can go back to
         | writing blocking code.
        
           | nlitened wrote:
           | > is far easier for most people
           | 
           | I'd say that writing single-threaded code is far easier for
           | _all_ people, even async code experts :)
           | 
           | Also, single-threaded code is supported by programming
           | language facilities: you have a proper call stack, thread-
           | local vars, exceptions bubbling up, structured concurrency,
           | simple resource management (RAII, try-with-resources, defer).
           | Easy to reason and debug on language level.
           | 
           | Async runtimes are always complicated, filled with leaky
           | abstractions, it's like another language that one has to
           | learn in addition, but with a less thought-out, ad-hoc
           | design. Difficult to reason and debug, especially in edge
           | cases
        
             | bheadmaster wrote:
             | > Async runtimes are always complicated, filled with leaky
             | abstractions, it's like another language that one has to
             | learn in addition, but with a less thought-out, ad-hoc
             | design. Difficult to reason and debug, especially in edge
             | cases
             | 
             | Async runtimes themselves are simply attempts to bolt-on
             | green threads on top of a language that doesn't support
             | them on a language level. In JavaScript, async/await uses
             | Promises to enable callback-code to interact with key
             | language features like try/catch, for/while/break, return,
             | etc. In Python, async/await is just syntax sugar for
             | coroutines, which are again just syntax sugar for CPS-style
             | classes with methods split at each "yield". Not sure about
             | Rust, but it probably also uses some Rust macro magic to do
             | something similar.
        
               | logicchains wrote:
               | >Async runtimes themselves are simply attempts to bolt-on
               | green threads on top of a language that doesn't support
               | them on a language level.
               | 
               | Haskell supports async code while also supporting green
               | threads on a language level, and the async code has most
               | of the same issues as async code in any other languages.
        
               | whateveracct wrote:
               | What problems exactly? Haskell has a few things that imo
               | it does better than most languages in this area:
               | 
               | - All IO is non-blocking by default.
               | 
               | - FFI support for interruptible.
               | 
               | - Haskell threads can be preempted externally - this
               | allows you to ensure they never leak. Vs a goroutine that
               | can just spin forever if it doesn't explicitly yield.
               | 
               | - There are various stdlib abstractions for building
               | concurrent programs in a compositional way.
        
               | kbolino wrote:
               | > Haskell threads can be preempted externally - this
               | allows you to ensure they never leak. Vs a goroutine that
               | can just spin forever if it doesn't explicitly yield.
               | 
               | Goroutines are preemptible by the runtime (since
               | https://go.dev/doc/go1.14#runtime) but they're still not
               | addressable or killable through the language itself.
        
               | derriz wrote:
               | Indeed. Async runtimes/sytles are attempts to provide a
               | more readable/useable syntax for CPS[1]. CPS originally
               | had nothing to do with blocking/non-blocking or multi-
               | threading but arose as a technique to structure compiler
               | code.
               | 
               | Its attraction for non-blocking coding is that it allows
               | hiding the multi-threaded event dispatching loop. But as
               | the parent comment suggests, this abstraction is
               | extremely leaky. And in addition, CPS in non-functional
               | languages or without syntactic sugar has poor
               | readability. Improving the readability requires compiler
               | changes in the host language - so many languages have
               | added compiler support to further hide the CPS
               | underpinnings of their async model.
               | 
               | I've always felt this was a big mistake in our industry -
               | all this effort not only in compilers but also in
               | debuggers/IDE - building on a leaky abstraction. Adding
               | more layers of leaky abstractions has only made the issue
               | worse. Async code, at first glance, looks simple but is a
               | minefield for inexperienced/non-professional software
               | engineers.
               | 
               | It's annoying that Rust switched to async style - the
               | abstraction leakiness immediately hits you, as the
               | "hidden event dispatching loop" remains a real dependency
               | even if it's not explicit in the code. Thus libraries
               | using asycn cannot generally be used together although
               | last time i looked, tokio seems to have become the de-
               | facto standard.
               | 
               | [1] https://en.wikipedia.org/wiki/Continuation-
               | passing_style
        
               | kaba0 wrote:
               | I absolutely agree that the virtual/green thread style is
               | much better, more ergonomic, less likely to be correct,
               | etc, but I can't fault Rust's choice, given it being a
               | low-level language without a fat runtime, making it
               | possible to be called into from other runtimes. What the
               | JVM does is simply not possible that way.
        
               | dwattttt wrote:
               | > Not sure about Rust, but it probably also uses some
               | Rust macro magic to do something similar.
               | 
               | Much the same as JavaScript I understand, but no macros;
               | the compiler turns them into Futures that can be polled
        
             | xxs wrote:
             | >I'd say that writing single-threaded code is far easier
             | for _all_ people, even async code experts :)
             | 
             | While 'async' is just a name, underneath it's epoll - and
             | the virtual threads would not perform better than a proper
             | NIO (epoll) server. I dont consider myself an 'async
             | expert' but I have my share of writing NIO code (dare say
             | not terrible at all)
        
               | kaba0 wrote:
               | Virtual threads literally replace the "blocking" IO call
               | issued by the user, by a proper NIO call, mounting the
               | issuer virtual thread when it signals.
        
           | chipdart wrote:
           | > To put it shortly: Writing single-threaded blocking code is
           | far easier for most people and has many other benefits, like
           | more understandable and readable programs:
           | 
           | I think you're missing the whole point.
           | 
           | The reason why so many smart people invest their time on
           | "virtual threads" is developer experience. The goal is to
           | turn writing event-driven concurrent code into something
           | that's as easy as writing single-threaded blocking code.
           | 
           | Check why C#'s async/await implementation is such a huge
           | success and replaced all past approaches overnight. Check why
           | node.js is such a huge success. Check why Rust's async
           | support is such a hot mess. It's all about developer
           | experience.
        
             | kitd wrote:
             | I think he was making the same point as you: writing for
             | virtual threads is like writing for single-threaded
             | blocking code.
        
             | written-beyond wrote:
             | As someone who has written multiple productions services
             | with Async Rust, that are under constant load, I disagree.
             | I've had team members who have only written in C, pick up
             | and start building very comprehensive and performant
             | services in Rust in a matter of days.
             | 
             | How do you developers spew such strong opinions without
             | taking a moment to think about what you're about to say.
             | Rust cannot be directly compared to C#, Java or even Go.
             | 
             | You don't get a runtime or a GC with rust. The developer
             | experience is excellent, you get a lot of control over
             | everything you're building with it. Yes it's not as magical
             | as languages and runtimes like you've mentioned, but the
             | fact that I can at anytime rip those abstractions off and
             | make my service extremely lightweight and performant is not
             | something those languages will allow you to do.
             | 
             | And this is coming from someone who's written non blocking
             | services before Async rust was a thing with just MIO.
             | 
             | The very fact Rust gets mentioned between these languages
             | should be a tribute to the efforts of it's maintainers and
             | core team. The amount of tooling and features they've added
             | into the language gives developers of every realm liberty
             | to try and build what they want.
             | 
             | Honestly, you can hold whatever opinion you want on any
             | language but your comparison really doesn't make sense.
        
           | Nullabillity wrote:
           | > To put it shortly: Writing single-threaded blocking code is
           | far easier for most people. [snip] With virtual threads that
           | problem is eliminated so we can go back to writing blocking
           | code.
           | 
           | This is the core misunderstanding/dishonesty behind the
           | Loom/Virtual Threads hype. Single-threaded blocking code is
           | easy, yes. But that ease comes from being single-threaded,
           | not from not having to await a few Futures.
           | 
           | But Loom doesn't magically solve the threading problem. It
           | hides the Futures, but that just means that you're now
           | writing a multi-threaded program, without the guardrails that
           | modern Future-aware APIs provide. It's the worst of all
           | worlds. It's the scenario that gave multi-threading such a
           | bad reputation for inscrutable failures in the first place.
        
         | gregopet wrote:
         | It's a brave attempt to release the programmer from worrying or
         | even thinking about thread pools and blocking code. Java has
         | gone all in - they even cancelled a non-blocking rewrite of
         | their database driver architecture because why have that if you
         | won't have to worry about blocking code? And the JVM really is
         | a marvel of engineering, it's really really good at what it
         | does, so what team to better pull this off?
         | 
         | So far, they're not quite there yet: the issue of "thread
         | pinning" is something developers still have to be aware of. I
         | hear the newest JVM version has removed a few more cases where
         | it happens, but will we ever truly 100% not have to care about
         | all that anymore?
         | 
         | I have to say things are already pretty awesome however. If you
         | avoid the few thread pinning causes (and can avoid libraries
         | that use them - although most of not all modern libraries have
         | already adapted), you can write really clean code. We had to
         | rewrite an old app that made a huge mess tracking a process
         | where multiple event sources can act independently, and virtual
         | threads seemed the perfect thing for it. Now our business logic
         | looks more like a game loop and not the complicated mix of
         | pollers, request handlers, intermediate state persisters (with
         | their endless thirst for various mappers) and whatnot that it
         | was before (granted, all those things weren't there just
         | because of threading.. the previous version was really really
         | shitily written).
         | 
         | It's true that virtual threads sometimes hurt performance
         | (since their main benefit is cleaner simpler code). Not by
         | much, usually, but a precisely written and carefully tuned
         | piece of performance critical code can often still do things
         | better than automatic threading code. And as a fun aside, some
         | very popular libraries assumed the developer is using thread
         | pools (before virtual threads, which non trivial Java app
         | didn't? - ok nobody answer that, I'm sure there are cases :D)
         | so these libraries had performance tricks (ab)using thread pool
         | code specifics. So that's another possible performance issue
         | with virtual threads - like always with performance of course:
         | don't just assume, try it and measure! :P
        
           | immibis wrote:
           | So... What is it seeking to optimize? Why did you need a
           | thread pool before but not any more? What resource was
           | exhausted to prevent you from putting every request on a
           | thread?
        
             | davidgay wrote:
             | A thread per request has a high risk of overcommitting on
             | CPU use, leading to a different set of problems. Virtual
             | threads are scheduled on a fixed-size (based on number of
             | cores) underlying (non-virtual) thread pool to avoid this
             | problem.
        
               | immibis wrote:
               | Why can't virtual threads overcommit CPU use? If I have 4
               | CPUs and 4000 virtual threads running CPU-bound code, is
               | that not overcommit? A system without overcommit would
               | refuse to create the 5th thread.
        
               | detinho wrote:
               | I think parent is saying overcommit with OS threads. 4k
               | requests = 4k OS threads. That would lead to the problems
               | parent is talking about.
        
               | immibis wrote:
               | Why wouldn't 4k virtual threads lead to the same
               | problems?
        
               | troupo wrote:
               | Because they don't create 4k real threads, and can be
               | scheduled on n=CPU Cores OS threads
        
             | jmaker wrote:
             | Briefly: The cost of spawning schedulable entities, memory
             | and the time to execution. Virtual threads, i.e., fibers,
             | entertain lightweight stacks. You can spawn as many as you
             | like immediately. Your runtime system won't go out of
             | memory as easily. In addition, the spawning happens much
             | faster in user space. You're not creating kernel threads,
             | which is a limited and not cheap resource, whence the
             | pooling you're comparing it to. With virtual threads you
             | can do thread per request explicitly. It makes most sense
             | for IO-bound tasks.
        
             | gifflar wrote:
             | This article nicely describes the differences between
             | threads and virtual threads:
             | https://www.infoq.com/articles/java-virtual-threads/
             | 
             | I think it's definitely worth a read.
        
             | gregopet wrote:
             | It's mainly trying to make you not worry about how many
             | threads you create (and not worry about the caveats that
             | come with optimising how many threads you create, which is
             | something you are very often forced to do).
             | 
             | You can create a thread in your code and not worry whether
             | that thing will then be some day run in a huge loop or
             | receive thousands of requests and therefore spend all your
             | memory on thread overhead. Go and other languages (in
             | Java's ecosystem there's Kotlin for example) employ similar
             | mechanisms to avoid native thread overhead, but you have to
             | think about them. Like, there's tutorial code where
             | everything is nice & simple, and then there's real world
             | code where a lot of it must run in these special constructs
             | that may have little to do with what you saw in those first
             | "Hello, world" samples.
             | 
             | Java's approach tries to erase the difference between
             | virtual and real threads. The programmer should have to
             | employ no special techniques when using virtual threads and
             | should be able to use everything the language has to offer
             | (this isn't true in many languages' virtual/green threads
             | implementations). Old libraries should continue working and
             | perhaps not even be aware they're being run on virtual
             | threads (although, caveats do apply for low level/high
             | performance stuff, see above posts). And libraries that you
             | interact with don't have to care what "model" of green
             | threading you're using or specifically expose "red" and
             | "blue" functions.
        
               | giamma wrote:
               | You will still have to worry, too many virtual threads
               | will imply too much context switching. However, virtual
               | threads will be always interruptable on I/O, as they are
               | not mapped to actual o.s. threads, but rather simulated
               | by the JVM which will executed a number of instructions
               | for each virtual thread.
               | 
               | This gives the chance to the JVM to use real threads more
               | efficiently, avoiding that threads remain unused while
               | waiting on I/O (e.g. a response from a stream). As soon
               | as the JVM detects that a physical thread is blocked on
               | I/O, a semaphore, a lock or anything, it will reallocate
               | that physical thread to running a new virtual thread.
               | This will reduce latency, context switch time (the
               | switching is done by the JVM that already globally
               | manages the memory of the Java process in its heap) and
               | will avoid or at least largely reduce the chance that a
               | real thread remains allocated but idle as it's blocked on
               | I/O or something else.
        
               | frant-hartm wrote:
               | What do you mean by context switching?
               | 
               | My understanding is that virtual threads mostly eliminate
               | context switching - for N CPUs JVM creates N platform
               | threads and they run virtual threads as needed. There is
               | no real context switching apart from GC and other JVM
               | internal threads.
               | 
               | A platform thread picking another virtual thread to run
               | after its current virtual thread is blocked on IO is not
               | a context switch, that is an expensive OS-level
               | operation.
        
               | anonymousDan wrote:
               | Does Java's implementation of virtual threads perform any
               | kind of work stealing when a particular physical thread
               | has no virtual threads to run (e.g. they are all blocked
               | on I/O)?
        
               | mike_hearn wrote:
               | It does. They get scheduled onto the ForkJoinPool which
               | is a work stealing pool.
        
               | immibis wrote:
               | "they run virtual threads as needed" - so when one
               | virtual thread is no longer needed and another one is
               | needed, they switch context, yes?
        
               | frant-hartm wrote:
               | This is called mounting/un-mounting and is much cheaper
               | than a context switch.
        
               | immibis wrote:
               | This is a type of context switch. You are saying dollars
               | are cheaper than money.
        
               | peeters wrote:
               | It's been a really long time since I dealt with anything
               | this low level, but in my very limited and ancient
               | experience when people talk about context switching
               | they're talking specifically about the userland process
               | yielding execution back to the kernel so that the
               | processor can be reassigned to a different
               | process/thread. Naively, if the JVM isn't actually
               | yielding control back to the kernel, it has the freedom
               | to do things in a much more lightweight manner than the
               | kernel would have to.
               | 
               | So I think it's meaningful to define what we mean by
               | context switch here.
        
               | giamma wrote:
               | The JVM will need to do context switching when
               | reallocating the real thread that is running a blocked
               | virtual thread to the next available virtual thread. It
               | won't be CPU context switching, but context switching
               | happens at the JVM level and represents an effort.
        
               | frant-hartm wrote:
               | Ok. This JVM-level switching is called mounting/un-
               | mounting of the virtual thread and is supposed to be
               | several orders of magnitude cheaper compared to normal
               | context switch. You should be fine with millions of
               | virtual threads.
        
               | immibis wrote:
               | It seems that the answer to the question was "memory".
               | Stack allocations, presumably. You have answered by
               | telling us that virtual threads are better than real
               | threads because real threads suck, but you didn't say why
               | they suck or why virtual threads don't suck in the same
               | way.
        
               | mike_hearn wrote:
               | Real threads don't suck but they pay a price for
               | generality. The kernel doesn't know what software you're
               | going to run, and there's no standards for how that
               | software might use the stack. So the kernel can't
               | optimize by making any assumptions.
               | 
               | Virtual threads are less general than kernel threads. If
               | you use a virtual thread to call out of the JVM you lose
               | their benefits, because the JVM becomes like the kernel
               | and can't make any assumptions about the stack.
               | 
               | But if you are running code controlled by the JVM, then
               | it becomes possible to do optimizations (mostly stack
               | related) that otherwise can't be done, because the GC and
               | the compiler and the threads runtime are all developed
               | together and work together.
               | 
               | Specifically, what HotSpot can do moving stack frames to
               | and from the heap very fast, which interacts better with
               | the GC. For instance if a virtual thread resumes,
               | iterates in a loop and suspends again, then the stack
               | frames are never copied out of the heap onto the kernel
               | stack at all. Hotspot can incrementally "pages" stack
               | frames out of the heap. Additionally, the storage space
               | used for a suspended virtual thread stack is a lot
               | smaller than a suspended kernel stack because a lot of
               | administrative goop doesn't need to be saved at all.
        
               | brabel wrote:
               | OS Threads do not suck, they're great. But they are
               | expensive to create as they require a syscall, and
               | they're expensive to maintain as they consume quite a bit
               | of memory just to exist, even if you don't need it (due
               | to how they must pre-allocate a stack which apparently is
               | around 2MB initially, and can't be made smaller as in
               | most cases you will need even more, so it would make most
               | cases worse).
               | 
               | Virtual Threads are very fast to create and allocate only
               | the memory needed by the actual call stack, which can be
               | much less than for OS Threads.
               | 
               | Also, blocking code is very simple compared to the
               | equivalent async code. So using blocking code makes your
               | code much easier to follow. Check out examples of
               | reactive frameworks for Java and you will quickly
               | understand why.
        
               | kllrnohj wrote:
               | > and they're expensive to maintain as they consume quite
               | a bit of memory just to exist, even if you don't need it
               | (due to how they must pre-allocate a stack which
               | apparently is around 2MB initially,
               | 
               | I'm not familiar with windows, but this certainly isn't
               | the case on Linux. It only costs 2mb-8mb of virtual
               | address space, not actual physical memory. And there's no
               | particular reason to believe the JVM can have a list of
               | threads and their states more efficiently than the kernel
               | can.
               | 
               | All you really save is the syscall to create it and some
               | context switching costs as the JVM doesn't need to deal
               | with saving/restoring registers as there's no preemption.
               | 
               | The downside though is you don't have any preemption,
               | which depending on your usage is a really fucking massive
               | downside.
        
               | Someone wrote:
               | > The downside though is you don't have any preemption,
               | which depending on your usage is a [...] massive
               | downside.
               | 
               | Nobody is taking OS threads away, so you can choose to
               | use them when they better fit your use case.
        
             | chipdart wrote:
             | > So... What is it seeking to optimize?
             | 
             | The goal is to maximize the number of tasks you can run
             | concurrently, while imposing on the developers a low
             | cognitive load to write and maintain the code.
             | 
             | > Why did you need a thread pool before but not any more?
             | 
             | You still need a thread pool. Except with virtual threads
             | you are no longer bound to run a single task per thread.
             | This is specially desirable when workloads are IO-bound and
             | will expectedly idle while waiting for external events. If
             | you have a never-ending queue of tasks waiting to run, why
             | should you block a thread consuming that task queue by
             | running a task that stays idle while waiting for something
             | to happen? You're better off starting the task and setting
             | it aside the moment it awaits for something to happen.
             | 
             | > What resource was exhausted to prevent you from putting
             | every request on a thread?
        
             | twic wrote:
             | The memory overhead of threads.
        
           | pragmatick wrote:
           | > although most of not all modern libraries have already
           | adapted
           | 
           | Unfortunately kafka, for example, has not:
           | https://github.com/spring-projects/spring-
           | kafka/commit/ae775...
        
           | haspok wrote:
           | Just a side note, async JDBC was a thing way before Loom came
           | about, and it failed miserably. I'm not sure why, but my
           | guess would be is that most enterprise software is not web-
           | scale, so JDBC worked well as it was.
           | 
           | Also, all the database vendors provided their drivers
           | implementing the JDBC API - good luck getting Oracle or IBM
           | contribute to R2DBC.. (Actually, I stand corrected: there is
           | an Oracle R2DBC driver now - it was released fairly recently
           | though.)
           | 
           | EDIT: "failed miserably" is maybe too strong - but R2DBC
           | certainly doesn't have the support and acceptance of JDBC.
        
             | frevib wrote:
             | It could also be that there just isn't enough demand for a
             | non-blocking JDBC. For example, Postgresql server is not
             | coping very well with lots of simultaneous connections, due
             | to it's (a.o.) process-per-connection model. From the
             | client-side (JDBC), a small thread poool would be enough to
             | max out the Postgresql server. And there is almost no
             | benefit of using non-blocking vs a small thread pool.
        
               | haspok wrote:
               | I would argue the main benefit would be that the
               | threadpool that the developer would create anyway would
               | instead be created by the async database driver, which
               | has more intimate knowledge about the server's
               | capabilities. Maybe it knows the limits to the number of
               | connections, or can do other smart optimizations. In any
               | case, for the developer it would be a more streamlined
               | experience, with less code needed, and better defaults.
        
               | frevib wrote:
               | I think we're confusing async and non-blocking? Non-
               | blocking is the part what makes virtual threads more
               | efficient than threads. Async is the programming style;
               | e.g. do things concurrently. Async can be implemented
               | with threads or non-blocking, if the API supports it. I
               | was merely arguing that a non-blocking JDBC has little
               | merit as the connections to a DB are limited. Non-
               | blocking APIs are only beneficial when there are lots, >
               | 10k connections.
               | 
               | JDBC knows nothing about the amount of connections a
               | server can handle, but to try so many connections until
               | it won't connect any more.
               | 
               | | In any case, for the developer it would be a more
               | streamlined experience, with less code needed, and better
               | defaults.
               | 
               | I agree it would be best not to bother the dev with what
               | is going on under the hood.
        
             | vbezhenar wrote:
             | R2DBC allows to efficiently maintain millions of
             | connections to the database. But what database supports
             | millions of connections? Not postgres for sure, and
             | probably no other conventional database. So using reactive
             | JDBC driver makes little sense, if you're going to use 1000
             | connections, 1000 threads will do just fine and bring
             | little overhead. Those who use Java, don't care about
             | spending 100 more MB of RAM when their service already eats
             | 60GB.
        
               | merb wrote:
               | Reactive drivers were not about 1000 connections, they
               | were about reusing a single connection better, by queuing
               | a little bit more efficient over a single connection.
               | Reactive programming is not about parallelism, it's about
               | concurrency.
        
         | lmm wrote:
         | > I remember saying "something" will block eventually no matter
         | what... anything from the buffer being full on the NIC to your
         | cpu being at anything less than 100%.
         | 
         | Nope. You can go async all the way down, right to the
         | electrical signals if you want. We usually impose some amount
         | of synchronous clocking/polling for sanity, at various levels,
         | but you don't have to; the world is not synchronised, the
         | fastest way to respond to a stimulus will always be to respond
         | when it happens.
         | 
         | > Does it shake out to any real advantage?
         | 
         | Of course it does - did you miss the whole C10K discussions 20+
         | years ago? Whether it matters for your business is another
         | question, but you can absolutely get a lot more throughput by
         | being nonblocking, and if you're doing request-response across
         | the Internet you generally can't afford _not_ to.
        
         | duped wrote:
         | imo the biggest difference between "virtual" threads in a
         | managed runtime and "os" threads is that the latter uses a
         | fixed size stack whereas the former is allowed to resize, it
         | can grow on demand and shrink under pressure.
         | 
         | When you spawn an OS thread you are paying at worst the full
         | cost of it, and at best the max depth seen so far in the
         | program, and stack overflows can happen even if the program is
         | written correctly. Whereas a virtual thread can grow the stack
         | to be exactly the size it needs at any point, and when GC runs
         | it can rewrite pointers to any data on the stack safely.
         | 
         | Virtual/green/user space threads aka stackful coroutines have
         | proven to be an excellent tool for scaling concurrency in real
         | programs, while threads and processes have always played
         | catchup.
         | 
         | > "something" will block eventually no matter what...
         | 
         | The point is to allow _everything else_ to make progress while
         | that resource is busy.
         | 
         | ---
         | 
         | At a broader scale, as a programming model it lets you
         | architect programs that are designed to scale horizontally.
         | With the commodization of compute in the cloud that means it's
         | very easy to write a program that can be distributed as i/o
         | demand increases. In principle, a "virtual" thread could be
         | spawned on a different machine entirely.
        
         | chipdart wrote:
         | > What is the virtual thread / event loop pattern seeking to
         | optimize? Is it context switching?
         | 
         | Throughput.
         | 
         | Some workloads are not CPU-bound or memory-bound, and spend the
         | bulk of their time waiting for external processes to make data
         | available.
         | 
         | If your workloads are expected to stay idle while waiting for
         | external events, you can switch to other tasks while you wait
         | for those external events to trigger.
         | 
         | This is particularly convenient if the other tasks you're
         | hoping to run are also tasks that are bound to stay idle while
         | waiting for external events.
         | 
         | One of the textbook scenarios that suits this pattern well is
         | making HTTP requests. Another one is request handlers, such as
         | the controller pattern used so often in HTTP servers.
         | 
         | Perhaps the poster child of this pattern is Node.js. It might
         | not be the performance king and might be single-threaded, but
         | it features in the top spots in performance benchmarks such as
         | TechEmpower's. Node.js is also highly favoured in function-as-
         | a-service applications, as it's event-driven architecture is
         | well suited for applications involving a hefty dose of network
         | calls running on memory- and CPU-constrained systems.
        
         | pron wrote:
         | No, it optimises hardware utilisation by simply allowing more
         | tasks to concurrently make progress. This allows throughput to
         | reach the maximum the hardware allows. See
         | https://youtu.be/07V08SB1l8c.
        
         | frevib wrote:
         | They indeed optimize thread context switching. Taking the
         | thread on and off the CPU is becoming expensive when there are
         | thousands of threads.
         | 
         | You are right that everything blocks, even when going to L1
         | cache you have to wait 1 nanoseconds. But blocking in this
         | context means waiting for "real" IO like a network request or
         | spinning disk access. Virtual threads take away the problem
         | that the thread sits there doing nothing for a while as it is
         | waiting for data, before it is context switched.
         | 
         | Virtual threads won't improve CPU-bound blocking. There the
         | thread is actually occupying the CPU, so there is no problem of
         | the thread doing nothing as with IO-bound blocking.
        
         | kbolino wrote:
         | The hardware now is just as concurrent/parallel as the
         | software. High-end NVMe SSDs and server-grade NICs can do
         | hundreds to thousands of things simultaneously. Even if one
         | lane does get blocked, there are other lanes which are open.
        
       | tzahifadida wrote:
       | Similarly the power of golang concurrent programming is that you
       | write non-blocking code as you write normal code. You don't have
       | to wrap it in functions and pollute the code but moreover, not
       | every coder on the planet knows how to handle blocking code
       | properly and that is the main advantage. Most programming
       | languages can do anything the other languages can do. The problem
       | is that not all coders can make use of it. This is why I see
       | languages like golang as an advantage.
        
         | jillesvangurp wrote:
         | Kotlin embraced the same thing via co-routines, which are
         | conceptually similar to go routines. It adds a few useful
         | concepts around this though; mainly that of a co-routine
         | context which encapsulates that a tree of co-routine calls
         | needs some notion of failure handling and cancellation.
         | Additionally, co-routines are dispatched to a dispatcher. A
         | dispatcher can be just on the same thread or actually use a
         | thread pool. Or as of recent Java versions a virtual thread
         | pool. There's actually very little point in using virtual
         | threads in Kotlin. They are basically a slightly more heavy
         | weight way of doing co-routines. The main benefit is dealing
         | with legacy blocking Java libraries.
         | 
         | But the bottom line with virtual threads, go-routines, or
         | kotlin's co-routines is that it indeed allows for imperative
         | code style code that is easy to read and understand. Of course
         | you still need to understand all the pitfalls of concurrency
         | bugs and all the weird and wonderful way things can fail to
         | work as you expect. And while Java's virtual threads are
         | designed to work like magic pixie dust, it does have some nasty
         | failure modes where a single virtual thread can end up blocking
         | all your virtual threads. Having a lot of synchronized blocks
         | in legacy code could cause that.
        
           | tzahifadida wrote:
           | Kotlin is not a language I learned so I will avoid
           | commenting.
           | 
           | However, the use of JAVA for me is for admin backend or heavy
           | weight services for enterprises or startups I coded for, so
           | for my taste I can't use it without spring or jboss, etc.. ,
           | and in that way I think simplicity went out the window a long
           | long time ago :) It took me years to learn all the quirks of
           | these frameworks... and the worse thing about it is that they
           | keep changing every few months...
        
             | jillesvangurp wrote:
             | Kotlin makes a lot of that stuff easier to deal with and
             | there is also a growing number of things that work without
             | Java libraries. Or even the JVM. I use it with Spring Boot.
             | But we also have a lot of kotlin-js code running in a
             | browser. And I use quite a few multiplatform libraries for
             | Kotlin that work pretty much anywhere. I've even written a
             | few myself. It's pretty easy to write portable code in
             | Kotlin these days.
             | 
             | For example ktor works on the JVM but you can also build
             | native applications with it. And I use ktor client in the
             | browser. When running in the browser it uses the browser
             | fetch API. When running on the jvm you can configure it to
             | use any of a wide range of Java http clients. On native it
             | uses curl.
        
         | juyjf_3 wrote:
         | Can we stop pretending Erlang does not exist?
         | 
         | Go is a next-gen trumpian language that rejects sum types,
         | pattern matching, non-nil pointers, and for years, generics;
         | it's unhinged.
        
           | seabrookmx wrote:
           | While I generally agree with your take that it's a regression
           | in PL design, there's no need to be inflammatory. There's
           | lots of good software written in it.
           | 
           | > pretending Erlang does not exist
           | 
           | For better or worse it doesn't to most programmers. The
           | syntax is not nearly as approachable as GoLang. Luckily
           | Elixir exists.
        
       | taspeotis wrote:
       | My rough understanding is that this is similar to async/await in
       | .NET?
       | 
       | It's a shame this article paints a neutral (or even negative)
       | experience with virtual threads.
       | 
       | We rewrote a boring CRUD app that spent 99% of its time waiting
       | the database to respond to be async/await from top-to-bottom. CPU
       | and memory usage went way down on the web server because so many
       | requests could be handled by far fewer threads.
        
         | jsiepkes wrote:
         | > My rough understanding is that this is similar to async/await
         | in .NET?
         | 
         | Well somewhat but also not really. They are green threads like
         | async/await, but it's use is more transparent, unlike
         | async/await.
         | 
         | So there are no special "async methods". You just instantiate a
         | "VirtualThread" where you normally instantiate a (kernel)
         | "Thread" and then use it like any other (kernel) thread. This
         | works because for example all blocking IO API will be
         | automatically converted to non-blocking IO underwater.
        
         | devjab wrote:
         | > My rough understanding is that this is similar to async/await
         | in .NET?
         | 
         | Not really. What C# does is sort of similar but it has the
         | disadvantages of splitting your code ecosystem into non-
         | blocking/blocking code. This means you can "accidentally" start
         | your non-blocking code. Something which may cause your
         | relatively simple API to consume a ridiculous amount of
         | resources. It also makes it much more complicated to update and
         | maintain your code as it grows over the years. What is perhaps
         | worse is that C# lacks an interruption model.
         | 
         | Java's approach is much more modern but then it kind of had to
         | be because the JVM already supported structured concurrency
         | from Kotlin. Which means that Java's "async/await" had to work
         | in a way which wouldn't break what was already there. Because
         | Java is like that.
         | 
         | I think you can sort of view it as another example of how Java
         | has overtaken C# (for now), but I imagine C# will get an
         | improved async/await model in the next couple of years. Neither
         | approach is something you would actually chose if concurrency
         | is important to what you build and you don't have a legacy
         | reason to continue to build on Java/C# . This is because Go or
         | Erlang would be the obvious choice, but it's nice that you at
         | least have the option if your organisation is married to a
         | specific language.
        
           | delusional wrote:
           | From what I recall, and this is a while ago so bare with me,
           | Java Virtual Threads still have a lot of pitfalls where the
           | promise of concurrency isn't really fulfilled.
           | 
           | I seem to remember that is was some pretty basic operations
           | (like maybe read or something) that caused the thread not to
           | unmount, and therefore just block the underlying os thread.
           | At that point you've just invented the world's most
           | complicated thread pool.
        
             | za3faran wrote:
             | You're referring to thread pinning, and this is being
             | addressed.
        
             | mike_hearn wrote:
             | Reading from sockets definitely works. It'd be pretty
             | useless if it didn't.
             | 
             | Some operations that don't cause a task switch to another
             | virtual thread are:
             | 
             | - If you've called into a native library and back into Java
             | that then blocks. In practice this never happens because
             | Java code doesn't rely on native libraries or frameworks
             | that much and when it does happen it's nearly always in-
             | and-out quickly without callbacks. This can't be fixed by
             | the JVM, however.
             | 
             | - File IO. No fundamental problem here, it can be fixed,
             | it's just that not so many programs need tens of thousands
             | of threads doing async file IO.
             | 
             | - If you're holding a lock using 'synchronized'. No
             | fundamental problem here, it's just annoying because of how
             | HotSpot is implemented. They're fixing this at the moment.
             | 
             | In practice it's mostly the last one that causes issues in
             | real apps. It's not hard to work around, and eventually
             | those workarounds won't be needed anymore.
        
           | szundi wrote:
           | Maybe C# is going to have a new asynv await model but the
           | fragmentation of libs and codes cannot be undone probably.
           | 
           | Java has the power that they make relatively more decisions
           | about the language and the libs that they don't have to fix
           | later. That's a great value if you're not building throw-away
           | software but SaaS or something that has to live long.
        
           | za3faran wrote:
           | I would not argue that golang is the obvious choice for
           | concurrency. Java's approach is actually superior to
           | golang's. It takes it a step further by offering structured
           | concurrency[1].
           | 
           | Kotlin's design had no bearing on Java's or the JVM's
           | implementation.
           | 
           | C# has an interruption model through CancellationToken as far
           | as I'm aware.
           | 
           | [1] https://openjdk.org/jeps/453
        
           | troupo wrote:
           | Erlang, not Go, should be the obvious choice for concurrency,
           | but it's impossible to retrofit Erlang's concurrency onto
           | existing systems.
        
             | toast0 wrote:
             | As an Erlang person, from reading about Java's Virtual
             | Threads, it feels like it should get a significant portion
             | of the Erlang concurrency story.
             | 
             | With virtual threads, it seems like if you don't hit
             | gotchas, you can spawn a thead, and run straight through
             | blocking code and not worry about too many threads, etc. So
             | you could do thread per connection/user chat servers and
             | http servers and what not.
             | 
             | Yes, it's still shared memory, so you can miss out on the
             | simplifying effect of explicit communication instead of
             | shared memory communication and how that makes it easy to
             | work with remote and local communication partners. But you
             | can build a mailbox system if you want (it's not going to
             | be as nice as built in one, of course). I'm not sure if
             | Java virtual threads can kill each other effectively,
             | either.
        
               | troupo wrote:
               | Erlang's concurrency story isn't green threads.
               | 
               | It's (with caveats, of course):
               | 
               | - a thread crashing will not bring the system down
               | 
               | - a thread cannot hog all processing time as the system
               | ensures all threads get to run. The entire system is re-
               | entrant and execution of each thread can be suspended to
               | let other threads continue
               | 
               | - all CPU cores can and will be utilized transparently to
               | the user
               | 
               | - you can monitor a thread and if it crashes you're
               | guaranteed to receive info on why and how it crashed
               | 
               | - immutable data structures play a huge part of it, of
               | course, but the above is probably more important
               | 
               | That's why Go's concurrency is not that good, actually.
               | Goroutines are not even half-way there: an error in a
               | goroutine can panic-kill your entire program, there are
               | no good ways to monitor them etc.
        
             | morsch wrote:
             | Isn't that Akka?
        
               | troupo wrote:
               | Akka is heavily inspired by Erlang, but the underlying
               | system/VM has to provide certain guarantees for actual
               | Erlang-style concurrency to work:
               | https://news.ycombinator.com/item?id=40989995
        
           | jayd16 wrote:
           | It's foolish to say that green threads are strictly better
           | and ignore async/await as something outdated. It can do a lot
           | that green threads can't.
           | 
           | For example, you can actually share a thread with another
           | runtime.
           | 
           | Cooperative threading allows for implicit critical sections
           | that can be cumbersome in preemptive threading.
           | 
           | Async/await and virtual threads are solving different
           | problems.
           | 
           | > What is perhaps worse is that C# lacks an interruption
           | model
           | 
           | Btw, You'd just use OS threads if you really needed pre-
           | emptively scheduled threads. Async tasks run on top of OS
           | threads so you get both co-opertive scheduling within threads
           | and pre-emptive scheduling of threads onto cores.
        
           | kaba0 wrote:
           | > This is because Go or Erlang would be the obvious choice
           | 
           | Why go? It has a quite anemic standard library for concurrent
           | data structures, compared to java and is a less expressive ,
           | and arguably worse language on any count, verbosity included.
        
         | xxs wrote:
         | >My rough understanding is that this is similar to async/await
         | in .NET?
         | 
         | No, the I/O is still blocking with respect to the application
         | code.
        
         | kimi wrote:
         | It's more like Erlang threads - they appear to be blocking, so
         | existing code will work with zero changes. But you can create a
         | gazillion of them.
        
         | he0001 wrote:
         | > My rough understanding is that this is similar to async/await
         | in .NET?
         | 
         | The biggest difference is that C# async/await code is rewritten
         | by the compiler to be able to be async. This means that you see
         | artifacts in the stack that weren't there when you wrote the
         | code.
         | 
         | There are no rewrites with virtual threads and the code is
         | presented on the stack just as you write it.
         | 
         | They solve the same problem but in very different ways.
        
           | pansa2 wrote:
           | > _They solve the same problem but in very different ways._
           | 
           | Yes. Async/await is stackless, which leads to the "coloured
           | functions" problem (because it can only suspend function
           | calls one-by-one). Threads are stackful (the whole stack can
           | be suspended at once), which avoids the issue.
        
           | jayd16 wrote:
           | There is overlap but they really don't solve the same
           | problem. Cooperative threading has its own advantages and
           | patterns that won't be served by virtual threads.
        
             | he0001 wrote:
             | What patterns does async/await solve which virtual threads
             | don't?
        
               | neonsunset wrote:
               | "Green Threads" as implemented in Java is a solution that
               | solves only a single problem - blocking/multiplexing.
               | 
               | It does not enable easy concurrency and task/future
               | composition the way C#/JS/Rust do, which offer strictly
               | better and more comprehensive model.
        
               | jayd16 wrote:
               | If you need to be explicit about thread contexts because
               | you're using a thread that's bound to some other runtime
               | (say, a GL Context) or you simply want to use a single
               | thread for synchronization like is common in UI
               | programming with a Main/UI Thread, async/await does quite
               | well. The async/await sugar ends up being a better devx
               | than thread locking and implicit threading just doesn't
               | cut it.
               | 
               | In Java they're working on a structured concurrency
               | library to bridge this gap, but IMO, it'll end up looking
               | like async/await with all its ups and downs but with less
               | sugar.
        
         | peteri wrote:
         | It's a different model. Microsoft did work on green threads a
         | while ago and decided against continuing.
         | 
         | Links:
         | 
         | https://github.com/dotnet/runtimelab/issues/2398
         | 
         | https://github.com/dotnet/runtimelab/blob/feature/green-thre...
        
           | pjmlp wrote:
           | It should be pointed out, that the main reason they didn't go
           | further was because of added complexity in .NET, when
           | async/await already exists.
           | 
           | > Green threads introduce a completely new async programming
           | model. The interaction between green threads and the existing
           | async model is quite complex for .NET developers. For
           | example, invoking async methods from green thread code
           | requires a sync-over-async code pattern that is a very poor
           | choice if the code is executed on a regular thread.
           | 
           | Also to note that even the current model is complex enough to
           | warrant a FAQ,
           | 
           | https://devblogs.microsoft.com/dotnet/configureawait-faq
           | 
           | https://github.com/davidfowl/AspNetCoreDiagnosticScenarios/b.
           | ..
        
             | neonsunset wrote:
             | This FAQ is a bit outdated in places, and is not something
             | most users should worry about in practice.
             | 
             | JVM Green Threads here serve predominantly back-end
             | scenarios, where most of the items on the list are not of
             | concern. This list also exists to address bad habits that
             | carried over from before the tasks were introduced, many
             | years ago.
             | 
             | In general, the perceived want of green threads is in part
             | caused by misunderstanding of that one bad article about
             | function coloring. And that one bad article about function
             | coloring also does not talk about the way you do async in
             | C#.
             | 
             | Async/await in C# in back-end is a very easy to work with
             | model with explicit understanding where a method returns an
             | operation that promises to complete in the future or not,
             | and composing tasks[0] for easy (massive) concurrency is
             | significantly more idiomatic than doing so with green
             | threads or completable futures that existed in Java before
             | these. And as evidenced by adoption of green threads by
             | large scale Java projects, turns out the failure modes
             | share similarities except green threads end up violating
             | way more expectations and the code author may not have any
             | indication or explicit mechanism to address this, like
             | using AsyncLocal.
             | 
             | Also one change to look for is "Runtime Handled Tasks"
             | project in .NET that will replace Roslyn-generated state
             | machine code with runtime-provided suspension mechanism
             | which will only ever suspend at true suspension points
             | where task's execution actually yields asynchronously. So
             | far numbers show at least 5x decrease in overhead, which is
             | massive and will bring performance of computation heavy
             | async paths in line with sync ones:
             | 
             | https://github.com/dotnet/runtimelab/blob/feature/async2-ex
             | p...
             | 
             | Note that you were trivially able to have millions of
             | scheduled tasks even before that as they are very
             | lightweight.
             | 
             | [0]: e.g. sending requests in parallel is just this
             | using var http = new HttpClient() {             BaseAddress
             | = new("https://news.ycombinator.com/news")         };
             | var requests = Enumerable             .Range(1, 4)
             | .Select(n => $"?p={n}")
             | .Select(http.GetStringAsync);              var pages =
             | await Task.WhenAll(requests);
        
               | ffsm8 wrote:
               | I don't think that this would be a good showcase for
               | Virtual Threads. The "async" API for Java is
               | CompletableFutures, right? thats been stable for
               | something like 10 years, so no real change since Java 8.
               | 
               | You'd jsut have to define a ThreadPool with n Threads
               | before, where each request would've blocked one pending
               | thread. Now it just keeps going.
               | 
               | So your equivalent Java example should've been something
               | like this, but again: the completeable futures api is
               | pretty old at this point.
               | @HttpExchange(value = "https://news.ycombinator.com")
               | interface HnClient {
               | @GetExchange("news?p={page}")
               | CompletableFuture<String> getNews(@PathVariable("page")
               | Integer page);         }
               | @RequiredArgsConstructor         @Service         class
               | HnService {             private final HnClient hnClient;
               | List<String> getNews() {                 var requests =
               | IntStream.rangeClosed(1, 4)
               | .boxed().map(hnClient::getNews).toList();
               | return
               | requests.stream().map(CompletableFuture::join).toList();
               | }         }
        
               | vips7L wrote:
               | Structured concurrency is still being developed:
               | https://openjdk.org/jeps/453
               | 
               | Also, I wouldnt consider that the equivalent Java code.
               | That is all Spring and Lombok magic. Just write the code
               | and just use java.net.HttpClient.
        
               | ffsm8 wrote:
               | > and just use java.net.HttpClient.
               | 
               | No.
        
               | no_wizard wrote:
               | it might be obvious to others, but why the 'No'?
        
               | vips7L wrote:
               | The standard http client doesn't have as great of UX as
               | other community libs. Most of us (including me) don't
               | like to use it.
               | 
               | That being said, imo you can't call something equivalent
               | when doing a bunch of spring magic. This disregards that
               | OPs logic isn't equivalent at all. It waits for each
               | future 1 by 1 instead of doing something like
               | CompletableFuture.allOf or in JS: Promise.all.
        
               | no_wizard wrote:
               | I take your point about the aforementioned article[0][1]
               | being a popular reference when discussing async / await
               | (and to a lesser extent, async programming in modern
               | languages more generally) I think its popularity is
               | highlighting the fact that it is a pain point for folks.
               | 
               | Take for instance Go. It is well liked in part, because
               | its so easy to do concurrency with goroutines, and
               | they're easy to reason about, easy to call, easy to
               | write, and for how much heavy weight they're lifting,
               | relatively simple to understand.
               | 
               | The reason Java is getting alot of kudos here for their
               | implementation of green threads is exactly the same
               | reason people talk about Go being an easy language to use
               | for concurrency: It doesn't gate code behind specialized
               | idioms / syntax / features that are only specific to
               | asynchronous work. Rather, it largely utilizes the same
               | idioms / syntax as synchronous code, and therefore is
               | easier to reason about, adopt, and ultimately I think
               | history is starting to show, to use.
               | 
               | Java is taking an approach paved by Go, and ultimately I
               | think its the right choice, because having worked
               | extensively with C# and other languages that use async /
               | await, there are simply less footguns for the average
               | developer to hit when you reduce the surface area of
               | having to understand async / sync boundaries.
               | 
               | [0]: https://journal.stuffwithstuff.com/2015/02/01/what-
               | color-is-...
               | 
               | [1]: HN discussion:
               | https://news.ycombinator.com/item?id=8984648
        
               | neonsunset wrote:
               | Green Threads _increase_ the footgun count as methods
               | which return tasks are rather explicit about their
               | nature. The domain of async /await is well-studied, and
               | enables crucial patterns that, like in my previous
               | example, Green Threads do nothing to improve the UX of in
               | any way. This also applies to Go approach which expects
               | you to use Channels, which have their own plethora of
               | footguns, even for things trivially solved by firing off
               | a couple of tasks and awaiting their result. In Go, you
               | are also expected to use explicit synchronization
               | primitives for trivial concurrent code that require no
               | cognitive effort in C# whatsoever. C# does have channels
               | that work well, but turns out you rarely need them when
               | you can just write simple task-based code instead.
               | 
               | I'm tired of this, that one article is bad, and
               | incorrect, and promotes straight-up harmful intuition and
               | probably sets the industry in terms of concurrent and
               | asynchronous programming back by 10 years in the same way
               | misinterpreting Donald Knuth's quote did in terms of
               | performance.
        
               | kaba0 wrote:
               | That's a very simplistic view. Especially that java
               | does/will provide "structured concurrency" as something
               | analogous to structured control flow, vs gotos.
               | 
               | Also, nothing prevents you from building your own, more
               | limited but safer (the two always come together!)
               | abstraction on top, but you couldn't express Loom on
               | async as the primitive.
        
             | jayd16 wrote:
             | It would break a lot of the native interop and UI code devx
             | of the language. Java was never as nice in those categories
             | so it had less to lose going this path.
        
         | fulafel wrote:
         | Can you expand on how the benefit in your rewrite came about?
         | Threads don't consume CPU when they're waiting for the DB,
         | after all. And threads share memory with each other.
         | 
         | (I guess scaling to ridiculous levels you could be approaching
         | trouble if you have O(100k) outstanding DB queries per
         | application server, hope you have a DB that can handle millions
         | of oustanding DB queries then!)
        
           | segfaltnh wrote:
           | In large numbers the cost of switching between threads does
           | consume CPU while they're waiting for the database. This is
           | why green threads exist, to have large numbers of in flight
           | work executing over a smaller number of OS threads.
        
             | fulafel wrote:
             | When using OS threads, there's no switching when they are
             | waiting for a socket (db connection). The OS knows to wake
             | the thread up only when there's something new to see on the
             | connection.
        
       | pansa2 wrote:
       | Are these Virtual Threads the feature that was previously known
       | as "Project Loom"? Lightweight threads, more-or-less equivalent
       | to Go's goroutines?
        
         | Skinney wrote:
         | Yes
        
         | giamma wrote:
         | Yes, at EclipseCon 2022 an Oracle manager working on the
         | Helidon framework presented their results replacing the Helidon
         | core, which was based on Netty (and reactive programming) with
         | Virtual Threads (using imperative programming). [1].
         | 
         | Unfortunately the slides from that presentation were not
         | uploaded to the conference site, but this article summarizes
         | [2] the most significant metrics. The Oracle guy claimed that
         | by using Virtual Threads Oracle was able to implement, using
         | imperative Java, a new engine for Helidon (called Nima) that
         | had identical performance to the old engine based on Netty,
         | which is (at least in Oracle's opinion) the top performing
         | reactive HTTP engine.
         | 
         | The conclusion of the presentation was that based on Oracle's
         | experience imperative code is much easier to write, read and
         | maintain with respect to reactive code. Given the identical
         | performance achieved with Virtual Threads, Oracle was going to
         | abandon reactive programming in favor of imperative programming
         | and virtual threads in all its products.
         | 
         | [1] https://www.eclipsecon.org/2022/sessions/helidon-nima-
         | loom-b...
         | 
         | [2] https://medium.com/helidon/helidon-n%C3%ADma-helidon-on-
         | virt...
        
         | pgwhalen wrote:
         | Yes. It's not that the feature was previously known under a
         | different name - Project Loom is the OpenJDK project, and
         | Virtual Threads are the main feature that has come out of that
         | project.
        
         | tomp wrote:
         | They're not equivalent to Go's goroutines.
         | 
         | Go's goroutines are preemptive (and Go's development team went
         | through a lot of pain to make them such).
         | 
         | Java's lightweight threads aren't.
         | 
         | Java's repeating the same mistakes that Go made (and learned
         | from) 10 years ago.
        
           | jayd16 wrote:
           | Virtual threads could be scheduled pre-emptively but
           | currently the scheduler will wait for some kind of thread
           | sleep to schedule another virtual thread. That's just a
           | scheduler implementation detail and the spec is such that a
           | time slice scheduler could be implemented.
        
             | tomp wrote:
             | Yes, but the problem is that the spec is such that
             | preemptive blocking doesn't _need_ to be implemented.
             | 
             | That means that Java programmers have to be very careful
             | when writing code, lest they block the entire underlying
             | (OS) thread!
             | 
             | Again, Go already went through that experience. It was
             | painful. Java should have learned and implemented it from
             | the start
        
               | jayd16 wrote:
               | I don't know. The language already has Thread.Yield. If
               | your use case is such that you have starvation and care
               | about it, it seems trivial to work around.
               | 
               | Still, an annoying gotcha if it hits you unexpectedly.
        
             | nimish wrote:
             | This is a really unfortunate gotcha that's not at all
             | obvious. Does it kick preemption up a layer to the OS then?
        
               | Jtsummers wrote:
               | The "not at all obvious" gotcha is described in the
               | documentation near the top, under the heading "What is a
               | Virtual Thread?":
               | 
               | https://docs.oracle.com/en/java/javase/21/core/virtual-
               | threa...
               | 
               | > Like a platform thread, a virtual thread is also an
               | instance of java.lang.Thread. However, a virtual thread
               | isn't tied to a specific OS thread. A virtual thread
               | still runs code on an OS thread. However, when code
               | running in a virtual thread calls a blocking I/O
               | operation, the Java runtime suspends the virtual thread
               | until it can be resumed. The OS thread associated with
               | the suspended virtual thread is now free to perform
               | operations for other virtual threads.
               | 
               | It's not been hidden at all in their presentation on
               | virtual threads.
               | 
               | The OS thread that the virtual thread is mounted to can
               | still be preempted, but that won't free up the OS thread
               | for another virtual thread. However, if you use them for
               | what they're intended for this shouldn't be a problem. In
               | practice, it will be because no one can be bothered to
               | RTFM.
        
       | LinXitoW wrote:
       | From my very limited exposure to virtual threads and the older
       | solution (thread pools), the biggest hurdle was the extensive use
       | of ThreadLocals by most popular libraries.
       | 
       | In one project I had to basically turn a reactive framework into
       | a one thread per request framework, because passing around the
       | MDC (a kv map of extra logging information) was a horrible pain.
       | Getting it to actually jump ship from thread to thread AND
       | deleting it at the correct time was basically impossible.
       | 
       | Has that improved yet?
        
         | bberrry wrote:
         | If you are already in a reactive framework, why would you
         | change to virtual threads? Those frameworks pool threads and
         | have their own event loop so I would say they are not suitable
         | for virtual thread migration.
        
           | brabel wrote:
           | Yes, if you're happy with the reactive frameworks there's no
           | reason to migrate. Most people, however, would love to remove
           | their complexities from their code bases. Virtual Threads are
           | much, much easier to program with. There's downsides, like
           | not being able to easily limit concurrency, having to
           | implement your own timeout mechanisms etc. but that will
           | probably be provided by a common lib sooner or later which
           | hopefully provides identical features to reactive frameworks,
           | while being much, much simpler.
        
         | vbezhenar wrote:
         | What do you mean by hurdle? ThreadLocals work just fine with
         | virtual threads.
        
           | brabel wrote:
           | It's not recommended though.
           | 
           | See https://openjdk.org/jeps/429
           | 
           | If you keep ThreadLocal variables, they get inherited by
           | child Threads. If you make many thousands of them, the memory
           | footprint becomes completely unacceptable. If the memory used
           | by ThreadLocal variables is large, it also makes it more
           | expensive to create new Threads (virtual or not), so you lose
           | most advantages of Virtual Threads by doing that.
        
             | bberrry wrote:
             | I don't think that's correct. ThreadLocals should behave
             | just like on regular OS threads, the difference is that you
             | can suddenly create millions of them.
             | 
             | You used to be able to depend on OS threads getting reused
             | because you were pooling them. You can do the same with
             | virtual threads if you wish and you will get the same
             | behavior. The difference is we ought to spawn new threads
             | per task now.
             | 
             | Side note, you have to specifically use
             | InheritableThreadLocal to get the inheritance behavior you
             | speak of.
        
         | joshlemer wrote:
         | I faced this issue once. I solved it by creating a
         | wrapping/delegating Executor, which would capture the MDC from
         | the scheduling thread at schedule-time, and then at execute-
         | time, set the MDC for the executing thread, and then clear the
         | MDC after the execution completes. Something like...
         | class MyExecutor implements Executor {             private
         | final Executor delegate;             public MyExecutor(Executor
         | delegate) {                 this.delegate = delegate;
         | }             @Override             public void
         | execute(@NotNull Runnable command) {                 var mdc =
         | MDC.getCopyOfContextMap();                 delegate.execute(()
         | -> {                     MDC.setContextMap(mdc);
         | try {                         command.run();
         | } finally {                         MDC.clear();
         | }                 });             }         }
        
       | davidtos wrote:
       | I did some similar testing a few days ago[1]. Comparing platform
       | threads to virtual threads doing API calls. They mention the
       | right conditions like having high task delays, but it also
       | depends on what the task is. Threads.sleep(1) performs better on
       | virtual threads than platform threads but a rest call taking a
       | few ms performs worse.
       | 
       | [1] https://davidvlijmincx.com/posts/virtual-thread-
       | performance-...
        
       | pron wrote:
       | Virtual threads do one thing: they allow creating lots of
       | threads. This helps throughput due to Little's law [1]. But
       | because this server here saturates the CPU with only a few
       | threads (it doesn't do the fanout modern servers tend to do),
       | this means that no significant improvements can be provided by
       | virtual threads (or asynchronous programming, which operates on
       | the same principle) _while keeping everything else in the system
       | the same_ , especially since everything else in that server was
       | optimised for over two decades under the constraints of expensive
       | threads (such as the deployment strategy to many small instances
       | with little CPU).
       | 
       | So it looks like their goal was: try adopting a new technology
       | without changing any of the aspects designed for an old
       | technology and optimised around it.
       | 
       | [1]: https://youtu.be/07V08SB1l8c
        
         | hitekker wrote:
         | This take sounds reasonable to me. But I'm not an expert, and
         | I'd be curious to hear an opposing view if there's one.
        
           | kaba0 wrote:
           | He is as much of an expert as it gets, as he is the leader of
           | the Loom project.
        
           | binary132 wrote:
           | Greenlets ultimately have to be scheduled onto system threads
           | at the end of the day unless you have a lightweight thread
           | model of some sort supported by the OS, so it's a little bit
           | misleading depending on how far down the stack you want to
           | think about optimizing for greenlets. You could potentially
           | have a poor implementation of task scheduling for some legacy
           | compatibility reason, however. I guess I'd be curious about
           | the specifics of what pron is discussing.
        
             | troupo wrote:
             | Even though yes, in the end you have to map onto system
             | threads, there are still quite a fee things you can do. But
             | this is infeasible for Java, unfortunately.
             | 
             | For example, in Erlang the entire VM is built around green
             | threads with a huge amount of guarantees and mechanisms:
             | https://news.ycombinator.com/item?id=40989995
             | 
             | When your entire system is optimized for green threads, the
             | question of "it still needs to map onto OS threads" loses
             | its significance
        
           | michaelt wrote:
           | Standard/OS threads in Java use about a megabyte of memory
           | per thread, so running 256 threads uses about 256 MB of
           | memory before you've even started allocating things on the
           | heap.
           | 
           | Virtual threads are therefore useful if you're writing
           | something like a proxy server, where you want to allow lots
           | of concurrent connections, and you want to use the familiar
           | thread-per-connection programming model.
        
         | jayceedenton wrote:
         | I guess at least their work has confirmed what we probably
         | already knew intuitively: if you have CPU-intensive tasks,
         | without waiting on anything, and you want to execute these
         | concurrently, use traditional threads.
         | 
         | The advice "don't use virtual threads for that, it will be
         | inefficient" really does need some evidence.
         | 
         | Mildly infuriating though that people may read this and think
         | that somehow the JVM has problems in its virtual thread
         | implementation. I admit their 'Unexpected findings' section is
         | very useful work, but the moral of this story is: don't use
         | virtual threads for this that they were not intended for. Use
         | them when you want a very large number of processes executing
         | concurrently, those processes have idle stages, and you want a
         | simpler model to program with than other kinds of async.
        
       | bberrry wrote:
       | I don't understand these benchmarks at all. How could it possibly
       | take virtual threads 40-50 seconds to reach maximum throughput
       | when getting a number of tasks submitted at once?
        
       | cayhorstmann wrote:
       | I looked at the replication instructions at
       | https://github.com/blueperf/demo-vt-issues/tree/main, which
       | reference this project: https://github.com/blueperf/acmeair-
       | authservice-java/tree/ma...
       | 
       | What "CPU-intensive apps" did they test with? Surely not acmeair-
       | authservice-java. A request does next to nothing. It
       | authenticates a user and generates a token. I thought it at least
       | connects to some auth provider, but if I understand it correctly,
       | it just uses a test config with a single test user (https://openl
       | iberty.io/docs/latest/reference/config/quickSta...). Which would
       | not be a blocking call.
       | 
       | If the request tasks don't block, this is not an interesting
       | benchmark. Using virtual threads for non-blocking tasks is not
       | useful.
       | 
       | So, let's hope that some of the tests were with tasks that block.
       | The authors describe that a modest number of concurrent requests
       | (< 10K) didn't show the increase in throughput that virtual
       | threads promise. That's not a lot of concurrent requests, but one
       | would expect an improvement in throughput once the number of
       | concurrent requests exceeds the pool size. Except that may be
       | hard to see because OpenLiberty's default is to keep spawning new
       | threads (https://openliberty.io/blog/2019/04/03/liberty-
       | threadpool-au...). I would imagine that in actual deployments
       | with high concurrency, the pool size will be limited, to prevent
       | the app from running out of memory.
       | 
       | If it never gets to the point where the number of concurrent
       | requests significantly exceeds the pool size, this is not an
       | interesting benchmark either.
        
       ___________________________________________________________________
       (page generated 2024-07-17 23:07 UTC)