[HN Gopher] Java Virtual Threads: A Case Study
___________________________________________________________________
Java Virtual Threads: A Case Study
Author : mighty_plant
Score : 148 points
Date : 2024-07-14 05:52 UTC (3 days ago)
(HTM) web link (www.infoq.com)
(TXT) w3m dump (www.infoq.com)
| exabrial wrote:
| What is the virtual thread / event loop pattern seeking to
| optimize? Is it context switching?
|
| A number of years ago I remember trying to have a sane discussion
| about "non blocking" and I remember saying "something" will block
| eventually no matter what... anything from the buffer being full
| on the NIC to your cpu being at anything less than 100%. Does it
| shake out to any real advantage?
| kevingadd wrote:
| One of the main reasons to do virtual threads is that it allows
| you to write naive "thread per request" code and still scale up
| significantly without hitting the kind of scaling limits you
| would with OS threads.
| hashmash wrote:
| The problem with the naive design is that even with virtual
| threads, you risk running out of (heap) memory if the threads
| ever block. Each task makes a bit of progress, allocates some
| objects, and then lets another one do the same thing.
|
| With virtual threads, you can limit the damage by using a
| semaphore, but you still need to tune the size. This isn't
| much different than sizing a traditional thread pool, and so
| I'm not sure what benefit virtual threads will really have in
| practice. You're swapping one config for another.
| immibis wrote:
| Async does exactly the same by the way.
| initplus wrote:
| The benefits from virtual threads come from the simple API
| that it presents to the programmer. It's not a performance
| optimization.
| hashmash wrote:
| But that same benefit was always available with platform
| threads -- a simple API. What is the real gain by using
| virtual threads? It's either going to be performance or
| memory utilization.
| groestl wrote:
| It's combining the benefits from async models (state
| machines separated from os threads, thus more optimal for
| I/O bound workload), with the benefits from proper
| threading models (namely the simpler human interface).
|
| Memory utilization & performance is going to be similar
| to the async callback mess.
| hashmash wrote:
| Why is an async model better than using OS threads for an
| I/O bound workload? The OS is doing async stuff
| internally and shielding the complexity with threads.
| With virtual threads this work has shifted to the JVM.
| Can the JVM do threads better than the OS?
| adgjlsfhk1 wrote:
| it can do a much better job because there isn't a
| security boundary. OS thread scheduling requires sys
| calls and invalidate a bunch of cache to prevent timing
| leaks
| zokier wrote:
| > Can the JVM do threads better than the OS?
|
| Yes. The JVM has far more opportunities for optimizing
| threads because it doesn't need to uphold 50 years of
| accumulated invariants and compatibility that current
| OSes do, and JVM has more visibilty on the application
| internals.
| mrsilencedogood wrote:
| "Why is an async model better than using OS threads for
| an I/O bound workload?"
|
| Because evented/callback-driven code is a nightmare to
| reason about and breaks lots of very basic tools, like
| the humble stack trace.
|
| Another big thing for me is resource management -
| try/finally don't work across callback boundaries, but do
| work within a virtual thread. I recently ported a netty-
| based evented system to virtual threads and a very long-
| standing issue - resource leakage - turned into one very
| nice try/finally block.
| lichtenberger wrote:
| Throughput. The code can be "suspended" on a blocking
| call (I/O, where the platform thread usually is wasted,
| as the CPU has nothing to do during this time). So, the
| platform thread can do other work in the meantime.
| CrimsonRain wrote:
| Create 100k platform threads and you'll find out.
| packetlost wrote:
| Yeah, and it's generally good to be RAM limited instead of
| CPU, no? The alternative is blowing a bunch of time on
| syscalls and OS scheduler overhead.
|
| Also the virtual threads run on a "traditional" thread pool
| to my understanding, so you can just tweak the number of
| worker threads to cap the total concurrency.
|
| The benefit is it's overall more efficient (in the general
| case) and lets you write linear blocking code (as opposed
| to function coloring). You don't have to use it, but it's
| nice that it's there. Now hopefully Valhalla actually makes
| it in eventually
| hashmash wrote:
| The OS scheduler is still there (for the carrier
| threads), but now you've added on top of that FJ pool
| based scheduler overhead. Although virtual threads don't
| have the syscall overhead when they block, there's a new
| cost caused by allocating the internal continuation
| object, and copying state into it. This puts more
| pressure on the garbage collector. Context switching cost
| due to CPU cache thrashing doesn't go away regardless of
| which type of thread you're using.
|
| I've not yet seen a study that shows that virtual threads
| offer a huge benefit. The Open Liberty study suggests
| that they're worse than the existing platform threads.
| zokier wrote:
| > The OS scheduler is still there (for the carrier
| threads), but now you've added on top of that FJ pool
| based scheduler overhead.
|
| Ideally carrier threads would be pinned to isolated cpu
| cores, which removes most aspects of OS scheduler from
| the picture
| zokier wrote:
| > I've not yet seen a study that shows that virtual
| threads offer a huge benefit.
|
| Not exactly Java virtual threads, but a study on how
| userland threads beat kernel threads.
|
| https://cs.uwaterloo.ca/~mkarsten/papers/sigmetrics2020.h
| tml
|
| For quick results, check figures 11 and 15 from the
| (preprint) paper. Userland threads ("fred") have ~50%
| higher throughput while having orders of magnitude better
| latency at high load levels, in a real-world application
| (memcached).
| packetlost wrote:
| The study says there's _surprising_ performance problems
| with Java 's virtual thread implementation. Their test of
| throughput was also hilarious, they put 2000 OS threads
| vs 2000 virtual threads: most of the time OS threads
| don't start falling apart until 100k+ threads. You _can_
| architect an application such that you can handle 200k
| simultaneous connections using platform-thread-per-core,
| but it 's harder to reason about than the linear,
| blocking code that virtual threads and async allow for.
|
| > Context switching cost due to CPU cache thrashing
| doesn't go away regardless of which type of thread you're
| using.
|
| Except it's not a context switch? You're jumping to
| another instruction in the program, one that should be
| _very_ predictable. You _might_ lose your cache, but it
| will depend on a ton of factors.
|
| > there's a new cost caused by allocating the internal
| continuation object, and copying state into it.
|
| This is more of a problem with the implementation (not
| every virtual thread language does it this way), but yeah
| this is more overhead on the application. I assume
| there's improvements that can be made to ease GC
| pressure, like using object pools.
|
| Usually virtual threads are a memory vs CPU tradeoff that
| you typically use in massively concurrent IO-bound
| applications. Total throughput should take over platform
| threads with hundreds of thousands of connections, but
| below that probably perform worse, I'm not that surprised
| by that.
| electroly wrote:
| > Except it's not a context switch? You're jumping to
| another instruction in the program, one that should be
| very predictable. You might lose your cache, but it will
| depend on a ton of factors.
|
| Java virtual threads are stackful; they have to save and
| restore the stack every time they mount a different
| virtual thread to the platform thread. They do this by
| naive[0] copying of the stack out to a heap allocation
| and then back again, every time. That's clearly a context
| switch that you're paying for; it's just not in the
| kernel. I believe this is what the person you're replying
| to is talking about.
|
| [0] Not totally naive. They do take some effort to copy
| only subsets of the stack if they can get away with it.
| But it's still all done by copies. I don't know enough to
| understand why they need to copy and can't just swap
| stack pointers. I think it's related to the need to
| dynamically grow the stack when the thread is active vs.
| having a fixed size heap allocation to store the stack
| copy.
| dikei wrote:
| > The problem with the naive design is that even with
| virtual threads, you risk running out of (heap) memory if
| the threads ever block.
|
| The key with virtual threads is they are so light weight
| that you can have thousands of them running concurrently:
| even when they block for I/O, it doesn't matter. It's
| similar to light weight coroutine in other language like Go
| or Kotlin.
| imtringued wrote:
| What you are complaining about has nothing to do with
| thread pools or virtual threads. You're pointing out the
| fact that more parallelism will also need more hardware and
| that a finite hardware budget will need a back pressure
| strategy to keep resource consumption within a limit. While
| you might be correct that "sizing a traditional thread
| pool" is a back pressure strategy that can be applied to
| virtual threads, the problem with it is that IO bound
| threads will prevent CPU bound threads from making
| progress. You don't want to apply back pressure based on
| the number of tasks. You want back pressure to be in
| response to resource utilization, so that enough tasks get
| scheduled to max out the hardware.
|
| This is a common problem with people using Java parallel
| streams, because they by default share a single global
| thread pool and the way to use your own thread pool is also
| extremely counterintuitive, because it essentially relies
| on some implicit thread local magic to choose to distribute
| the stream in the thread pool that the parallel stream was
| launched on, instead of passing it as a parameter.
|
| It would be best if people came up with more dynamic back
| pressure strategies, because this is a more general problem
| that goes way beyond thread pools. In fact, one of the key
| problems of automatic parallelization is deciding at what
| point there is too much parallelization.
| fzeindl wrote:
| Does it shake out to any real advantage?
|
| To put it shortly: Writing single-threaded blocking code is far
| easier for most people and has many other benefits, like more
| understandable and readable programs:
| https://www.youtube.com/watch?v=449j7oKQVkc
|
| The main reason why non-blocking IO with it's style of
| intertwining concurrency and algorithms came along is that
| starting a thread for every request was too expensive. With
| virtual threads that problem is eliminated so we can go back to
| writing blocking code.
| nlitened wrote:
| > is far easier for most people
|
| I'd say that writing single-threaded code is far easier for
| _all_ people, even async code experts :)
|
| Also, single-threaded code is supported by programming
| language facilities: you have a proper call stack, thread-
| local vars, exceptions bubbling up, structured concurrency,
| simple resource management (RAII, try-with-resources, defer).
| Easy to reason and debug on language level.
|
| Async runtimes are always complicated, filled with leaky
| abstractions, it's like another language that one has to
| learn in addition, but with a less thought-out, ad-hoc
| design. Difficult to reason and debug, especially in edge
| cases
| bheadmaster wrote:
| > Async runtimes are always complicated, filled with leaky
| abstractions, it's like another language that one has to
| learn in addition, but with a less thought-out, ad-hoc
| design. Difficult to reason and debug, especially in edge
| cases
|
| Async runtimes themselves are simply attempts to bolt-on
| green threads on top of a language that doesn't support
| them on a language level. In JavaScript, async/await uses
| Promises to enable callback-code to interact with key
| language features like try/catch, for/while/break, return,
| etc. In Python, async/await is just syntax sugar for
| coroutines, which are again just syntax sugar for CPS-style
| classes with methods split at each "yield". Not sure about
| Rust, but it probably also uses some Rust macro magic to do
| something similar.
| logicchains wrote:
| >Async runtimes themselves are simply attempts to bolt-on
| green threads on top of a language that doesn't support
| them on a language level.
|
| Haskell supports async code while also supporting green
| threads on a language level, and the async code has most
| of the same issues as async code in any other languages.
| whateveracct wrote:
| What problems exactly? Haskell has a few things that imo
| it does better than most languages in this area:
|
| - All IO is non-blocking by default.
|
| - FFI support for interruptible.
|
| - Haskell threads can be preempted externally - this
| allows you to ensure they never leak. Vs a goroutine that
| can just spin forever if it doesn't explicitly yield.
|
| - There are various stdlib abstractions for building
| concurrent programs in a compositional way.
| kbolino wrote:
| > Haskell threads can be preempted externally - this
| allows you to ensure they never leak. Vs a goroutine that
| can just spin forever if it doesn't explicitly yield.
|
| Goroutines are preemptible by the runtime (since
| https://go.dev/doc/go1.14#runtime) but they're still not
| addressable or killable through the language itself.
| derriz wrote:
| Indeed. Async runtimes/sytles are attempts to provide a
| more readable/useable syntax for CPS[1]. CPS originally
| had nothing to do with blocking/non-blocking or multi-
| threading but arose as a technique to structure compiler
| code.
|
| Its attraction for non-blocking coding is that it allows
| hiding the multi-threaded event dispatching loop. But as
| the parent comment suggests, this abstraction is
| extremely leaky. And in addition, CPS in non-functional
| languages or without syntactic sugar has poor
| readability. Improving the readability requires compiler
| changes in the host language - so many languages have
| added compiler support to further hide the CPS
| underpinnings of their async model.
|
| I've always felt this was a big mistake in our industry -
| all this effort not only in compilers but also in
| debuggers/IDE - building on a leaky abstraction. Adding
| more layers of leaky abstractions has only made the issue
| worse. Async code, at first glance, looks simple but is a
| minefield for inexperienced/non-professional software
| engineers.
|
| It's annoying that Rust switched to async style - the
| abstraction leakiness immediately hits you, as the
| "hidden event dispatching loop" remains a real dependency
| even if it's not explicit in the code. Thus libraries
| using asycn cannot generally be used together although
| last time i looked, tokio seems to have become the de-
| facto standard.
|
| [1] https://en.wikipedia.org/wiki/Continuation-
| passing_style
| kaba0 wrote:
| I absolutely agree that the virtual/green thread style is
| much better, more ergonomic, less likely to be correct,
| etc, but I can't fault Rust's choice, given it being a
| low-level language without a fat runtime, making it
| possible to be called into from other runtimes. What the
| JVM does is simply not possible that way.
| dwattttt wrote:
| > Not sure about Rust, but it probably also uses some
| Rust macro magic to do something similar.
|
| Much the same as JavaScript I understand, but no macros;
| the compiler turns them into Futures that can be polled
| xxs wrote:
| >I'd say that writing single-threaded code is far easier
| for _all_ people, even async code experts :)
|
| While 'async' is just a name, underneath it's epoll - and
| the virtual threads would not perform better than a proper
| NIO (epoll) server. I dont consider myself an 'async
| expert' but I have my share of writing NIO code (dare say
| not terrible at all)
| kaba0 wrote:
| Virtual threads literally replace the "blocking" IO call
| issued by the user, by a proper NIO call, mounting the
| issuer virtual thread when it signals.
| chipdart wrote:
| > To put it shortly: Writing single-threaded blocking code is
| far easier for most people and has many other benefits, like
| more understandable and readable programs:
|
| I think you're missing the whole point.
|
| The reason why so many smart people invest their time on
| "virtual threads" is developer experience. The goal is to
| turn writing event-driven concurrent code into something
| that's as easy as writing single-threaded blocking code.
|
| Check why C#'s async/await implementation is such a huge
| success and replaced all past approaches overnight. Check why
| node.js is such a huge success. Check why Rust's async
| support is such a hot mess. It's all about developer
| experience.
| kitd wrote:
| I think he was making the same point as you: writing for
| virtual threads is like writing for single-threaded
| blocking code.
| written-beyond wrote:
| As someone who has written multiple productions services
| with Async Rust, that are under constant load, I disagree.
| I've had team members who have only written in C, pick up
| and start building very comprehensive and performant
| services in Rust in a matter of days.
|
| How do you developers spew such strong opinions without
| taking a moment to think about what you're about to say.
| Rust cannot be directly compared to C#, Java or even Go.
|
| You don't get a runtime or a GC with rust. The developer
| experience is excellent, you get a lot of control over
| everything you're building with it. Yes it's not as magical
| as languages and runtimes like you've mentioned, but the
| fact that I can at anytime rip those abstractions off and
| make my service extremely lightweight and performant is not
| something those languages will allow you to do.
|
| And this is coming from someone who's written non blocking
| services before Async rust was a thing with just MIO.
|
| The very fact Rust gets mentioned between these languages
| should be a tribute to the efforts of it's maintainers and
| core team. The amount of tooling and features they've added
| into the language gives developers of every realm liberty
| to try and build what they want.
|
| Honestly, you can hold whatever opinion you want on any
| language but your comparison really doesn't make sense.
| Nullabillity wrote:
| > To put it shortly: Writing single-threaded blocking code is
| far easier for most people. [snip] With virtual threads that
| problem is eliminated so we can go back to writing blocking
| code.
|
| This is the core misunderstanding/dishonesty behind the
| Loom/Virtual Threads hype. Single-threaded blocking code is
| easy, yes. But that ease comes from being single-threaded,
| not from not having to await a few Futures.
|
| But Loom doesn't magically solve the threading problem. It
| hides the Futures, but that just means that you're now
| writing a multi-threaded program, without the guardrails that
| modern Future-aware APIs provide. It's the worst of all
| worlds. It's the scenario that gave multi-threading such a
| bad reputation for inscrutable failures in the first place.
| gregopet wrote:
| It's a brave attempt to release the programmer from worrying or
| even thinking about thread pools and blocking code. Java has
| gone all in - they even cancelled a non-blocking rewrite of
| their database driver architecture because why have that if you
| won't have to worry about blocking code? And the JVM really is
| a marvel of engineering, it's really really good at what it
| does, so what team to better pull this off?
|
| So far, they're not quite there yet: the issue of "thread
| pinning" is something developers still have to be aware of. I
| hear the newest JVM version has removed a few more cases where
| it happens, but will we ever truly 100% not have to care about
| all that anymore?
|
| I have to say things are already pretty awesome however. If you
| avoid the few thread pinning causes (and can avoid libraries
| that use them - although most of not all modern libraries have
| already adapted), you can write really clean code. We had to
| rewrite an old app that made a huge mess tracking a process
| where multiple event sources can act independently, and virtual
| threads seemed the perfect thing for it. Now our business logic
| looks more like a game loop and not the complicated mix of
| pollers, request handlers, intermediate state persisters (with
| their endless thirst for various mappers) and whatnot that it
| was before (granted, all those things weren't there just
| because of threading.. the previous version was really really
| shitily written).
|
| It's true that virtual threads sometimes hurt performance
| (since their main benefit is cleaner simpler code). Not by
| much, usually, but a precisely written and carefully tuned
| piece of performance critical code can often still do things
| better than automatic threading code. And as a fun aside, some
| very popular libraries assumed the developer is using thread
| pools (before virtual threads, which non trivial Java app
| didn't? - ok nobody answer that, I'm sure there are cases :D)
| so these libraries had performance tricks (ab)using thread pool
| code specifics. So that's another possible performance issue
| with virtual threads - like always with performance of course:
| don't just assume, try it and measure! :P
| immibis wrote:
| So... What is it seeking to optimize? Why did you need a
| thread pool before but not any more? What resource was
| exhausted to prevent you from putting every request on a
| thread?
| davidgay wrote:
| A thread per request has a high risk of overcommitting on
| CPU use, leading to a different set of problems. Virtual
| threads are scheduled on a fixed-size (based on number of
| cores) underlying (non-virtual) thread pool to avoid this
| problem.
| immibis wrote:
| Why can't virtual threads overcommit CPU use? If I have 4
| CPUs and 4000 virtual threads running CPU-bound code, is
| that not overcommit? A system without overcommit would
| refuse to create the 5th thread.
| detinho wrote:
| I think parent is saying overcommit with OS threads. 4k
| requests = 4k OS threads. That would lead to the problems
| parent is talking about.
| immibis wrote:
| Why wouldn't 4k virtual threads lead to the same
| problems?
| troupo wrote:
| Because they don't create 4k real threads, and can be
| scheduled on n=CPU Cores OS threads
| jmaker wrote:
| Briefly: The cost of spawning schedulable entities, memory
| and the time to execution. Virtual threads, i.e., fibers,
| entertain lightweight stacks. You can spawn as many as you
| like immediately. Your runtime system won't go out of
| memory as easily. In addition, the spawning happens much
| faster in user space. You're not creating kernel threads,
| which is a limited and not cheap resource, whence the
| pooling you're comparing it to. With virtual threads you
| can do thread per request explicitly. It makes most sense
| for IO-bound tasks.
| gifflar wrote:
| This article nicely describes the differences between
| threads and virtual threads:
| https://www.infoq.com/articles/java-virtual-threads/
|
| I think it's definitely worth a read.
| gregopet wrote:
| It's mainly trying to make you not worry about how many
| threads you create (and not worry about the caveats that
| come with optimising how many threads you create, which is
| something you are very often forced to do).
|
| You can create a thread in your code and not worry whether
| that thing will then be some day run in a huge loop or
| receive thousands of requests and therefore spend all your
| memory on thread overhead. Go and other languages (in
| Java's ecosystem there's Kotlin for example) employ similar
| mechanisms to avoid native thread overhead, but you have to
| think about them. Like, there's tutorial code where
| everything is nice & simple, and then there's real world
| code where a lot of it must run in these special constructs
| that may have little to do with what you saw in those first
| "Hello, world" samples.
|
| Java's approach tries to erase the difference between
| virtual and real threads. The programmer should have to
| employ no special techniques when using virtual threads and
| should be able to use everything the language has to offer
| (this isn't true in many languages' virtual/green threads
| implementations). Old libraries should continue working and
| perhaps not even be aware they're being run on virtual
| threads (although, caveats do apply for low level/high
| performance stuff, see above posts). And libraries that you
| interact with don't have to care what "model" of green
| threading you're using or specifically expose "red" and
| "blue" functions.
| giamma wrote:
| You will still have to worry, too many virtual threads
| will imply too much context switching. However, virtual
| threads will be always interruptable on I/O, as they are
| not mapped to actual o.s. threads, but rather simulated
| by the JVM which will executed a number of instructions
| for each virtual thread.
|
| This gives the chance to the JVM to use real threads more
| efficiently, avoiding that threads remain unused while
| waiting on I/O (e.g. a response from a stream). As soon
| as the JVM detects that a physical thread is blocked on
| I/O, a semaphore, a lock or anything, it will reallocate
| that physical thread to running a new virtual thread.
| This will reduce latency, context switch time (the
| switching is done by the JVM that already globally
| manages the memory of the Java process in its heap) and
| will avoid or at least largely reduce the chance that a
| real thread remains allocated but idle as it's blocked on
| I/O or something else.
| frant-hartm wrote:
| What do you mean by context switching?
|
| My understanding is that virtual threads mostly eliminate
| context switching - for N CPUs JVM creates N platform
| threads and they run virtual threads as needed. There is
| no real context switching apart from GC and other JVM
| internal threads.
|
| A platform thread picking another virtual thread to run
| after its current virtual thread is blocked on IO is not
| a context switch, that is an expensive OS-level
| operation.
| anonymousDan wrote:
| Does Java's implementation of virtual threads perform any
| kind of work stealing when a particular physical thread
| has no virtual threads to run (e.g. they are all blocked
| on I/O)?
| mike_hearn wrote:
| It does. They get scheduled onto the ForkJoinPool which
| is a work stealing pool.
| immibis wrote:
| "they run virtual threads as needed" - so when one
| virtual thread is no longer needed and another one is
| needed, they switch context, yes?
| frant-hartm wrote:
| This is called mounting/un-mounting and is much cheaper
| than a context switch.
| immibis wrote:
| This is a type of context switch. You are saying dollars
| are cheaper than money.
| peeters wrote:
| It's been a really long time since I dealt with anything
| this low level, but in my very limited and ancient
| experience when people talk about context switching
| they're talking specifically about the userland process
| yielding execution back to the kernel so that the
| processor can be reassigned to a different
| process/thread. Naively, if the JVM isn't actually
| yielding control back to the kernel, it has the freedom
| to do things in a much more lightweight manner than the
| kernel would have to.
|
| So I think it's meaningful to define what we mean by
| context switch here.
| giamma wrote:
| The JVM will need to do context switching when
| reallocating the real thread that is running a blocked
| virtual thread to the next available virtual thread. It
| won't be CPU context switching, but context switching
| happens at the JVM level and represents an effort.
| frant-hartm wrote:
| Ok. This JVM-level switching is called mounting/un-
| mounting of the virtual thread and is supposed to be
| several orders of magnitude cheaper compared to normal
| context switch. You should be fine with millions of
| virtual threads.
| immibis wrote:
| It seems that the answer to the question was "memory".
| Stack allocations, presumably. You have answered by
| telling us that virtual threads are better than real
| threads because real threads suck, but you didn't say why
| they suck or why virtual threads don't suck in the same
| way.
| mike_hearn wrote:
| Real threads don't suck but they pay a price for
| generality. The kernel doesn't know what software you're
| going to run, and there's no standards for how that
| software might use the stack. So the kernel can't
| optimize by making any assumptions.
|
| Virtual threads are less general than kernel threads. If
| you use a virtual thread to call out of the JVM you lose
| their benefits, because the JVM becomes like the kernel
| and can't make any assumptions about the stack.
|
| But if you are running code controlled by the JVM, then
| it becomes possible to do optimizations (mostly stack
| related) that otherwise can't be done, because the GC and
| the compiler and the threads runtime are all developed
| together and work together.
|
| Specifically, what HotSpot can do moving stack frames to
| and from the heap very fast, which interacts better with
| the GC. For instance if a virtual thread resumes,
| iterates in a loop and suspends again, then the stack
| frames are never copied out of the heap onto the kernel
| stack at all. Hotspot can incrementally "pages" stack
| frames out of the heap. Additionally, the storage space
| used for a suspended virtual thread stack is a lot
| smaller than a suspended kernel stack because a lot of
| administrative goop doesn't need to be saved at all.
| brabel wrote:
| OS Threads do not suck, they're great. But they are
| expensive to create as they require a syscall, and
| they're expensive to maintain as they consume quite a bit
| of memory just to exist, even if you don't need it (due
| to how they must pre-allocate a stack which apparently is
| around 2MB initially, and can't be made smaller as in
| most cases you will need even more, so it would make most
| cases worse).
|
| Virtual Threads are very fast to create and allocate only
| the memory needed by the actual call stack, which can be
| much less than for OS Threads.
|
| Also, blocking code is very simple compared to the
| equivalent async code. So using blocking code makes your
| code much easier to follow. Check out examples of
| reactive frameworks for Java and you will quickly
| understand why.
| kllrnohj wrote:
| > and they're expensive to maintain as they consume quite
| a bit of memory just to exist, even if you don't need it
| (due to how they must pre-allocate a stack which
| apparently is around 2MB initially,
|
| I'm not familiar with windows, but this certainly isn't
| the case on Linux. It only costs 2mb-8mb of virtual
| address space, not actual physical memory. And there's no
| particular reason to believe the JVM can have a list of
| threads and their states more efficiently than the kernel
| can.
|
| All you really save is the syscall to create it and some
| context switching costs as the JVM doesn't need to deal
| with saving/restoring registers as there's no preemption.
|
| The downside though is you don't have any preemption,
| which depending on your usage is a really fucking massive
| downside.
| Someone wrote:
| > The downside though is you don't have any preemption,
| which depending on your usage is a [...] massive
| downside.
|
| Nobody is taking OS threads away, so you can choose to
| use them when they better fit your use case.
| chipdart wrote:
| > So... What is it seeking to optimize?
|
| The goal is to maximize the number of tasks you can run
| concurrently, while imposing on the developers a low
| cognitive load to write and maintain the code.
|
| > Why did you need a thread pool before but not any more?
|
| You still need a thread pool. Except with virtual threads
| you are no longer bound to run a single task per thread.
| This is specially desirable when workloads are IO-bound and
| will expectedly idle while waiting for external events. If
| you have a never-ending queue of tasks waiting to run, why
| should you block a thread consuming that task queue by
| running a task that stays idle while waiting for something
| to happen? You're better off starting the task and setting
| it aside the moment it awaits for something to happen.
|
| > What resource was exhausted to prevent you from putting
| every request on a thread?
| twic wrote:
| The memory overhead of threads.
| pragmatick wrote:
| > although most of not all modern libraries have already
| adapted
|
| Unfortunately kafka, for example, has not:
| https://github.com/spring-projects/spring-
| kafka/commit/ae775...
| haspok wrote:
| Just a side note, async JDBC was a thing way before Loom came
| about, and it failed miserably. I'm not sure why, but my
| guess would be is that most enterprise software is not web-
| scale, so JDBC worked well as it was.
|
| Also, all the database vendors provided their drivers
| implementing the JDBC API - good luck getting Oracle or IBM
| contribute to R2DBC.. (Actually, I stand corrected: there is
| an Oracle R2DBC driver now - it was released fairly recently
| though.)
|
| EDIT: "failed miserably" is maybe too strong - but R2DBC
| certainly doesn't have the support and acceptance of JDBC.
| frevib wrote:
| It could also be that there just isn't enough demand for a
| non-blocking JDBC. For example, Postgresql server is not
| coping very well with lots of simultaneous connections, due
| to it's (a.o.) process-per-connection model. From the
| client-side (JDBC), a small thread poool would be enough to
| max out the Postgresql server. And there is almost no
| benefit of using non-blocking vs a small thread pool.
| haspok wrote:
| I would argue the main benefit would be that the
| threadpool that the developer would create anyway would
| instead be created by the async database driver, which
| has more intimate knowledge about the server's
| capabilities. Maybe it knows the limits to the number of
| connections, or can do other smart optimizations. In any
| case, for the developer it would be a more streamlined
| experience, with less code needed, and better defaults.
| frevib wrote:
| I think we're confusing async and non-blocking? Non-
| blocking is the part what makes virtual threads more
| efficient than threads. Async is the programming style;
| e.g. do things concurrently. Async can be implemented
| with threads or non-blocking, if the API supports it. I
| was merely arguing that a non-blocking JDBC has little
| merit as the connections to a DB are limited. Non-
| blocking APIs are only beneficial when there are lots, >
| 10k connections.
|
| JDBC knows nothing about the amount of connections a
| server can handle, but to try so many connections until
| it won't connect any more.
|
| | In any case, for the developer it would be a more
| streamlined experience, with less code needed, and better
| defaults.
|
| I agree it would be best not to bother the dev with what
| is going on under the hood.
| vbezhenar wrote:
| R2DBC allows to efficiently maintain millions of
| connections to the database. But what database supports
| millions of connections? Not postgres for sure, and
| probably no other conventional database. So using reactive
| JDBC driver makes little sense, if you're going to use 1000
| connections, 1000 threads will do just fine and bring
| little overhead. Those who use Java, don't care about
| spending 100 more MB of RAM when their service already eats
| 60GB.
| merb wrote:
| Reactive drivers were not about 1000 connections, they
| were about reusing a single connection better, by queuing
| a little bit more efficient over a single connection.
| Reactive programming is not about parallelism, it's about
| concurrency.
| lmm wrote:
| > I remember saying "something" will block eventually no matter
| what... anything from the buffer being full on the NIC to your
| cpu being at anything less than 100%.
|
| Nope. You can go async all the way down, right to the
| electrical signals if you want. We usually impose some amount
| of synchronous clocking/polling for sanity, at various levels,
| but you don't have to; the world is not synchronised, the
| fastest way to respond to a stimulus will always be to respond
| when it happens.
|
| > Does it shake out to any real advantage?
|
| Of course it does - did you miss the whole C10K discussions 20+
| years ago? Whether it matters for your business is another
| question, but you can absolutely get a lot more throughput by
| being nonblocking, and if you're doing request-response across
| the Internet you generally can't afford _not_ to.
| duped wrote:
| imo the biggest difference between "virtual" threads in a
| managed runtime and "os" threads is that the latter uses a
| fixed size stack whereas the former is allowed to resize, it
| can grow on demand and shrink under pressure.
|
| When you spawn an OS thread you are paying at worst the full
| cost of it, and at best the max depth seen so far in the
| program, and stack overflows can happen even if the program is
| written correctly. Whereas a virtual thread can grow the stack
| to be exactly the size it needs at any point, and when GC runs
| it can rewrite pointers to any data on the stack safely.
|
| Virtual/green/user space threads aka stackful coroutines have
| proven to be an excellent tool for scaling concurrency in real
| programs, while threads and processes have always played
| catchup.
|
| > "something" will block eventually no matter what...
|
| The point is to allow _everything else_ to make progress while
| that resource is busy.
|
| ---
|
| At a broader scale, as a programming model it lets you
| architect programs that are designed to scale horizontally.
| With the commodization of compute in the cloud that means it's
| very easy to write a program that can be distributed as i/o
| demand increases. In principle, a "virtual" thread could be
| spawned on a different machine entirely.
| chipdart wrote:
| > What is the virtual thread / event loop pattern seeking to
| optimize? Is it context switching?
|
| Throughput.
|
| Some workloads are not CPU-bound or memory-bound, and spend the
| bulk of their time waiting for external processes to make data
| available.
|
| If your workloads are expected to stay idle while waiting for
| external events, you can switch to other tasks while you wait
| for those external events to trigger.
|
| This is particularly convenient if the other tasks you're
| hoping to run are also tasks that are bound to stay idle while
| waiting for external events.
|
| One of the textbook scenarios that suits this pattern well is
| making HTTP requests. Another one is request handlers, such as
| the controller pattern used so often in HTTP servers.
|
| Perhaps the poster child of this pattern is Node.js. It might
| not be the performance king and might be single-threaded, but
| it features in the top spots in performance benchmarks such as
| TechEmpower's. Node.js is also highly favoured in function-as-
| a-service applications, as it's event-driven architecture is
| well suited for applications involving a hefty dose of network
| calls running on memory- and CPU-constrained systems.
| pron wrote:
| No, it optimises hardware utilisation by simply allowing more
| tasks to concurrently make progress. This allows throughput to
| reach the maximum the hardware allows. See
| https://youtu.be/07V08SB1l8c.
| frevib wrote:
| They indeed optimize thread context switching. Taking the
| thread on and off the CPU is becoming expensive when there are
| thousands of threads.
|
| You are right that everything blocks, even when going to L1
| cache you have to wait 1 nanoseconds. But blocking in this
| context means waiting for "real" IO like a network request or
| spinning disk access. Virtual threads take away the problem
| that the thread sits there doing nothing for a while as it is
| waiting for data, before it is context switched.
|
| Virtual threads won't improve CPU-bound blocking. There the
| thread is actually occupying the CPU, so there is no problem of
| the thread doing nothing as with IO-bound blocking.
| kbolino wrote:
| The hardware now is just as concurrent/parallel as the
| software. High-end NVMe SSDs and server-grade NICs can do
| hundreds to thousands of things simultaneously. Even if one
| lane does get blocked, there are other lanes which are open.
| tzahifadida wrote:
| Similarly the power of golang concurrent programming is that you
| write non-blocking code as you write normal code. You don't have
| to wrap it in functions and pollute the code but moreover, not
| every coder on the planet knows how to handle blocking code
| properly and that is the main advantage. Most programming
| languages can do anything the other languages can do. The problem
| is that not all coders can make use of it. This is why I see
| languages like golang as an advantage.
| jillesvangurp wrote:
| Kotlin embraced the same thing via co-routines, which are
| conceptually similar to go routines. It adds a few useful
| concepts around this though; mainly that of a co-routine
| context which encapsulates that a tree of co-routine calls
| needs some notion of failure handling and cancellation.
| Additionally, co-routines are dispatched to a dispatcher. A
| dispatcher can be just on the same thread or actually use a
| thread pool. Or as of recent Java versions a virtual thread
| pool. There's actually very little point in using virtual
| threads in Kotlin. They are basically a slightly more heavy
| weight way of doing co-routines. The main benefit is dealing
| with legacy blocking Java libraries.
|
| But the bottom line with virtual threads, go-routines, or
| kotlin's co-routines is that it indeed allows for imperative
| code style code that is easy to read and understand. Of course
| you still need to understand all the pitfalls of concurrency
| bugs and all the weird and wonderful way things can fail to
| work as you expect. And while Java's virtual threads are
| designed to work like magic pixie dust, it does have some nasty
| failure modes where a single virtual thread can end up blocking
| all your virtual threads. Having a lot of synchronized blocks
| in legacy code could cause that.
| tzahifadida wrote:
| Kotlin is not a language I learned so I will avoid
| commenting.
|
| However, the use of JAVA for me is for admin backend or heavy
| weight services for enterprises or startups I coded for, so
| for my taste I can't use it without spring or jboss, etc.. ,
| and in that way I think simplicity went out the window a long
| long time ago :) It took me years to learn all the quirks of
| these frameworks... and the worse thing about it is that they
| keep changing every few months...
| jillesvangurp wrote:
| Kotlin makes a lot of that stuff easier to deal with and
| there is also a growing number of things that work without
| Java libraries. Or even the JVM. I use it with Spring Boot.
| But we also have a lot of kotlin-js code running in a
| browser. And I use quite a few multiplatform libraries for
| Kotlin that work pretty much anywhere. I've even written a
| few myself. It's pretty easy to write portable code in
| Kotlin these days.
|
| For example ktor works on the JVM but you can also build
| native applications with it. And I use ktor client in the
| browser. When running in the browser it uses the browser
| fetch API. When running on the jvm you can configure it to
| use any of a wide range of Java http clients. On native it
| uses curl.
| juyjf_3 wrote:
| Can we stop pretending Erlang does not exist?
|
| Go is a next-gen trumpian language that rejects sum types,
| pattern matching, non-nil pointers, and for years, generics;
| it's unhinged.
| seabrookmx wrote:
| While I generally agree with your take that it's a regression
| in PL design, there's no need to be inflammatory. There's
| lots of good software written in it.
|
| > pretending Erlang does not exist
|
| For better or worse it doesn't to most programmers. The
| syntax is not nearly as approachable as GoLang. Luckily
| Elixir exists.
| taspeotis wrote:
| My rough understanding is that this is similar to async/await in
| .NET?
|
| It's a shame this article paints a neutral (or even negative)
| experience with virtual threads.
|
| We rewrote a boring CRUD app that spent 99% of its time waiting
| the database to respond to be async/await from top-to-bottom. CPU
| and memory usage went way down on the web server because so many
| requests could be handled by far fewer threads.
| jsiepkes wrote:
| > My rough understanding is that this is similar to async/await
| in .NET?
|
| Well somewhat but also not really. They are green threads like
| async/await, but it's use is more transparent, unlike
| async/await.
|
| So there are no special "async methods". You just instantiate a
| "VirtualThread" where you normally instantiate a (kernel)
| "Thread" and then use it like any other (kernel) thread. This
| works because for example all blocking IO API will be
| automatically converted to non-blocking IO underwater.
| devjab wrote:
| > My rough understanding is that this is similar to async/await
| in .NET?
|
| Not really. What C# does is sort of similar but it has the
| disadvantages of splitting your code ecosystem into non-
| blocking/blocking code. This means you can "accidentally" start
| your non-blocking code. Something which may cause your
| relatively simple API to consume a ridiculous amount of
| resources. It also makes it much more complicated to update and
| maintain your code as it grows over the years. What is perhaps
| worse is that C# lacks an interruption model.
|
| Java's approach is much more modern but then it kind of had to
| be because the JVM already supported structured concurrency
| from Kotlin. Which means that Java's "async/await" had to work
| in a way which wouldn't break what was already there. Because
| Java is like that.
|
| I think you can sort of view it as another example of how Java
| has overtaken C# (for now), but I imagine C# will get an
| improved async/await model in the next couple of years. Neither
| approach is something you would actually chose if concurrency
| is important to what you build and you don't have a legacy
| reason to continue to build on Java/C# . This is because Go or
| Erlang would be the obvious choice, but it's nice that you at
| least have the option if your organisation is married to a
| specific language.
| delusional wrote:
| From what I recall, and this is a while ago so bare with me,
| Java Virtual Threads still have a lot of pitfalls where the
| promise of concurrency isn't really fulfilled.
|
| I seem to remember that is was some pretty basic operations
| (like maybe read or something) that caused the thread not to
| unmount, and therefore just block the underlying os thread.
| At that point you've just invented the world's most
| complicated thread pool.
| za3faran wrote:
| You're referring to thread pinning, and this is being
| addressed.
| mike_hearn wrote:
| Reading from sockets definitely works. It'd be pretty
| useless if it didn't.
|
| Some operations that don't cause a task switch to another
| virtual thread are:
|
| - If you've called into a native library and back into Java
| that then blocks. In practice this never happens because
| Java code doesn't rely on native libraries or frameworks
| that much and when it does happen it's nearly always in-
| and-out quickly without callbacks. This can't be fixed by
| the JVM, however.
|
| - File IO. No fundamental problem here, it can be fixed,
| it's just that not so many programs need tens of thousands
| of threads doing async file IO.
|
| - If you're holding a lock using 'synchronized'. No
| fundamental problem here, it's just annoying because of how
| HotSpot is implemented. They're fixing this at the moment.
|
| In practice it's mostly the last one that causes issues in
| real apps. It's not hard to work around, and eventually
| those workarounds won't be needed anymore.
| szundi wrote:
| Maybe C# is going to have a new asynv await model but the
| fragmentation of libs and codes cannot be undone probably.
|
| Java has the power that they make relatively more decisions
| about the language and the libs that they don't have to fix
| later. That's a great value if you're not building throw-away
| software but SaaS or something that has to live long.
| za3faran wrote:
| I would not argue that golang is the obvious choice for
| concurrency. Java's approach is actually superior to
| golang's. It takes it a step further by offering structured
| concurrency[1].
|
| Kotlin's design had no bearing on Java's or the JVM's
| implementation.
|
| C# has an interruption model through CancellationToken as far
| as I'm aware.
|
| [1] https://openjdk.org/jeps/453
| troupo wrote:
| Erlang, not Go, should be the obvious choice for concurrency,
| but it's impossible to retrofit Erlang's concurrency onto
| existing systems.
| toast0 wrote:
| As an Erlang person, from reading about Java's Virtual
| Threads, it feels like it should get a significant portion
| of the Erlang concurrency story.
|
| With virtual threads, it seems like if you don't hit
| gotchas, you can spawn a thead, and run straight through
| blocking code and not worry about too many threads, etc. So
| you could do thread per connection/user chat servers and
| http servers and what not.
|
| Yes, it's still shared memory, so you can miss out on the
| simplifying effect of explicit communication instead of
| shared memory communication and how that makes it easy to
| work with remote and local communication partners. But you
| can build a mailbox system if you want (it's not going to
| be as nice as built in one, of course). I'm not sure if
| Java virtual threads can kill each other effectively,
| either.
| troupo wrote:
| Erlang's concurrency story isn't green threads.
|
| It's (with caveats, of course):
|
| - a thread crashing will not bring the system down
|
| - a thread cannot hog all processing time as the system
| ensures all threads get to run. The entire system is re-
| entrant and execution of each thread can be suspended to
| let other threads continue
|
| - all CPU cores can and will be utilized transparently to
| the user
|
| - you can monitor a thread and if it crashes you're
| guaranteed to receive info on why and how it crashed
|
| - immutable data structures play a huge part of it, of
| course, but the above is probably more important
|
| That's why Go's concurrency is not that good, actually.
| Goroutines are not even half-way there: an error in a
| goroutine can panic-kill your entire program, there are
| no good ways to monitor them etc.
| morsch wrote:
| Isn't that Akka?
| troupo wrote:
| Akka is heavily inspired by Erlang, but the underlying
| system/VM has to provide certain guarantees for actual
| Erlang-style concurrency to work:
| https://news.ycombinator.com/item?id=40989995
| jayd16 wrote:
| It's foolish to say that green threads are strictly better
| and ignore async/await as something outdated. It can do a lot
| that green threads can't.
|
| For example, you can actually share a thread with another
| runtime.
|
| Cooperative threading allows for implicit critical sections
| that can be cumbersome in preemptive threading.
|
| Async/await and virtual threads are solving different
| problems.
|
| > What is perhaps worse is that C# lacks an interruption
| model
|
| Btw, You'd just use OS threads if you really needed pre-
| emptively scheduled threads. Async tasks run on top of OS
| threads so you get both co-opertive scheduling within threads
| and pre-emptive scheduling of threads onto cores.
| kaba0 wrote:
| > This is because Go or Erlang would be the obvious choice
|
| Why go? It has a quite anemic standard library for concurrent
| data structures, compared to java and is a less expressive ,
| and arguably worse language on any count, verbosity included.
| xxs wrote:
| >My rough understanding is that this is similar to async/await
| in .NET?
|
| No, the I/O is still blocking with respect to the application
| code.
| kimi wrote:
| It's more like Erlang threads - they appear to be blocking, so
| existing code will work with zero changes. But you can create a
| gazillion of them.
| he0001 wrote:
| > My rough understanding is that this is similar to async/await
| in .NET?
|
| The biggest difference is that C# async/await code is rewritten
| by the compiler to be able to be async. This means that you see
| artifacts in the stack that weren't there when you wrote the
| code.
|
| There are no rewrites with virtual threads and the code is
| presented on the stack just as you write it.
|
| They solve the same problem but in very different ways.
| pansa2 wrote:
| > _They solve the same problem but in very different ways._
|
| Yes. Async/await is stackless, which leads to the "coloured
| functions" problem (because it can only suspend function
| calls one-by-one). Threads are stackful (the whole stack can
| be suspended at once), which avoids the issue.
| jayd16 wrote:
| There is overlap but they really don't solve the same
| problem. Cooperative threading has its own advantages and
| patterns that won't be served by virtual threads.
| he0001 wrote:
| What patterns does async/await solve which virtual threads
| don't?
| neonsunset wrote:
| "Green Threads" as implemented in Java is a solution that
| solves only a single problem - blocking/multiplexing.
|
| It does not enable easy concurrency and task/future
| composition the way C#/JS/Rust do, which offer strictly
| better and more comprehensive model.
| jayd16 wrote:
| If you need to be explicit about thread contexts because
| you're using a thread that's bound to some other runtime
| (say, a GL Context) or you simply want to use a single
| thread for synchronization like is common in UI
| programming with a Main/UI Thread, async/await does quite
| well. The async/await sugar ends up being a better devx
| than thread locking and implicit threading just doesn't
| cut it.
|
| In Java they're working on a structured concurrency
| library to bridge this gap, but IMO, it'll end up looking
| like async/await with all its ups and downs but with less
| sugar.
| peteri wrote:
| It's a different model. Microsoft did work on green threads a
| while ago and decided against continuing.
|
| Links:
|
| https://github.com/dotnet/runtimelab/issues/2398
|
| https://github.com/dotnet/runtimelab/blob/feature/green-thre...
| pjmlp wrote:
| It should be pointed out, that the main reason they didn't go
| further was because of added complexity in .NET, when
| async/await already exists.
|
| > Green threads introduce a completely new async programming
| model. The interaction between green threads and the existing
| async model is quite complex for .NET developers. For
| example, invoking async methods from green thread code
| requires a sync-over-async code pattern that is a very poor
| choice if the code is executed on a regular thread.
|
| Also to note that even the current model is complex enough to
| warrant a FAQ,
|
| https://devblogs.microsoft.com/dotnet/configureawait-faq
|
| https://github.com/davidfowl/AspNetCoreDiagnosticScenarios/b.
| ..
| neonsunset wrote:
| This FAQ is a bit outdated in places, and is not something
| most users should worry about in practice.
|
| JVM Green Threads here serve predominantly back-end
| scenarios, where most of the items on the list are not of
| concern. This list also exists to address bad habits that
| carried over from before the tasks were introduced, many
| years ago.
|
| In general, the perceived want of green threads is in part
| caused by misunderstanding of that one bad article about
| function coloring. And that one bad article about function
| coloring also does not talk about the way you do async in
| C#.
|
| Async/await in C# in back-end is a very easy to work with
| model with explicit understanding where a method returns an
| operation that promises to complete in the future or not,
| and composing tasks[0] for easy (massive) concurrency is
| significantly more idiomatic than doing so with green
| threads or completable futures that existed in Java before
| these. And as evidenced by adoption of green threads by
| large scale Java projects, turns out the failure modes
| share similarities except green threads end up violating
| way more expectations and the code author may not have any
| indication or explicit mechanism to address this, like
| using AsyncLocal.
|
| Also one change to look for is "Runtime Handled Tasks"
| project in .NET that will replace Roslyn-generated state
| machine code with runtime-provided suspension mechanism
| which will only ever suspend at true suspension points
| where task's execution actually yields asynchronously. So
| far numbers show at least 5x decrease in overhead, which is
| massive and will bring performance of computation heavy
| async paths in line with sync ones:
|
| https://github.com/dotnet/runtimelab/blob/feature/async2-ex
| p...
|
| Note that you were trivially able to have millions of
| scheduled tasks even before that as they are very
| lightweight.
|
| [0]: e.g. sending requests in parallel is just this
| using var http = new HttpClient() { BaseAddress
| = new("https://news.ycombinator.com/news") };
| var requests = Enumerable .Range(1, 4)
| .Select(n => $"?p={n}")
| .Select(http.GetStringAsync); var pages =
| await Task.WhenAll(requests);
| ffsm8 wrote:
| I don't think that this would be a good showcase for
| Virtual Threads. The "async" API for Java is
| CompletableFutures, right? thats been stable for
| something like 10 years, so no real change since Java 8.
|
| You'd jsut have to define a ThreadPool with n Threads
| before, where each request would've blocked one pending
| thread. Now it just keeps going.
|
| So your equivalent Java example should've been something
| like this, but again: the completeable futures api is
| pretty old at this point.
| @HttpExchange(value = "https://news.ycombinator.com")
| interface HnClient {
| @GetExchange("news?p={page}")
| CompletableFuture<String> getNews(@PathVariable("page")
| Integer page); }
| @RequiredArgsConstructor @Service class
| HnService { private final HnClient hnClient;
| List<String> getNews() { var requests =
| IntStream.rangeClosed(1, 4)
| .boxed().map(hnClient::getNews).toList();
| return
| requests.stream().map(CompletableFuture::join).toList();
| } }
| vips7L wrote:
| Structured concurrency is still being developed:
| https://openjdk.org/jeps/453
|
| Also, I wouldnt consider that the equivalent Java code.
| That is all Spring and Lombok magic. Just write the code
| and just use java.net.HttpClient.
| ffsm8 wrote:
| > and just use java.net.HttpClient.
|
| No.
| no_wizard wrote:
| it might be obvious to others, but why the 'No'?
| vips7L wrote:
| The standard http client doesn't have as great of UX as
| other community libs. Most of us (including me) don't
| like to use it.
|
| That being said, imo you can't call something equivalent
| when doing a bunch of spring magic. This disregards that
| OPs logic isn't equivalent at all. It waits for each
| future 1 by 1 instead of doing something like
| CompletableFuture.allOf or in JS: Promise.all.
| no_wizard wrote:
| I take your point about the aforementioned article[0][1]
| being a popular reference when discussing async / await
| (and to a lesser extent, async programming in modern
| languages more generally) I think its popularity is
| highlighting the fact that it is a pain point for folks.
|
| Take for instance Go. It is well liked in part, because
| its so easy to do concurrency with goroutines, and
| they're easy to reason about, easy to call, easy to
| write, and for how much heavy weight they're lifting,
| relatively simple to understand.
|
| The reason Java is getting alot of kudos here for their
| implementation of green threads is exactly the same
| reason people talk about Go being an easy language to use
| for concurrency: It doesn't gate code behind specialized
| idioms / syntax / features that are only specific to
| asynchronous work. Rather, it largely utilizes the same
| idioms / syntax as synchronous code, and therefore is
| easier to reason about, adopt, and ultimately I think
| history is starting to show, to use.
|
| Java is taking an approach paved by Go, and ultimately I
| think its the right choice, because having worked
| extensively with C# and other languages that use async /
| await, there are simply less footguns for the average
| developer to hit when you reduce the surface area of
| having to understand async / sync boundaries.
|
| [0]: https://journal.stuffwithstuff.com/2015/02/01/what-
| color-is-...
|
| [1]: HN discussion:
| https://news.ycombinator.com/item?id=8984648
| neonsunset wrote:
| Green Threads _increase_ the footgun count as methods
| which return tasks are rather explicit about their
| nature. The domain of async /await is well-studied, and
| enables crucial patterns that, like in my previous
| example, Green Threads do nothing to improve the UX of in
| any way. This also applies to Go approach which expects
| you to use Channels, which have their own plethora of
| footguns, even for things trivially solved by firing off
| a couple of tasks and awaiting their result. In Go, you
| are also expected to use explicit synchronization
| primitives for trivial concurrent code that require no
| cognitive effort in C# whatsoever. C# does have channels
| that work well, but turns out you rarely need them when
| you can just write simple task-based code instead.
|
| I'm tired of this, that one article is bad, and
| incorrect, and promotes straight-up harmful intuition and
| probably sets the industry in terms of concurrent and
| asynchronous programming back by 10 years in the same way
| misinterpreting Donald Knuth's quote did in terms of
| performance.
| kaba0 wrote:
| That's a very simplistic view. Especially that java
| does/will provide "structured concurrency" as something
| analogous to structured control flow, vs gotos.
|
| Also, nothing prevents you from building your own, more
| limited but safer (the two always come together!)
| abstraction on top, but you couldn't express Loom on
| async as the primitive.
| jayd16 wrote:
| It would break a lot of the native interop and UI code devx
| of the language. Java was never as nice in those categories
| so it had less to lose going this path.
| fulafel wrote:
| Can you expand on how the benefit in your rewrite came about?
| Threads don't consume CPU when they're waiting for the DB,
| after all. And threads share memory with each other.
|
| (I guess scaling to ridiculous levels you could be approaching
| trouble if you have O(100k) outstanding DB queries per
| application server, hope you have a DB that can handle millions
| of oustanding DB queries then!)
| segfaltnh wrote:
| In large numbers the cost of switching between threads does
| consume CPU while they're waiting for the database. This is
| why green threads exist, to have large numbers of in flight
| work executing over a smaller number of OS threads.
| fulafel wrote:
| When using OS threads, there's no switching when they are
| waiting for a socket (db connection). The OS knows to wake
| the thread up only when there's something new to see on the
| connection.
| pansa2 wrote:
| Are these Virtual Threads the feature that was previously known
| as "Project Loom"? Lightweight threads, more-or-less equivalent
| to Go's goroutines?
| Skinney wrote:
| Yes
| giamma wrote:
| Yes, at EclipseCon 2022 an Oracle manager working on the
| Helidon framework presented their results replacing the Helidon
| core, which was based on Netty (and reactive programming) with
| Virtual Threads (using imperative programming). [1].
|
| Unfortunately the slides from that presentation were not
| uploaded to the conference site, but this article summarizes
| [2] the most significant metrics. The Oracle guy claimed that
| by using Virtual Threads Oracle was able to implement, using
| imperative Java, a new engine for Helidon (called Nima) that
| had identical performance to the old engine based on Netty,
| which is (at least in Oracle's opinion) the top performing
| reactive HTTP engine.
|
| The conclusion of the presentation was that based on Oracle's
| experience imperative code is much easier to write, read and
| maintain with respect to reactive code. Given the identical
| performance achieved with Virtual Threads, Oracle was going to
| abandon reactive programming in favor of imperative programming
| and virtual threads in all its products.
|
| [1] https://www.eclipsecon.org/2022/sessions/helidon-nima-
| loom-b...
|
| [2] https://medium.com/helidon/helidon-n%C3%ADma-helidon-on-
| virt...
| pgwhalen wrote:
| Yes. It's not that the feature was previously known under a
| different name - Project Loom is the OpenJDK project, and
| Virtual Threads are the main feature that has come out of that
| project.
| tomp wrote:
| They're not equivalent to Go's goroutines.
|
| Go's goroutines are preemptive (and Go's development team went
| through a lot of pain to make them such).
|
| Java's lightweight threads aren't.
|
| Java's repeating the same mistakes that Go made (and learned
| from) 10 years ago.
| jayd16 wrote:
| Virtual threads could be scheduled pre-emptively but
| currently the scheduler will wait for some kind of thread
| sleep to schedule another virtual thread. That's just a
| scheduler implementation detail and the spec is such that a
| time slice scheduler could be implemented.
| tomp wrote:
| Yes, but the problem is that the spec is such that
| preemptive blocking doesn't _need_ to be implemented.
|
| That means that Java programmers have to be very careful
| when writing code, lest they block the entire underlying
| (OS) thread!
|
| Again, Go already went through that experience. It was
| painful. Java should have learned and implemented it from
| the start
| jayd16 wrote:
| I don't know. The language already has Thread.Yield. If
| your use case is such that you have starvation and care
| about it, it seems trivial to work around.
|
| Still, an annoying gotcha if it hits you unexpectedly.
| nimish wrote:
| This is a really unfortunate gotcha that's not at all
| obvious. Does it kick preemption up a layer to the OS then?
| Jtsummers wrote:
| The "not at all obvious" gotcha is described in the
| documentation near the top, under the heading "What is a
| Virtual Thread?":
|
| https://docs.oracle.com/en/java/javase/21/core/virtual-
| threa...
|
| > Like a platform thread, a virtual thread is also an
| instance of java.lang.Thread. However, a virtual thread
| isn't tied to a specific OS thread. A virtual thread
| still runs code on an OS thread. However, when code
| running in a virtual thread calls a blocking I/O
| operation, the Java runtime suspends the virtual thread
| until it can be resumed. The OS thread associated with
| the suspended virtual thread is now free to perform
| operations for other virtual threads.
|
| It's not been hidden at all in their presentation on
| virtual threads.
|
| The OS thread that the virtual thread is mounted to can
| still be preempted, but that won't free up the OS thread
| for another virtual thread. However, if you use them for
| what they're intended for this shouldn't be a problem. In
| practice, it will be because no one can be bothered to
| RTFM.
| LinXitoW wrote:
| From my very limited exposure to virtual threads and the older
| solution (thread pools), the biggest hurdle was the extensive use
| of ThreadLocals by most popular libraries.
|
| In one project I had to basically turn a reactive framework into
| a one thread per request framework, because passing around the
| MDC (a kv map of extra logging information) was a horrible pain.
| Getting it to actually jump ship from thread to thread AND
| deleting it at the correct time was basically impossible.
|
| Has that improved yet?
| bberrry wrote:
| If you are already in a reactive framework, why would you
| change to virtual threads? Those frameworks pool threads and
| have their own event loop so I would say they are not suitable
| for virtual thread migration.
| brabel wrote:
| Yes, if you're happy with the reactive frameworks there's no
| reason to migrate. Most people, however, would love to remove
| their complexities from their code bases. Virtual Threads are
| much, much easier to program with. There's downsides, like
| not being able to easily limit concurrency, having to
| implement your own timeout mechanisms etc. but that will
| probably be provided by a common lib sooner or later which
| hopefully provides identical features to reactive frameworks,
| while being much, much simpler.
| vbezhenar wrote:
| What do you mean by hurdle? ThreadLocals work just fine with
| virtual threads.
| brabel wrote:
| It's not recommended though.
|
| See https://openjdk.org/jeps/429
|
| If you keep ThreadLocal variables, they get inherited by
| child Threads. If you make many thousands of them, the memory
| footprint becomes completely unacceptable. If the memory used
| by ThreadLocal variables is large, it also makes it more
| expensive to create new Threads (virtual or not), so you lose
| most advantages of Virtual Threads by doing that.
| bberrry wrote:
| I don't think that's correct. ThreadLocals should behave
| just like on regular OS threads, the difference is that you
| can suddenly create millions of them.
|
| You used to be able to depend on OS threads getting reused
| because you were pooling them. You can do the same with
| virtual threads if you wish and you will get the same
| behavior. The difference is we ought to spawn new threads
| per task now.
|
| Side note, you have to specifically use
| InheritableThreadLocal to get the inheritance behavior you
| speak of.
| joshlemer wrote:
| I faced this issue once. I solved it by creating a
| wrapping/delegating Executor, which would capture the MDC from
| the scheduling thread at schedule-time, and then at execute-
| time, set the MDC for the executing thread, and then clear the
| MDC after the execution completes. Something like...
| class MyExecutor implements Executor { private
| final Executor delegate; public MyExecutor(Executor
| delegate) { this.delegate = delegate;
| } @Override public void
| execute(@NotNull Runnable command) { var mdc =
| MDC.getCopyOfContextMap(); delegate.execute(()
| -> { MDC.setContextMap(mdc);
| try { command.run();
| } finally { MDC.clear();
| } }); } }
| davidtos wrote:
| I did some similar testing a few days ago[1]. Comparing platform
| threads to virtual threads doing API calls. They mention the
| right conditions like having high task delays, but it also
| depends on what the task is. Threads.sleep(1) performs better on
| virtual threads than platform threads but a rest call taking a
| few ms performs worse.
|
| [1] https://davidvlijmincx.com/posts/virtual-thread-
| performance-...
| pron wrote:
| Virtual threads do one thing: they allow creating lots of
| threads. This helps throughput due to Little's law [1]. But
| because this server here saturates the CPU with only a few
| threads (it doesn't do the fanout modern servers tend to do),
| this means that no significant improvements can be provided by
| virtual threads (or asynchronous programming, which operates on
| the same principle) _while keeping everything else in the system
| the same_ , especially since everything else in that server was
| optimised for over two decades under the constraints of expensive
| threads (such as the deployment strategy to many small instances
| with little CPU).
|
| So it looks like their goal was: try adopting a new technology
| without changing any of the aspects designed for an old
| technology and optimised around it.
|
| [1]: https://youtu.be/07V08SB1l8c
| hitekker wrote:
| This take sounds reasonable to me. But I'm not an expert, and
| I'd be curious to hear an opposing view if there's one.
| kaba0 wrote:
| He is as much of an expert as it gets, as he is the leader of
| the Loom project.
| binary132 wrote:
| Greenlets ultimately have to be scheduled onto system threads
| at the end of the day unless you have a lightweight thread
| model of some sort supported by the OS, so it's a little bit
| misleading depending on how far down the stack you want to
| think about optimizing for greenlets. You could potentially
| have a poor implementation of task scheduling for some legacy
| compatibility reason, however. I guess I'd be curious about
| the specifics of what pron is discussing.
| troupo wrote:
| Even though yes, in the end you have to map onto system
| threads, there are still quite a fee things you can do. But
| this is infeasible for Java, unfortunately.
|
| For example, in Erlang the entire VM is built around green
| threads with a huge amount of guarantees and mechanisms:
| https://news.ycombinator.com/item?id=40989995
|
| When your entire system is optimized for green threads, the
| question of "it still needs to map onto OS threads" loses
| its significance
| michaelt wrote:
| Standard/OS threads in Java use about a megabyte of memory
| per thread, so running 256 threads uses about 256 MB of
| memory before you've even started allocating things on the
| heap.
|
| Virtual threads are therefore useful if you're writing
| something like a proxy server, where you want to allow lots
| of concurrent connections, and you want to use the familiar
| thread-per-connection programming model.
| jayceedenton wrote:
| I guess at least their work has confirmed what we probably
| already knew intuitively: if you have CPU-intensive tasks,
| without waiting on anything, and you want to execute these
| concurrently, use traditional threads.
|
| The advice "don't use virtual threads for that, it will be
| inefficient" really does need some evidence.
|
| Mildly infuriating though that people may read this and think
| that somehow the JVM has problems in its virtual thread
| implementation. I admit their 'Unexpected findings' section is
| very useful work, but the moral of this story is: don't use
| virtual threads for this that they were not intended for. Use
| them when you want a very large number of processes executing
| concurrently, those processes have idle stages, and you want a
| simpler model to program with than other kinds of async.
| bberrry wrote:
| I don't understand these benchmarks at all. How could it possibly
| take virtual threads 40-50 seconds to reach maximum throughput
| when getting a number of tasks submitted at once?
| cayhorstmann wrote:
| I looked at the replication instructions at
| https://github.com/blueperf/demo-vt-issues/tree/main, which
| reference this project: https://github.com/blueperf/acmeair-
| authservice-java/tree/ma...
|
| What "CPU-intensive apps" did they test with? Surely not acmeair-
| authservice-java. A request does next to nothing. It
| authenticates a user and generates a token. I thought it at least
| connects to some auth provider, but if I understand it correctly,
| it just uses a test config with a single test user (https://openl
| iberty.io/docs/latest/reference/config/quickSta...). Which would
| not be a blocking call.
|
| If the request tasks don't block, this is not an interesting
| benchmark. Using virtual threads for non-blocking tasks is not
| useful.
|
| So, let's hope that some of the tests were with tasks that block.
| The authors describe that a modest number of concurrent requests
| (< 10K) didn't show the increase in throughput that virtual
| threads promise. That's not a lot of concurrent requests, but one
| would expect an improvement in throughput once the number of
| concurrent requests exceeds the pool size. Except that may be
| hard to see because OpenLiberty's default is to keep spawning new
| threads (https://openliberty.io/blog/2019/04/03/liberty-
| threadpool-au...). I would imagine that in actual deployments
| with high concurrency, the pool size will be limited, to prevent
| the app from running out of memory.
|
| If it never gets to the point where the number of concurrent
| requests significantly exceeds the pool size, this is not an
| interesting benchmark either.
___________________________________________________________________
(page generated 2024-07-17 23:07 UTC)