[HN Gopher] Async file IO in Java is not truly async [Resource, ...
___________________________________________________________________
Async file IO in Java is not truly async [Resource, Tech blog]
Author : cmhteixeira
Score : 46 points
Date : 2024-01-08 18:54 UTC (4 hours ago)
(HTM) web link (cmhteixeira.com)
(TXT) w3m dump (cmhteixeira.com)
| aardvark179 wrote:
| I believe there is work ongoing to adopt io_uring in the JDK for
| file io on Linux, but I'm not sure what the current status of
| that is.
| 05bmckay wrote:
| Basically every async function isn't truly async, async is pretty
| tough thing to solve perfectly.
| nritchie wrote:
| How does Project Loom in JDK 21 change this? It seems that Java
| has expended a lot of effort in implementing low cost virtual
| threads including all the effort needed to make async truly async
| throughout the Java runtime library.
| papercrane wrote:
| The article is about the AsynchronousFileChannel class, which I
| don't think Project Loom affected at all.
|
| Regardless of any future updates, this class will probably
| always use a thread pool/ExecutorService. It is part of the
| documented behaviour and API.
| cogman10 wrote:
| AFIAK, loom hasn't addressed this problem. File IO isn't
| something I think the majority of devs are dealing with.
|
| That said, it's on their wiki as something they plan to
| address. Virtual threads aren't the end of the dev work here.
| twic wrote:
| Loom goes in the opposite direction: it presents a programming
| model where everything is blocking all the time, then makes
| threads lightweight, so that's scalable. It doesn't do anything
| to make file operations truly async; it makes it unnecessary
| for them to be async.
| jayd16 wrote:
| > it unnecessary for them to be async.
|
| Is this true? Wouldn't they still consume an OS thread in
| that case; possibly consuming the pool of OS threads that are
| servicing the virtual threads?
|
| How does loom handle that?
| jtasdlfj234 wrote:
| Virtual Threads, under the hood, cooperatively yield at IO
| boundaries (on supported APIs)
| jakjak123 wrote:
| Its not really related at all. Loom allows java thread to in
| certain cases block, but then yield to run a different thread's
| code. That principle is unrelated to File IO.
| bitcharmer wrote:
| This has almost nothing to do with Java and there is nothing
| revealing in this blog post. Block device operations on Linux
| always went through layers of indirection. That's why things like
| page cache or writeback exist. You can always go O_DIRECT if you
| need the write to be truly synchronous.
|
| If anything the author just discovered how IO works on Linux.
| yxhuvud wrote:
| Except that nowadays you can actually do async file IO using
| io_uring.
| PaulDavisThe1st wrote:
| O_DIRECT doesn't guarantee synchronous, it only skips the
| buffer cache.
| nu11ptr wrote:
| Clickbait title IMO. Essentially Java presents an abstraction and
| whether the underlying I/O is truly async or emulated depends on
| the OS. However, such a design allows them to change the
| underlying implementation to an OS async API at a later time
| (example: io_uring on linux). At most, perhaps it would be nice
| if there was an API to query whether it is truly async or not,
| but most code is going to want it to "just work" and know their
| code will take advantage of underlying async I/O, when available.
|
| Post from mailing list from a couple years ago, for example:
| https://mail.openjdk.org/pipermail/loom-dev/2021-November/00...
| javier2 wrote:
| Yeah, this. It will be async when implemented like async on
| target platform
| xxs wrote:
| I doubt, as that would break any
| FileChannel.write(ByteBuffer) that expects the buffer to be
| consumed (remaining()==0)
|
| I suppose it's possible to create a new set of API or augment
| the existing ones not to use thread pool, still rather
| unlikely. Also it'd require a general widely supported async
| from the OS, incl. Windows
| twic wrote:
| > I doubt, as that would break any
| FileChannel.write(ByteBuffer) that expects the buffer to be
| consumed (remaining()==0)
|
| You can't expect that with AsynchronousFileChannel::write
| at the moment, though, because the buffer is consumed on a
| background thread. As the docs say [1]:
|
| > Buffers are not safe for use by multiple concurrent
| threads so care should be taken to not access the buffer
| until the operation has completed.
|
| And as the article says, on Windows, these writes are
| "truly" async at the OS level, so portable code already
| needs to deal with that.
|
| [1] https://docs.oracle.com/en/java/javase/17/docs/api/java
| .base...
| de6u99er wrote:
| You can put lipstick on a pig, ...
| amsterdamer wrote:
| was insightful.
| xxs wrote:
| _" An AsynchronousFileChannel is associated with a thread pool to
| which tasks are submitted to handle I/O events and dispatch to
| completion handlers that consume the results of I/O operations on
| the channel."_
|
| This part is close to 17 years old now (java 1.7, 2007). The NIO
| for files that came with 1.4 (20y+) was not async for files
| either. I wonder where the 'truly async' idea has come from as
| the docs are rather explicit about it.
| exabrial wrote:
| It's pretty important to understand the timeline here, as the
| article title can lead the consumer to a lot of wrong
| conclusions.
|
| * AsynchronousFileChannel was released in jdk7 on 2011-7-28,
| works by submitting io to a JVM thread pool
|
| * Linux io_uring (AIO) released on 2019-5-5, works at the kernel
| level
|
| The article's point is that AsynchronousFileChannel does not use
| io_uring, which makes sense given the API was implemented roughly
| 8 years previous. Yes, AsynchronousFileChannel is Asynchronous,
| however, it operates in user space and does not use io_uring.
|
| My observation (and praise) is the JVM implementation is unlikely
| to break backward compatibility. If io_uring support is desired
| in Java, you have to jump through a few hoops with the current
| APIs. Not a big deal in my opinion... if you truely need those
| few extra percentage points of performance none of this is
| probably news to you.
| riku_iki wrote:
| But there were other async APIs on Linux prior to 2019, why jdk
| didn't use them?
| wmf wrote:
| Before uring, Linux AIO was kind of fake; most operations
| were blocking. It makes sense that Java would use a thread
| pool instead of attempting to use Linux AIO.
| jakjak123 wrote:
| I am not sure if there were any practical async apis for
| files. There was only for networking I believe?
| riku_iki wrote:
| I am not an expert, but my reading is that epoll operates
| on file descriptors.
| jen20 wrote:
| This is highly dependent on the filesystem - as far as I
| know, basically only XFS works, and only for a small
| subset of operations, and everything else just blocks.
|
| Tokio's filesystem handling in Rust is the same way by
| default - a pool of IO threads.
|
| Edit: I just remembered that Avi Kivity (of KVM and
| ScyllaDB fame) wrote a tool to detect this:
| https://github.com/avikivity/fsqual
| evmar wrote:
| Side note, most comments here are talking about Linux, but
| despite Windows having a nice async IO API, it turns out that it
| sometimes is also just sometimes synchronous, yikes.
|
| It includes some pretty big cases like "if the filesystem is
| encrypted" or "if the write extends the file":
| https://learn.microsoft.com/en-us/previous-versions/troubles...
| ygra wrote:
| > if the filesystem is encrypted
|
| The file, not the file system. NTFS encryption is applied at
| the file level (and should be fairly rare these days).
| Bitlocker should not be affected as it is an encrypted volume
| with a normal file system.
| clhodapp wrote:
| This doesn't seem to be that big of a problem, just something to
| know about and keep track of.
|
| First, some framing:
|
| Asynchronous scheduling is about sharing scarce resources
| efficiently. It allows you to keep your hardware busy without
| spending gobs of memory on OS threads. Specifically, it allows
| the system to make progress on many suspendable tasks in
| parallel. When a task needs to wait for something time-consuming
| to happen, it can be suspended, freeing up a CPU core to work on
| other tasks that are _not_ waiting. The time-consuming work can
| be executed by some sort of centralized shared worker that has a
| broader view of the work that many tasks are waiting on. Often,
| that worker is a single thread or a small thread pool that
| performs blocking IO against a given type of resource (file,
| network socket, etc.).
|
| About the JVM:
|
| On the JVM, this type of sharing can currently only occur within
| a single process (at least on Linux). This is because the JVM
| doesn't currently take advantage of any OS-level asynchronous
| scheduling primitives. Therefore, the Java standard library
| itself manages some central IO worker threads within each JVM
| process, which suspendable tasks can offload work to.
|
| Implementing io_uring, as is being done for the JVM now, will
| move to using the kernel's own asynchronous scheduling
| primitives. This will allow sharing to occur _cross-process_ ,
| since the shared workers will now be inside of the kernel.
| That'll be a nice efficiency gain for systems where many tiny
| processes run on the same kernel, but it likely won't help that
| much for big single-purpose machines.
|
| Also of note: The virtual threads adopted in Project Loom allow
| programs written in a blocking style to behave more like
| lightweight suspendable tasks, so it has a lot of synergy with an
| io_uring implementation.
___________________________________________________________________
(page generated 2024-01-08 23:00 UTC)