[HN Gopher] macOS 14.4 causes JVM crashes
___________________________________________________________________
macOS 14.4 causes JVM crashes
Author : kingds
Score : 100 points
Date : 2024-03-16 14:32 UTC (8 hours ago)
(HTM) web link (blogs.oracle.com)
(TXT) w3m dump (blogs.oracle.com)
| npalli wrote:
| An issue introduced by macOS 14.4, which causes Java process to
| terminate unexpectedly, is affecting all Java versions from Java
| 8 to the early access builds of JDK 22
|
| If this affects so many versions of Java and nobody notices, is
| anyone even using Java on macOS?
| semiquaver wrote:
| Plenty of people develop for java on macs. The issue is that
| per the article this behavior was not present in the early
| access macOS builds, which means something changed between beta
| and release.
| CharlesW wrote:
| For one, it doesn't affect all versions of Java. Java 20 (an
| LTS release) and 21, for example, don't have this problem.
| pritambarhate wrote:
| JDK 21 is LTS not JDK 20.
|
| https://www.oracle.com/in/java/technologies/downloads/
| bremac wrote:
| Per the bug report, all versions since Java 8 are affected.
| CharlesW wrote:
| > _Affected Version: 8,11,17,21,22_
|
| This has changed (they added 21) since I posted the comment
| above, so it looks like they're still getting a handle on
| it.
| karmakaze wrote:
| I'm running RubyMine on 14.3.1 all the time and it's fine.
| Should I hold off updating to 14.4 until the dust has settled?
| merb wrote:
| You should I had some Rider and IntelliJ crashes. The crash
| does not happen often tough, but if your in the middle of
| writing code it can get you out of the flow.
| bzzzt wrote:
| It's not terminating directly. I've seen a few IDE crashes this
| week, less than one per day, but since there's no log there's
| no easy way to determine it's related to a macOS change.
| LgWoodenBadger wrote:
| IntelliJ did this twice to me on Thursday and there was a
| crash log both times. I only reported one to Apple.
|
| Did you check the Console app for crash reports?
| lanna wrote:
| Maybe not a lot of macOS devs use Java, but a lot of Java devs
| use macOS
| seanalltogether wrote:
| Also, if you're a mobile developer you likely have a Mac, and
| if you're a mobile dev that doesn't target iOS exclusively,
| then you run java.
| latchkey wrote:
| > is anyone even using Java on macOS?
|
| IntelliJ IDEA, the product itself, is JVM based.
| comonoid wrote:
| It broke since recent MacOS 14.4, even at 14.4 betas it worked.
| bombcar wrote:
| Minecraft runs on various Javas.
|
| And there's a known issue with an interaction between
| minecraft, Java, and the video drivers that crashes out and it
| can be traced back all the way to here:
| https://github.com/glfw/glfw/issues/1997
|
| It's not fixed.
| nurettin wrote:
| Sonoma has been out for only one week!
| CharlesW wrote:
| > _" As a normal part of the just-in-time compile and execute
| cycle, processes running on macOS may access memory in protected
| memory regions."_
|
| I'm just a lowly JavaScript/TypeScript/PHP programmer, but what
| is the Very Good Reason that Java trying to access other
| processes' memory?
| royjacobs wrote:
| The reasons are literally spelled out in the following
| paragraphs.
| CharlesW wrote:
| I'm asking because the reasons seem dumb to me, which is why
| I'm asking people smarter than I am about low-level memory
| management if they're legitimate.
| rzzzt wrote:
| JIT compilation can happen at any time. The runtime wants
| to create a native version of a previously interpreted
| snippet of code when it is called frequently enough to
| warrant this.
|
| The article also describes W^X functionality, which means a
| region of memory is either executable (x)or writable. On
| macOS 14.4 violating this either-or condition results in a
| signal that can not be handled by the process.
| znafelrif wrote:
| The article doesnt say anything about the JVM accessing
| other processes memory though.
| mayoff wrote:
| I don't think the article claims that a Java process tries to
| access some other process's memory.
|
| In a typical modern operating system, a memory page can be non-
| writable and non-executable, writable and non-executable, or
| non-writable and executable, but not simultaneously writable
| AND executable.
|
| If you generate executable code at runtime, then you need write
| access to a page to write the executable code into that page.
| Then you need to tell the operating system to change the page
| from writable to executable.
|
| If you then try to write to the page, you'll get a signal
| (SIGSEGV or SIGBUS, according to the article).
|
| Oracle's JVM apparently relies on this behavior: a Java process
| sometimes tries to write to a page (in its own memory space)
| that is not marked writable. The JVM then catches the SIGSEGV
| and recovers (perhaps by asking the operating system to change
| the page back from executable to writable, or by arranging to
| write to a different page, or to abort the write operation
| altogether).
| Traubenfuchs wrote:
| Thank you, that explained it way better than the original
| link.
| scialex wrote:
| It's not. It's trying to access unmapped or protected memory in
| its own process.
|
| Basically what its used for is to implement an 'if' that's
| super fast on the most likely path but super slow on the less
| likely path.
|
| It's not super clear what its being used for (this is often
| used for the GC but the fact that graal isn't affected means
| that likely still works). Possibly they are using this to
| detect attempts to use inline-cache entries that have been
| deleted.
| moonchild wrote:
| object.field is implemented as a direct load from the object;
| if the object turned out to be null, then the resultant
| signal is caught and turned into a NullPointerException
| olliej wrote:
| It depends on exactly what is being done.
|
| A fairly common idiom is to use memory protection to provide
| zero cost access checks, as you can generally catch the signals
| produced by most memory faults, and then work out where things
| went wrong and convert the memory access error into a catchable
| exception, or to lazily construct data structures or code.
|
| So you want the trap, but the trap itself can be handled. It
| sounds like there's been a semantic change when the trap occurs
| for execution of an address or an access to an executable page.
|
| There are also a bunch of poorly documented Mac APIs to inform
| the memory manager and linker about JIT regions and I wonder if
| it's related to those. It really depends on exactly what
| oracle's jvm is trying to do, and what the subsequent cause of
| the fault is.
|
| Certainly it's a less than optimal failure though :-/
| samus wrote:
| Accessing such areas is sometimes done deliberately since
| programmers could rely on the OS telling them what just
| happened using signals instead of nuking the process. Doing it
| without signals is usually slow and/or clunky (null-pointer
| checks, read/write permissions, existence of pages), or
| straight out impossible.
|
| Accessing other processes' memory is not the concern since
| virtual memory provides each process the illusion of having the
| entire address space for itself.
| MaxBarraclough wrote:
| Is the signals change in macOS likely to affect JIT-based systems
| other than the OpenJDK JVM?
| xyst wrote:
| Apple and macOS is slowly becoming another Windows in terms of
| stability.
|
| There was a HN post about a hashicorp founder using Linux within
| a vm on their mbp. Might adopt that same approach, if I can find
| the og post.
| open592 wrote:
| Here's the YouTube link from Mitchell. I was thinking about
| doing something similar lately too.
|
| https://youtu.be/ubDMLoWz76U?si=ipmho73-r9FzZpBp
| nullwarp wrote:
| This is what I do when my job forced me to use a mac. I think
| the only thing I installed on the mac outside of it was
| Firefox.
|
| Worked great for years before I changed jobs that let me bring
| my own hardware finally.
| neeleshs wrote:
| What is your preferred hardware and flavor of Linux for this?
| I'm trying to do the same
| rzzzt wrote:
| Rancher Desktop used Lima + QEMU behind the scenes:
| https://lima-vm.io/
| Kipters wrote:
| To be fair, this is the kind of breakage I'd expect from macOS,
| but never from Windows
| secondcoming wrote:
| When's the last time you had a BSOD on Windows? I honestly
| can't recall.
| DuskHorizon wrote:
| Well, that's why Apple forbids use of private APIs in the App
| Store apps. If you built all your tech stack on the foundation of
| some peculiar nondocumented platform's behavior, don't be
| surprised when this stack breaks.
| bhawks wrote:
| This is not an API. It's the handling of writes to memory the
| process has protected. In the past this would generate a signal
| the process could handle and recover from. Now it generates a
| sigkill which is uncatchable / unrecoverable from.
|
| These behaviours have been historically well documented.
| DuskHorizon wrote:
| All system idiosyncrasies are APIs in the long run ;)
| fifteen1506 wrote:
| The change of a SIGSEGV to a SIGKILL, seriously?
| DuskHorizon wrote:
| And, why not? macOS is Apple's IP and they have all
| rights to do with it as they want. Buy the way,
| Chrome/Node.js JavaScript engine uses JIT compilation
| too. Are they affected?
| samus wrote:
| This breaks POSIX compatibility, which is basically
| saying FU to how a lot of developers expect to interact
| with an operating system for decades now.
| olliej wrote:
| A gross and low performance option for now might be to run Java
| under Rosetta, but I'm saying that based on them saying that this
| is apple silicon specifically and processes under rosetta have a
| bunch of quirks to support intel semantics. This would allow you
| to work around this for now.
|
| That said I'm curious what the exact scenario that leads to this
| is, I'm assuming it's not common as you would expect it to have
| come up during betas and pre -release seeds.
| fwlr wrote:
| "The Java Virtual Machine [...] leverages the _protected memory
| access signal mechanism_ both for correctness (e.g., to handle
| the truncation of memory mapped files) and for performance."
|
| Where by "protected memory access signal mechanism", they mean
| SIGBUS/SIGSEGV, i.e., a segfault.
|
| This is probably because the JVM is doing "zero cost access
| checks", which is where you do the moral equivalent of:
| try { writeToFile() } catch(err) { if
| (err == SYSTEM_CRASH_IMMINENT) {
| changeFilePermissions() retry } }
|
| ...because it's faster than checking file permissions before
| every write. (It's a common pattern in systems programming, so
| it's not _quite_ as crazy as it sounds.)
|
| I guess my opinion on this is that if you write your program to
| intentionally trigger _and ignore_ kill(10) / kill(11) from the
| host OS, for the sake of a speed boost, you can't really get too
| mad when the host OS gets fed up and starts sending kill(9)
| instead.
|
| I also wonder what happens in the (extremely rare) case where the
| signal the JVM is trapping is a _real_ segfault, and not an
| operating system signal.
| dzaima wrote:
| This isn't about files, this is about plain pages of RAM[0]. It
| is a basic CPU operation to trap on unmapped pages, and OSes
| rightfully expose this useful feature (in addition to using it
| themselves), allowing processes to do many things, from lazily-
| computed memory regions to removing significant amounts of
| overhead doing a thing the CPU will inevitably do itself
| anyway.
|
| I believe the "the truncation of memory mapped files" section
| is for when the Java process memory-maps a file (as Java
| provides memory-mapping operations in its standard library, and
| probably also uses them itself), and afterwards some other
| unrelated process truncates the file, resulting in the OS
| quietly making (parts of) the mappings inaccessible. Here the
| process couldn't even check the permissions before reading
| (never mind how utterly hilariously inefficient that would be,
| defeating the purpose of memory-mapping) as the mappings could
| change between the check and subsequent read anyway.
|
| [0]: https://bugs.java.com/bugdatabase/view_bug?bug_id=8327860,
| "I've managed to narrow this down to this small reproducer:"
| section
| fwlr wrote:
| You are of course completely correct.
|
| However, I still stand by my pseudocode - I claim that it
| will give a fairly accurate impression of the basic concept
| of zero-cost access checks to a reader who isn't familiar
| with low-level systems programming. (That said, I have
| updated my comment to make it clear it's more of a metaphor
| than a literal description.)
| Jtsummers wrote:
| And it's worth noting that while man mmap on macOS doesn't
| indicate what happens when the protections are violated (that
| is, if you try to read, write, or execute in violation of the
| set protections) the related function mprotect has this to
| say in macOS 14.3 (what I have available):
|
| > When a program violates the protections of a page, it gets
| a SIGBUS or SIGSEGV signal.
|
| (The Linux man pages for mmap and mprotect indicates SIGSEGV
| would be signaled.)
|
| So the past use and assumption (SIGSEGV or SIGBUS) are
| consistent with the expectations of mmap and mprotect given
| the documentation provided.
| riscy wrote:
| > macOS on Apple silicon processors (M1, M2, and M3) includes a
| feature which controls how and when dynamically generated code
| can be either produced (written) or executed on a per-thread
| basis. [...] With macOS 14.4, when a thread is operating in the
| write mode, if a memory access to a protected memory region is
| attempted, macOS will send the signal SIGKILL instead.
|
| This isn't just any old thread triggering SIGKILL, it's the JIT
| thread privileged to write to executable pages that is performing
| illegal memory accesses. That's typically a sign of a bug, and
| allowing a thread with write access to executable pages to
| continue executing after that is a security risk.
|
| But I know of other language runtimes that take advantage of
| installing signal handlers for SIGBUS/SIGSEGV to detect when they
| overflow a page so they can allocate more memory, etc. This saves
| from having to do an explicit overflow check on every allocation.
| Those threads aren't given privilege to write to executable
| memory, so they're not seeing this issue...
|
| So this sounds like a narrow design problem the JVM is facing
| with their JIT thread. This blog doesn't explain why their JIT
| thread _needs_ to make illegal memory accesses instead of an
| explicit check.
| ivan_gammel wrote:
| I wonder if it's the same reason as why Civilization 6 stopped
| working on iPadOS 17.4. Did they change something deep in the
| kernel for DMA compliance?
| zx8080 wrote:
| The main question now is why hasn't it been exposed in pre-
| release 14.4. This could mean some very urgent and risky change
| got its way to the 14.4 release, or that the whole macos release
| process is broken and unstable.
___________________________________________________________________
(page generated 2024-03-16 23:01 UTC)