[HN Gopher] macOS 14.4 causes JVM crashes
       ___________________________________________________________________
        
       macOS 14.4 causes JVM crashes
        
       Author : kingds
       Score  : 100 points
       Date   : 2024-03-16 14:32 UTC (8 hours ago)
        
 (HTM) web link (blogs.oracle.com)
 (TXT) w3m dump (blogs.oracle.com)
        
       | npalli wrote:
       | An issue introduced by macOS 14.4, which causes Java process to
       | terminate unexpectedly, is affecting all Java versions from Java
       | 8 to the early access builds of JDK 22
       | 
       | If this affects so many versions of Java and nobody notices, is
       | anyone even using Java on macOS?
        
         | semiquaver wrote:
         | Plenty of people develop for java on macs. The issue is that
         | per the article this behavior was not present in the early
         | access macOS builds, which means something changed between beta
         | and release.
        
         | CharlesW wrote:
         | For one, it doesn't affect all versions of Java. Java 20 (an
         | LTS release) and 21, for example, don't have this problem.
        
           | pritambarhate wrote:
           | JDK 21 is LTS not JDK 20.
           | 
           | https://www.oracle.com/in/java/technologies/downloads/
        
           | bremac wrote:
           | Per the bug report, all versions since Java 8 are affected.
        
             | CharlesW wrote:
             | > _Affected Version: 8,11,17,21,22_
             | 
             | This has changed (they added 21) since I posted the comment
             | above, so it looks like they're still getting a handle on
             | it.
        
         | karmakaze wrote:
         | I'm running RubyMine on 14.3.1 all the time and it's fine.
         | Should I hold off updating to 14.4 until the dust has settled?
        
           | merb wrote:
           | You should I had some Rider and IntelliJ crashes. The crash
           | does not happen often tough, but if your in the middle of
           | writing code it can get you out of the flow.
        
         | bzzzt wrote:
         | It's not terminating directly. I've seen a few IDE crashes this
         | week, less than one per day, but since there's no log there's
         | no easy way to determine it's related to a macOS change.
        
           | LgWoodenBadger wrote:
           | IntelliJ did this twice to me on Thursday and there was a
           | crash log both times. I only reported one to Apple.
           | 
           | Did you check the Console app for crash reports?
        
         | lanna wrote:
         | Maybe not a lot of macOS devs use Java, but a lot of Java devs
         | use macOS
        
           | seanalltogether wrote:
           | Also, if you're a mobile developer you likely have a Mac, and
           | if you're a mobile dev that doesn't target iOS exclusively,
           | then you run java.
        
         | latchkey wrote:
         | > is anyone even using Java on macOS?
         | 
         | IntelliJ IDEA, the product itself, is JVM based.
        
         | comonoid wrote:
         | It broke since recent MacOS 14.4, even at 14.4 betas it worked.
        
         | bombcar wrote:
         | Minecraft runs on various Javas.
         | 
         | And there's a known issue with an interaction between
         | minecraft, Java, and the video drivers that crashes out and it
         | can be traced back all the way to here:
         | https://github.com/glfw/glfw/issues/1997
         | 
         | It's not fixed.
        
         | nurettin wrote:
         | Sonoma has been out for only one week!
        
       | CharlesW wrote:
       | > _" As a normal part of the just-in-time compile and execute
       | cycle, processes running on macOS may access memory in protected
       | memory regions."_
       | 
       | I'm just a lowly JavaScript/TypeScript/PHP programmer, but what
       | is the Very Good Reason that Java trying to access other
       | processes' memory?
        
         | royjacobs wrote:
         | The reasons are literally spelled out in the following
         | paragraphs.
        
           | CharlesW wrote:
           | I'm asking because the reasons seem dumb to me, which is why
           | I'm asking people smarter than I am about low-level memory
           | management if they're legitimate.
        
             | rzzzt wrote:
             | JIT compilation can happen at any time. The runtime wants
             | to create a native version of a previously interpreted
             | snippet of code when it is called frequently enough to
             | warrant this.
             | 
             | The article also describes W^X functionality, which means a
             | region of memory is either executable (x)or writable. On
             | macOS 14.4 violating this either-or condition results in a
             | signal that can not be handled by the process.
        
             | znafelrif wrote:
             | The article doesnt say anything about the JVM accessing
             | other processes memory though.
        
         | mayoff wrote:
         | I don't think the article claims that a Java process tries to
         | access some other process's memory.
         | 
         | In a typical modern operating system, a memory page can be non-
         | writable and non-executable, writable and non-executable, or
         | non-writable and executable, but not simultaneously writable
         | AND executable.
         | 
         | If you generate executable code at runtime, then you need write
         | access to a page to write the executable code into that page.
         | Then you need to tell the operating system to change the page
         | from writable to executable.
         | 
         | If you then try to write to the page, you'll get a signal
         | (SIGSEGV or SIGBUS, according to the article).
         | 
         | Oracle's JVM apparently relies on this behavior: a Java process
         | sometimes tries to write to a page (in its own memory space)
         | that is not marked writable. The JVM then catches the SIGSEGV
         | and recovers (perhaps by asking the operating system to change
         | the page back from executable to writable, or by arranging to
         | write to a different page, or to abort the write operation
         | altogether).
        
           | Traubenfuchs wrote:
           | Thank you, that explained it way better than the original
           | link.
        
         | scialex wrote:
         | It's not. It's trying to access unmapped or protected memory in
         | its own process.
         | 
         | Basically what its used for is to implement an 'if' that's
         | super fast on the most likely path but super slow on the less
         | likely path.
         | 
         | It's not super clear what its being used for (this is often
         | used for the GC but the fact that graal isn't affected means
         | that likely still works). Possibly they are using this to
         | detect attempts to use inline-cache entries that have been
         | deleted.
        
           | moonchild wrote:
           | object.field is implemented as a direct load from the object;
           | if the object turned out to be null, then the resultant
           | signal is caught and turned into a NullPointerException
        
         | olliej wrote:
         | It depends on exactly what is being done.
         | 
         | A fairly common idiom is to use memory protection to provide
         | zero cost access checks, as you can generally catch the signals
         | produced by most memory faults, and then work out where things
         | went wrong and convert the memory access error into a catchable
         | exception, or to lazily construct data structures or code.
         | 
         | So you want the trap, but the trap itself can be handled. It
         | sounds like there's been a semantic change when the trap occurs
         | for execution of an address or an access to an executable page.
         | 
         | There are also a bunch of poorly documented Mac APIs to inform
         | the memory manager and linker about JIT regions and I wonder if
         | it's related to those. It really depends on exactly what
         | oracle's jvm is trying to do, and what the subsequent cause of
         | the fault is.
         | 
         | Certainly it's a less than optimal failure though :-/
        
         | samus wrote:
         | Accessing such areas is sometimes done deliberately since
         | programmers could rely on the OS telling them what just
         | happened using signals instead of nuking the process. Doing it
         | without signals is usually slow and/or clunky (null-pointer
         | checks, read/write permissions, existence of pages), or
         | straight out impossible.
         | 
         | Accessing other processes' memory is not the concern since
         | virtual memory provides each process the illusion of having the
         | entire address space for itself.
        
       | MaxBarraclough wrote:
       | Is the signals change in macOS likely to affect JIT-based systems
       | other than the OpenJDK JVM?
        
       | xyst wrote:
       | Apple and macOS is slowly becoming another Windows in terms of
       | stability.
       | 
       | There was a HN post about a hashicorp founder using Linux within
       | a vm on their mbp. Might adopt that same approach, if I can find
       | the og post.
        
         | open592 wrote:
         | Here's the YouTube link from Mitchell. I was thinking about
         | doing something similar lately too.
         | 
         | https://youtu.be/ubDMLoWz76U?si=ipmho73-r9FzZpBp
        
         | nullwarp wrote:
         | This is what I do when my job forced me to use a mac. I think
         | the only thing I installed on the mac outside of it was
         | Firefox.
         | 
         | Worked great for years before I changed jobs that let me bring
         | my own hardware finally.
        
           | neeleshs wrote:
           | What is your preferred hardware and flavor of Linux for this?
           | I'm trying to do the same
        
             | rzzzt wrote:
             | Rancher Desktop used Lima + QEMU behind the scenes:
             | https://lima-vm.io/
        
         | Kipters wrote:
         | To be fair, this is the kind of breakage I'd expect from macOS,
         | but never from Windows
        
         | secondcoming wrote:
         | When's the last time you had a BSOD on Windows? I honestly
         | can't recall.
        
       | DuskHorizon wrote:
       | Well, that's why Apple forbids use of private APIs in the App
       | Store apps. If you built all your tech stack on the foundation of
       | some peculiar nondocumented platform's behavior, don't be
       | surprised when this stack breaks.
        
         | bhawks wrote:
         | This is not an API. It's the handling of writes to memory the
         | process has protected. In the past this would generate a signal
         | the process could handle and recover from. Now it generates a
         | sigkill which is uncatchable / unrecoverable from.
         | 
         | These behaviours have been historically well documented.
        
           | DuskHorizon wrote:
           | All system idiosyncrasies are APIs in the long run ;)
        
             | fifteen1506 wrote:
             | The change of a SIGSEGV to a SIGKILL, seriously?
        
               | DuskHorizon wrote:
               | And, why not? macOS is Apple's IP and they have all
               | rights to do with it as they want. Buy the way,
               | Chrome/Node.js JavaScript engine uses JIT compilation
               | too. Are they affected?
        
               | samus wrote:
               | This breaks POSIX compatibility, which is basically
               | saying FU to how a lot of developers expect to interact
               | with an operating system for decades now.
        
       | olliej wrote:
       | A gross and low performance option for now might be to run Java
       | under Rosetta, but I'm saying that based on them saying that this
       | is apple silicon specifically and processes under rosetta have a
       | bunch of quirks to support intel semantics. This would allow you
       | to work around this for now.
       | 
       | That said I'm curious what the exact scenario that leads to this
       | is, I'm assuming it's not common as you would expect it to have
       | come up during betas and pre -release seeds.
        
       | fwlr wrote:
       | "The Java Virtual Machine [...] leverages the _protected memory
       | access signal mechanism_ both for correctness (e.g., to handle
       | the truncation of memory mapped files) and for performance."
       | 
       | Where by "protected memory access signal mechanism", they mean
       | SIGBUS/SIGSEGV, i.e., a segfault.
       | 
       | This is probably because the JVM is doing "zero cost access
       | checks", which is where you do the moral equivalent of:
       | try {           writeToFile()         } catch(err) {           if
       | (err == SYSTEM_CRASH_IMMINENT) {
       | changeFilePermissions()             retry           }         }
       | 
       | ...because it's faster than checking file permissions before
       | every write. (It's a common pattern in systems programming, so
       | it's not _quite_ as crazy as it sounds.)
       | 
       | I guess my opinion on this is that if you write your program to
       | intentionally trigger _and ignore_ kill(10)  / kill(11) from the
       | host OS, for the sake of a speed boost, you can't really get too
       | mad when the host OS gets fed up and starts sending kill(9)
       | instead.
       | 
       | I also wonder what happens in the (extremely rare) case where the
       | signal the JVM is trapping is a _real_ segfault, and not an
       | operating system signal.
        
         | dzaima wrote:
         | This isn't about files, this is about plain pages of RAM[0]. It
         | is a basic CPU operation to trap on unmapped pages, and OSes
         | rightfully expose this useful feature (in addition to using it
         | themselves), allowing processes to do many things, from lazily-
         | computed memory regions to removing significant amounts of
         | overhead doing a thing the CPU will inevitably do itself
         | anyway.
         | 
         | I believe the "the truncation of memory mapped files" section
         | is for when the Java process memory-maps a file (as Java
         | provides memory-mapping operations in its standard library, and
         | probably also uses them itself), and afterwards some other
         | unrelated process truncates the file, resulting in the OS
         | quietly making (parts of) the mappings inaccessible. Here the
         | process couldn't even check the permissions before reading
         | (never mind how utterly hilariously inefficient that would be,
         | defeating the purpose of memory-mapping) as the mappings could
         | change between the check and subsequent read anyway.
         | 
         | [0]: https://bugs.java.com/bugdatabase/view_bug?bug_id=8327860,
         | "I've managed to narrow this down to this small reproducer:"
         | section
        
           | fwlr wrote:
           | You are of course completely correct.
           | 
           | However, I still stand by my pseudocode - I claim that it
           | will give a fairly accurate impression of the basic concept
           | of zero-cost access checks to a reader who isn't familiar
           | with low-level systems programming. (That said, I have
           | updated my comment to make it clear it's more of a metaphor
           | than a literal description.)
        
           | Jtsummers wrote:
           | And it's worth noting that while man mmap on macOS doesn't
           | indicate what happens when the protections are violated (that
           | is, if you try to read, write, or execute in violation of the
           | set protections) the related function mprotect has this to
           | say in macOS 14.3 (what I have available):
           | 
           | > When a program violates the protections of a page, it gets
           | a SIGBUS or SIGSEGV signal.
           | 
           | (The Linux man pages for mmap and mprotect indicates SIGSEGV
           | would be signaled.)
           | 
           | So the past use and assumption (SIGSEGV or SIGBUS) are
           | consistent with the expectations of mmap and mprotect given
           | the documentation provided.
        
       | riscy wrote:
       | > macOS on Apple silicon processors (M1, M2, and M3) includes a
       | feature which controls how and when dynamically generated code
       | can be either produced (written) or executed on a per-thread
       | basis. [...] With macOS 14.4, when a thread is operating in the
       | write mode, if a memory access to a protected memory region is
       | attempted, macOS will send the signal SIGKILL instead.
       | 
       | This isn't just any old thread triggering SIGKILL, it's the JIT
       | thread privileged to write to executable pages that is performing
       | illegal memory accesses. That's typically a sign of a bug, and
       | allowing a thread with write access to executable pages to
       | continue executing after that is a security risk.
       | 
       | But I know of other language runtimes that take advantage of
       | installing signal handlers for SIGBUS/SIGSEGV to detect when they
       | overflow a page so they can allocate more memory, etc. This saves
       | from having to do an explicit overflow check on every allocation.
       | Those threads aren't given privilege to write to executable
       | memory, so they're not seeing this issue...
       | 
       | So this sounds like a narrow design problem the JVM is facing
       | with their JIT thread. This blog doesn't explain why their JIT
       | thread _needs_ to make illegal memory accesses instead of an
       | explicit check.
        
       | ivan_gammel wrote:
       | I wonder if it's the same reason as why Civilization 6 stopped
       | working on iPadOS 17.4. Did they change something deep in the
       | kernel for DMA compliance?
        
       | zx8080 wrote:
       | The main question now is why hasn't it been exposed in pre-
       | release 14.4. This could mean some very urgent and risky change
       | got its way to the 14.4 release, or that the whole macos release
       | process is broken and unstable.
        
       ___________________________________________________________________
       (page generated 2024-03-16 23:01 UTC)