https://asahilinux.org/2021/08/progress-report-august-2021/ [AsahiLinux] * About * Community * Contribute * GitHub * Wiki * Blog * Donate / Blog / Progress Report: August 2021 Progress Report: August 2021 * Previous * It's been a long time since the last update! In all honesty, the first Progress Report set the bar a little bit too high, and I found it difficult to sit down and put together monthly reports that would do it justice. So, going forward, we're going to be providing shorter-form updates while striving to keep a monthly schedule. That said, a lot has happened in the past few months, so strap in for a bigger update this time! Core bring-up upstreamed into Linux 5.13 The core bring-up work that we detailed in our our first progress report was upstreamed and released on June 27 as part of Linux 5.13! This is not very useful for end-users at this early stage, but it represents months of work laying down a foundation and figuring out how to solve certain difficult problems in a way that is acceptable to the upstream kernel community. This has also gotten members of the kernel community interested in our project. This is important, as having a good relationship with kernel veterans is critical to ensuring we can work together to keep things upstreamed as development moves forward. Hardware Reverse Engineering with the m1n1 Hypervisor The M1 represents a massive reverse engineering challenge, with lots of bespoke, completely undocumented hardware. One approach to reverse engineering hardware is blind probing, as we used to reverse engineer the Apple Interrupt Controller, but this doesn't really work for more complicated hardware. In order to properly understand how to drive the hardware, we have to look at the only piece of documentation that exists: macOS itself. It would be technically possible to disassemble and reverse engineer the macOS drivers themselves, but this poses legal challenges that could put the copyright status of our project in jeopardy, as well as being inefficient since a lot of the code is specific to the macOS driver framework and doesn't give us any useful information about the hardware. Instead, a much safer approach that has been used by projects such as Nouveau in the past is to record a log of the hardware accesses that the official drivers perform on a real system, without actually looking at the code. Nouveau accomplished this by using a Linux driver to intercept accesses by Nvidia's official Linux driver. Of course, Apple's M1 drivers are for macOS, not Linux. While we could implement the same approach with a custom patch to the open source core of the macOS kernel, we decided instead to go one level deeper and build a hypervisor that can run the entirety of macOS, unmodified, in a VM that transparently presents it the real M1 hardware. This is very different from a typical virtual machine, which is designed to run a guest OS on top of a host OS, with a full set of virtualized hardware. Our hypervisor, which is built on our m1n1 bootloader and hardware experimentation tool, is a completely bespoke implementation. It is designed to mostly stay out of the way of the guest OS, running it in an environment as close to bare metal as possible, while just transparently intercepting and logging hardware accesses. Thus, macOS "sees" the real M1 hardware, and interacts with it as normal - complete with a full accelerated desktop. Hello, world from macOS running on the m1n1 hypervisor! With one blazing fast efficiency core, and all graphics MMIO logged via USB! pic.twitter.com/28mBnzZIOC -- Asahi Linux (@AsahiLinux) May 27, 2021 This tweet was posted from Safari on macOS, running on the hypervisor Since the hypervisor is built on m1n1, it works together with Python code running on a separate host machine. Effectively, the Python host can "puppeteer" the M1 and its guest OS remotely. The hypervisor itself is partially written in Python! This allows us to have a very fast test cycle, and we can even update parts of the hypervisor itself live during guest execution, without a reboot. The hypervisor also includes standard debugging tools (like stopping execution, single-stepping, and getting a backtrace). This makes it not just useful for reverse engineering, but also as a low-level debugging tool for m1n1 itself and Linux, since they can also run on the hypervisor. Yes, you can now run m1n1 on m1n1! If you're intererested in the inner workings of the hypervisor, I did a 3-hour code recap stream covering most of the implementation, as well as the general topics of ARMv8-A virtualization, M1-specific details and oddities, and more. On top of the hypervisor, we've built a flexible hardware I/O tracing framework that allows us to seamlessly load and upload tracers that understand how a particular piece of hardware works. For example, the tracer for the GPIO (General Purpose I/O) hardware can tell us when macOS toggles the state or configuration of each GPIO pin. This allows us to build up our understanding of the hardware, from raw register reads and writes to higher level functions. This was invaluable for the next bit of hardware we tackled: the DCP. Reverse Engineering DCP One of the biggest challenges for Asahi Linux is making the M1's GPU work. But what most people think of as a "GPU" is actually two completely distinct pieces of hardware: the GPU proper, which is in charge of rendering frames in memory, and the display controller, which is in charge of sending those rendered frames from memory to the display. While Alyssa has been hard at work reverse engineering the userspace components of the GPU, from draw calls to shaders, we still haven't looked at the lowest levels of the hardware that handle memory management and submission of commands to the GPU. But before we can use the GPU to render anything, we need a way to put it on the screen! Up until now, we've been using the firmware-provided framebuffer, which is just an area of memory where we can write pixels to be shown on the screen, but this won't cut it for a real desktop. We need features such as displaying new frames without tearing, support for hardware sprites such as the mouse cursor, switching resolutions and configuring multiple outputs, and more. This is the job of the display controller. On most mobile SoCs, the display controller is just a piece of hardware with simple registers. While this is true on the M1 as well, Apple decided to give it a twist. They added a coprocessor to the display engine (called DCP), which runs its own firmware (initialized by the system bootloader), and moved most of the display driver into the coprocessor. But instead of doing it at a natural driver boundary … they took half of their macOS C++ driver, moved it into the DCP, and created a remote procedure call interface so that each half can call methods on C++ objects on the other CPU! Talk about overcomplicating things… Reverse engineering this is a huge challenge, but thanks to the hypervisor, we can build up our understanding of how this all works layer by layer. At the lowest layer, DCP is an instance of what apple calls an "ASC", which is their term for these coprocessors (the M1 has about a dozen!). ASC processors run their own firmware and communicate with the main CPU through a mailbox interface, which is a simple message queue where each side can send 64-bit messages to the other, tagged with an "endpoint". Above this simple interface, Apple uses a shared set of endpoints for all ASC processors that run RTKit, Apple's bespoke RTOS. This interface provides features such as sending syslog messages and crash dumps from the ASC to the main CPU, and initializing endpoints (services). So we built a tracer that can understand these messages, and do things like print the syslog messages directly to the hypervisor console. On top of this, the DCP implements multiple endpoints, one of which serves as the "main" interface. This interface itself supports making remote method calls in both directions. The DCP often issues synchronous callbacks to the main CPU after it receives a call, and the main CPU can in turn issue more synchronous DCP calls - effectively, the execution call stack extends across the CPU-to-DCP boundary! The interface even supports asynchronous reentrancy, having multiple "channels" so that, for example, the DCP can send asynchronous messages to the main CPU at any time, even during another operation. The method calls themselves send their arguments and return data via buffers in shared memory. These buffers encode simple types like integers; pointers that can pass data in the input, output, or both directions; more complex fixed structures; and even two different serialization formats for JSON-like higher-level data structures (blame IOKit)! Our dcp tracer takes these buffers and dumps them out to a trace file, so that they can be analyzed offline as we improve our understanding of the protocol. We then started building a Python implementation of this RPC protocol and marshaling system. This implementation serves a triple purpose: it allows us to parse the DCP logs from the hypervisor to understand what macOS does, it allows us to build a prototype DCP driver entirely in Python, and it will in the future be used to automatically generate marshaling code for the Linux kernel DCP driver. Passing the hypervisor DCP traces through this decoder, we can get a trace of all the method calls that are exchanged between macOS and the DCP: >C[0x0] A401 IOMobileFramebufferAP::start_signal() d[0x0] D598 IOMobileFramebufferAP::find_swap_function_gated() d[0x0] D107 UnifiedPipeline2::create_provider_service() = True [...] d[0x0] D000 UPPipeAP_H13P::did_boot_signal() = True d[0x0] D001 UPPipeAP_H13P::did_power_on_signal() = True C[0x40] A357 UnifiedPipeline2::set_create_DFB() C[0x40] A443 IOMobileFramebufferAP::do_create_default_frame_buffer() C[0x40] A103 UPPipe2::test_control(cmd=0, arg=2863267840) C[0x40] A029 UPPipeAP_H13P::setup_video_limits() d[0x40] D107 UnifiedPipeline2::create_provider_service() = True d[0x40] D401 ServiceRelay::sr_get_uint_prop(obj='PROV', key='minimum-frequency', value=0) = False d[0x40] D107 UnifiedPipeline2::create_provider_service() = True d[0x40] D408 ServiceRelay::sr_getClockFrequency(obj='PROV', arg=0) = 533333328 d[0x40] D300 PropRelay::pr_publish(prop_id=38, value=470741) d[0x40] D563 IOMobileFramebufferAP::setProperty_int(key='MaxVideoSrcDownscalingWidth', value=27582) = True d[0x40] D563 IOMobileFramebufferAP::setProperty_int(key='VideoClock', value=74250000) = True d[0x40] D563 IOMobileFramebufferAP::setProperty_int(key='PixelClock', value=533333328) = True C[0x40] A463 IOMobileFramebufferAP::flush_supportsPower(arg0=True) C[0x40] A036 UPPipeAP_H13P::apt_supported() C[0x40] A000 UPPipeAP_H13P::late_init_signal() ) [...] C[0x40] A460 IOMobileFramebufferAP::setDisplayRefreshProperties() d[0x40] D561 IOMobileFramebufferAP::setProperty_dict(key='IOMFBDisplayRefresh', value=...) = True d[0x0] D116 UnifiedPipeline2::start_hardware_boot() = True