https://lwn.net/SubscriberLink/1026694/3413f4b43c862629/ LWN.net Logo LWN .net News from the source LWN * Content + Weekly Edition + Archives + Search + Kernel + Security + Events calendar + Unread comments + ------------------------------------------------------------- + LWN FAQ + Write for us User: [ ] Password: [ ] [Log in] | [Subscribe] | [Register] Subscribe / Log in / New account How to write Rust in the kernel: part 3 [LWN subscriber-only content] Welcome to LWN.net The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net! By Daroc Alden July 18, 2025 --------------------------------------------------------------------- Rust in the kernel The interfaces between C and Rust in the kernel have grown over time; any non-trivial Rust driver will use a number of these. Tasks like allocating memory, dealing with immovable structures, and interacting with locks are necessary for handling most devices. There are also many subsystem-specific bindings, but the focus of this third item in our series on writing Rust in the kernel will be on an overview of the bindings that all kernel Rust code can be expected to use. Rust code can call C using the foreign function interface (FFI); given that, one potential way to integrate Rust into the kernel would have been to let Rust code call kernel C functions directly. There are a few problems with that approach, however: __always_inline functions, non-idiomatic APIs, etc. In particular, C and Rust have different approaches to freeing memory and locking. During the early planning phases, the project proposed adopting a rule that there should be a single, centralized set of Rust bindings for each subsystem, as explained in the kernel documentation. This has the disadvantage (compared to direct use of Rust's FFI) of creating some extra work for a Rust programmer who wishes to call into a new area of the kernel, but as more bindings are written that need should go away over time. The advantage of the approach is that there's a single set of standardized Rust interfaces to learn, with all of the documentation in one place, which should make building and understanding the bindings less work overall. The interfaces can also be reviewed by the Rust maintainers in one place for safety and quality. Allocating memory Like C, Rust puts local variables (including compound structures) on the stack by default. But most programs will eventually need the flexibility offered by heap allocation and the limitations on kernel-stack size mean that even purely local data may require heap-allocation. In user space, Rust programs use automatic heap allocations for some types -- mainly Box (a smart pointer into the heap) and Vec (a growable, heap-allocated array). In the kernel, these interfaces would not provide nearly enough control. Instead, allocations are performed using the interfaces in the kernel::alloc module, which allow for specifying allocation flags and handling the possibility of failure. The Rust interfaces support three ways to allocate kernel memory: Kmalloc, Vmalloc, and KVmalloc, corresponding to the memory-management API functions with similar names. The first two allocate physically contiguous memory or virtually contiguous memory, respectively. KVmalloc first tries to allocate physically contiguous memory, and then falls back to virtually contiguous memory. No matter which allocator is used, the pointers that are exposed to Rust are part of the virtual address space, as in C. These three different types all implement the Allocator interface, which is similar to the unstable user-space trait of the same name. While the allocators can be used to directly create a [u8] (a sized array of bytes; conceptually similar to how malloc() returns a void * instead of a specific type), the more ergonomic and less error-prone use is to allocate Box or Vec structures. Since memory allocation is so common, the interfaces provide short aliases for boxes and vectors made with each allocator, such as KBox, KVBox, VVec, etc. Reference counted allocations can be made with Arc. The choice of allocator is far from the only thing that kernel programmers care about when allocating memory, however. Depending on the context, it may or may not be acceptable to block, to swap, or to receive memory from a particular zone. When allocating, the flags in kernel::alloc::flags can be used to specify more details about how the necessary memory should be obtained: let boxed_integer: Result, AllocError> = KBox::new(42, GFP_KERNEL); That example allocates an unsigned 64-bit integer, initialized to 42, with the usual set of allocation flags (GFP_KERNEL). For a small allocation like this, that likely means the memory will come from the kernel's slab allocator, possibly after triggering memory reclamation or blocking. This particular allocation cannot fail, but a larger one using the same API could, if there is no suitable memory available, even after reclamation. Therefore, the KBox::new() function doesn't return the resulting heap allocation directly. Instead, it returns a Result that contains either the successful heap allocation, or an AllocError. Reading generic types C doesn't really have an equivalent of Rust's generic types; the closest might be a macro that can be used to define a structure with different types substituted in for a field. In this case, the Result that KBox::new() returns has been given two additional types as parameters. The first is the data associated with a non-error result, and the second is the data associated with an error result. Matching angle brackets in a Rust type always play this role of specifying a (possibly optional) type to include as a field nested somewhere inside the structure. Boxes, as smart pointers, have a few nice properties compared to raw pointers. A KBox is always initialized -- KBox::new() takes an initial value, as shown in the example above. Boxes are also automatically freed when they are no longer referenced, which is almost always what one wants from a heap allocation. When that isn't the case, the KBox::leak() or KBox::into_raw() methods can be used to override Rust's lifetime analysis and let the heap allocation live until the programmer takes care of it with KBox::from_raw(). Of course, there are also times when a programmer would like to allocate space on the heap, but not actually fill it with anything yet. For example, the Rust user-space memory bindings use it to allocate a buffer for user-space data to be copied into without initializing it. Rust indicates that a structure may be uninitialized by wrapping it in MaybeUninit; allocating a Box holding a MaybeUninit works just fine. Self-referential structures The kernel features a number of self-referential structures, such as doubly linked lists. Sharing these structures with Rust code poses a problem: moving a value that refers to itself (including indirectly) could cause the invariants of this kind of structure to be violated. For example, if a doubly linked list node is moved, node->prev->next will no longer refer to the right address. In C, programmers are expected to just not do that. But Rust tries to localize dangerous operations to areas of the code marked with unsafe. Moving values around is a common thing to do; it would be inconvenient if it were considered unsafe. To solve this, the Rust developers created an idea called "pinning", which is used to mark structures that cannot be safely relocated. The standard library is designed in such a way that these structures cannot be moved by accident. The Rust kernel developers imported the same idea into the kernel Rust APIs; when referencing a self-referential structure created in C, it must be wrapped in the Pin type on the Rust side. (Some other pointers in the kernel API, notably Arc, include an implicit Pin, so the wrapping may not always be visible). It might not immediately cause problems if Pin were omitted in the Rust bindings for a self-referential structure, but it would still be unsound, since it could let ostensibly safe Rust driver code cause memory corruption. To simplify the process of allocating a large structure with multiple pinned components, the Rust API includes the pin_init!() and try_pin_init!() macros. Prior to their inclusion in the kernel, creating a pinned allocation was a multi-step process using unsafe APIs. The macro works along with the #[pin_data] and #[pin] macros in a structure's definition to build a custom initializer. These PinInit initializers represent the process of constructing a pinned structure. They can be written by hand, but the process is tedious, so the macros are normally used instead. Language-level support is the subject of ongoing debate in the Rust community. PinInit structures can be passed around or reused to build an initializer for a larger partially-pinned structure, before finally being given to an allocator to be turned into a real value of the appropriate type. See below for an example. Locks User-space Rust code typically organizes locks by having structures that wrap the data covered by the lock. The kernel API makes lock implementations matching that convention available. For example, a Mutex actually contains the data that it protects, so that it can ensure all accesses to the data are made with the Mutex locked. Since C code doesn't tend to work like this, the kernel's existing locking mechanisms don't translate directly into Rust. In addition to traditional Rust-style locks, the kernel's Rust APIs include special types for dealing with locks separated from the data they protect: LockedBy, and GlobalLockedBy. These use Rust's lifetime system to enforce that a specific lock is held when the data is accessed. Currently, the Rust bindings in kernel::sync support spinlocks, mutexes, and read-side read-copy-update (RCU) locks. When asked to look over an early draft of this article, Benno Lossin warned that the current RCU support is "`very barebones'", but that the Rust developers plan to expand on it over time. The spinlocks and mutexes in these bindings require a lockdep class key to create, so all of the locks used in Rust are automatically covered by the kernel's internal locking validator. Internally, this involves creating some self-referential state, so both spinlocks and mutexes must be pinned in order to be used. In all, defining a lock in Rust ends up looking like this example lightly adapted from some of the Rust sample code: // The `#[pin_data]` macro builds the custom initializer for this type. #[pin_data] struct Configuration { #[pin] data: Mutex<(KBox<[u8; PAGE_SIZE]>, usize)>, } impl Configuration { // The value returned can be used to build a larger structure, or it can // be allocated on the heap with `KBox::pin_init()`. fn new() -> impl PinInit { try_pin_init!(Self { // The `new_mutex!()` macro creates a new lockdep class and // initializes the mutex with it. data <- new_mutex!((KBox::new([0; PAGE_SIZE], flags::GFP_KERNEL)?, 0)), }) } } // Once created, references to the structure containing the lock can be // passed around in the normal way. fn show(container: &Configuration, page: &mut [u8; PAGE_SIZE]) -> Result { // Calling the mutex's `lock()` function returns a smart pointer that // allows access only so long as the lock is held. let guard = container.data.lock(); let data = guard.0.as_slice(); let len = guard.1; page[0..len].copy_from_slice(&data[0..len]); Ok(len) // `guard` is automatically dropped at the end of its containing scope, // freeing the lock. Trying to return data from inside the lock past the // end of the function without copying it would be a compile-time error. } Using a lock defined in C works much like in show() above, except that there is an additional step to handle the fact that the data may not be directly contained in the lock structure: // The C lock will still be released when guard goes out of scope. let guard = c_lock.lock(); // Data that is marked as `LockedBy` in the Rust/C bindings takes a reference // to the guard of the matching lock as evidence that the lock has been acquired. let data = some_other_structure.access(&guard); See the LockedBy examples for a complete demonstration. The interface is slightly more conceptually complicated than C's mutex_lock() and mutex_unlock(), but it does have the nice property of producing a compiler error instead of a run-time error for many kinds of mistakes. The mutex in this example cannot be double-locked or double-freed, nor can the data be accessed without the lock held. It can still be locked from a non-sleepable context or get involved in a deadlock, however, so some care is still required -- at least until the custom tooling to track and enforce kernel locking rules at compile time is complete. This kind of safer interface is, of course, the ultimate purpose behind introducing Rust bindings into the kernel -- to make it possible to write drivers where more errors can be caught at compile time. No machine-checked set of rules can catch everything, however, so the next (and likely final) article in this series will focus on things to look for when reviewing Rust patches. [Send a free link] ----------------------------------------- [Log in] to post comments Purpose of this series? Posted Jul 18, 2025 17:42 UTC (Fri) by willy (subscriber, #9762) [ Link] (6 responses) Maybe I misunderstood why you were writing this series, because I was expecting more along the lines of "if you know how to write Kernel C, this is how to write Kernel Rust". This article focuses on "This is how to write Rust bindings to C", which is a much more specialized thing to want to do. [Reply to this comment] Purpose of this series? Posted Jul 18, 2025 18:13 UTC (Fri) by daroc (editor, #160859) [Link] Yes, that is the goal of the series. So it's entirely possible that I've just failed to write something that lives up to that goal. My _intent_ with this article was to give people the library-level knowledge about kernel Rust that they would need (to go with the build-system level and introductory language-level knowledge from the first two articles). But if it came across as being more about how to write the Rust bindings then about "these are the bindings you are almost certainly going to have to use, here are the things that are different than C", then that's my mistake. [Reply to this comment] Purpose of this series? Posted Jul 18, 2025 18:41 UTC (Fri) by cpitrat (subscriber, #116459) [Link] I'm confused, to me the article read a lot like "how to use bindings", not "how to write bindings". Which part describes writing bindings? [Reply to this comment] Purpose of this series? Posted Jul 19, 2025 4:18 UTC (Sat) by lambda (subscriber, #40735) [ Link] I think that a lot of times, in order to understand how something works, you have to learn a little bit about how it's made. To learn how to use kernel Rust bindings, you need to learn a little bit about how and why they are built that way. I would say this gives some good background on why certain aspects of the Rust bindings are the way they are, which helps you understand how to use them. [Reply to this comment] Purpose of this series? Posted Jul 19, 2025 6:04 UTC (Sat) by adobriyan (guest, #30858) [Link ] (2 responses) 1) learn what references are: &T, &mut T Everything revolves around references and destructive move. T&& from C++ is not a thing. 2) arithmetic evaluates from left to right This is important because overflow checks are done per individual operation. Given that integer overflow panics(!) everything that comes from userspace must be checked with some combination of checked_*/ overflowing_* stuff. There is even an example in the kernel: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/... 3) variable shadowing is allowed, even encouraged I'm sure Misra Rust will eventually ban it but normal programmers use it to their advantage. 4) hiding temporary stuff in blocks is encouraged to minimise mistakes: let buf = { let mut f = File::open(&filename)?; let mut buf = vec![]; f.read_to_end(&mut buf)?; buf }; Things to unlearn from C: 1) variable are declared as you go, no more big declaration blocks in the beginning 2) top level in unordered, forward declarations aren't a thing C++ has this for methods inside struct/class scope 3) merging types like kernel does with ERR_PTR is cardinal sin 4) if you wrote lots of mutable variables you're probably doing something wrong 5) functions can return multiple values, returning stuff via pointers/references is another cardinal sin [Reply to this comment] Purpose of this series? Posted Jul 19, 2025 6:07 UTC (Sat) by adobriyan (guest, #30858) [Link ] and of course I forgot deterministic destructor running at the end of the scope (earlier with std::mem::drop()). Linux is getting taste of this implicitness (which is not scary at all if done right) with __attribute__((cleanup)) which is badly done counterfeit version. [Reply to this comment] Purpose of this series? Posted Jul 19, 2025 14:29 UTC (Sat) by iabervon (subscriber, #722) [ Link] The thing about variable shadowing is that variables often become invalid as their ownership is given away. This makes it better rather than worse to reuse their names, especially in the case where the new variable is a destructive transformation of the old variable; if you have a typo in the shadowing declaration, and use the name later, C will use the corrupted value (which is why Misra doesn't like it), but Rust will give you an error instead. Possibly Misra will ban shadowing a non-moved value, but I think that still allows the normal usage of shadowing. [Reply to this comment] Pinning continues to be the most difficult aspect of Rust to understand Posted Jul 18, 2025 20:02 UTC (Fri) by NYKevin (subscriber, #129325) [Link] Pinning is unfortunately rather difficult to follow even in userspace. I would suggest anyone who's struggling with Pin to read the Rust userspace documentation (for the std::pin module) to get a better understanding of how it works, but here's a basic summary: 1. By default, anything can be moved at any time. It is also safe (but usually bad practice) to reuse an object's memory without dropping it (in C++ terms: all types are trivially destructible, so the destructor may not be used to uphold a safety invariant). You can even do the latter in safe Rust with MaybeUninit::write(). As a reminder, moving is always equivalent to calling memcpy() and then vanishing the original in a puff of logic (i.e. setting a flag so that its drop glue does not run, removing its binding from the scope so that safe Rust can no longer interact with it, and in most cases the memory is ultimately deallocated by one means or another), but the compiler is permitted to optimize the resulting code as it sees fit. 2. If Ptr is a smart pointer type (like Box) or either of &T or & mut T, then whenever a Pin> exists, rule 1 is suspended for the pointee (the T instance). The pointee is not allowed to be moved, and its memory may not be reused until it is properly dropped (the T is "pinned"). This is considered a safety invariant, and T (or code that interacts with T) is allowed to invoke UB if it is violated. Importantly, only the pointee is pinned, so the Ptr instance can still be freely moved. This rule applies on a per-instance basis - other instances of T are unaffected and continue to follow rule 1 (unless they have been pinned separately). 3. If T implements the trait Unpin, then pinning it has no effect and rule 2 is entirely ignored (rule 1 is reinstated for every instance of T, regardless of whether it is pinned). Because of the orphan rule, you're only allowed to implement Unpin on a type that you defined (in the same module as the Unpin implementation), so you can't go around disabling the safety invariants on foreign code. Most "simple" types implement Unpin automatically - implementing Unpin is the usual state of affairs, and can be understood as "this type never cares if it gets moved around." For example, an i64 in isolation will not "break" if it gets moved or overwritten, so i64 implements Unpin. But a struct containing an i64 might have other fields that do care about their addresses, or the struct as a whole might care about its address (due to its relationship with some other piece of code), so the author can decide whether the struct implements Unpin or not. The default is to auto-implement Unpin iff all field types implement Unpin, but this may be overridden. 4. Rule 1 is a language-level rule and rules 2 and 3 are (mostly) library-level rules (except for auto-implementation of Unpin, that requires a tiny amount of language support). This is the reason that pinning is so weird - it has to work around the language's implicit assumption that pinning is Not A Thing. In practice, this consists of convincing the borrow checker to disallow operations that violate the pinning invariant, but the double indirection of Pin> makes it rather more convoluted than we might otherwise expect (you can never allow &mut T to "escape" the Pin, or else std::mem::swap() etc. could be used to move it). There has been significant discussion of how and whether to promote (2) and (3) into language-level rules so that pinning can become less complicated and easier to understand, but there's still quite a few open questions about exactly how it should work. There are a number of other complications described in std::pin's documentation, but I won't go into them here, because otherwise this comment would triple in length. If the above rules leave you with followup questions, I strongly encourage reading that documentation - it really is quite comprehensive. But here are some simple points to answer "obvious" questions: * Technically, Ptr can be anything that implements Deref and does not need to take a type parameter at all, so the pedantically correct way to write it is P where P: Deref. That's harder to read, so we usually write Ptr when speaking informally. * Almost every type that (deliberately) does not implement Unpin will need at least a little bit of unsafe boilerplate to deal with various infelicities in the pin API. In the case of Linux, some of this boilerplate is generated with macros in the pin_init crate. * Pinning a struct may or may not have the effect of pinning its fields (pinning may be "structural" or not for each field). It's up to the struct author to decide which behavior is more correct for a given field (depending on exactly what invariants the author wishes the struct as a whole to uphold). [Reply to this comment] Readability Difficulty Posted Jul 19, 2025 3:54 UTC (Sat) by PengZheng (subscriber, #108006) [Link] (3 responses) > data <- new_mutex!((KBox::new([0; PAGE_SIZE], flags::GFP_KERNEL)?, 0)), I found this line extremely difficult to read since human eyes are really not good at matching parenthesis. [Reply to this comment] Readability Difficulty Posted Jul 19, 2025 6:42 UTC (Sat) by burki99 (subscriber, #17149) [ Link] (1 responses) Lisp programmers might disagree :-) [Reply to this comment] Readability Difficulty Posted Jul 19, 2025 8:14 UTC (Sat) by Wol (subscriber, #4433) [Link] Likewise PL/1 :-) Cheers, Wol [Reply to this comment] Readability Difficulty Posted Jul 19, 2025 10:04 UTC (Sat) by DOT (subscriber, #58786) [Link ] Some newlines and indentation make it bulky, but much easier to parse: data <- new_mutex!( ( KBox::new([0; PAGE_SIZE], flags::GFP_KERNEL)?, 0, ) ), A good guideline might be to break out into multiple lines whenever you would get nested brackets of the same type. Brackets of different types seem to be easier to parse on one line. [Reply to this comment] Copyright (c) 2025, Eklektix, Inc. Comments and public postings are copyrighted by their creators. Linux is a registered trademark of Linus Torvalds