https://lwn.net/SubscriberLink/1026694/3413f4b43c862629/

LWN.net Logo LWN
.net News from the source LWN

  * Content
      + Weekly Edition
      + Archives
      + Search
      + Kernel
      + Security
      + Events calendar
      + Unread comments
      + -------------------------------------------------------------
      + LWN FAQ
      + Write for us

User: [        ] Password: [        ] [Log in]
|
[Subscribe]
|
[Register]
Subscribe / Log in / New account

How to write Rust in the kernel: part 3

[LWN subscriber-only content]

    Welcome to LWN.net

    The following subscription-only content has been made available
    to you by an LWN subscriber. Thousands of subscribers depend on
    LWN for the best news from the Linux and free software
    communities. If you enjoy this article, please consider
    subscribing to LWN. Thank you for visiting LWN.net!

By Daroc Alden
July 18, 2025
---------------------------------------------------------------------
Rust in the kernel

The interfaces between C and Rust in the kernel have grown over time;
any non-trivial Rust driver will use a number of these. Tasks like
allocating memory, dealing with immovable structures, and interacting
with locks are necessary for handling most devices. There are also
many subsystem-specific bindings, but the focus of this third item in
our series on writing Rust in the kernel will be on an overview of
the bindings that all kernel Rust code can be expected to use.

Rust code can call C using the foreign function interface (FFI);
given that, one potential way to integrate Rust into the kernel would
have been to let Rust code call kernel C functions directly. There
are a few problems with that approach, however: __always_inline
functions, non-idiomatic APIs, etc. In particular, C and Rust have
different approaches to freeing memory and locking.

During the early planning phases, the project proposed adopting a
rule that there should be a single, centralized set of Rust bindings
for each subsystem, as explained in the kernel documentation. This
has the disadvantage (compared to direct use of Rust's FFI) of
creating some extra work for a Rust programmer who wishes to call
into a new area of the kernel, but as more bindings are written that
need should go away over time. The advantage of the approach is that
there's a single set of standardized Rust interfaces to learn, with
all of the documentation in one place, which should make building and
understanding the bindings less work overall. The interfaces can also
be reviewed by the Rust maintainers in one place for safety and
quality.

Allocating memory

Like C, Rust puts local variables (including compound structures) on
the stack by default. But most programs will eventually need the
flexibility offered by heap allocation and the limitations on
kernel-stack size mean that even purely local data may require
heap-allocation. In user space, Rust programs use automatic heap
allocations for some types -- mainly Box (a smart pointer into the
heap) and Vec (a growable, heap-allocated array). In the kernel,
these interfaces would not provide nearly enough control. Instead,
allocations are performed using the interfaces in the kernel::alloc
module, which allow for specifying allocation flags and handling the
possibility of failure.

The Rust interfaces support three ways to allocate kernel memory:
Kmalloc, Vmalloc, and KVmalloc, corresponding to the
memory-management API functions with similar names. The first two
allocate physically contiguous memory or virtually contiguous memory,
respectively. KVmalloc first tries to allocate physically contiguous
memory, and then falls back to virtually contiguous memory. No matter
which allocator is used, the pointers that are exposed to Rust are
part of the virtual address space, as in C.

These three different types all implement the Allocator interface,
which is similar to the unstable user-space trait of the same name.
While the allocators can be used to directly create a [u8] (a sized
array of bytes; conceptually similar to how malloc() returns a void *
instead of a specific type), the more ergonomic and less error-prone
use is to allocate Box or Vec structures. Since memory allocation is
so common, the interfaces provide short aliases for boxes and vectors
made with each allocator, such as KBox, KVBox, VVec, etc. Reference
counted allocations can be made with Arc.

The choice of allocator is far from the only thing that kernel
programmers care about when allocating memory, however. Depending on
the context, it may or may not be acceptable to block, to swap, or to
receive memory from a particular zone. When allocating, the flags in
kernel::alloc::flags can be used to specify more details about how
the necessary memory should be obtained:

    let boxed_integer: Result<KBox<u64>, AllocError> = KBox::new(42, GFP_KERNEL);

That example allocates an unsigned 64-bit integer, initialized to 42,
with the usual set of allocation flags (GFP_KERNEL). For a small
allocation like this, that likely means the memory will come from the
kernel's slab allocator, possibly after triggering memory reclamation
or blocking. This particular allocation cannot fail, but a larger one
using the same API could, if there is no suitable memory available,
even after reclamation. Therefore, the KBox::new() function doesn't
return the resulting heap allocation directly. Instead, it returns a
Result that contains either the successful heap allocation, or an
AllocError.

Reading generic types

C doesn't really have an equivalent of Rust's generic types; the
closest might be a macro that can be used to define a structure with
different types substituted in for a field. In this case, the Result
that KBox::new() returns has been given two additional types as
parameters. The first is the data associated with a non-error result,
and the second is the data associated with an error result. Matching
angle brackets in a Rust type always play this role of specifying a
(possibly optional) type to include as a field nested somewhere
inside the structure.

Boxes, as smart pointers, have a few nice properties compared to raw
pointers. A KBox is always initialized -- KBox::new() takes an initial
value, as shown in the example above. Boxes are also automatically
freed when they are no longer referenced, which is almost always what
one wants from a heap allocation. When that isn't the case, the
KBox::leak() or KBox::into_raw() methods can be used to override
Rust's lifetime analysis and let the heap allocation live until the
programmer takes care of it with KBox::from_raw().

Of course, there are also times when a programmer would like to
allocate space on the heap, but not actually fill it with anything
yet. For example, the Rust user-space memory bindings use it to
allocate a buffer for user-space data to be copied into without
initializing it. Rust indicates that a structure may be uninitialized
by wrapping it in MaybeUninit; allocating a Box holding a MaybeUninit
works just fine.

Self-referential structures

The kernel features a number of self-referential structures, such as
doubly linked lists. Sharing these structures with Rust code poses a
problem: moving a value that refers to itself (including indirectly)
could cause the invariants of this kind of structure to be violated.
For example, if a doubly linked list node is moved, node->prev->next
will no longer refer to the right address. In C, programmers are
expected to just not do that.

But Rust tries to localize dangerous operations to areas of the code
marked with unsafe. Moving values around is a common thing to do; it
would be inconvenient if it were considered unsafe. To solve this,
the Rust developers created an idea called "pinning", which is used
to mark structures that cannot be safely relocated. The standard
library is designed in such a way that these structures cannot be
moved by accident. The Rust kernel developers imported the same idea
into the kernel Rust APIs; when referencing a self-referential
structure created in C, it must be wrapped in the Pin type on the
Rust side. (Some other pointers in the kernel API, notably Arc,
include an implicit Pin, so the wrapping may not always be visible).
It might not immediately cause problems if Pin were omitted in the
Rust bindings for a self-referential structure, but it would still be
unsound, since it could let ostensibly safe Rust driver code cause
memory corruption.

To simplify the process of allocating a large structure with multiple
pinned components, the Rust API includes the pin_init!() and
try_pin_init!() macros. Prior to their inclusion in the kernel,
creating a pinned allocation was a multi-step process using unsafe
APIs. The macro works along with the #[pin_data] and #[pin] macros in
a structure's definition to build a custom initializer. These PinInit
initializers represent the process of constructing a pinned
structure. They can be written by hand, but the process is tedious,
so the macros are normally used instead. Language-level support is
the subject of ongoing debate in the Rust community. PinInit
structures can be passed around or reused to build an initializer for
a larger partially-pinned structure, before finally being given to an
allocator to be turned into a real value of the appropriate type. See
below for an example.

Locks

User-space Rust code typically organizes locks by having structures
that wrap the data covered by the lock. The kernel API makes lock
implementations matching that convention available. For example, a
Mutex actually contains the data that it protects, so that it can
ensure all accesses to the data are made with the Mutex locked. Since
C code doesn't tend to work like this, the kernel's existing locking
mechanisms don't translate directly into Rust.

In addition to traditional Rust-style locks, the kernel's Rust APIs
include special types for dealing with locks separated from the data
they protect: LockedBy, and GlobalLockedBy. These use Rust's lifetime
system to enforce that a specific lock is held when the data is
accessed.

Currently, the Rust bindings in kernel::sync support spinlocks,
mutexes, and read-side read-copy-update (RCU) locks. When asked to
look over an early draft of this article, Benno Lossin warned that
the current RCU support is "`very barebones'", but that the Rust
developers plan to expand on it over time. The spinlocks and mutexes
in these bindings require a lockdep class key to create, so all of
the locks used in Rust are automatically covered by the kernel's
internal locking validator. Internally, this involves creating some
self-referential state, so both spinlocks and mutexes must be pinned
in order to be used. In all, defining a lock in Rust ends up looking
like this example lightly adapted from some of the Rust sample code:

    // The `#[pin_data]` macro builds the custom initializer for this type.
    #[pin_data]
    struct Configuration {
        #[pin]
        data: Mutex<(KBox<[u8; PAGE_SIZE]>, usize)>,
    }

    impl Configuration {
        // The value returned can be used to build a larger structure, or it can
        // be allocated on the heap with `KBox::pin_init()`.
        fn new() -> impl PinInit<Self, Error> {
            try_pin_init!(Self {
                // The `new_mutex!()` macro creates a new lockdep class and
                // initializes the mutex with it.
                data <- new_mutex!((KBox::new([0; PAGE_SIZE], flags::GFP_KERNEL)?, 0)),
            })
        }
    }

    // Once created, references to the structure containing the lock can be
    // passed around in the normal way.
    fn show(container: &Configuration, page: &mut [u8; PAGE_SIZE]) -> Result<usize> {
        // Calling the mutex's `lock()` function returns a smart pointer that
        // allows access only so long as the lock is held.
        let guard = container.data.lock();
        let data = guard.0.as_slice();
        let len = guard.1;
        page[0..len].copy_from_slice(&data[0..len]);
        Ok(len)
        // `guard` is automatically dropped at the end of its containing scope,
        // freeing the lock. Trying to return data from inside the lock past the
        // end of the function without copying it would be a compile-time error.
    }

Using a lock defined in C works much like in show() above, except
that there is an additional step to handle the fact that the data may
not be directly contained in the lock structure:

    // The C lock will still be released when guard goes out of scope.
    let guard = c_lock.lock();
    // Data that is marked as `LockedBy` in the Rust/C bindings takes a reference
    // to the guard of the matching lock as evidence that the lock has been acquired.
    let data = some_other_structure.access(&guard);

See the LockedBy examples for a complete demonstration. The interface
is slightly more conceptually complicated than C's mutex_lock() and
mutex_unlock(), but it does have the nice property of producing a
compiler error instead of a run-time error for many kinds of
mistakes. The mutex in this example cannot be double-locked or
double-freed, nor can the data be accessed without the lock held. It
can still be locked from a non-sleepable context or get involved in a
deadlock, however, so some care is still required -- at least until
the custom tooling to track and enforce kernel locking rules at
compile time is complete.

This kind of safer interface is, of course, the ultimate purpose
behind introducing Rust bindings into the kernel -- to make it
possible to write drivers where more errors can be caught at compile
time. No machine-checked set of rules can catch everything, however,
so the next (and likely final) article in this series will focus on
things to look for when reviewing Rust patches.


[Send a free link]


-----------------------------------------
[Log in] to post comments

Purpose of this series?

Posted Jul 18, 2025 17:42 UTC (Fri) by willy (subscriber, #9762) [
Link] (6 responses)

Maybe I misunderstood why you were writing this series, because I was
expecting more along the lines of "if you know how to write Kernel C,
this is how to write Kernel Rust". This article focuses on "This is
how to write Rust bindings to C", which is a much more specialized
thing to want to do.
[Reply to this comment]
Purpose of this series?

Posted Jul 18, 2025 18:13 UTC (Fri) by daroc (editor, #160859) [Link]

Yes, that is the goal of the series. So it's entirely possible that
I've just failed to write something that lives up to that goal.

My _intent_ with this article was to give people the library-level
knowledge about kernel Rust that they would need (to go with the
build-system level and introductory language-level knowledge from the
first two articles). But if it came across as being more about how to
write the Rust bindings then about "these are the bindings you are
almost certainly going to have to use, here are the things that are
different than C", then that's my mistake.
[Reply to this comment]
Purpose of this series?

Posted Jul 18, 2025 18:41 UTC (Fri) by cpitrat (subscriber, #116459)
[Link]

I'm confused, to me the article read a lot like "how to use
bindings", not "how to write bindings". Which part describes writing
bindings?
[Reply to this comment]
Purpose of this series?

Posted Jul 19, 2025 4:18 UTC (Sat) by lambda (subscriber, #40735) [
Link]

I think that a lot of times, in order to understand how something
works, you have to learn a little bit about how it's made. To learn
how to use kernel Rust bindings, you need to learn a little bit about
how and why they are built that way. I would say this gives some good
background on why certain aspects of the Rust bindings are the way
they are, which helps you understand how to use them.
[Reply to this comment]
Purpose of this series?

Posted Jul 19, 2025 6:04 UTC (Sat) by adobriyan (guest, #30858) [Link
] (2 responses)

1) learn what references are: &T, &mut T

Everything revolves around references and destructive move. T&& from
C++ is not a thing.

2) arithmetic evaluates from left to right

This is important because overflow checks are done per individual
operation.

Given that integer overflow panics(!) everything that comes from
userspace must be checked with some combination of checked_*/
overflowing_* stuff.

There is even an example in the kernel:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...

3) variable shadowing is allowed, even encouraged

I'm sure Misra Rust will eventually ban it but normal programmers use
it to their advantage.

4) hiding temporary stuff in blocks is encouraged to minimise
mistakes:

let buf = {
let mut f = File::open(&filename)?;
let mut buf = vec![];
f.read_to_end(&mut buf)?;
buf
};

Things to unlearn from C:
1) variable are declared as you go, no more big declaration blocks in
the beginning

2) top level in unordered, forward declarations aren't a thing
C++ has this for methods inside struct/class scope

3) merging types like kernel does with ERR_PTR is cardinal sin

4) if you wrote lots of mutable variables you're probably doing
something wrong

5) functions can return multiple values,

returning stuff via pointers/references is another cardinal sin
[Reply to this comment]
Purpose of this series?

Posted Jul 19, 2025 6:07 UTC (Sat) by adobriyan (guest, #30858) [Link
]

and of course I forgot deterministic destructor running at the end of
the scope (earlier with std::mem::drop()).

Linux is getting taste of this implicitness (which is not scary at
all if done right) with __attribute__((cleanup))
which is badly done counterfeit version.
[Reply to this comment]
Purpose of this series?

Posted Jul 19, 2025 14:29 UTC (Sat) by iabervon (subscriber, #722) [
Link]

The thing about variable shadowing is that variables often become
invalid as their ownership is given away. This makes it better rather
than worse to reuse their names, especially in the case where the new
variable is a destructive transformation of the old variable; if you
have a typo in the shadowing declaration, and use the name later, C
will use the corrupted value (which is why Misra doesn't like it),
but Rust will give you an error instead. Possibly Misra will ban
shadowing a non-moved value, but I think that still allows the normal
usage of shadowing.
[Reply to this comment]
Pinning continues to be the most difficult aspect of Rust to
understand

Posted Jul 18, 2025 20:02 UTC (Fri) by NYKevin (subscriber, #129325)
[Link]

Pinning is unfortunately rather difficult to follow even in
userspace. I would suggest anyone who's struggling with Pin to read
the Rust userspace documentation (for the std::pin module) to get a
better understanding of how it works, but here's a basic summary:

1. By default, anything can be moved at any time. It is also safe
(but usually bad practice) to reuse an object's memory without
dropping it (in C++ terms: all types are trivially destructible, so
the destructor may not be used to uphold a safety invariant). You can
even do the latter in safe Rust with MaybeUninit::write(). As a
reminder, moving is always equivalent to calling memcpy() and then
vanishing the original in a puff of logic (i.e. setting a flag so
that its drop glue does not run, removing its binding from the scope
so that safe Rust can no longer interact with it, and in most cases
the memory is ultimately deallocated by one means or another), but
the compiler is permitted to optimize the resulting code as it sees
fit.
2. If Ptr<T> is a smart pointer type (like Box) or either of &T or &
mut T, then whenever a Pin<Ptr<T>> exists, rule 1 is suspended for
the pointee (the T instance). The pointee is not allowed to be moved,
and its memory may not be reused until it is properly dropped (the T
is "pinned"). This is considered a safety invariant, and T (or code
that interacts with T) is allowed to invoke UB if it is violated.
Importantly, only the pointee is pinned, so the Ptr<T> instance can
still be freely moved. This rule applies on a per-instance basis -
other instances of T are unaffected and continue to follow rule 1
(unless they have been pinned separately).
3. If T implements the trait Unpin, then pinning it has no effect and
rule 2 is entirely ignored (rule 1 is reinstated for every instance
of T, regardless of whether it is pinned). Because of the orphan
rule, you're only allowed to implement Unpin on a type that you
defined (in the same module as the Unpin implementation), so you
can't go around disabling the safety invariants on foreign code. Most
"simple" types implement Unpin automatically - implementing Unpin is
the usual state of affairs, and can be understood as "this type never
cares if it gets moved around." For example, an i64 in isolation will
not "break" if it gets moved or overwritten, so i64 implements Unpin.
But a struct containing an i64 might have other fields that do care
about their addresses, or the struct as a whole might care about its
address (due to its relationship with some other piece of code), so
the author can decide whether the struct implements Unpin or not. The
default is to auto-implement Unpin iff all field types implement
Unpin, but this may be overridden.
4. Rule 1 is a language-level rule and rules 2 and 3 are (mostly)
library-level rules (except for auto-implementation of Unpin, that
requires a tiny amount of language support). This is the reason that
pinning is so weird - it has to work around the language's implicit
assumption that pinning is Not A Thing. In practice, this consists of
convincing the borrow checker to disallow operations that violate the
pinning invariant, but the double indirection of Pin<Ptr<T>> makes it
rather more convoluted than we might otherwise expect (you can never
allow &mut T to "escape" the Pin, or else std::mem::swap() etc. could
be used to move it). There has been significant discussion of how and
whether to promote (2) and (3) into language-level rules so that
pinning can become less complicated and easier to understand, but
there's still quite a few open questions about exactly how it should
work.

There are a number of other complications described in std::pin's
documentation, but I won't go into them here, because otherwise this
comment would triple in length. If the above rules leave you with
followup questions, I strongly encourage reading that documentation -
it really is quite comprehensive. But here are some simple points to
answer "obvious" questions:

* Technically, Ptr<T> can be anything that implements Deref and does
not need to take a type parameter at all, so the pedantically correct
way to write it is P where P: Deref<Target=T>. That's harder to read,
so we usually write Ptr<T> when speaking informally.
* Almost every type that (deliberately) does not implement Unpin will
need at least a little bit of unsafe boilerplate to deal with various
infelicities in the pin API. In the case of Linux, some of this
boilerplate is generated with macros in the pin_init crate.
* Pinning a struct may or may not have the effect of pinning its
fields (pinning may be "structural" or not for each field). It's up
to the struct author to decide which behavior is more correct for a
given field (depending on exactly what invariants the author wishes
the struct as a whole to uphold).
[Reply to this comment]
Readability Difficulty

Posted Jul 19, 2025 3:54 UTC (Sat) by PengZheng (subscriber, #108006)
[Link] (3 responses)

> data <- new_mutex!((KBox::new([0; PAGE_SIZE], flags::GFP_KERNEL)?,
0)),

I found this line extremely difficult to read since human eyes are
really not good at matching parenthesis.

[Reply to this comment]
Readability Difficulty

Posted Jul 19, 2025 6:42 UTC (Sat) by burki99 (subscriber, #17149) [
Link] (1 responses)

Lisp programmers might disagree :-)
[Reply to this comment]
Readability Difficulty

Posted Jul 19, 2025 8:14 UTC (Sat) by Wol (subscriber, #4433) [Link]

Likewise PL/1 :-)

Cheers,
Wol
[Reply to this comment]
Readability Difficulty

Posted Jul 19, 2025 10:04 UTC (Sat) by DOT (subscriber, #58786) [Link
]

Some newlines and indentation make it bulky, but much easier to
parse:

data <- new_mutex!(
    (
        KBox::new([0; PAGE_SIZE], flags::GFP_KERNEL)?,
        0,
    )
),

A good guideline might be to break out into multiple lines whenever
you would get nested brackets of the same type. Brackets of different
types seem to be easier to parse on one line.
[Reply to this comment]

                  Copyright (c) 2025, Eklektix, Inc.
   Comments and public postings are copyrighted by their creators.
          Linux is a registered trademark of Linus Torvalds