https://q3k.org/lanai.html
Lanai, the mystery CPU architecture in LLVM.
Disclaimer: I have had access to some confidential information about
some of the matter discussed in this page. However, everything
written here is derived form publicly available sources, and
references to these sources are also provided.
Some of my recent long-term projects revolve around a little known
CPU architecture called 'Lanai'. Unsurprisingly, very few people have
heard of it, and even their Googling skills don't come in handy. This
page is a short summary of what I know, and should serve as a
reference for future questions.
Myricom & the origins of Lanai
Myricom is a hardware company founded in 1994. One of their early
products was a networking interface card family and protocol,
Myrinet. I don't know much about it, other than it did some funky
stuff with wormhole routing.
As part of their network interface card design, they introduced data
plane programmability with the help of a small RISC core they named
LANai. It originally ran at 33MHz, the speed of the PCI bus on which
the cards were operating. These cores were quite well documented on
the Myricom website, seemingly with the end-user programmability
being a selling point of their devices.
It's worth noting that multiple versions of LANai/Lanai have been
released. The last publicly documented version on the old Myricom
website is Lanai3/4. Apart from the documentation, sources for a gcc/
binutils fork exist to this day on Myricom's Github.
At some point, however, Myricom stopped publicly documenting the
programmability of their network cards, but documentation/SDK was
still available on request. Some papers and research websites
actually contain tutorials on how to get running with the newest
versions of the SDK at the time, and even document the differences
between the last documented Lanai3/4 version and newer releases of
the architecture/core.
This closing down of the Lanai core documentation by Myricom didn't
mean they stopped using it in their subsequent cards. The core made
its way into their Ethernet offerings (after Myrinet basically died),
like their 10GbE network cards. You can easily find these 10G cards
on eBay, and they even have the word 'Lanai' written on their main
ASIC package. Even more interestingly, Lanai binaries are shipped
with Linux firmware packages, and can be chucked straight into a
Lanai disassembler (eg. the Myricom binutils fork's objdump).
Technical summary of Lanai3/4
* 32 registers, most of them general purpose, with special
treatment for R0 (all zeroes), R1 (all ones), R2 (the program
counter), R3 (status register), and some registers allocated for
mode/context switching.
* 4-stage RISC-style pipeline: Calculate Address, Fetch, Compute,
Memory
* Delay slot based pipeline hazard resolution
* No multiplication, no division. It's meant to route packets, not
crunch numbers.
* The world's best instruction mnemonic: PUNT, to switch between
user and system contexts.
Here's a sample of Lanai assembly:
000000f8 :
f8: 92 93 ff fc st %fp, [--%sp]
fc: 02 90 00 08 add %sp, 0x8, %fp
100: 22 10 00 08 sub %sp, 0x8, %sp
104: 51 80 00 00 or %r0, 0x0, %r3
108: 04 81 40 01 mov 0x40010000, %r9
10c: 54 a4 08 0c or %r9, 0x80c, %r9
110: 06 01 11 11 mov 0x11110000, %r12
114: 56 30 11 11 or %r12, 0x1111, %r12
118: 96 26 ff f4 st %r12, -12[%r9]
11c: 96 26 ff f8 st %r12, -8[%r9]
120: 86 26 13 f8 ld 5112[%r9], %r12
00000124 <.LBB3_1>:
124: 46 8d 00 00 and %r3, 0xffff, %r13
128: 96 a4 00 00 st %r13, 0[%r9]
12c: 01 8c 00 01 add %r3, 0x1, %r3
130: e0 00 01 24 bt 0x124 <.LBB3_1>
134: 96 24 00 00 st %r12, 0[%r9]
The `add`/`sub`/`or` instruction have their destination on the right
hand side. `st` and `ld` are memory store and load instructions
respectively. Note the lack of 32-bit immediate load (instead a `mov`
and `or` instruction are used in tandem). That `mov` instruction
isn't real, either - it's a pseudo instruction for an `add 0,
0x40010000, %r9`. Also note the branch delay slot at address 134
(this instruction gets executed even if the branch at 130 is taken).
The ISA is quite boring, and in my opinion that's a good thing. It
makes core implementations easy and fast, and it generally feels like
one of the RISC-iest cores I've dealt with. The only truly
interesting thing about it is its' dual-context execution system, but
that unfortunately becomes irrelevant at some point, as we'll see
later.
Google & the Lanai team
In the early 2010s, things weren't going great at Myricom. Due to
financial and leadership difficulties, some of their products got
canceled, and in 2013, core Myricom engineers were bought out by
Google, and they transferred the Lanai intellectual property rights
with them. The company still limps on, seemingly targeting the
network security and fintech markets, and even continuing to market
their networking gear as programmable, but Lanai is nowehere to be
seen in their new designs.
So what has Google done with the Lanai engineers and technology? The
only thing we know is that in 2016 Google implemented and upstreamed
a Lanai target in LLVM, and that it was to be used internally at
Google. What is it used for? Only Google knows, and Google isn't
saying.
The LLVM backend targets Lanai11. This is quite a few numbers higher
than the last publicly documented Lanai3/4, and there's quite a few
differences between them:
1. No more dual-context operation, no more PUNT instruction. The
compiler/programmer can now make use of nearly all registers from
r4 to r31.
2. No more dual-ALU (R-R-R) instructions. This was obviously slow,
and was probably a combinatorial bottleneck in newer
microarchitectural implementations.
3. Slightly different delay slot semantics, pointing at a new
microarchitecture (likely having stepped away from a classic RISC
pipeline into something more modern).
4. New additional instruction format and set of accompanying
instructions: SPLS (special part-word load/store), SLI (special
load immediate), and Special Instruction (containing amongst
others popcount, of course).
Lanai Necromancy
As you can tell by this page, this architecture intrigued me. The
fact that it's an LLVM target shipped with nearly every LLVM
distribution while no-one has access to hardware which runs the
emitted code is just so spicy. Apart from writing this page, I have a
few other Lanai-related projects, and I'd like to introduce them
here:
1. I'm porting Rust to Lanai11. I have a working prototype, which
required submitting some patches to upstream LLVM to deal with IR
emitted by rustc. This has been upstreamed. My rustc patches are
pending on...
2. I'm implementing LLD support for Lanai. Google (in the LLVM
mailing list posts) mentions they use a binutils ld, forked off
from the Myricom binutils fork. I've instead opted to implement
an LLD backend for Lanai, which currently only supports the
simplest relocations. I haven't yet submitted a public LLVM
change request for this, but this is on my shortlist of things to
do. I have to first talk to the LLVM/Google folks on the
maintenance plan for this.
3. I've implemented a simple Lanai11 core in Bluespec, as part of my
qfc monorepo. 3-stage pipeline (merged addr/fetch stages),
in-order. It's my first bit of serious Bluespec code, so it's not
very good. I plan on implementing a better core at some point.
4. I've implemented a small Lanai-based microcontroller, qf105,
which is due to be manufactured in 130nm as part of the OpenMPW5
shuttle. Which is, notably, sponsored by Google :).
If you're interested in following or joining these efforts, hop on to
##q3k on libera.chat.
In addition to my effort piecing together information about Lanai and
making use of it for my own needs, the TrueBit project also used it
as a base for their smart contract system (in which they implemented
a Lanai interpreter in Solidity).
Documentation
Useful resources, in no particular oder:
* Original Lanai3/4 docs from Myricom's website, archived.
* Myrinet tutorial by James Otto, archived.
* A Myrinet Firmware development experience by Marc Herbert.
* Lanai per-generation ISA differences, as shown by GCC
architecture/machine options.
Copyright 2022 Serge Bazanski. This work is licensed under a Creative
Commons Attribution 4.0 International License.
Back to q3k.org.