[HN Gopher] Writing a "bare metal" operating system for Raspberr...
___________________________________________________________________
Writing a "bare metal" operating system for Raspberry Pi 4
Author : rcarmo
Score : 294 points
Date : 2021-10-06 15:05 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| poetaster wrote:
| For my older pies I found
| https://www.cl.cam.ac.uk/projects/raspberrypi/tutorials/os/
| great. But this is arm assembly territory. I believe subsequent
| generations of pi have had good tutorials. OS, of course, is a
| very large, encompassing term. What is a minimal OS?
| hikerclimber1 wrote:
| Everything is subjective. Especially laws.
| nanis wrote:
| I find the writing style tedious: The author expects a reader who
| does not know about `make` or cross compilers to relate to
| writing ARM64 assembly for the bootloader.
|
| If I am following along this material, then I don't need all the
| digressions with close enough descriptions of the tools. Like, if
| I am reading a home building tutorial, don't explain what a
| hammer is.
| subhro wrote:
| Very refreshing to see this. It is so much fucking easy to grok
| bare metal C compared to the <flavour of the year>-script junk
| that floats around these days.
| kennywinker wrote:
| Ah yes, nice easy to grok code like `curval &= ~(field_mask <<
| shift);` :P
|
| But for real - I've had way more luck grokking embedded rust
| than all of the bare metal C examples i've looked at. C breeds
| dense bittwiddling and code that relies on inscrutable compiler
| behavior. There are easier ways to learn how these systems work
| at a bare-metal level.
| PaulDavisThe1st wrote:
| Would you like to propose or reference a way of doing bit-
| twiddling that is clearer than this?
|
| Also hint: C doesn't breed bit-twiddling, writing software
| that actually interacts directly with hardware does.
| Veserv wrote:
| They are just implementing a generic contiguous bitfield
| clear.
|
| field_mask was probably constructed as ((1 << width) - 1)
| instead of as a manifest constant. So you can just do:
|
| ClearBitField(input, width, shift) { return input & ~(((1
| << width) - 1) << shift) }
|
| Now you just use that everywhere you would clear a
| contiguous bitfield which is a pretty common operation when
| operating on hardware. Now all your bit-twiddling is
| isolated to a single well-defined generically useful
| function instead of repeating it a billion times.
|
| We know this is a generically valuable operation since this
| is basically a C implementation of the ARMv8 bfi
| (b)it(f)ield (i)nsert instruction with a fixed 0 argument
| or in assembly:
|
| BFI X{n}, XZR, #shift, #width
| kennywinker wrote:
| I like this answer too. When opaque code is irreducibly
| opaque, put it in a fn with a well chosen name.
| kennywinker wrote:
| That was just a throwaway example of a pretty write-only
| line of code from the op codebase, but since you asked:
|
| One operation per line. A comment for every operation.
| Shifts that explicitly say if they are wrapping or
| overflowing. Rust uses ! instead of ~ but if I had my way
| it'd be a named function like bitwise_invert().
| // curval &= ~(field_mask << shift); // original line
| // pseudo-rust version let shifted_mask =
| FIELD_MASK.wrapping_shl(shift); // be clear about what kind
| of shift we're doing let invered_mask =
| shifted_mask.bitwise_invert(); // use a fictional invert fn
| to avoid single-char operators. let shifted_val =
| curval & inverted_mask; // new variable instead of mutating
| the existing one
|
| Ideally those comments would say WHY we're doing those ops
| rather than what's notable about them - but i didn't dig
| into the code enough to write explanations.
|
| And then we let the compiler crush that into an efficient
| lil one liner like the author of the original code did
| manually.
| adrian_b wrote:
| When booting a real CPU you might easily have to modify
| from a few tens to a few hundreds of hardware registers,
| by doing to each one or more such bit operations.
|
| If you would choose such a deliberately verbose style,
| especially the splitting in multiple lines is the worst,
| the written code would become really unreadable, as too
| much space would be filled with text that does not
| provide any information, obscuring the important parts.
|
| Normally the name of the register, the mask constant and
| the shift constant have informative names that should
| indicate all that needs to be known about the operation
| done and any other symbols should occupy as less space as
| possible on the line of code.
| kennywinker wrote:
| That's what functions and automatic compiler inlining are
| for. See verserv's answer
| https://news.ycombinator.com/item?id=28776751
| adrian_b wrote:
| No, using functions for such things is worse.
|
| It does not matter if the compiler inlines them,
| encapsulating the bit field operations obfuscates the
| code instead of making it more easily understandable.
|
| It is not possible to make the name of the function to
| provide more information than the triplet register name +
| bit field name (the name of the shift constant) + the
| name of the configuration option (the name of the mask
| constant).
|
| Encapsulating the bit operations into a function just
| makes you write exactly the same thing twice and when you
| are reading the code you must waste extra time to check
| each function definition to see whether it does the right
| thing.
|
| The C code would look just like a table with the names,
| where the operators just provide some delimiters in the
| table that occupy little space.
|
| Replacing the operators with words makes such code less
| readable and concatenating the named constants into
| function names or using them as function arguments brings
| no improvement.
|
| The only possible improvement over explicit bit
| operations is to define the registers as structures with
| bit-field members and use member assignment instead of
| bit string operations.
|
| Unfortunately the number of register definitions for any
| CPU is huge, so most programmers use headers provided by
| the hardware vendor, as it would be too much work to
| rewrite them.
|
| For almost all processors with which I have worked, the
| hardware vendor has preferred to provide names for mask
| constants and shift constants, instead of defining the
| registers as structures, even if the latter would have
| allowed more easy to read code.
| kennywinker wrote:
| > you must waste extra time to check each function
| definition to see whether it does the right thing
|
| I think I see what you're arguing. That this:
| reg1 &= ~(width_mask_1 << shift); reg2 &=
| ~(width_mask_2 << shift); reg3 &= ~(width_mask_3
| << shift); // etc...
|
| is clearer than something like this:
| reg1 = ClearBitField(reg1, 1, shift); reg2 =
| ClearBitField(reg2, 2, shift); reg3 =
| ClearBitField(reg3, 3, shift); // etc...
|
| If that's what you're arguing, I simply don't agree.
| `ClearBitField` is descriptive and readable. It avoids
| creating all those width_mask_n constants, since you
| specify the width as input to the fn. You don't have to
| go digging into `ClearBitField` because you wrote a unit
| test to confirm that it does what it says on the label
| and handles the edge cases.
|
| On top of that, the code inside `ClearBitField` can be as
| verbose or as compact as you desire, because it's
| contained and separated from the rest of the code.
| adrian_b wrote:
| Obviously this is a matter of personal preferences and
| experience.
|
| Real register names are usually very long, to indicate
| their purpose, so you would not want to repeat them on
| each line.
|
| This can be avoided by redefining ClearBitField.
|
| Even so, writing an extra "ClearBitField" on each line
| does not provide any information. It just clutters the
| space.
|
| Anyone working with such code is very aware that &=~
| means clear bits and |= means set bits.
|
| When reading the table of names, the repeated function
| name is just a distraction that is harder to overlook
| than the operators.
|
| The way to improve over that is not adding anything on
| the lines, but using simpler symbols by defining the
| registers as structures, i.e.:
|
| register_1 . bit_field_1 = constant_name_1;
|
| register_2 . bit_field_2 = constant_name_2;
|
| register_3 . bit_field_3 = constant_name_3;
|
| Unfortunately, like I have said, the hardware vendors
| seldom provide header files with structure definitions
| for the registers and rewriting the headers is a huge
| work.
|
| However, if you are able to rewrite just the register
| definitions that you use, that would be better spent time
| than attempting to write functions or macros for these
| tasks.
| fouric wrote:
| I find the first code example easier to read and process.
|
| However, that's because I've written a fair bit of C
| code, and so when my brain goes into "C mode", the
| symbols &, =, ~, <<, etc. all have clear and unambiguous
| meanings - whereas ClearBitField does not. Additionally,
| the pattern ~(foo << bar) is a common C idiom, so beyond
| the individual symbols, my brain recognizes the whole
| pattern so it's "semantically compressed" (easier to
| think about) for me. This would not be the case for a
| beginner.
|
| Which style is better depends on an individual's
| preferences and experiences - there's no "right" answer.
|
| This is a stellar example of one of the many reasons why
| code-as-text is a huge mistake - because structure and
| representation are conflated and coupled together. A
| sanely written programming language represents code as
| _code objects_ , and you can configure those code objects
| to be displayed however you like, whether that's baz &=
| ~(foo << bar) or ClearBitField(baz, 1, bar).
| junon wrote:
| No thanks, I'll take the C version any day.
| kennywinker wrote:
| Sure, the single line is more aesthetically pleasing.
| Compact, clever, concise. But try fixing a bug or adding
| new functionality to that one line. Especially as a
| beginner. This is supposed to be an educational codebase.
| isometimes wrote:
| I've stated in part1 of the tutorial that "This tutorial
| is not intended to teach you how to code in assembly
| language or C".
|
| My goal was to demonstrate some basic principles to get
| code running on bare metal, encourage curiosity, further
| my own knowledge and document my findings.
|
| I appreciate that more self-documenting code might be
| desirable, but to some people (me included) a large
| number of lines can be as off-putting as more esoteric
| syntax. I acknowledge, however, that it is very hard to
| please everyone!
| NobodyNada wrote:
| That's significantly less readable than the C version. I
| still have to know what a "left-shift" and "bitwise
| invert" are, and if I knew that then I wouldn't have a
| problem with `<<` or `~` either. IMO `<<` is even more
| intuitive than `shl` because I can just look at the arrow
| instead of having to think about which way "left" is (and
| I don't even have a tendency to get "left" and "right"
| confused).
|
| All the extra verbosity simply obfuscates the actual
| intent of the code: clear all bits in field_mask (shifted
| to the left by some offset). That's pretty easy to see
| at-a-glance from the C code (some comments could make
| that clearer, but this is simple enough that any
| experienced systems programmer will know what this does
| without comments).
|
| I agree that Rust embedded code is often more readable
| than C, but that's done by creating abstractions to
| manage complexity rather than just by adding more words.
| For instance, one could write a wrapper struct that
| provides a less-tedious interface than a bitfield (like
| `curval.set_field(false)`).
| kennywinker wrote:
| >> All the extra verbosity simply obfuscates the actual
| intent of the code
|
| I have a preference for verbosity in code, and I know
| that many people don't share my preference. That's
| alright - there's no exact right way to write that code.
| But my point was C encourages you to write code that
| relies on knowing secrets about specific hidden behavior
| in your compiler. `shl` isn't more clear than `<<`, but
| `wrapping_shl` and `overflowing_shl` ARE more clear,
| because it makes us explicitly aware of behavior that
| `<<` doesn't surface.
|
| As for clarity, I agree an abstraction would be best. And
| Rust encourages those abstractions where C discourages
| them. I'd still argue that the inside of that abstraction
| should be the verbose version, but other than the
| wrapping_shl that's mostly just a style/preference thing.
| NobodyNada wrote:
| In general, I'd agree with you -- I prefer spelling
| things out explicitly instead of using terse
| abbreviations. However, really common & fundamental math
| operations benefit from some shorthand. For instance, 'y
| = ax + b' is _way_ easier to read than:
| let multiplied = a.wrapping_mul(x); let y =
| multiplied.wrapping_add(b);
|
| The "terse" equation I can instantly recognize as a
| linear function, while I'd have to stare at the more
| verbose version it for a while to figure out what it
| does. In my opinion, bitwise operators work the same way:
| if you're working in a domain where you have to write
| thousands of simple bitwise operations, a bit of
| shorthand can make the code much more expressive.
| sneak wrote:
| I'm surprised this doesn't start with qemu on a Real Computer for
| building/testing.
| mrlonglong wrote:
| I can recommend https://www.giters.com/rust-embedded/rust-
| raspberrypi-OS-tut... for those of you interested in using Rust.
| Be aware it also requires the use of Docker though but I don't
| need Docker and have changed my code not to need it.
| dljsjr wrote:
| What is this site that's re-hosting GitHub repos?
| AQuantized wrote:
| This is perfect for me, I recently made the project OS for Nand
| to Tetris and have been learning systems programming with Rust.
| I wish there was a way to find resources like this more easily
| than scouring HN or trying to sort through google searches.
| chucksmash wrote:
| I also did Nand2Tetris, like Rust, and had an interest in
| more material in this area. I followed v2 of this tutorial[1]
| and enjoyed it enough to become a GitHub sponsor for in-
| progress UEFI work, you might enjoy:
|
| [1]: https://os.phil-opp.com/
| jfoutz wrote:
| For a few glorious years google was amazing at this. the
| difference coming from altavista was unreal.
|
| at this point, I think I'd prefer boolean queries like
| altavista so I can search the word vectors myself. maybe some
| meta info so I can include/exclude based on various tags and
| links.
| ggregoire wrote:
| Is this an alternative UI for GitHub but without the files,
| commits history and so on? Why tho? I'm confused.
|
| Actual GitHub repo for anyone looking for the files:
| https://github.com/rust-embedded/rust-raspberrypi-OS-tutoria...
| superkuh wrote:
| Unlike github or gitlab this page is actually an HTML file
| and does not need javascript executed to define the web
| components "HTML". I appreciate an accessible link, at least.
| I don't know if that's why he linked it.
| mrlonglong wrote:
| Oh, was it on GitHub? I hadn't noticed it was different.
| I'll do better next time.
| Brian_K_White wrote:
| This is a great idea, but I can't square that kind of project
| with WSL and brew instead of actual linux or macports.
|
| If it's meant for the youths and neophites where you don't want
| to scare them with strange not-windows things, perfectly fine,
| but then aarch64 assembly is already out of scope.
|
| If you don't know what's wrong with brew (as an os developer not
| a casual user) then I can't take you seriously as a system
| architect or os developer.
| vagrantJin wrote:
| Neophites?
|
| > _can 't take you seriously as a system architect or os
| developer_
|
| I dont think OS devs and Sys Architects are the intended
| audience. What might be helpful is if those rather busy people
| could chip in with their knowledge to improve said project
| rather than off-handedly dismiss it.
|
| It is afterall being made available freely and some devs who
| aren't low level proframmers might find it a good reason to
| learn something low level as an OS, don't you think so?
| Wouldn't it be nice?
| kennywinker wrote:
| I'm very familiar with many of homebrew's faults, but none of
| them are dealbreakers for the casual "install latest version of
| tool". If you don't like it, just install the same tools using
| macports.
| ac42 wrote:
| And not to forget https://github.com/rsta2/circle
| [deleted]
| throwaway889900 wrote:
| Of all the sections in the tutorial,
| https://github.com/isometimes/rpi4-osdev/tree/master/part10-...
| is probably the best one for anyone to read. I don't think a lot
| of people grasp that all the cores on a system start running
| immediately on power up and they're all running the same code
| from memory initially.
| Unklejoe wrote:
| What happens when they all race to store to the same memory
| location? I guess if they all run in lockstep it doesn't really
| matter?
|
| I've worked with some ARM SoCs from NXP and I could have sworn
| that one core comes up first and the others get released from
| reset later, with a "bringing up secondary CPUs" message
| printed.
| pm215 wrote:
| The code that runs at startup makes sure they don't all write
| to the same location :-) A common simple approach goes:
| * read the CPU main ID register * if core 0, branch to
| primary-core bootup code * otherwise, go into a loop
| (eg "read x from known location for this core, if x is non
| zero branch to x, else keep looping") * core 0 releases
| each secondary from the loop when it is ready -- this is when
| core 0 prints that "bringing up secondary CPUs" message
|
| (There are a bunch of minor variants on this, eg waking
| secondaries by sending them an interrupt so they can sleep
| via wfi insn instead of busy looping, but the basic approach
| is always the same.)
|
| It is also possible to do this in hardware -- you can have an
| SoC with a power controller so secondaries start powered off
| or held in reset, and the primary core prods the power
| controller to start each secondary.
|
| On 64-bit Arm the common standard is that this is all handled
| by the firmware (which implements a standard ABI called
| PSCI), and the OS code just makes SMC calls into the firmware
| for "power on the secondary". (The firmware does something
| like the above under the hood.)
| throwaway889900 wrote:
| This is assuming an asymmetric multiprocessing model. It
| may be that the hardware is set up as such, but symmetric
| multiprocessing is also an option which is what the Pi
| seems to do.
| my123 wrote:
| The code that runs at reset on the Arm CPU complex for the
| RPi: https://github.com/raspberrypi/tools/blob/master/armst
| ubs/ar...
| monocasa wrote:
| They don't all store to the same location.
| SavantIdiot wrote:
| That depends on the architecture. E.g., Intel uses a wired-OR
| circuit and the cores race to determine who booted first, then
| that core becomes the boot core and executes the first
| instruction from boot ROM.
| throwaway889900 wrote:
| I'm assuming a simplistic CPU architecture aimed towards
| beginners, which is generally what a tutorial is aimed at.
| From there you can learn about all the nitty gritty details
| that you need to get actual chip to work.
| not-elite wrote:
| Wow, is it really this [1] easy to run a C routine?
|
| Where does the rpi4 store the firmware necessary to read from the
| sd card where this software is (presumably) stored?
|
| [1]
| https://github.com/isometimes/rpi4-osdev/blob/master/part1-b...
| teraflop wrote:
| Yeah, the bootloader is responsible for the hardware stuff up
| to this point. It doesn't take _that_ much more assembly code
| to bootstrap C in the Linux kernel on x86:
| https://github.com/torvalds/linux/blob/master/arch/x86/boot/...
|
| There are a bunch of other headers in that file, but the
| "start_of_setup:" label is what's invoked by the bootloader,
| and "calll main" transitions to C. So 32 lines of code, by my
| count.
| Teknoman117 wrote:
| There's a bootloader (u-boot) written into flash memory in the
| RPi4 SoC that handles the early initialization of the core and
| finding a kernel to boot. Think of u-boot as the UEFI
| equivalent for your RPi.
|
| Getting into C (or Rust) assuming the presence of some kind of
| system firmware (BIOS, UEFI, u-boot, coreboot, etc.) isn't too
| difficult in the grand scheme of things.
|
| Not to toot my own horn much, but here's an example I did of
| getting into Rust on a 386EX SBC i had hanging around. I
| actually yanked out the BIOS chip and this replaces it. Please
| forgive any poor Rust practices, this was written in a hurry.
|
| https://github.com/teknoman117/ts-3100-images/tree/master/ru...
|
| I discovered a mind-melting bug where replacing the RTC clock
| chip / battery-backed RAM can erase the BIOS. This SBC uses the
| same flash chip for both user storage and the BIOS. The
| partition between the user area and the bios is stored in the
| CMOS ram, so if there is any junk in it, the BIOS might
| misidentify the flash boundaries and erase itself...
|
| So, I wrote this to recover the boards. Bonus points were that
| I only had an 8 KiB EEPROM hanging around so it had to fit in
| 8K initially.
| my123 wrote:
| It isn't u-boot, it's something totally barebones.
| Teknoman117 wrote:
| I forgot the RPi didn't use u-boot, but they use an
| equivalent.
|
| https://github.com/isometimes/rpi4-osdev/tree/master/part2-
| b...
|
| https://raspberrypi.stackexchange.com/questions/10489/how-
| do...
|
| You don't handle the CPU from the reset vector like you
| would in a microcontroller or system firmware environment.
| There is an entire loader stack that finds a boot device to
| read a kernel image from that exists under you.
|
| That's not to take away from this series at all, it's just
| the parent comment was asking about how it was so easy to
| get into a kernel image written in C on an SD card without
| any apparent SD card or FS logic.
| cesarb wrote:
| > Where does the rpi4 store the firmware necessary to read from
| the sd card where this software is (presumably) stored?
|
| The main chip of the RPi4 has a small amount of code in a
| built-in ROM which runs on boot. In the normal boot flow, that
| code loads the bootloader from the EEPROM chip, but it can also
| read a recovery image from the SD card. See
| https://www.raspberrypi.com/documentation/computers/raspberr...
| for details (or
| https://www.raspberrypi.com/documentation/computers/raspberr...
| for how it was on the older RPi devices).
| dragontamer wrote:
| Yeah, C is really easy to interface with assembly language.
|
| The hard part is the linker-script to get this working right
| :-)
| https://github.com/isometimes/rpi4-osdev/blob/master/part1-b...
| [deleted]
| sigjuice wrote:
| _Yeah, C is really easy to interface with assembly language._
|
| Doesn't necessarily have to involve assembly!
|
| https://duckduckgo.com/?q=c+interpreter
| jandrese wrote:
| The main core gets bootstrapped by the opaque binary blob in
| the graphics subsystem.
| rkagerer wrote:
| _The Bluetooth modem is a Broadcom chip (BCM43455), and it needs
| to be loaded with proprietary software before it 's useful to
| us._
|
| Are there any efforts out there to create open-source firmware
| for this chip?
| isometimes wrote:
| I looked for a long time, but to no avail. Broadcom are
| notoriously tight-lipped when it comes to their intellectual
| property.
|
| Some have attempted to reverse-engineer... This was a good
| read: https://blog.quarkslab.com/reverse-engineering-broadcom-
| wire...
| erdo wrote:
| Might sound like a strange comment, but I really like the tone of
| the readme. It's very welcoming and clear, you don't often find
| that level of professionalism in readme docs - it makes me think
| the author is probably quite unusual (in a good way)
| isometimes wrote:
| I'm going to take that as a compliment, thank you ;-)
| isometimes wrote:
| Quick thanks to the OP for linking my project:
| https://www.rpi4os.com &
| https://github.com/isometimes/rpi4-osdev. I'm humbled to be
| mentioned here. So great to read the feedback! Feel free to get
| in touch.
| rcarmo wrote:
| No problem. I've been looking into bare metal runtimes for ARM
| chips, and when I chanced upon this thought it was too much of
| a gem not to be posted here. :)
| 908B64B197 wrote:
| CS107E[0] might be an interesting course for those interested in
| bare metal programming.
|
| Enrollment is over for it as of now, so better luck next
| semester!
|
| [0] http://web.stanford.edu/class/cs107e/
| hikerclimber1 wrote:
| You should only invest in the country you live in since you know
| what goes on politically.
| sylwester wrote:
| On the Pi Zero and Pi CM (maybe also others) you don't even need
| an SD card to boot it. You can boot it via rpi-boot
| https://github.com/raspberrypi/usbboot so no need for qemu for
| testing. You can just test it on real hardware in no time.
| junon wrote:
| Without qemu you have to implement serial interfaces early in
| order to debug your own OS. Qemu is a huge benefit early on,
| and provides a sane environment for deterministic development
| as opposed to potentially quirky hardware.
|
| Usually hobbyist OS projects don't start directly on hardware.
| Many of them never reach the point of running on hardware.
| toast0 wrote:
| I don't know about qemu for a PI/arm in general, but you do
| have to be careful on x86, because segmentation limits aren't
| checked by default, and you can be lulled into doing things
| that don't work on hardware (I messed up secondary processor
| starting), and if you don't frequently test on hardware, it's
| easy to forget things. (See also homebrew games that only
| work on emulators)
|
| But, you can certainly wait to start on hardware until you've
| got it started on qemu. Debuggability is a lot better unless
| you've got specialized equipment for your hardware setup
| pm215 wrote:
| On Arm a couple of only-on-hardware pitfalls are (a) cache
| maintenance -- QEMU doesn't model caches so won't notice if
| you forget to clean the dcache and flush the icache before
| executing code you just wrote or modified and (b)
| synchronization/barriers -- QEMU doesn't reorder memory
| accesses and generally makes writes to system registers
| take effect immediately, so won't notice if you forget
| necessary barrier insns.
| LeoPanthera wrote:
| The Pi 3 and 4 can also boot from TFTP with an NFS root. This
| also allows you to switch the OS your Pi is booting just by
| renaming a symlink on your server. All my home Pis boot that
| way. As a bonus, you never need to worry about an SD card going
| bad.
___________________________________________________________________
(page generated 2021-10-06 23:00 UTC)