Bare Metal Pi

In the winter of 2019 I took CS140E, an experimental offering of an operating systems course that had the distinction of holding labs that focused on implementing elements of an OS on a Raspberry Pi. This ground-up or “bare metal” approach challenged the way I thought. The class was my first exposure to the Pi, as well as to programming the sort of low level directives that build the foundation for an operating system.

The labs, held twice a week, were some of the most intense I’ve had: the time slots were two hours, but we all would stay for four hours or more, working with each other to finish. And for an open-ended final project, I decided to flesh out an implementation of virtual memory on the Raspberry Pi (I was young and green then).

Enabling virtual memory involves a tightly orchestrated collaboration between hardware and software, where speed is the utmost priority. At its core, virtual memory is about translating virtual addresses into physical ones; to this end, the kernel maintains index-like structures called page tables that are used to quickly access physical-to-virtual page mappings. Even if we store these entries in an easily accessible place (and we do, it’s called the MMU), walking page tables repeatedly is intractably slow, so we also cache the results of page table walks. This caching is so critical that the hardware has a specialized cache just for these page table entries: the translation lookaside buffer (TLB).

The entries themselves also store a host of memory properties that can be modified to set permissions. An example of a feature in the ARM architecture is domains, which let you group entries into sixteen collections and set their respective access permissions via two-bit fields in an access control register. This lets you swap memory access at a more granular level without needing to invalidate and flush existing page tables.

* These fault handlers are also technically stored at physical address 0x0; some mapping and permission setting on the kernel's end is required to produce the common segmentation fault for clients of a VM.

Implementing VM on a Pi is difficult because a lot of things can go wrong, and since you’re dealing with things at such a low level, bizarre things can happen. I was initially pretty uncomfortable with coding something that wasn’t inherently visual or gave immediate feedback; testing often involved triggering data faults or aborts and prompting the hardware to hop to a set of fault handlers, which themselves depended on your ASM code placing them at the correct location in memory! Working on this really stretched the limits of my conceptual reasoning, and I had as much diagrams and notes as I had code.

The ARM manual and ASM reference were invaluable; I can’t tell you how many times I read that thing. I’m happy to have gone from a cursory interest in the Pi to learning how to read dense, technical manuals and design the architecture for software that tightly couples with hardware. For something as integral to modern-day computers as VM, this was cool!