What this is. Opinion + Experience + Fact (30% opinion · 30% experience · 40% fact). Written in collaboration with AI — I discuss, I do not outsource.
Second in the series — "What Actually Happens When…" — one descent through the stack, from the button you pressed to the silicon at the bottom.
1. Four hundred milliseconds
Last post laid out the tower: every instant moment is a signal falling through layers of engineering, stacked so cleanly you forget they are there. I promised we would start the descents with the one closest to my own work — you press power, and a screen appears.
It feels like a single event. It is not. In the few hundred milliseconds between the button and the first pixel, a chip that woke up with no running software pulls itself up by its own bootstraps, one layer at a time, in a fixed order that has barely changed in forty years. By the time you see anything, a dozen layers have each done their small job and handed off to the next.
I am going to do this embedded-first — a microcontroller driving a small display — because that is where the layers are most exposed and least hidden by an operating system. A laptop does the same dance with more layers and more abstraction over them; I will note where it differs. But the bare-metal version is the honest one. There is nowhere for the magic to hide.
▸ First principle. Boot is not an event. It is an ordered climb, where each layer builds exactly enough of the machine for the next layer to exist.
2. A chip wakes with nothing
Press the button and power reaches the silicon. For a brief moment the chip is useless on purpose — voltage is still rising, the clock is not yet stable — so a small circuit holds the processor in reset until the world is safe to compute in. This is the power-on reset, and it is the most honest moment in the whole machine: nothing is running, nothing is assumed.
When reset releases, the processor does the one thing it is hardwired to do. It reads a fixed address — burned into the silicon, identical every single time — and there it finds two numbers: where to put the top of the stack, and the address of the very first instruction to run. On an Arm Cortex-M, that is literally the first two words of the vector table. The chip does not "decide" to boot. It falls into the only behavior it has.
There is no operating system here. No memory allocator. No printf. No stack until that first word is loaded. The processor is an engine that knows how to fetch the next instruction and nothing else. Everything you think of as "the computer" has to be built, in software, in the next few milliseconds, out of almost nothing.
▸ First principle. A processor at reset has exactly one talent: fetch the next instruction. Every other capability is constructed, in order, by the software that runs first.
3. The first code builds the world
The first instruction does not jump into your program. It jumps into a tiny piece of startup code — the part most engineers never read, written once and inherited forever. Its job is to turn a bare processor into a place where ordinary code can run.
It does a short, strict list. It sets the stack pointer so functions can call each other. It speeds the chip up — at reset the processor runs on a slow internal oscillator, so the startup code programs the clock tree and the PLL to bring the system to full speed. It copies the initial values of your variables from flash into RAM (the .data section) and zeroes the rest (the .bss), because the C language quietly promises that an uninitialized global is zero and a defined one has its value — and someone has to make that promise true. Only then does it call into the C world you actually wrote.
This is the layer almost everyone forgets, because it has been right for decades. But it is doing something profound: it is constructing the runtime — the small set of guarantees every line of higher code leans on without thinking. The stack. Initialized memory. A running clock. Build those wrong and nothing above them can be debugged, because the ground itself is lying.
▸ First principle. Before your code runs, something has to make the runtime your code assumes — the stack, the clock, initialized memory. The language's promises are kept by code you never see.
4. The linker drew the map
None of step 3 works without a map, and the map was drawn before the chip ever powered on — at build time, by the linker.
A microcontroller has different kinds of memory at different fixed addresses: flash that keeps its contents with the power off, RAM that is fast but forgets. Your program is not one blob; it is regions. Code and constants belong in flash. Variables belong in RAM. The vector table has to sit at exactly the address the silicon will look for it. The linker script is the document that decides all of this — it assigns every function, every variable, every byte to a real address, and it is why the startup code knows where .data lives to copy it and where .bss lives to zero it.
Here is the map, in the form every embedded engineer has stared at:
| Region | Lives in | Holds | Survives power-off |
|---|---|---|---|
.isr_vector | flash, fixed start | reset vector + interrupt table | yes |
.text | flash | your code, constants | yes |
.data | RAM (copied from flash) | initialized globals | no — restored at boot |
.bss | RAM (zeroed at boot) | uninitialized globals | no — re-zeroed at boot |
| stack / heap | RAM, opposite ends | calls, local variables, allocations | no |
When a firmware engineer says a bug is "a linker problem," this table is what they mean — something landed at the wrong address, and the careful contract between the build and the boot quietly broke.
▸ First principle. The boot sequence runs on a map drawn at build time. Where every byte lives is decided before power is ever applied.
5. Drivers wake the board, one bus at a time
Now there is a running C program on a fast clock with honest memory — but it is alone on the chip and blind to the board around it. The peripherals are all asleep. Bringing them up is the next layer, and it happens in a strict order because each piece depends on the one before.
Clocks first: most peripherals share gated clocks, and a peripheral with no clock is furniture. Then the pins — the GPIO multiplexer is told which physical pin is a UART, which is an SPI bus, which drives the display's reset line. Then the buses wake in turn: I²C for the little sensors, SPI or a parallel bus for the display, each with its own timing to get right. A driver is just the code that knows the exact ritual a specific chip expects — the register writes, the wait states, the order — so the layers above can say "read the temperature" without knowing any of it.
This is the layer where my career has lived, and where the boot most often goes wrong in ways that are maddening precisely because they are invisible from the top. A display that stays black because one reset line toggled a few milliseconds too early. A sensor that reads garbage because its power rail had not settled when the driver started talking. From the application's point of view, "the screen didn't come up." From three layers down, it is a timing detail in a bring-up sequence — and only someone who knows the sequence exists can find it.
▸ First principle. Drivers are the ritual that turns a bus into a capability. They wake in dependency order — clock, pins, bus, device — because each one is furniture until the one before it is alive.
6. Something takes the wheel
With the board awake, control passes to whatever runs the show. On the smallest systems that is a single loop — read inputs, update state, draw, repeat — and that is a perfectly honest answer. On anything with real concurrency it is an RTOS scheduler, or a structured application layer sitting above the RTOS, deciding which task runs when, handing the display task its turn to render while the sensor task waits on its next reading.
This is the moment the system stops being a script that runs top to bottom and becomes a system — several things making progress, coordinated, with the illusion of doing them at once. Everything below this point existed to make this layer possible: the runtime so tasks have stacks, the clock so the scheduler can measure time, the drivers so a task has something real to talk to.
A laptop's version of this moment is enormously larger — a bootloader hands off to a kernel that brings up memory management, processes, filesystems, and a compositor — but the shape is identical. Lower layers build the machine; then a layer takes the wheel and turns capability into behavior. The microcontroller just lets you see the whole thing without a million lines in the way.
▸ First principle. At some point a layer stops building the machine and starts directing it. That handoff — from bring-up to scheduler — is where a pile of capabilities becomes a system.
7. The first pixel
Only now, with clocks running, the bus alive, the driver initialized, and a task scheduled to do it, does the screen actually light.
Even this last step is a small sequence of its own. The display controller gets its initialization stream — dozens of register writes that set resolution, color format, refresh, gamma. A region of RAM is declared the framebuffer, the place where "what the screen shows" lives as plain numbers. The first frame is drawn into it, and the controller is told to push those bytes out to the panel. The backlight comes on last, so you never see a half-drawn frame. And there it is — a picture, four hundred milliseconds after a button, sitting on top of a tower that built itself from a single hardwired address.
▸ First principle. The pixel is the top of a climb. By the time anything is drawn, every layer below has already done its job and handed off.
8. Why knowing the climb pays off
You can build a great deal of software without ever thinking about any of this, and most of the time you should — that is the entire point of the layers. The boot sequence earned the right to be forgotten by being right almost every time.
But "almost every time" is where this knowledge pays for itself, all at once. The product that boots fine on the bench a thousand times and then fails one in a hundred on the factory line — that is a boot-order or power-sequencing leak, and you cannot find it from the application. The field unit that comes back dead after a brownout, because a sag on one rail at the wrong millisecond left a peripheral half-initialized. The firmware that works until a linker setting changes and the vector table moves. Every one of these is a layer from this post leaking through — and the only person who can climb down to it is the one who knew the layer was there.
That is the case for understanding boot even in an era where you rarely touch it: not nostalgia, but the ability to drop down a rung when the rung leaks. The tower is built so you can forget it. The engineer is the one who remembers it anyway.
▸ First principle. You can ignore the boot sequence right up until it leaks — and then only the person who never fully forgot it can climb back down and fix it.
So before the series climbs back out and crosses the network: how far down your own machine's boot could you actually go before it turned into magic?
Next in the series: you type a web address, and a page arrives from the other side of the planet. Everything that happens in between.
Labeled: Opinion + Experience + Fact (30% opinion · 30% experience · 40% fact)
Sources:
- Arm, Cortex-M reset behavior and vector table — the processor loads the initial stack pointer and reset vector from the first two words of the table at reset.
- Earlier post — From the Transistor to the Token — the tower this descent climbs down
- Earlier post — The Layer Above Your RTOS Has a Name — the layer that takes the wheel in chapter 6
(Written in collaboration with AI — I discuss, I do not outsource.)
New to this labeling? Read the framework → 20+ Years of Ideas. Articulation Is the Craft.
— Ritesh | ritzylab.com
#EmbeddedSystems #Firmware #ComputerArchitecture #FirstPrinciples #BareMetal
Stay in the loop
New essays on embedded systems, firmware quality, and engineering craft. No noise.
Discussion
No comments yet. Be the first to share your thoughts.
Leave a comment