Part III — Hardware·Chapter 6

The Embassy
Runtime

Embassy is not just a HAL — it is a complete async runtime for bare-metal ARM. Understanding its executor model, task arena, peripheral singleton system, and memory layout is the foundation for writing correct concurrent firmware on the RP2350.
§ 6.1
The RP2350 Boot Sequence — Before Your Code Runs

Understanding what happens before your main() runs demystifies crashes during initialisation and helps you understand why Embassy's setup code looks the way it does.

RP2350 BOOT SEQUENCE
─────────────────────────────────────────────────────
1. ROM Boot       Built-in mask ROM executes immediately on power.
                   Validates and loads the 256-byte boot2 stage.

2. boot2          Tiny assembler stub that configures the flash interface
                   (RP2350 runs XIP — eXecute In Place from flash).
                   Embassy provides the correct boot2 for W25Q16.

3. cortex-m-rt    Runtime library: zeroes .bss, copies .data to RAM,
                   initialises the vector table, sets up the stack pointer.
                   Then calls your Rust main().

4. embassy_rp::init()  Configures clocks (125MHz default), DMA, interrupt
                        controller, and returns the Peripherals singleton.
                        This is the first line of every Embassy main().

5. Your async fn main()  Gets the Spawner — the handle to spawn tasks.
                          The Embassy executor starts here and never returns.
Figure 6.1 — Five stages before your application logic executes.
§ 6.2
The Embassy Executor — How Cooperative Scheduling Works

Embassy's executor is a run-to-completion cooperative scheduler. It does not preempt tasks. A task runs until it yields (via .await). When it yields, the executor checks the run queue for other ready tasks, runs them, then — if the queue is empty — executes WFI (Wait For Interrupt) to sleep the CPU. A hardware interrupt fires, potentially waking one or more tasks via their registered wakers, and the executor loop continues.

EMBASSY EXECUTOR LOOP
───────────────────────────────────────────────────────────

static RUN_QUEUE: [Task;N]   ← statically allocated in task arena

loop:
    while run_queue.not_empty():
        task = run_queue.dequeue()
        match task.poll():
            Ready   → mark task done, free arena slot
            Pending → task registered waker with hardware
                       task waits until waker fires

    // Queue empty — sleep until hardware event
    cortex_m::asm::wfi()   ← CPU clock gates, ~µA idle current

    // Timer ISR fires → waker.wake() → task moves to run_queue
    // GPIO ISR fires  → waker.wake() → task moves to run_queue
    // I2C ISR fires   → waker.wake() → task moves to run_queue
    // ... repeat ...
Figure 6.2 — Embassy's cooperative scheduler. No preemption. No OS. No context switch overhead. Tasks yield voluntarily via .await.
Cooperative vs Preemptive

Embassy is cooperative — tasks must yield voluntarily.

In a preemptive RTOS (FreeRTOS, Zephyr), the scheduler can interrupt a task at any point and switch to another. This requires saving and restoring the full CPU register state at every switch — the context switch. In Embassy's cooperative model, tasks switch only at .await points. The "context switch" is just the Rust state machine storing its variables in the enum and returning Poll::Pending. There is no register save. There is no stack push. The "switch" costs a few nanoseconds.

The trade-off: a task that never awaits starves all other tasks. This is why blocking in async is a critical error in Embassy. Every significant operation must be .awaited. Embassy's API is designed to make this natural — all I/O is async, all delays are async, all inter-task communication is async.

§ 6.3
Spawning Tasks — The Static Constraint and Why It Exists
tasks.rs — spawning multiple concurrent tasks
#[embassy_executor::task]
async fn display_task(
    clk: Output<'static>,
    dio: Output<'static>,
    // arguments must be 'static — they live for the program's lifetime
) {
    let mut display = Tm1637::new(clk, dio);
    loop {
        display.show_number(1234);
        Timer::after_millis(100).await;
    }
}

#[embassy_executor::task]
async fn sensor_task() {
    loop {
        // read DHT11, send to shared state
        Timer::after_secs(2).await;
    }
}

#[embassy_executor::main]
async fn main(spawner: Spawner) {
    let p = embassy_rp::init(Default::default());

    // Spawn concurrent tasks — they run interleaved at await points
    spawner.spawn(display_task(
        Output::new(p.PIN_2, Level::Low),  // GPIO2 = CLK
        Output::new(p.PIN_3, Level::Low),  // GPIO3 = DIO
    )).unwrap();

    spawner.spawn(sensor_task()).unwrap();

    // main() itself is a task — can continue doing work
    loop {
        Timer::after_secs(60).await;
        defmt::info!("heartbeat — system running");
    }
}

// Why 'static? Because Embassy tasks are stored in a static array.
// The arena outlives every function call — it lives for the program.
// A task that holds a reference to a local variable would dangle.
// 'static guarantees the data lives as long as the task does.

// Why unwrap()? Embassy's task arena is fixed-size. If you spawn
// more tasks than the arena has space for, spawn() returns Err.
// unwrap() panics — a good thing to see early in development.
// In production, size the arena to fit all your tasks.
§ 6.4
The Peripheral Singleton — Hardware Ownership at Compile Time

Embassy wraps every RP2350 peripheral in a singleton type. embassy_rp::init() returns a Peripherals struct with one field per peripheral. Each field can only be moved out once — into the task or driver that owns it. If you try to use PIN_2 in two places, the compiler refuses: PIN_2 was moved in the first use, it cannot be used again.

This is hardware ownership at compile time. No runtime peripheral manager. No mutex protecting a shared peripheral handle. The compiler guarantees that exactly one piece of code drives each GPIO pin, each I2C bus, each PWM slice. The firmware cannot be written in a way that two drivers simultaneously drive the same pin — the type system prevents it.

§ 6.5
Memory Layout — Where Everything Lives on the RP2350
RP2350 MEMORY MAP (520KB SRAM)
─────────────────────────────────────────────────────────

0x20000000  ┌─────────────────────────────┐  ← RAM start
            │  .data section              │  initialized statics
            │  (copied from flash by rt)  │  ~few KB
            ├─────────────────────────────┤
            │  .bss section               │  zero-initialized statics
            │  (zeroed by runtime)        │  ~few KB
            ├─────────────────────────────┤
            │  Embassy task arena         │  task state machines
            │  (EMBASSY_EXECUTOR_...)     │  size depends on tasks
            │                             │  default ~4KB
            ├─────────────────────────────┤
            │  Your heap (optional)       │  if using alloc crate
            │  (off by default)           │
            ├─────────────────────────────┤
            │  ↑ Stack grows downward     │
            │  ...                        │
            │  Stack                      │  Cortex-M33 main stack
0x200827FF  └─────────────────────────────┘  ← RAM end (520KB)

flip-link moves the stack to the BOTTOM of RAM (before .data/.bss).
If the stack overflows into .data, flip-link's sentinel triggers
a hard fault immediately — instead of silently corrupting data.
This turns mysterious data corruption bugs into immediate crashes.
Always use flip-link in development.
Figure 6.3 — RP2350 RAM layout with flip-link's stack placement.
§ 6.6
Exercises
Exercise 6.1 — Three Tasks, One Pico

Run concurrent tasks, verify interleaving

Write three tasks: a heartbeat task (logs "tick" every second), a counter task (increments a shared counter every 100ms), and a display task (reads the counter every 250ms and shows it on the TM1637). Use an embassy_sync::mutex::Mutex to protect the counter. Observe that the TM1637 display shows approximately 2-3 increments between updates. Verify you can see all three tasks' log output interleaved in the terminal.

Exercise 6.2 — Measure Task Arena Usage

Find the size of your task state machines

Add this to your main: defmt::info!("arena: {} bytes", core::mem::size_of::<YourTaskFn>()) where YourTaskFn is the return type of your async task function. You can also check the linker map file after building: cargo build --release 2>&1 | grep -i embassy or examine the .elf with probe-rs. How much arena space does each task take? Does a task with a large local buffer take more arena space than a simple task?