Part IV — The Web·Chapter 15

The Sprint NOC API —
Production Deploy

Fourteen chapters of language, toolchain, hardware, and web framework. Now we ship. This chapter assembles every piece — file layout, environment configuration, Docker multi-stage build, database migrations, health endpoints, Zabbix webhook integration, and a systemd unit — into a production system running on the SprintTZ Kilimanjaro server at 5th Floor Derm Complex.

§ 15.1

Project File Structure

Before a single line compiles, project layout communicates intent. A well-structured Rust service separates concerns at the module boundary — database logic never bleeds into route handlers, authentication never tangled with business rules. Here is the complete file tree for the Sprint NOC API:

sprint-noc-api/
├── Cargo.toml              workspace manifest — all deps declared here
├── Cargo.lock              committed to git — reproducible builds
├── .env.example            template — never commit .env itself
├── Dockerfile              multi-stage: builder → runtime
├── docker-compose.yml      postgres + api + migrations
├── sprint-noc-api.service  systemd unit for bare-metal deploy
│
├── migrations/             sqlx migrate run applies these in order
│   ├── 001_create_alerts.sql
│   └── 002_create_users.sql
│
└── src/
    ├── main.rs             startup: config, db pool, router, listener
    ├── config.rs           typed env-var extraction via envy
    ├── error.rs            ApiError → HTTP response mapping
    ├── state.rs            AppState struct shared across all handlers
    │
    ├── models/
    │   ├── mod.rs
    │   ├── alert.rs        Alert, CreateAlert, UpdateAlert structs
    │   └── user.rs         User, Claims, LoginRequest structs
    │
    ├── db/
    │   ├── mod.rs
    │   ├── alerts.rs       list_alerts, create_alert, resolve_alert
    │   └── users.rs        find_by_email, create_user
    │
    ├── auth/
    │   ├── mod.rs
    │   ├── jwt.rs          encode_token, decode_token
    │   └── extractors.rs   AuthUser, AdminUser FromRequestParts
    │
    ├── routes/
    │   ├── mod.rs          build_router() — composes all routes
    │   ├── alerts.rs       GET/POST /api/v1/alerts, PATCH /:id/resolve
    │   ├── auth.rs         POST /api/v1/auth/login
    │   ├── health.rs       GET /health — liveness + readiness
    │   ├── internal.rs     POST /internal/alerts — Zabbix webhook
    │   └── ws.rs           GET /ws/alerts — WebSocket upgrade
    │
    └── zabbix/
        ├── mod.rs
        └── webhook.rs      ZabbixPayload → AlertEvent translation

§ 15.2

Configuration — Environment Variables

Hard-coded connection strings and secrets are the single most common cause of production incidents. The service reads all configuration from environment variables at startup, failing fast with a clear error if anything is missing. The envy crate deserialises environment variables directly into a typed struct — no std::env::var scattered through the codebase.

.env.example

# Database — PostgreSQL on stz-srv-01
DATABASE_URL=postgres://noc_user:CHANGE_ME@localhost:5432/sprint_noc

# JWT — generate with: openssl rand -hex 64
JWT_SECRET=replace_with_64_hex_chars

# Zabbix internal webhook secret (Zabbix → /internal/alerts)
INTERNAL_TOKEN=replace_with_strong_secret

# Server
BIND_ADDR=0.0.0.0:8080
RUST_LOG=sprint_noc_api=info,tower_http=info

src/config.rs

use serde::Deserialize;

#[derive(Debug, Deserialize, Clone)]
pub struct Config {
    pub database_url:    String,
    pub jwt_secret:      String,
    pub internal_token:  String,
    #[serde(default = "default_bind")]
    pub bind_addr:       String,
}

fn default_bind() -> String { "0.0.0.0:8080".to_string() }

impl Config {
    pub fn from_env() -> Result<Self, envy::Error> {
        envy::from_env::<Config>()
    }
}

§ 15.3

Startup — main.rs

The main.rs file is the service's checklist. It runs in order: parse config, initialise tracing, connect to the database, run pending migrations, build shared state, construct the router, bind the listener, and serve. If any step fails the process exits immediately with a clear error message. On an oil field, you do not guess — you verify at each checkpoint before proceeding.

src/main.rs

use std::sync::Arc;
use sqlx::postgres::PgPoolOptions;
use tokio::net::TcpListener;
use tokio::sync::broadcast;
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};

mod auth;
mod config;
mod db;
mod error;
mod models;
mod routes;
mod state;
mod zabbix;

use state::AppState;
use crate::routes::ws::AlertEvent;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // ── 1. Config ──────────────────────────────────────────────
    dotenvy::dotenv().ok();  // load .env if present (dev only)
    let cfg = config::Config::from_env()
        .expect("Missing required environment variables");

    // ── 2. Tracing ─────────────────────────────────────────────
    tracing_subscriber::registry()
        .with(tracing_subscriber::fmt::layer())
        .with(tracing_subscriber::EnvFilter::from_default_env())
        .init();

    tracing::info!("Sprint NOC API starting — Kilimanjaro node");

    // ── 3. Database ────────────────────────────────────────────
    let pool = PgPoolOptions::new()
        .max_connections(20)
        .connect(&cfg.database_url).await
        .expect("Failed to connect to PostgreSQL");

    // ── 4. Migrations ──────────────────────────────────────────
    sqlx::migrate!("./migrations")
        .run(&pool).await
        .expect("Failed to run database migrations");

    tracing::info!("Database migrations applied");

    // ── 5. Shared state ────────────────────────────────────────
    let (alert_tx, _) = broadcast::channel::<AlertEvent>(256);
    let state = Arc::new(AppState {
        db:        pool,
        alert_tx:  alert_tx.clone(),
        config:    cfg.clone(),
    });

    // ── 6. Router ──────────────────────────────────────────────
    let app = routes::build_router(state);

    // ── 7. Listen ──────────────────────────────────────────────
    let listener = TcpListener::bind(&cfg.bind_addr).await?;
    tracing::info!("Listening on {}", cfg.bind_addr);

    axum::serve(listener, app).await?;
    Ok(())
}

§ 15.4

Health Endpoint — Liveness and Readiness

Any load balancer, Docker health check, or Kubernetes probe needs a fast endpoint that signals service health. We distinguish two states: liveness (the process is running and the event loop is responsive) and readiness (the database pool can serve a query). The health route checks both, returning structured JSON. Isaac's dashboard can poll this endpoint; if it returns anything other than 200, alert the on-call engineer.

src/routes/health.rs

use axum::{extract::State, http::StatusCode, response::Json};
use serde::Serialize;
use std::sync::Arc;
use crate::state::AppState;

#[derive(Serialize)]
pub struct HealthResponse {
    status:   String,
    database: String,
    version:  String,
}

pub async fn health_handler(
    State(state): State<Arc<AppState>>,
) -> (StatusCode, Json<HealthResponse>) {
    // Probe the DB pool with a trivial query
    let db_ok = sqlx::query("SELECT 1")
        .execute(&state.db)
        .await
        .is_ok();

    let (status, db_str) = if db_ok {
        (StatusCode::OK, "ok".to_string())
    } else {
        (StatusCode::SERVICE_UNAVAILABLE, "unavailable".to_string())
    };

    (status, Json(HealthResponse {
        status:   if db_ok { "ok" } else { "degraded" }.to_string(),
        database: db_str,
        version:  env!("CARGO_PKG_VERSION").to_string(),
    }))
}

§ 15.5

Zabbix Webhook Integration

Zabbix triggers a media type action: an HTTP webhook fires at POST /internal/alerts whenever a host goes down or recovers. The endpoint is protected by a static bearer token — a long secret configured in both Zabbix and the service's environment. This is not user authentication (that is JWT); it is service-to-service authentication using a shared secret, simpler and appropriate for a single trusted caller.

src/zabbix/webhook.rs — Zabbix payload shape

use serde::{Deserialize, Serialize};

/// Shape of the JSON Zabbix sends — configure in Media type → Message
#[derive(Debug, Deserialize)]
pub struct ZabbixPayload {
    pub trigger_name:    String,   // e.g. "Host unreachable"
    pub host_name:       String,   // e.g. "stz-sw-kilimanjaro-01"
    pub severity:        String,   // "High", "Disaster", etc.
    pub status:          String,   // "PROBLEM" | "RESOLVED"
    pub event_id:        String,
    pub site:            String,   // custom macro {$SPRINT_SITE}
}

/// Map Zabbix severity → our internal severity
pub fn map_severity(zabbix: &str) -> &'static str {
    match zabbix {
        "Disaster" | "High"   => "critical",
        "Average"             => "warning",
        "Warning" | "Info"   => "info",
        _                     => "info",
    }
}

src/routes/internal.rs — Zabbix webhook handler

use axum::{
    extract::State,
    http::{HeaderMap, StatusCode},
    response::Json,
};
use std::sync::Arc;
use crate::{state::AppState, zabbix::webhook::{ZabbixPayload, map_severity}};
use crate::routes::ws::AlertEvent;
use crate::models::alert::CreateAlert;

pub async fn zabbix_webhook(
    headers: HeaderMap,
    State(state): State<Arc<AppState>>,
    Json(payload): Json<ZabbixPayload>,
) -> StatusCode {
    // ── Authenticate Zabbix ────────────────────────────────
    let token = headers
        .get("Authorization")
        .and_then(|v| v.to_str().ok())
        .and_then(|v| v.strip_prefix("Bearer "))
        .unwrap_or("");

    if token != state.config.internal_token {
        return StatusCode::UNAUTHORIZED;
    }

    // ── Determine event type ───────────────────────────────
    let event_type = if payload.status == "RESOLVED" {
        "alert.resolved"
    } else {
        "alert.created"
    };

    let severity = map_severity(&payload.severity).to_string();

    // ── Persist to database ────────────────────────────────
    let alert = crate::db::alerts::create_alert(
        &state.db,
        CreateAlert {
            site:     payload.site.clone(),
            severity: severity.clone(),
            message:  payload.trigger_name.clone(),
        },
    ).await;

    let alert_id = match alert {
        Ok(a)  => a.id.to_string(),
        Err(_) => "unknown".to_string(),
    };

    // ── Broadcast to NOC screens ───────────────────────────
    let _ = state.alert_tx.send(AlertEvent {
        id:       alert_id,
        site:     payload.site,
        severity,
        message:  payload.trigger_name,
        event:    event_type.to_string(),
    });

    StatusCode::OK
}

Configuring Zabbix

Media type → Webhook setup

In Zabbix UI: Alerts → Media types → Create media type. Type: Webhook. URL: http://stz-srv-01:8080/internal/alerts. Method: POST. Headers: Authorization: Bearer {your_INTERNAL_TOKEN}. Message body: a JSON template with macros like {TRIGGER.NAME}, {HOST.NAME}, {EVENT.SEVERITY}, {$SPRINT_SITE} (a host-level macro you define per host group). Add the media type to an action trigger and it will fire on every problem and recovery.

§ 15.6

The Router — All Routes Assembled

src/routes/mod.rs — complete router

use axum::{
    routing::{get, post, patch},
    Router,
};
use std::sync::Arc;
use tower_http::{
    cors::{CorsLayer, Any},
    compression::CompressionLayer,
    trace::TraceLayer,
};
use crate::state::AppState;

pub mod alerts;
pub mod auth;
pub mod health;
pub mod internal;
pub mod ws;

pub fn build_router(state: Arc<AppState>) -> Router {
    let api = Router::new()
        .route("/alerts",     get(alerts::list_alerts).post(alerts::create_alert))
        .route("/alerts/:id/resolve", patch(alerts::resolve_alert))
        .route("/auth/login", post(auth::login));

    Router::new()
        // Public health check — no auth
        .route("/health",              get(health::health_handler))
        // Internal webhook — token auth only
        .route("/internal/alerts",    post(internal::zabbix_webhook))
        // WebSocket — authenticated via AuthUser extractor inside handler
        .route("/ws/alerts",           get(ws::ws_handler))
        // REST API — JWT auth enforced per-handler via extractors
        .nest("/api/v1", api)
        // Middleware stack applied to everything
        .layer(TraceLayer::new_for_http())
        .layer(CompressionLayer::new())
        .layer(CorsLayer::new().allow_origin(Any))
        .with_state(state)
}

§ 15.7

Docker — Multi-Stage Build

The multi-stage Dockerfile solves a problem specific to compiled languages: the build tools (rustup, 1.4GB of LLVM, the entire crates registry) are needed to compile but must never appear in the production image. Stage one — the builder — installs all tooling and compiles a statically-linked binary. Stage two — the runtime — is a minimal Debian image that receives only the binary and the migrations directory. The final image is under 80MB.

Dockerfile

# ── Stage 1: Builder ───────────────────────────────────────────
FROM rust:1.78-slim-bookworm AS builder

# System deps for sqlx (needs OpenSSL headers for postgres TLS)
RUN apt-get update && apt-get install -y \
    pkg-config libssl-dev \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Cache dependencies before copying source
# Copy manifests first — Docker layer cache reuses this unless Cargo.toml changes
COPY Cargo.toml Cargo.lock ./
RUN mkdir src && echo 'fn main(){}' > src/main.rs
RUN cargo build --release
RUN rm -f target/release/deps/sprint_noc_api*

# Now copy real source and build
COPY src/ src/
COPY migrations/ migrations/
RUN cargo build --release

# ── Stage 2: Runtime ───────────────────────────────────────────
FROM debian:bookworm-slim AS runtime

# Runtime deps only: ca-certificates for TLS to PostgreSQL
RUN apt-get update && apt-get install -y \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*

# Non-root user — never run services as root
RUN useradd -ms /bin/bash sprint
USER sprint
WORKDIR /app

# Copy the compiled binary and migrations from builder
COPY --from=builder /app/target/release/sprint-noc-api ./sprint-noc-api
COPY --from=builder /app/migrations ./migrations

EXPOSE 8080
CMD ["./sprint-noc-api"]

docker-compose.yml

version: '3.9'

services:
  db:
    image: postgres:16-alpine
    restart: unless-stopped
    environment:
      POSTGRES_DB:       sprint_noc
      POSTGRES_USER:     noc_user
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    volumes:
      - pg_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U noc_user"]
      interval: 5s
      retries: 5

  api:
    build: .
    restart: unless-stopped
    depends_on:
      db:
        condition: service_healthy
    ports:
      - "8080:8080"
    environment:
      DATABASE_URL:   postgres://noc_user:${DB_PASSWORD}@db:5432/sprint_noc
      JWT_SECRET:     ${JWT_SECRET}
      INTERNAL_TOKEN: ${INTERNAL_TOKEN}
      RUST_LOG:       sprint_noc_api=info,tower_http=info
    healthcheck:
      test: ["CMD-SHELL", "curl -sf http://localhost:8080/health || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 3

volumes:
  pg_data:

§ 15.8

Systemd — Bare-Metal Deploy

If you are deploying directly on stz-srv-01 without Docker — which is entirely reasonable for a single-server NOC API — systemd is the production process manager. It handles restarts on crash, enforces resource limits, isolates the process from the rest of the system, and logs to journald. The unit file is a specification: start this binary, with these environment variables, restart on failure, never run as root.

sprint-noc-api.service

[Unit]
Description=Sprint NOC API — SprintTZ Kilimanjaro
After=network.target postgresql.service
Requires=postgresql.service

[Service]
Type=simple
User=sprint
Group=sprint
WorkingDirectory=/opt/sprint-noc-api
ExecStart=/opt/sprint-noc-api/sprint-noc-api

# Load secrets from a file not tracked in git
EnvironmentFile=/etc/sprint-noc-api/env

# Restart policy
Restart=on-failure
RestartSec=5s

# Security hardening
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/opt/sprint-noc-api/logs

# Resource limits
LimitNOFILE=65536

# Logging — view with: journalctl -u sprint-noc-api -f
StandardOutput=journal
StandardError=journal
SyslogIdentifier=sprint-noc-api

[Install]
WantedBy=multi-user.target

BARE-METAL DEPLOY SEQUENCE — stz-srv-01
────────────────────────────────────────────────────────────

  # Build on your dev machine (cross-compile for Ubuntu 24.04 x86_64)
  cargo build --release
  scp target/release/sprint-noc-api sprint@stz-srv-01:/opt/sprint-noc-api/

  # First deploy: set up env file and migrate
  ssh sprint@stz-srv-01
  sudo mkdir -p /etc/sprint-noc-api
  sudo nano /etc/sprint-noc-api/env        # paste secrets, chmod 600

  # Run migrations (once, idempotent)
  ./sprint-noc-api --migrate-only          # or use sqlx-cli directly

  # Install and start service
  sudo cp sprint-noc-api.service /etc/systemd/system/
  sudo systemctl daemon-reload
  sudo systemctl enable --now sprint-noc-api

  ● sprint-noc-api.service - Sprint NOC API — SprintTZ Kilimanjaro
       Loaded: loaded (/etc/systemd/system/sprint-noc-api.service)
       Active: active (running) since Fri 2026-05-01 09:31:00 EAT

  # Rolling update
  scp target/release/sprint-noc-api sprint@stz-srv-01:/opt/sprint-noc-api/sprint-noc-api.new
  ssh sprint@stz-srv-01 "mv /opt/sprint-noc-api/sprint-noc-api{.new,} \
    && sudo systemctl restart sprint-noc-api"
  # Zero downtime: old process drains connections before new starts

Deployment checklist for the Kilimanjaro server. The rolling update replaces the binary and restarts — Rust's fast startup (sub-100ms) makes the downtime window negligible for NOC tooling.

§ 15.9

Complete Cargo.toml

Cargo.toml — full dependency manifest

[package]
name    = "sprint-noc-api"
version = "0.1.0"
edition = "2021"

[dependencies]
# Web framework
axum            = { version = "0.7", features = ["ws"] }
tower           = "0.4"
tower-http      = { version = "0.5", features = ["cors", "compression-gzip", "trace"] }

# Async runtime
tokio           = { version = "1", features = ["full"] }

# Database
sqlx            = { version = "0.7", features = ["postgres", "runtime-tokio-rustls", "uuid", "chrono", "macros"] }

# Auth
jsonwebtoken    = "9"
bcrypt          = "0.15"

# Serialisation
serde           = { version = "1", features = ["derive"] }
serde_json      = "1"

# Config
envy            = "0.4"
dotenvy         = "0.15"   # .env loading in dev

# Error handling
anyhow          = "1"
thiserror       = "1"

# IDs and timestamps
uuid            = { version = "1", features = ["v4", "serde"] }
chrono          = { version = "0.4", features = ["serde"] }

# Logging
tracing         = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }

[profile.release]
opt-level = "z"       # size optimisation — smaller binary for scp
lto = "thin"
codegen-units = 1
strip = "symbols"

§ 15.10

Verification — End-to-End Test

Before declaring the system live, walk through the complete alert path manually. This is the equivalent of a pilot's post-start checklist — you confirm every system is functional before you leave the ground.

END-TO-END VERIFICATION SEQUENCE
────────────────────────────────────────────────────────────

## 1. Health
curl http://stz-srv-01:8080/health
{"status":"ok","database":"ok","version":"0.1.0"}

## 2. Login — get a JWT
curl -X POST http://stz-srv-01:8080/api/v1/auth/login \
  -H 'Content-Type: application/json' \
  -d '{"email":"[email protected]","password":"..."}'
{"token":"eyJ0eXAiOiJKV1Q..."}

## 3. Open a WebSocket in another terminal
wscat -c ws://stz-srv-01:8080/ws/alerts
Connected (press CTRL+C to quit)

## 4. Simulate a Zabbix alert
curl -X POST http://stz-srv-01:8080/internal/alerts \
  -H 'Authorization: Bearer your_INTERNAL_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{
    "trigger_name": "Host unreachable: stz-sw-serengeti-01",
    "host_name":    "stz-sw-serengeti-01",
    "severity":     "High",
    "status":       "PROBLEM",
    "event_id":     "99001",
    "site":         "Serengeti"
  }'

## 5. Watch the WebSocket terminal — event arrives within milliseconds
< {"id":"a3f9...","site":"Serengeti","severity":"critical",
   "message":"Host unreachable: stz-sw-serengeti-01",
   "event":"alert.created"}

## 6. Confirm persistence
curl http://stz-srv-01:8080/api/v1/alerts \
  -H 'Authorization: Bearer eyJ0eXAi...'
[{"id":"a3f9...","site":"Serengeti","severity":"critical",...}]

All six steps passing = system is live. ✓

Manual verification sequence. Run these six steps after every deploy. The full round-trip — Zabbix fires, database persists, WebSocket delivers — is the critical path this entire book was building toward.

Exercise 15.1

Rate Limiting the Webhook

A misconfigured Zabbix action can fire thousands of alerts per minute, hammering the database. Add rate limiting to the /internal/alerts route using Tower's ServiceBuilder with a rate_limit layer, capping at 60 requests per minute. When the limit is exceeded, return 429 Too Many Requests with a Retry-After header set to the number of seconds until the window resets. Log every rate-limit rejection with the source IP address.

Exercise 15.2

Alert Acknowledgement

NOC engineers need to acknowledge alerts — marking "I am aware and investigating" without resolving them. Add a new PATCH /api/v1/alerts/:id/acknowledge endpoint. Extend the alerts table with an acknowledged_by column (nullable UUID, foreign key to users), an acknowledged_at timestamp, and an acknowledged boolean. The endpoint should require authentication, record the acknowledging user, and broadcast an alert.acknowledged event to WebSocket clients so the NOC screens can update the alert's visual state immediately.

Exercise 15.3 — Capstone

The Four-Site Dashboard

Build the NOC wall screen that Isaac sees when he walks into the Kilimanjaro operations room. It is a single HTML file served statically by the API (add a ServeDir layer). The screen connects to /ws/alerts on load and maintains a live table showing alerts grouped by site: Kilimanjaro, Serengeti, Drakensberg, Rwenzori. Each site card shows the count of active alerts and the highest severity. When a new alert.created event arrives, the relevant site card flashes and the alert appears. When alert.resolved arrives, the row disappears with a brief fade. On disconnect, show a reconnecting indicator and retry with exponential backoff. No frameworks — plain HTML, CSS, and the WebSocket API are sufficient and produce a faster, more maintainable result.

Epilogue · Rust on Metal

From the Oil Field
to the Operations Room

You started this book without knowing what a borrow checker was. You now own the mental model of memory safety from first principles, you have wired real hardware, you have spoken I2C to a servo controller and quadrature-decoded an encoder shaft, and you have shipped a WebSocket API that can fan a single database write out to forty screens in the time it takes light to cross a fibre strand.

The thread running through every chapter is the same discipline you learned in the field: verify before you proceed, make failure visible, let the type system hold the invariants you cannot afford to check at runtime. Rust's ownership model is not a language quirk — it is that discipline encoded in syntax.

The SprintTZ NOC team in Dar es Salaam now has infrastructure worth understanding. The system you have built runs on real fibre crossing real borders, surfaces real alerts from real Zabbix hosts, and pushes them to real screens in real operations rooms. That is not a tutorial project. That is the craft.

Keep building. Keep reading datasheets. Keep listening to the hardware.

← Chapter 14 WebSockets & Real-Time Push Contents Rust on Metal