Skip to main content

ADR-0012: M13 Containers + Real-Tenant Beta + Commission-Triplet Absorption

Status: Accepted Date: 2026-05-10 Amended: 2026-06-14 — co-located deploy hardening: per-service resource ceilings; LiveKit gated behind a voice compose profile (voice stays deferred to M14); the M13 stack additionally validated as a test environment on the shared Hetzner pilot box (the host running zol-rag) via scripts/deploy-on-pilot.sh. See Amendment (2026-06-14) below. (PR #35)

Context

M13 puts Ratiba in front of real customers at Aria Aura Spa (Adrian's wife's business in Nairobi). Two related design conversations (Beyond-Ratiba spec 2026-05-09 + Back-office integration spec 2026-05-10) decided that M13 should also absorb the commission-scaffolding triplet so the pilot generates real commission data day one.

By the close of M12 (2026-05-06), Ratiba's architecture phase is complete (11 ADRs, 30-page Docusaurus site, 1112 backend tests, 81 frontend Vitest). The remaining gap is operational: running as a production-shape container stack on a laptop with a real WhatsApp Business Account, real bookings, and real customers. Three design constraints drive M13's choices:

  1. Containerisation: The stack must run identically on Adrian's MacBook (dev), Hetzner CX21 (M14 production VPS), and CI. A single docker compose up must yield a working system.
  2. Stable webhook URL: Meta's WhatsApp Cloud API requires a durable HTTPS endpoint. ngrok disconnects every 2h on the free tier; ssh -R is fragile. A compose-native tunnelling solution is required.
  3. Commission data from day one: Aria Aura Spa tracks staff commissions manually today. The pilot is an opportunity to start capturing real commission data before any back-office integration exists, so M14/M17 has real data to validate against when Workpay and PesaPal integrations land.

Decision

Containerisation

  • 3-file Docker Compose split: docker-compose.infra.yml (Postgres, Redis, Keycloak, MinIO), docker-compose.app.yml (backend, worker, frontend), docker-compose.dev.yml (overrides for local development: volume mounts, hot-reload, exposed debug ports). Production and CI use infra + app; local dev uses all three via --file chaining.
  • Multi-stage Dockerfiles: Python 3.13 slim for backend + worker; Node 20 slim for frontend. Builder stage installs deps; runtime stage copies only the built artefacts. Non-root users (uid 1000) in all runtime stages.
  • /healthz checks on backend and frontend containers; Compose depends_on: {condition: service_healthy} for ordered startup.
  • Restart policies: restart: unless-stopped on all app containers; restart: always on infra containers.
  • Worker reuses backend image: Worker container uses the same image as the backend with an overridden command: (python -m app.worker). No separate Dockerfile; one image build per CI run.

Stable Webhook URL

  • Cloudflare Tunnel free tier as a compose service: A cloudflared container runs as a sidecar, pointing at the backend container's internal port. The tunnel provides a stable *.trycloudflare.com subdomain (or a named subdomain if Adrian registers one on the free plan) that survives Docker restarts and laptop sleeps.
  • Replaces ngrok (disconnects every 2h on free tier) and ssh -R (fragile, requires SSH server on the receiving end).
  • The CLOUDFLARE_TUNNEL_TOKEN env var is optional; if absent, cloudflared falls back to an ephemeral trycloudflare.com URL (acceptable for local-only dev, not for the real-tenant pilot).

WhatsApp + Daraja Credentials

  • WhatsApp via spa Twilio long-code per ADR-0008: The spa's Twilio long-code is verified once with the WhatsApp Business Platform via SMS OTP, then claimed by the WABA. Production-shape from day one.
  • Daraja sandbox-only in M13: The STK-push integration uses the Safaricom Daraja sandbox (sandbox.safaricom.co.ke). Real M-Pesa transactions require a Go-Live review; that flip = M14. All other payment flows (PesaPal card, cash) work in production mode.
  • Voice + Instagram + Messenger deferred to M14: These channels have the longest credential chains (LiveKit SIP provisioning, Meta IG review, Meta Messenger page setup). They are not load-bearing for the first beta; deferring them keeps M13 tractable.

Observability

  • Lean observability: docker logs <container> + structured JSON log tailing + daily WhatsApp digest via the M9 admin rail. No Loki, Datadog, or Sentry in M13.
  • Loki/Datadog/Sentry deferred to M14+ if M13 observability proves insufficient at real-tenant scale (5-10 customers, ~50 bookings/week estimate).

Beta Cohort

  • 5-10 trusted-customer beta cohort: Adrian's wife selects the first cohort from her existing client base. Onboarding = a bilingual opt-in WhatsApp message (English + Swahili) explaining the AI assistant, with ACHA / STOP / OUT exit keywords. In-flight bookings at opt-in time are honoured manually.
  • Cohort size capped at 10 for M13 to bound the support surface while the operator (Adrian) is the only support tier.

Commission-Triplet Absorption

M13 absorbs the commission-scaffolding triplet from the Beyond-Ratiba and back-office integration design discussions:

  • Extend staff with employment-attrs columns (Workpay-authoritative-but- Ratiba-cached): kra_pin, nssf_number, shif_number, hire_date, employment_type (enum: permanent / casual / commission_only / apprentice), base_pay_kes_minor. All nullable. kra_pin has a regex CHECK constraint (^[AP][0-9]{9}[A-Z]$). employment_type has an enum CHECK constraint. Migration: 0009_m13_staff_employment_attrs.
  • Extend appointments with rendering-actuals α-option columns: actual_start_at, actual_end_at (TIMESTAMP WITH TIME ZONE, nullable), tip_amount (NUMERIC(10,2), nullable), staff_notes (TEXT, nullable). α option = extend existing table (vs β = separate service_renderings table). v1 default per Beyond-Ratiba spec §5.4; β migration is mechanical if/when needed. Migration: 0010_m13_appointments_rendering_actuals.
  • New commission_rules table: Staff-id XOR service-category (enforced by CHECK (staff_id IS NOT NULL) <> (service_category IS NOT NULL)); rule_type{pct, fixed, tiered}; value JSONB (percentage, minor-unit amount, or tier array); valid_from / valid_to for historical rate changes. Partial index on staff_id WHERE valid_to IS NULL for fast active-rule lookups. Migration: 0011_m13_commission_rules.
  • New commission_period table: Frozen monthly commission per staff. UNIQUE on (staff_id, period_start, period_end, type) for CommissionEngine idempotency (re-running for the same period is a no-op). type{close, amendment}; amendments carry an amendment_of_id FK (enforced by CHECK (type = 'amendment') = (amendment_of_id IS NOT NULL)). Columns: commission_kes_minor, tip_kes_minor, hours_worked, services_rendered_count, frozen_at, vendor_period_id (for future Workpay sync). Migration: 0012_m13_commission_period.
  • CommissionEngine service: app/services/commission_engine.py — reads commission_rules, aggregates completed appointments in the period, writes a commission_period row. Fallback: if actual_end_at is NULL, uses end_at. Deferred to M13 Wave 3 (schema lands now; engine lands after pilot go-live tooling is in place).

Calibration Extension

  • Capture real customer transcripts (with explicit opt-in consent) into tests/eval/calibration/human_labelled.yaml with PII masking via the existing redact.py floor (per ADR-0004 §5.4). No new tooling; the capture process is manual in M13 (copy from Langfuse, mask, commit).

Methodology Codification

  • Methodology v2.1.4 codification deferred to post-M13 single bundled PR upstream in s4u-methodology repo. The ~14 amendment candidates (4 from M9+1 + 5 from M10 + 5 from M11 + 5 from M12) are tracked in MEMORY.md; they land as a single coherent PR after the pilot stabilises.

Consequences

Positive

  • Pilot generates real commission data day one; Workpay/AccountingAdapter integration in M14/M17 has real data to validate against.
  • Container shape ports cleanly to Hetzner CX21 in M14 — no re-architecting, just a VPS deploy target swap.
  • Cloudflare Tunnel compose service is cheaper and more reliable than ngrok for long-running pilots; free-tier subdomain is stable across restarts.
  • CommissionEngine absorption means the pilot operator can see accurate commission totals from week one without manual spreadsheet work.

Negative

  • M13 task count grew from ~12 to ~13 with commission-triplet absorption. Manageable; the schema migrations are straightforward additive work.
  • WABA review (1-7 business days, Safaricom Daraja Go-Live is also slow) is the long pole on M13 timeline. Both are external-dependency blockers that cannot be parallelised away.
  • Lean observability means no fancy dashboards — acceptable at 5-10 customer scale, but the team must be comfortable with docker logs and manual monitoring for M13's duration.
  • Daraja sandbox-only means the STK-push flow shows a prompt but no real debit. This is communicated to the beta cohort in the opt-in message.

Reversibility

  • Rendering-actuals α option (extend appointments) → β option (separate service_renderings table) is a mechanical migration if/when needed (per Beyond-Ratiba spec §5.4). No application-level commitments to the α shape are made in M13 that would make the β migration non-trivial.
  • The commission_period.vendor_period_id column is a nullable forward-compat hook for Workpay sync; it has no application-level semantics in M13.
  • Cloudflare Tunnel can be swapped for a VPS-hosted reverse proxy (Caddy + Let's Encrypt) in M14 with a one-line compose change.

Amendment (2026-06-14) — Co-located deploy hardening

Trigger

M13's original framing (Context constraint 1) targeted two deploy surfaces: Adrian's MacBook for dev and a dedicated Hetzner CX21 for M14 production. A third surface is now wanted before M14: a Ratiba test environment co-located on the shared Hetzner pilot box that already runs zol-rag. This exercises the production-shape container stack on real remote infra (not a laptop) without provisioning the dedicated M14 VPS yet. Co-locating a second project on a live box surfaced two operational gaps this ADR did not cover.

Measured headroom on that box (2026-06-14): 8 vCPU / 31 GB RAM, ~24 GB available, load avg ~0.1; zol-rag's resident footprint ~6.5 GB. Ample — but the box has no swap, and one Ratiba service (LiveKit) uses host networking, which collides at the host-port level with the co-tenant stack.

Decisions

  1. Per-service resource ceilings. Every infra and app service gains deploy.resources.{limits,reservations} (Compose V2 honours these on docker compose up; limits are hard cgroup caps, reservations advisory outside swarm). Default-profile hard caps sum to ~7.5 G (postgres 2G; backend 1.5G; keycloak/worker/frontend 1G; redis/minio 512M). With no swap on the shared box, these caps — not the kernel — are the safety net that stops a runaway container taking down the co-tenant zol-rag stack.

  2. LiveKit gated behind profiles: ["voice"]. Voice is already deferred to M14 (see Decision → WhatsApp + Daraja Credentials), and nothing depends_on livekit, so gating it costs the M13 stack nothing. This mirrors the existing cloudflared tunnel-profile pattern. The default up brings up 7 services and omits livekit — the only host-networked, host-port-binding service — which removes the port-collision surface against zol-rag's livekit entirely (rather than relying on the ranges merely not overlapping). --profile voice restores it; its pins (7890/7891/52000-52050) are already disjoint from zol-rag's (7880/7881/51000-51050). Bringing real voice up on the pilot still requires docker/livekit.yaml node_ip = the host public IP and real keys (it ships loopback dev config) — an M14 task, not unblocked here.

  3. scripts/deploy-on-pilot.sh. Parameterized remote bring-up mirroring zol-rag's build-on-pilot idiom (build the image on the server, never transfer tarballs). Atomic mkdir deploy lock; SSH + remote-checkout + backend/.env pre-flight gates; optional CI-green gate (REQUIRE_CI=1); captures the prior SHA for a rollback hint; health-polls backend /healthz (8010) and the frontend (3010). PILOT_HOST / RATIBA_DIR / BRANCH override the target; DEPLOY_VOICE=1 opts livekit in with a port-collision warning.

Reversibility

  • Purely additive. The laptop path is unchanged — infra + app (+ dev) still runs identically locally, and the resource caps are generous enough not to bind dev. The co-located deploy is env-gated (PILOT_HOST, DEPLOY_VOICE); the dedicated M14 Hetzner CX21 path (Consequences → Positive) is untouched.
  • Reverting is two deletions (the deploy.resources blocks and the livekit profiles: key) plus removing one script — no schema, API, or data commitments are introduced.

References

  • ADR-0008: WhatsApp Cloud API direct (Twilio long-code, WABA provisioning)
  • ADR-0004: Testing strategy + calibration cadence (human_labelled.yaml, redact.py PII masking floor, Cohen's kappa ≥ 0.7 target)
  • ADR-0007: Reaper + STK timeout (daily 3 AM EAT consolidated reaper, payment_callbacks_unrouted dead-letter table)
  • Beyond-Ratiba spec: docs/superpowers/specs/2026-05-09-aria-aura-spa-substrate-design.md
  • Back-office integration spec: docs/superpowers/specs/2026-05-10-ratiba-back-office-integration-design.md
  • M13 design doc: docs/plans/2026-05-10-m13-containers-and-real-pilot-design.md
  • M13 implementation plan: docs/superpowers/plans/2026-05-10-m13-containers-and-real-pilot.md