ADR-0012: M13 Containers + Real-Tenant Beta + Commission-Triplet Absorption
Status: Accepted
Date: 2026-05-10
Amended: 2026-06-14 — co-located deploy hardening: per-service resource ceilings; LiveKit gated behind a voice compose profile (voice stays deferred to M14); the M13 stack additionally validated as a test environment on the shared Hetzner pilot box (the host running zol-rag) via scripts/deploy-on-pilot.sh. See Amendment (2026-06-14) below. (PR #35)
Context
M13 puts Ratiba in front of real customers at Aria Aura Spa (Adrian's wife's business in Nairobi). Two related design conversations (Beyond-Ratiba spec 2026-05-09 + Back-office integration spec 2026-05-10) decided that M13 should also absorb the commission-scaffolding triplet so the pilot generates real commission data day one.
By the close of M12 (2026-05-06), Ratiba's architecture phase is complete (11 ADRs, 30-page Docusaurus site, 1112 backend tests, 81 frontend Vitest). The remaining gap is operational: running as a production-shape container stack on a laptop with a real WhatsApp Business Account, real bookings, and real customers. Three design constraints drive M13's choices:
- Containerisation: The stack must run identically on Adrian's MacBook
(dev), Hetzner CX21 (M14 production VPS), and CI. A single
docker compose upmust yield a working system. - Stable webhook URL: Meta's WhatsApp Cloud API requires a durable HTTPS
endpoint.
ngrokdisconnects every 2h on the free tier;ssh -Ris fragile. A compose-native tunnelling solution is required. - Commission data from day one: Aria Aura Spa tracks staff commissions manually today. The pilot is an opportunity to start capturing real commission data before any back-office integration exists, so M14/M17 has real data to validate against when Workpay and PesaPal integrations land.
Decision
Containerisation
- 3-file Docker Compose split:
docker-compose.infra.yml(Postgres, Redis, Keycloak, MinIO),docker-compose.app.yml(backend, worker, frontend),docker-compose.dev.yml(overrides for local development: volume mounts, hot-reload, exposed debug ports). Production and CI useinfra + app; local dev uses all three via--filechaining. - Multi-stage Dockerfiles: Python 3.13 slim for backend + worker; Node 20 slim for frontend. Builder stage installs deps; runtime stage copies only the built artefacts. Non-root users (uid 1000) in all runtime stages.
/healthzchecks on backend and frontend containers; Composedepends_on: {condition: service_healthy}for ordered startup.- Restart policies:
restart: unless-stoppedon all app containers;restart: alwayson infra containers. - Worker reuses backend image: Worker container uses the same image as
the backend with an overridden
command:(python -m app.worker). No separate Dockerfile; one image build per CI run.
Stable Webhook URL
- Cloudflare Tunnel free tier as a compose service: A
cloudflaredcontainer runs as a sidecar, pointing at the backend container's internal port. The tunnel provides a stable*.trycloudflare.comsubdomain (or a named subdomain if Adrian registers one on the free plan) that survives Docker restarts and laptop sleeps. - Replaces
ngrok(disconnects every 2h on free tier) andssh -R(fragile, requires SSH server on the receiving end). - The
CLOUDFLARE_TUNNEL_TOKENenv var is optional; if absent,cloudflaredfalls back to an ephemeraltrycloudflare.comURL (acceptable for local-only dev, not for the real-tenant pilot).
WhatsApp + Daraja Credentials
- WhatsApp via spa Twilio long-code per ADR-0008: The spa's Twilio long-code is verified once with the WhatsApp Business Platform via SMS OTP, then claimed by the WABA. Production-shape from day one.
- Daraja sandbox-only in M13: The STK-push integration uses the Safaricom
Daraja sandbox (
sandbox.safaricom.co.ke). Real M-Pesa transactions require a Go-Live review; that flip = M14. All other payment flows (PesaPal card, cash) work in production mode. - Voice + Instagram + Messenger deferred to M14: These channels have the longest credential chains (LiveKit SIP provisioning, Meta IG review, Meta Messenger page setup). They are not load-bearing for the first beta; deferring them keeps M13 tractable.
Observability
- Lean observability:
docker logs <container>+ structured JSON log tailing + daily WhatsApp digest via the M9 admin rail. No Loki, Datadog, or Sentry in M13. - Loki/Datadog/Sentry deferred to M14+ if M13 observability proves insufficient at real-tenant scale (5-10 customers, ~50 bookings/week estimate).
Beta Cohort
- 5-10 trusted-customer beta cohort: Adrian's wife selects the first
cohort from her existing client base. Onboarding = a bilingual opt-in
WhatsApp message (English + Swahili) explaining the AI assistant, with
ACHA/STOP/OUTexit keywords. In-flight bookings at opt-in time are honoured manually. - Cohort size capped at 10 for M13 to bound the support surface while the operator (Adrian) is the only support tier.
Commission-Triplet Absorption
M13 absorbs the commission-scaffolding triplet from the Beyond-Ratiba and back-office integration design discussions:
- Extend
staffwith employment-attrs columns (Workpay-authoritative-but- Ratiba-cached):kra_pin,nssf_number,shif_number,hire_date,employment_type(enum:permanent/casual/commission_only/apprentice),base_pay_kes_minor. All nullable.kra_pinhas a regex CHECK constraint (^[AP][0-9]{9}[A-Z]$).employment_typehas an enum CHECK constraint. Migration:0009_m13_staff_employment_attrs. - Extend
appointmentswith rendering-actuals α-option columns:actual_start_at,actual_end_at(TIMESTAMP WITH TIME ZONE, nullable),tip_amount(NUMERIC(10,2), nullable),staff_notes(TEXT, nullable). α option = extend existing table (vs β = separateservice_renderingstable). v1 default per Beyond-Ratiba spec §5.4; β migration is mechanical if/when needed. Migration:0010_m13_appointments_rendering_actuals. - New
commission_rulestable: Staff-id XOR service-category (enforced by CHECK(staff_id IS NOT NULL) <> (service_category IS NOT NULL));rule_type∈{pct, fixed, tiered};valueJSONB (percentage, minor-unit amount, or tier array);valid_from/valid_tofor historical rate changes. Partial index onstaff_id WHERE valid_to IS NULLfor fast active-rule lookups. Migration:0011_m13_commission_rules. - New
commission_periodtable: Frozen monthly commission per staff. UNIQUE on(staff_id, period_start, period_end, type)for CommissionEngine idempotency (re-running for the same period is a no-op).type∈{close, amendment}; amendments carry anamendment_of_idFK (enforced by CHECK(type = 'amendment') = (amendment_of_id IS NOT NULL)). Columns:commission_kes_minor,tip_kes_minor,hours_worked,services_rendered_count,frozen_at,vendor_period_id(for future Workpay sync). Migration:0012_m13_commission_period. - CommissionEngine service:
app/services/commission_engine.py— readscommission_rules, aggregates completedappointmentsin the period, writes acommission_periodrow. Fallback: ifactual_end_atis NULL, usesend_at. Deferred to M13 Wave 3 (schema lands now; engine lands after pilot go-live tooling is in place).
Calibration Extension
- Capture real customer transcripts (with explicit opt-in consent) into
tests/eval/calibration/human_labelled.yamlwith PII masking via the existingredact.pyfloor (per ADR-0004 §5.4). No new tooling; the capture process is manual in M13 (copy from Langfuse, mask, commit).
Methodology Codification
- Methodology v2.1.4 codification deferred to post-M13 single bundled PR
upstream in
s4u-methodologyrepo. The ~14 amendment candidates (4 from M9+1 + 5 from M10 + 5 from M11 + 5 from M12) are tracked in MEMORY.md; they land as a single coherent PR after the pilot stabilises.
Consequences
Positive
- Pilot generates real commission data day one; Workpay/AccountingAdapter integration in M14/M17 has real data to validate against.
- Container shape ports cleanly to Hetzner CX21 in M14 — no re-architecting, just a VPS deploy target swap.
- Cloudflare Tunnel compose service is cheaper and more reliable than ngrok for long-running pilots; free-tier subdomain is stable across restarts.
- CommissionEngine absorption means the pilot operator can see accurate commission totals from week one without manual spreadsheet work.
Negative
- M13 task count grew from ~12 to ~13 with commission-triplet absorption. Manageable; the schema migrations are straightforward additive work.
- WABA review (1-7 business days, Safaricom Daraja Go-Live is also slow) is the long pole on M13 timeline. Both are external-dependency blockers that cannot be parallelised away.
- Lean observability means no fancy dashboards — acceptable at 5-10 customer
scale, but the team must be comfortable with
docker logsand manual monitoring for M13's duration. - Daraja sandbox-only means the STK-push flow shows a prompt but no real debit. This is communicated to the beta cohort in the opt-in message.
Reversibility
- Rendering-actuals α option (extend
appointments) → β option (separateservice_renderingstable) is a mechanical migration if/when needed (per Beyond-Ratiba spec §5.4). No application-level commitments to the α shape are made in M13 that would make the β migration non-trivial. - The
commission_period.vendor_period_idcolumn is a nullable forward-compat hook for Workpay sync; it has no application-level semantics in M13. - Cloudflare Tunnel can be swapped for a VPS-hosted reverse proxy (Caddy + Let's Encrypt) in M14 with a one-line compose change.
Amendment (2026-06-14) — Co-located deploy hardening
Trigger
M13's original framing (Context constraint 1) targeted two deploy surfaces: Adrian's MacBook for dev and a dedicated Hetzner CX21 for M14 production. A third surface is now wanted before M14: a Ratiba test environment co-located on the shared Hetzner pilot box that already runs zol-rag. This exercises the production-shape container stack on real remote infra (not a laptop) without provisioning the dedicated M14 VPS yet. Co-locating a second project on a live box surfaced two operational gaps this ADR did not cover.
Measured headroom on that box (2026-06-14): 8 vCPU / 31 GB RAM, ~24 GB available, load avg ~0.1; zol-rag's resident footprint ~6.5 GB. Ample — but the box has no swap, and one Ratiba service (LiveKit) uses host networking, which collides at the host-port level with the co-tenant stack.
Decisions
-
Per-service resource ceilings. Every infra and app service gains
deploy.resources.{limits,reservations}(Compose V2 honours these ondocker compose up; limits are hard cgroup caps, reservations advisory outside swarm). Default-profile hard caps sum to ~7.5 G (postgres 2G; backend 1.5G; keycloak/worker/frontend 1G; redis/minio 512M). With no swap on the shared box, these caps — not the kernel — are the safety net that stops a runaway container taking down the co-tenant zol-rag stack. -
LiveKit gated behind
profiles: ["voice"]. Voice is already deferred to M14 (see Decision → WhatsApp + Daraja Credentials), and nothingdepends_onlivekit, so gating it costs the M13 stack nothing. This mirrors the existingcloudflaredtunnel-profile pattern. The defaultupbrings up 7 services and omits livekit — the only host-networked, host-port-binding service — which removes the port-collision surface against zol-rag's livekit entirely (rather than relying on the ranges merely not overlapping).--profile voicerestores it; its pins (7890/7891/52000-52050) are already disjoint from zol-rag's (7880/7881/51000-51050). Bringing real voice up on the pilot still requiresdocker/livekit.yamlnode_ip= the host public IP and real keys (it ships loopback dev config) — an M14 task, not unblocked here. -
scripts/deploy-on-pilot.sh. Parameterized remote bring-up mirroring zol-rag's build-on-pilot idiom (build the image on the server, never transfer tarballs). Atomicmkdirdeploy lock; SSH + remote-checkout +backend/.envpre-flight gates; optional CI-green gate (REQUIRE_CI=1); captures the prior SHA for a rollback hint; health-polls backend/healthz(8010) and the frontend (3010).PILOT_HOST/RATIBA_DIR/BRANCHoverride the target;DEPLOY_VOICE=1opts livekit in with a port-collision warning.
Reversibility
- Purely additive. The laptop path is unchanged —
infra + app (+ dev)still runs identically locally, and the resource caps are generous enough not to bind dev. The co-located deploy is env-gated (PILOT_HOST,DEPLOY_VOICE); the dedicated M14 Hetzner CX21 path (Consequences → Positive) is untouched. - Reverting is two deletions (the
deploy.resourcesblocks and the livekitprofiles:key) plus removing one script — no schema, API, or data commitments are introduced.
References
- ADR-0008: WhatsApp Cloud API direct (Twilio long-code, WABA provisioning)
- ADR-0004: Testing strategy + calibration cadence (human_labelled.yaml, redact.py PII masking floor, Cohen's kappa ≥ 0.7 target)
- ADR-0007: Reaper + STK timeout (daily 3 AM EAT consolidated reaper, payment_callbacks_unrouted dead-letter table)
- Beyond-Ratiba spec:
docs/superpowers/specs/2026-05-09-aria-aura-spa-substrate-design.md - Back-office integration spec:
docs/superpowers/specs/2026-05-10-ratiba-back-office-integration-design.md - M13 design doc:
docs/plans/2026-05-10-m13-containers-and-real-pilot-design.md - M13 implementation plan:
docs/superpowers/plans/2026-05-10-m13-containers-and-real-pilot.md