System overview
What this page covers
This page is the C4 Component-level view of Ratiba's backend. The sibling Architecture overview covers Levels 1 and 2 (Context and Container) — what Ratiba is, what it talks to, and what processes live in the docker compose stack. That is the right starting point for orientation.
This page goes one level deeper. It zooms into the FastAPI backend container and names the internal layers, the call edges between them, and the transport used at every external boundary. It also surfaces the two modules added after the initial architecture phase — app/services/knowledge.py (Phase 0 knowledge answers) and app/voice/* (voice full-duplex) — and shows how they slot into the existing layer structure without disturbing it.
Read it when you need to know which Python module owns a behaviour, or which protocol carries a particular interaction. The boundary table enumerates every external system Ratiba speaks to with auth scheme and failure mode per ADR. For the per-feature flows (booking happy path, cancel-with-refund, channel switch), keep going to How it works. For the auto-generated module dependency graph, see Component map.
C4 Container view, deeper
The container view in Architecture overview is intentionally shallow — seven processes, four data stores, arrow labels at the marketing level. The version below is the same shape with transport annotations: HTTP/REST for sync external calls, WebSocket for the admin chat surface, Postgres LISTEN/NOTIFY for cross-process events, Redis pub/sub for FSM hot state, gRPC for LiveKit.
Three notes on transport. First, the worker is the same Python process as the backend (./start-server.sh runs both under one supervisor) — when traffic warrants extracting it, the LISTEN/NOTIFY boundary is already where the seam lies. Second, every Meta webhook (WhatsApp + Instagram + Messenger) is HMAC-verified with a single project-level WHATSAPP_APP_SECRET / INSTAGRAM_APP_SECRET / MESSENGER_APP_SECRET (per ADR-0008 + ADR-0009 D9) before it reaches the channel adapter; per-tenant access tokens stored on public.tenants are only used outbound. Third, MinIO is present from ADR-0012; catalog_importer.py writes uploaded price-list photos there; vision_extractor.py reads them back for OCR.
C4 Component view of backend
Inside the FastAPI process, nine layers carry the request from webhook to database. The diagram below traces the canonical inbound path: API → Channels → FSM → Services → Persistence, with the LLM router consumed as a side-link by both Services and FSM, and two parallel entry points — Admin (for admin WhatsApp + dashboard commands) and Voice (for per-call SIP sessions).
The Knowledge service (app/services/knowledge.py) is a leaf of the Services layer: it is called by the FSM dispatcher's inquiry branch and has no upward dependencies. The Voice layer (app/voice/) is a lateral peer of Channels — it bypasses the channel adapter protocol and dispatches directly into the FSM, using the same dispatch_inbound_message entry point.
Four design properties that the shape encodes:
- FSM is the load-bearing centre. Every customer-facing turn — regardless of channel — transits
app/orchestrator/booking_graph.py(or its cancel / reschedule siblings). Voice is not a special branch; it is one more caller of the same dispatcher. - Knowledge is a leaf of Services.
app/services/knowledge.py::fetch_snippetshas no upward dependencies — it is a pure DB read injected by the dispatcher's inquiry branch. Upgrading from Phase 0 (SELECT) to Phase 1 (pgvector top-k) is a single-function swap with no caller changes. For the full behavioral description, see How it works → Knowledge answers. - LLM router is the single funnel. Every model call — intent classifier, answer shaper, vision OCR, admin NL router, handoff summariser — goes through
app/llm/router.py. Cost tracking (per ADR-0005 ContextVar ledger), prompt-version pinning, and provider failover happen in one place. - Persistence is the only I/O layer. Every other layer goes through it — no raw
asyncpgcalls aboveapp/persistence/. The per-tenant schema-routing (asyncio ContextVar → asyncpg pool selection) is encapsulated here per ADR-0002.
External boundaries enumerated
| External system | Protocol | Auth | When | Failure mode |
|---|---|---|---|---|
| Anthropic API | HTTPS REST | x-api-key | LLM completions, vision OCR, fluency metric | Adapter retry once; cost ContextVar updates per call (ADR-0005) |
| OpenAI API | HTTPS REST | Bearer token | GPT-4.1 mini for narrow tasks (intent classifier default) | Same retry-once + cost ContextVar |
| Meta Cloud API (WhatsApp) | HTTPS REST + webhook | Per-tenant whatsapp_access_token | WhatsApp inbound webhook + outbound messages.send | Webhook HMAC-verified via WHATSAPP_APP_SECRET; 5xx retry on send (ADR-0008) |
| Meta Graph API (Instagram + Messenger) | HTTPS REST + webhook | Per-tenant token | Tier-2 channel inbound + outbound | HMAC verify via INSTAGRAM_APP_SECRET / MESSENGER_APP_SECRET; reactive-only outside 24h window (ADR-0009 D9) |
| Daraja (Safaricom M-Pesa) | HTTPS REST + webhook | OAuth2 client-credentials | STK push at booking confirm + stkpushquery poll + result callback | One-shot poll at t=60s; late callbacks dead-letter to public.payment_callbacks_unrouted; 90s hard cap on voice calls (ADR-0007) |
| PesaPal | HTTPS REST + webhook | API key | Card payments only — never M-Pesa | 8 min nudge / 30 min abandon timers driven by worker (ADR-0007) |
| Africa's Talking | HTTPS REST | api_key header | SMS reminder fallback (NotificationSink, not a Channel) | Fire-and-forget; ~$0.003/msg cheaper than out-of-window WhatsApp utility (ADR-0009 D6) |
| LiveKit | gRPC + SIP | API key + secret | Voice rooms + SIP bridge for inbound calls | Reconnect on stream drop; AgentSession plugin path handles STT/TTS provider failover |
| Deepgram Nova-3 | WebSocket (streaming) | API key | Streaming STT during voice calls — Swahili + English language ID | LiveKit Agents SDK handles reconnect; interim transcripts drive barge-in + listening-ack; final transcripts drive FSM dispatch |
| ElevenLabs Multilingual v2 | WebSocket (streaming) | API key | Streaming TTS during voice calls | safe_say in barge_in.py catches the closing-session race (call already ended); WPM-adaptive speed multiplier applied per VoiceConfig |
| Keycloak | HTTPS REST + OIDC | Admin client-credentials | Admin auth + per-tenant realm management | Admin-only — customer flows have zero Keycloak dependency |
| MinIO | S3-compatible HTTPS | Access key + secret key | Catalog asset storage (price-list photos, CSV uploads) | Connection on Backend startup; uploads gated behind catalog_importer.py staging flow |
Eleven systems, eleven different auth schemes. The pattern: webhooks always HMAC-verified at the perimeter; outbound calls always carry per-tenant credentials (M-Pesa, Meta) or project-level keys (Anthropic, OpenAI, AT, LiveKit, Deepgram, ElevenLabs, Keycloak admin, MinIO); customer-facing flows have zero Keycloak dependency by design (ADR-0002 + ADR-0009).
Transport-level summary
Six transports in total. HTTP/REST is the default for sync external calls (Anthropic, OpenAI, Daraja, PesaPal, AT, Meta send-side, MinIO). Webhooks carry inbound from Meta + Daraja + PesaPal, signature-verified before any tenant resolution. WebSocket carries the admin chat surface (Next.js dashboard to FastAPI) and the voice STT/TTS sub-providers (Deepgram + ElevenLabs, managed inside LiveKit Agents). Postgres LISTEN/NOTIFY is the cross-process bus for payment callbacks and admin state events — payment callbacks, scheduled reminders, and admin hand-back flow through it (architectural win lifted from the M7 voice work; see Data flow for the full sequence). Redis pub/sub carries FSM hot-state updates within the backend; SETNX mutexes serialise per-thread turns. gRPC is LiveKit-only (the Agents SDK connection from app/voice/agent.py).
The voice layer adds one notable transport path: LiveKit → Deepgram WebSocket for streaming STT (interim and final transcripts arrive back in the agent handler) and LiveKit → ElevenLabs WebSocket for streaming TTS (sentence events from the VoiceStreamEvent seam are fed incrementally). Both WebSocket sessions are managed by the LiveKit Agents SDK — the backend itself never holds these connections directly.
Cross-links
- Architecture overview — Level 1 + 2 context (start here if you're new).
- Schema evolution — per-tenant DDL + Alembic migration order +
publicregistry shape. - Component map — auto-generated module dependency graph + undocumented-modules report. The exhaustive
app/folder tour lives there. - Data flow — request lifecycle traced through the backend layers, including payment callback and admin reply flows.
- How it works → Channel substrate — inbound message lifecycle, per channel.
- How it works → Conversation FSM — booking, cancel, reschedule state machines.
- How it works → Identity and tenancy — phone-only deterministic merge.
- How it works → Payments — Daraja STK + PesaPal cards path; 90s voice hard cap.
- How it works → Cross-sell — same-staff / same-slot offers post-confirm.
- How it works → Personality dials — eight curated tenant dials; prompt-cache invariant.
- How it works → Catalog onboarding — CSV / vision OCR / manual entry; MinIO asset storage.
- How it works → Admin orchestrator — dual-channel admin rail.
- How it works → Knowledge answers — Phase 0 "no-RAG RAG";
fetch_snippetsseam;knowledge_overflowgraduation trigger. - How it works → Voice conversation — full-duplex primitives;
VoiceStreamEventseam; WPM-adaptive speed. - ADR-0001, ADR-0002, ADR-0003, ADR-0005, ADR-0006, ADR-0007, ADR-0008, ADR-0009, ADR-0010, ADR-0011, ADR-0012, ADR-0013.