System overview

What this page covers

This page is the C4 Component-level view of Ratiba's backend. The sibling Architecture overview covers Levels 1 and 2 (Context and Container) — what Ratiba is, what it talks to, and what processes live in the docker compose stack. That is the right starting point for orientation.

This page goes one level deeper. It zooms into the FastAPI backend container and names the internal layers, the call edges between them, and the transport used at every external boundary. It also surfaces the two modules added after the initial architecture phase — app/services/knowledge.py (Phase 0 knowledge answers) and app/voice/* (voice full-duplex) — and shows how they slot into the existing layer structure without disturbing it.

Read it when you need to know which Python module owns a behaviour, or which protocol carries a particular interaction. The boundary table enumerates every external system Ratiba speaks to with auth scheme and failure mode per ADR. For the per-feature flows (booking happy path, cancel-with-refund, channel switch), keep going to How it works. For the auto-generated module dependency graph, see Component map.

C4 Container view, deeper

The container view in Architecture overview is intentionally shallow — seven processes, four data stores, arrow labels at the marketing level. The version below is the same shape with transport annotations: HTTP/REST for sync external calls, WebSocket for the admin chat surface, Postgres LISTEN/NOTIFY for cross-process events, Redis pub/sub for FSM hot state, gRPC for LiveKit.

Three notes on transport. First, the worker is the same Python process as the backend (./start-server.sh runs both under one supervisor) — when traffic warrants extracting it, the LISTEN/NOTIFY boundary is already where the seam lies. Second, every Meta webhook (WhatsApp + Instagram + Messenger) is HMAC-verified with a single project-level WHATSAPP_APP_SECRET / INSTAGRAM_APP_SECRET / MESSENGER_APP_SECRET (per ADR-0008 + ADR-0009 D9) before it reaches the channel adapter; per-tenant access tokens stored on public.tenants are only used outbound. Third, MinIO is present from ADR-0012; catalog_importer.py writes uploaded price-list photos there; vision_extractor.py reads them back for OCR.

C4 Component view of `backend`

Inside the FastAPI process, nine layers carry the request from webhook to database. The diagram below traces the canonical inbound path: API → Channels → FSM → Services → Persistence, with the LLM router consumed as a side-link by both Services and FSM, and two parallel entry points — Admin (for admin WhatsApp + dashboard commands) and Voice (for per-call SIP sessions).

The Knowledge service (app/services/knowledge.py) is a leaf of the Services layer: it is called by the FSM dispatcher's inquiry branch and has no upward dependencies. The Voice layer (app/voice/) is a lateral peer of Channels — it bypasses the channel adapter protocol and dispatches directly into the FSM, using the same dispatch_inbound_message entry point.

Four design properties that the shape encodes:

FSM is the load-bearing centre. Every customer-facing turn — regardless of channel — transits app/orchestrator/booking_graph.py (or its cancel / reschedule siblings). Voice is not a special branch; it is one more caller of the same dispatcher.
Knowledge is a leaf of Services. app/services/knowledge.py::fetch_snippets has no upward dependencies — it is a pure DB read injected by the dispatcher's inquiry branch. Upgrading from Phase 0 (SELECT) to Phase 1 (pgvector top-k) is a single-function swap with no caller changes. For the full behavioral description, see How it works → Knowledge answers.
LLM router is the single funnel. Every model call — intent classifier, answer shaper, vision OCR, admin NL router, handoff summariser — goes through app/llm/router.py. Cost tracking (per ADR-0005 ContextVar ledger), prompt-version pinning, and provider failover happen in one place.
Persistence is the only I/O layer. Every other layer goes through it — no raw asyncpg calls above app/persistence/. The per-tenant schema-routing (asyncio ContextVar → asyncpg pool selection) is encapsulated here per ADR-0002.

External boundaries enumerated

External system	Protocol	Auth	When	Failure mode
Anthropic API	HTTPS REST	`x-api-key`	LLM completions, vision OCR, fluency metric	Adapter retry once; cost ContextVar updates per call (ADR-0005)
OpenAI API	HTTPS REST	Bearer token	GPT-4.1 mini for narrow tasks (intent classifier default)	Same retry-once + cost ContextVar
Meta Cloud API (WhatsApp)	HTTPS REST + webhook	Per-tenant `whatsapp_access_token`	WhatsApp inbound webhook + outbound `messages.send`	Webhook HMAC-verified via `WHATSAPP_APP_SECRET`; 5xx retry on send (ADR-0008)
Meta Graph API (Instagram + Messenger)	HTTPS REST + webhook	Per-tenant token	Tier-2 channel inbound + outbound	HMAC verify via `INSTAGRAM_APP_SECRET` / `MESSENGER_APP_SECRET`; reactive-only outside 24h window (ADR-0009 D9)
Daraja (Safaricom M-Pesa)	HTTPS REST + webhook	OAuth2 client-credentials	STK push at booking confirm + `stkpushquery` poll + result callback	One-shot poll at t=60s; late callbacks dead-letter to `public.payment_callbacks_unrouted`; 90s hard cap on voice calls (ADR-0007)
PesaPal	HTTPS REST + webhook	API key	Card payments only — never M-Pesa	8 min nudge / 30 min abandon timers driven by worker (ADR-0007)
Africa's Talking	HTTPS REST	`api_key` header	SMS reminder fallback (NotificationSink, not a Channel)	Fire-and-forget; ~$0.003/msg cheaper than out-of-window WhatsApp utility (ADR-0009 D6)
LiveKit	gRPC + SIP	API key + secret	Voice rooms + SIP bridge for inbound calls	Reconnect on stream drop; AgentSession plugin path handles STT/TTS provider failover
Deepgram Nova-3	WebSocket (streaming)	API key	Streaming STT during voice calls — Swahili + English language ID	LiveKit Agents SDK handles reconnect; interim transcripts drive barge-in + listening-ack; final transcripts drive FSM dispatch
ElevenLabs Multilingual v2	WebSocket (streaming)	API key	Streaming TTS during voice calls	`safe_say` in `barge_in.py` catches the closing-session race (call already ended); WPM-adaptive speed multiplier applied per `VoiceConfig`
Keycloak	HTTPS REST + OIDC	Admin client-credentials	Admin auth + per-tenant realm management	Admin-only — customer flows have zero Keycloak dependency
MinIO	S3-compatible HTTPS	Access key + secret key	Catalog asset storage (price-list photos, CSV uploads)	Connection on `Backend` startup; uploads gated behind `catalog_importer.py` staging flow

Eleven systems, eleven different auth schemes. The pattern: webhooks always HMAC-verified at the perimeter; outbound calls always carry per-tenant credentials (M-Pesa, Meta) or project-level keys (Anthropic, OpenAI, AT, LiveKit, Deepgram, ElevenLabs, Keycloak admin, MinIO); customer-facing flows have zero Keycloak dependency by design (ADR-0002 + ADR-0009).

Transport-level summary

Six transports in total. HTTP/REST is the default for sync external calls (Anthropic, OpenAI, Daraja, PesaPal, AT, Meta send-side, MinIO). Webhooks carry inbound from Meta + Daraja + PesaPal, signature-verified before any tenant resolution. WebSocket carries the admin chat surface (Next.js dashboard to FastAPI) and the voice STT/TTS sub-providers (Deepgram + ElevenLabs, managed inside LiveKit Agents). Postgres LISTEN/NOTIFY is the cross-process bus for payment callbacks and admin state events — payment callbacks, scheduled reminders, and admin hand-back flow through it (architectural win lifted from the M7 voice work; see Data flow for the full sequence). Redis pub/sub carries FSM hot-state updates within the backend; SETNX mutexes serialise per-thread turns. gRPC is LiveKit-only (the Agents SDK connection from app/voice/agent.py).

The voice layer adds one notable transport path: LiveKit → Deepgram WebSocket for streaming STT (interim and final transcripts arrive back in the agent handler) and LiveKit → ElevenLabs WebSocket for streaming TTS (sentence events from the VoiceStreamEvent seam are fed incrementally). Both WebSocket sessions are managed by the LiveKit Agents SDK — the backend itself never holds these connections directly.

Cross-links

Architecture overview — Level 1 + 2 context (start here if you're new).
Schema evolution — per-tenant DDL + Alembic migration order + public registry shape.
Component map — auto-generated module dependency graph + undocumented-modules report. The exhaustive app/ folder tour lives there.
Data flow — request lifecycle traced through the backend layers, including payment callback and admin reply flows.
How it works → Channel substrate — inbound message lifecycle, per channel.
How it works → Conversation FSM — booking, cancel, reschedule state machines.
How it works → Identity and tenancy — phone-only deterministic merge.
How it works → Payments — Daraja STK + PesaPal cards path; 90s voice hard cap.
How it works → Cross-sell — same-staff / same-slot offers post-confirm.
How it works → Personality dials — eight curated tenant dials; prompt-cache invariant.
How it works → Catalog onboarding — CSV / vision OCR / manual entry; MinIO asset storage.
How it works → Admin orchestrator — dual-channel admin rail.
How it works → Knowledge answers — Phase 0 "no-RAG RAG"; fetch_snippets seam; knowledge_overflow graduation trigger.
How it works → Voice conversation — full-duplex primitives; VoiceStreamEvent seam; WPM-adaptive speed.
ADR-0001, ADR-0002, ADR-0003, ADR-0005, ADR-0006, ADR-0007, ADR-0008, ADR-0009, ADR-0010, ADR-0011, ADR-0012, ADR-0013.

What this page covers​

C4 Container view, deeper​

C4 Component view of backend​

External boundaries enumerated​

Transport-level summary​

Cross-links​