Skip to main content

System overview

What this page covers

This page is the C4 Component-level view of Ratiba's backend. The sibling Architecture overview covers Levels 1 and 2 (Context and Container) — what Ratiba is, what it talks to, and what processes live in the docker compose stack. That is the right starting point for orientation.

This page goes one level deeper. It zooms into the FastAPI backend container and names the internal layers, the call edges between them, and the transport used at every external boundary. It also surfaces the two modules added after the initial architecture phase — app/services/knowledge.py (Phase 0 knowledge answers) and app/voice/* (voice full-duplex) — and shows how they slot into the existing layer structure without disturbing it.

Read it when you need to know which Python module owns a behaviour, or which protocol carries a particular interaction. The boundary table enumerates every external system Ratiba speaks to with auth scheme and failure mode per ADR. For the per-feature flows (booking happy path, cancel-with-refund, channel switch), keep going to How it works. For the auto-generated module dependency graph, see Component map.


C4 Container view, deeper

The container view in Architecture overview is intentionally shallow — seven processes, four data stores, arrow labels at the marketing level. The version below is the same shape with transport annotations: HTTP/REST for sync external calls, WebSocket for the admin chat surface, Postgres LISTEN/NOTIFY for cross-process events, Redis pub/sub for FSM hot state, gRPC for LiveKit.

Three notes on transport. First, the worker is the same Python process as the backend (./start-server.sh runs both under one supervisor) — when traffic warrants extracting it, the LISTEN/NOTIFY boundary is already where the seam lies. Second, every Meta webhook (WhatsApp + Instagram + Messenger) is HMAC-verified with a single project-level WHATSAPP_APP_SECRET / INSTAGRAM_APP_SECRET / MESSENGER_APP_SECRET (per ADR-0008 + ADR-0009 D9) before it reaches the channel adapter; per-tenant access tokens stored on public.tenants are only used outbound. Third, MinIO is present from ADR-0012; catalog_importer.py writes uploaded price-list photos there; vision_extractor.py reads them back for OCR.


C4 Component view of backend

Inside the FastAPI process, nine layers carry the request from webhook to database. The diagram below traces the canonical inbound path: API → Channels → FSM → Services → Persistence, with the LLM router consumed as a side-link by both Services and FSM, and two parallel entry points — Admin (for admin WhatsApp + dashboard commands) and Voice (for per-call SIP sessions).

The Knowledge service (app/services/knowledge.py) is a leaf of the Services layer: it is called by the FSM dispatcher's inquiry branch and has no upward dependencies. The Voice layer (app/voice/) is a lateral peer of Channels — it bypasses the channel adapter protocol and dispatches directly into the FSM, using the same dispatch_inbound_message entry point.

Four design properties that the shape encodes:

  1. FSM is the load-bearing centre. Every customer-facing turn — regardless of channel — transits app/orchestrator/booking_graph.py (or its cancel / reschedule siblings). Voice is not a special branch; it is one more caller of the same dispatcher.
  2. Knowledge is a leaf of Services. app/services/knowledge.py::fetch_snippets has no upward dependencies — it is a pure DB read injected by the dispatcher's inquiry branch. Upgrading from Phase 0 (SELECT) to Phase 1 (pgvector top-k) is a single-function swap with no caller changes. For the full behavioral description, see How it works → Knowledge answers.
  3. LLM router is the single funnel. Every model call — intent classifier, answer shaper, vision OCR, admin NL router, handoff summariser — goes through app/llm/router.py. Cost tracking (per ADR-0005 ContextVar ledger), prompt-version pinning, and provider failover happen in one place.
  4. Persistence is the only I/O layer. Every other layer goes through it — no raw asyncpg calls above app/persistence/. The per-tenant schema-routing (asyncio ContextVar → asyncpg pool selection) is encapsulated here per ADR-0002.

External boundaries enumerated

External systemProtocolAuthWhenFailure mode
Anthropic APIHTTPS RESTx-api-keyLLM completions, vision OCR, fluency metricAdapter retry once; cost ContextVar updates per call (ADR-0005)
OpenAI APIHTTPS RESTBearer tokenGPT-4.1 mini for narrow tasks (intent classifier default)Same retry-once + cost ContextVar
Meta Cloud API (WhatsApp)HTTPS REST + webhookPer-tenant whatsapp_access_tokenWhatsApp inbound webhook + outbound messages.sendWebhook HMAC-verified via WHATSAPP_APP_SECRET; 5xx retry on send (ADR-0008)
Meta Graph API (Instagram + Messenger)HTTPS REST + webhookPer-tenant tokenTier-2 channel inbound + outboundHMAC verify via INSTAGRAM_APP_SECRET / MESSENGER_APP_SECRET; reactive-only outside 24h window (ADR-0009 D9)
Daraja (Safaricom M-Pesa)HTTPS REST + webhookOAuth2 client-credentialsSTK push at booking confirm + stkpushquery poll + result callbackOne-shot poll at t=60s; late callbacks dead-letter to public.payment_callbacks_unrouted; 90s hard cap on voice calls (ADR-0007)
PesaPalHTTPS REST + webhookAPI keyCard payments only — never M-Pesa8 min nudge / 30 min abandon timers driven by worker (ADR-0007)
Africa's TalkingHTTPS RESTapi_key headerSMS reminder fallback (NotificationSink, not a Channel)Fire-and-forget; ~$0.003/msg cheaper than out-of-window WhatsApp utility (ADR-0009 D6)
LiveKitgRPC + SIPAPI key + secretVoice rooms + SIP bridge for inbound callsReconnect on stream drop; AgentSession plugin path handles STT/TTS provider failover
Deepgram Nova-3WebSocket (streaming)API keyStreaming STT during voice calls — Swahili + English language IDLiveKit Agents SDK handles reconnect; interim transcripts drive barge-in + listening-ack; final transcripts drive FSM dispatch
ElevenLabs Multilingual v2WebSocket (streaming)API keyStreaming TTS during voice callssafe_say in barge_in.py catches the closing-session race (call already ended); WPM-adaptive speed multiplier applied per VoiceConfig
KeycloakHTTPS REST + OIDCAdmin client-credentialsAdmin auth + per-tenant realm managementAdmin-only — customer flows have zero Keycloak dependency
MinIOS3-compatible HTTPSAccess key + secret keyCatalog asset storage (price-list photos, CSV uploads)Connection on Backend startup; uploads gated behind catalog_importer.py staging flow

Eleven systems, eleven different auth schemes. The pattern: webhooks always HMAC-verified at the perimeter; outbound calls always carry per-tenant credentials (M-Pesa, Meta) or project-level keys (Anthropic, OpenAI, AT, LiveKit, Deepgram, ElevenLabs, Keycloak admin, MinIO); customer-facing flows have zero Keycloak dependency by design (ADR-0002 + ADR-0009).


Transport-level summary

Six transports in total. HTTP/REST is the default for sync external calls (Anthropic, OpenAI, Daraja, PesaPal, AT, Meta send-side, MinIO). Webhooks carry inbound from Meta + Daraja + PesaPal, signature-verified before any tenant resolution. WebSocket carries the admin chat surface (Next.js dashboard to FastAPI) and the voice STT/TTS sub-providers (Deepgram + ElevenLabs, managed inside LiveKit Agents). Postgres LISTEN/NOTIFY is the cross-process bus for payment callbacks and admin state events — payment callbacks, scheduled reminders, and admin hand-back flow through it (architectural win lifted from the M7 voice work; see Data flow for the full sequence). Redis pub/sub carries FSM hot-state updates within the backend; SETNX mutexes serialise per-thread turns. gRPC is LiveKit-only (the Agents SDK connection from app/voice/agent.py).

The voice layer adds one notable transport path: LiveKit → Deepgram WebSocket for streaming STT (interim and final transcripts arrive back in the agent handler) and LiveKit → ElevenLabs WebSocket for streaming TTS (sentence events from the VoiceStreamEvent seam are fed incrementally). Both WebSocket sessions are managed by the LiveKit Agents SDK — the backend itself never holds these connections directly.