Architecture overview

Ratiba is a single FastAPI process plus a Next.js dashboard, sitting between five customer channels (WhatsApp, voice, web widget, Instagram DM, Messenger DM) and a small set of paid external providers (Anthropic, OpenAI, Daraja, PesaPal, Meta Cloud API, Africa's Talking SMS, LiveKit, Deepgram, ElevenLabs). Conversation state is the database; channels are I/O adapters; tenancy is enforced at the channel boundary. This page is the C4-style map — Context first, then Container, then a paragraph per major component.

This page stays at C4 levels 1 and 2. Module-by-module dependency graphs live at Component map; per-feature flows at How it works.

System context — what Ratiba talks to

At Level 1, Ratiba is one box. Everyone else is either a user (customer or admin) or an external provider it integrates with. The point of the diagram below is the boundary: every payment path goes to Daraja or PesaPal, every WhatsApp message goes through Meta, every TTS audio frame comes from ElevenLabs via LiveKit. Nothing else crosses the perimeter.

Three notes on the boundary. First, Meta is one provider for three channels — the same Cloud API surface routes WhatsApp, Instagram DM, and Messenger DM (per ADR-0008 and ADR-0009 D9). Second, Africa's Talking is deliberately a NotificationSink, not a channel — outbound SMS reminders only, never an inbound conversation surface (ADR-0009 D6). Third, Deepgram and ElevenLabs are voice-pipeline internals — the backend drives them through the LiveKit Agents SDK, so the external boundary for telephony is LiveKit as the SIP bridge; Deepgram and ElevenLabs are sub-providers within that boundary.

Container view — what's inside the Ratiba box

Zooming in: Ratiba is seven processes plus four data stores. The single FastAPI backend on :8010 carries all business logic; the worker is in-process via APScheduler (no Celery yet). Frontend on :3010 is the admin dashboard plus the embedded web widget. MinIO on :9200 stores uploaded catalog assets (price-list photos, CSV imports).

The worker is the same Python process as the backend in v1 — APScheduler runs alongside FastAPI under one supervisor (./start-server.sh). When pilot traffic warrants extracting it, it will move to its own container without API changes; the schedule definitions already live in app/workers/. The three-file Docker Compose split (infra / app / dev) from ADR-0012 keeps infra-only restarts fast.

Components, one paragraph each

The ten components below mirror the ten "How it works" leaves. Each paragraph names the load-bearing source files, points at the relevant ADR(s), and links forward to the per-feature explainer.

1. Channel substrate

Channels are I/O adapters with capability flags, not first-class branches in the FSM. The Channel protocol at app/channels/_base/channel.py declares how a webhook is parsed, how a reply is rendered, and what Tier (1 = WhatsApp + voice; 2 = web + IG + Messenger) the adapter belongs to. Five concrete adapters live under app/channels/<kind>/ and share the substrate's identity-resolution semantics (phone-only deterministic matching) and session-window rules (24h Meta window for IG/Messenger; reactive-only outside it). See How it works → Channel substrate for the inbound message lifecycle and Architecture → System overview for the layered shape. ADR: 0009.

2. Conversation FSM

The booking, cancel, and reschedule flows are LangGraph state machines persisted two-tier: Redis for hot state, Postgres checkpoints for durable thread history. Entry points are app/orchestrator/booking_graph.py, cancel_graph.py, reschedule_graph.py; shared state shapes in state.py; thread-pointer logic in threads.py. Each booking gets a fresh thread_id (ULID), and a per-thread Redis SETNX mutex with 30s TTL serialises in-flight turns to avoid duplicate STK pushes. Intent classification is a single bilingual prompt routed through LLMRouter (per ADR-0005). See How it works → Conversation FSM and Architecture → Data flow. ADRs: 0003, 0005.

3. Identity and tenancy

Tenancy is resolved at the channel boundary, not the request boundary — webhooks arrive identified only by phone or session, never by JWT. IdentityResolver (app/agents/identity_resolver.py) maps phone to tenant + actor_type (admin or customer); app/persistence/customers.py plus the per-tenant customer_identities and customer_sessions tables provide deterministic cross-channel merge (phone-only, no probabilistic matching). Schema-per-tenant isolation is enforced by an asyncio contextvar (current_tenant.get()) read by every downstream service. See How it works → Identity and tenancy and Architecture → Schema evolution. ADRs: 0002, 0009.

4. Payments

M-Pesa is the primary rail (Daraja STK push at booking confirmation); PesaPal handles cards exclusively, never M-Pesa. Initiate path: app/payments/initiate_daraja.py → daraja.py. One-shot reconciliation poll at t=60s (poll_daraja.py); 8 min/30 min PesaPal nudge/abandon timers driven by the worker. Customer-initiated cancellation is a first-class FSM state with hybrid provider-specific reversal in cancel_reversal.py. Concurrent payments per booking thread are prohibited via FSM single-in-flight + Layer-3 idempotency keys. Reservations short-lived in app/services/reservations.py. Voice calls cap STK at 90 seconds hard (the caller hangs up; the worker then resolves the state out-of-band). See How it works → Payments and Architecture → Data flow. ADR: 0007.

5. Cross-sell

Once a booking confirms, the FSM optionally offers a same-staff or same-time-slot follow-on service. The decision logic lives in app/services/cross_sell.py and is gated by the tenant's personality dial cross_sell_aggressiveness (off / soft / aggressive). The offer is a single bilingual turn: accept = bundle into the same payment, decline = ship the original booking unchanged. Yield is captured per-tenant and per-vertical for future prompt-tuning. See How it works → Cross-sell. ADR: 0010 D6.

6. Personality dials

Tenants tune the AI's voice without freeform prompts. Eight curated dials (formality, warmth, response-length cap, emoji policy, cross-sell aggressiveness, etc.) live in personality_config per tenant; app/services/personality_config.py loads and validates; app/llm/router.py injects them into the system prompt at every call. Curated over freeform was a deliberate choice — it keeps moderation tractable until a future privacy/safety ADR. The dashboard surfaces these as sliders + dropdowns, never a textarea. See How it works → Personality dials. ADR: 0010 D2-D4, D9.

7. Catalog onboarding

A new tenant's services + staff + hours can land via three paths: CSV upload (app/services/csv_extractor.py), photo of a price list with vision OCR (vision_extractor.py), or full manual entry. Catalog imports are staged in app/persistence/catalog_imports.py so admins can review-and-confirm before they go live; the staging table also doubles as the audit trail. The catalog_importer.py service is the orchestrator — same interface regardless of input source. Uploaded photos live in MinIO under per-tenant prefixes. See How it works → Catalog onboarding. ADR: 0010 D5, D7, D8.

8. Admin orchestrator

Admins manage their tenant via the same WhatsApp thread their customers use, plus an OIDC-protected /admin dashboard as fallback. app/admin/orchestrator.py is a shallow 4-state FSM with a 4h actor-type TTL; commands route through nl_router.py (natural-language) or commands.py (slash-prefixed). Customer-handoff briefings are JSON-shaped per ADR-0006 and rendered as a bilingual card; on-demand translation buttons let an English-only admin read a Swahili transcript verbatim. See How it works → Admin orchestrator. ADRs: 0006, 0010 D8.

9. Knowledge answers

Customers ask questions that structured data cannot answer: "Do I need to arrive early?", "Is there parking?", "What's your cancellation policy?". Phase 0 answers these with the simplest possible mechanism: curated per-tenant knowledge_snippets rows injected directly into the existing answer_shaper prompt. The entry point is app/services/knowledge.py::fetch_snippets(intent) — a category-scoped SELECT that returns the active KB as a list of dicts. The dispatcher injects the list into raw_data["knowledge"], which rides into the answer_shaper user template. No vector index, no embedding column — the model performs retrieval by attention. The knowledge_overflow WARN is the observable graduation trigger for Phase 1 (pgvector + top-k cosine). For the full behavioral flow, see How it works → Knowledge answers. ADR: 0013.

10. Voice conversation

Phone calls are a Tier-1 channel: the caller's E.164 number is known from SIP metadata the instant the leg picks up. app/voice/agent.py is the per-call handler — it sets up the LiveKit AgentSession, resolves DID to tenant, and runs the on_user_turn_completed loop. Five full-duplex primitives handle the real-time audio layer: barge_in.py (interruption detection + safe-say race guard), backchannel.py (bilingual filler-word filter), hard_interrupt.py (3× stop-token pattern cancel), listening_ack.py (mid-utterance acknowledgement), and voice_speed.py (WPM-adaptive TTS speed). The FSM stays authoritative — the voice agent is a brain-less proxy that dispatches finalised utterances to the same booking_graph that WhatsApp uses. The VoiceStreamEvent seam at brain_stream.py is the forward-compat hook for a future agentic streaming backend. For the full behavioral detail, see How it works → Voice conversation. ADRs: 0005, 0006.

Where the decisions live

The 13 ADRs at /adr/ carry the full rationale. The most-referenced from this page:

ADR-0001 — Tech stack (FastAPI + Next.js + Postgres schema-per-tenant + Keycloak + Redis; Python 3.13 pin; library-currency policy).
ADR-0002 — Schema-per-tenant operational specifics; two pools; per-tenant Alembic invocation; contextvar tenant propagation.
ADR-0003 — Two-tier FSM persistence (Redis hot + Postgres LangGraph checkpoints); per-thread Redis SETNX mutex.
ADR-0007 — One-shot Daraja stkpushquery at t=60s; PesaPal cards-only; concurrent-payment prohibition per booking thread; 90s voice STK hard cap.
ADR-0009 — Channel-agnostic substrate; phone-only deterministic identity; SMS as a NotificationSink; channel-switch primitive.
ADR-0012 — Three-file Docker Compose split; Cloudflare Tunnel; real-tenant beta cohort; commission scaffolding.
ADR-0013 — "No-RAG RAG" knowledge snippets; YAGNI pgvector deferral; knowledge_overflow graduation trigger.

Full list at the ADR landing index.

What's NOT in this overview

Three things this page deliberately omits:

Schema details — the per-tenant DDL, Alembic migration order, and the public registry shape live at Architecture → Schema evolution.
Module-by-module dependency graph — the auto-generated component-map plus undocumented-modules report lives at Architecture → Component map.
Per-feature flows — booking happy paths, cancel-with-refund flows, channel-switch sequences, etc., live as runnable explainers at How it works.

System context — what Ratiba talks to​

Container view — what's inside the Ratiba box​

Components, one paragraph each​

1. Channel substrate​

2. Conversation FSM​

3. Identity and tenancy​

4. Payments​

5. Cross-sell​

6. Personality dials​

7. Catalog onboarding​

8. Admin orchestrator​

9. Knowledge answers​

10. Voice conversation​

Where the decisions live​

What's NOT in this overview​