Conversation FSM

What it does

Every conversation that books, cancels, reschedules, or cross-sells a service runs through a LangGraph state machine persisted two-tier per ADR-0003: Redis holds the hot state for the live turn, and Postgres carries the durable checkpoint history in per-tenant checkpoints_<tenant>.checkpoints tables. There are five graphs in flight: a booking graph (the spine), a cancel graph, a reschedule graph, the M11 cross-sell offshoot embedded inside the booking graph, and the M9 admin orchestrator covered separately.

A single bilingual intent classifier (ADR-0005) decides which graph each inbound turn belongs to. The classifier is one prompt — not a router-of-routers — and falls back to the tenant's locale when language confidence is low. Every booking thread gets a fresh ULID thread_id (per ADR-0003) so cross-booking state never leaks; a per-thread Redis SETNX mutex with 30s TTL serialises in-flight turns to avoid duplicate STK pushes when a customer double-taps.

State shape lives in BookingState (Pydantic v2, extra="forbid"). The current generation is FSM_GRAPH_VERSION = 5 post-M11 (was 4 post-M10, added the cross-sell branch fields). On hydration, migrate_state_shape() back-fills defaults for fields added by newer milestones — the v3→v4 shim covers M10's COLLECT_PHONE (Tier-2 phone capture) and v4→v5 covers M11's cross-sell pair-reservation.

Cost discipline is built in: ADR-0005 sets a $0.05 soft / $0.20 hard per-booking ceiling (per-tenant configurable), tracked via a total_token_cost_usd field on the state and one of three signals that can fire a handoff trigger.

Inquiry intents (services, hours, other) are a parallel FSM path: the classifier routes them to the AnswerShaper rather than to a booking subgraph. Phase 0 knowledge snippets are injected into that path — see Knowledge answers for the full retrieval seam. This page owns the booking/cancel/reschedule FSM graphs; the inquiry path is handled entirely in dispatcher.py's else branch.

How it fits in the system

The FSM sits between the channel adapters (which decode webhooks into normalised messages) and the AnswerShaper (which renders the FSM's structured reply back into the active channel's wire format). All state — hot and warm — flows through two storage tiers.

Two boundaries to flag. First, tenant scope is not stored in BookingState — it's read from the current_tenant ContextVar at every layer below the FSM hydration boundary. This keeps a cross-tenant write from slipping past the contextvar firewall the rest of the codebase enforces. For the full ContextVar and schema-per-tenant design, see Identity and tenancy. Second, the per-tenant Postgres schema for checkpoints is named checkpoints_<slug> and is created during onboarding via PostgresSaver.setup() per ADR-0003.

How it flows

The booking happy path threads through eleven nodes. Tier-1 channels (WhatsApp, voice) skip COLLECT_PHONE entirely because the phone is metadata; Tier-2 (web, IG, Messenger) routes through it on first contact. After CONFIRM, the cross-sell branch is gated on the tenant's cross_sell_aggressiveness dial and the availability of an adjacent eligible slot — when either is absent, CONFIRM jumps straight to BOOKED preserving the M5/M6/M7 topology.

The BOOKED state is the booking FSM's terminal — at that moment the dispatcher's post-CONFIRM payment hook fires the M-Pesa STK push (per Payments), which runs as a separate state machine driven by PaymentState. A customer-initiated cancellation mid-flight transitions that payment FSM into PaymentState.CANCELLED_BY_CUSTOMER per ADR-0007, but the booking FSM itself has already terminated by then.

Reply enrichment (chips + confirmation recap)

The dispatcher renders the FSM's reply text and attaches two channel-aware enrichments on the way out (app/orchestrator/dispatcher.py):

Quick-reply chips. _derive_quick_replies(state) maps the FSM's current state to a small set of tappable {label, value} choices — staff names + "Anyone" at COLLECT_STAFF, one chip per offered time (label = the time, value = the ordinal) at COLLECT_SLOT / RESCHEDULE_COLLECT_SLOT, and Yes/No at CROSS_SELL_RESPONSE. They ride the reply envelope as quick_replies; the web widget renders them as a chip row and tapping sends value exactly as a typed reply would, so the parse path is unchanged. This is recognition over recall — the customer never types or recalls an identifier. Channels that can't render chips (voice) simply ignore the field; the reply text already states the same options.
Booking-confirmation recap. On the single-appointment CONFIRM → BOOKED transition (no bundle, no handoff) the dispatcher fetches the booked row and replaces the bare "confirmed" with a recap of what was booked — service, tenant-local time, staff, and price. It is channel-aware: text channels get a bullet recap, voice gets one spoken sentence (no markdown markers read aloud). A zero (quote-only) price omits the price line. When M-Pesa is live the payment hook overrides the recap with the STK ask.

The intent classifier

The classifier is a single bilingual prompt — one LLM call per turn — that routes each inbound message to one of: booking, cancel, reschedule, admin, services, hours, or other. It is not a router-of-routers (no multi-stage classification tree) to keep latency and token cost minimal.

Language confidence is a first-class output: when the classifier is uncertain whether the customer is writing Swahili or English (mixed-code messages are common in Nairobi), it falls back to the tenant's locale field rather than guessing. This prevents the classifier from routing a Swahili booking request to other just because it contains loanwords.

Failure budget (per ADR-0005): the classifier has a two-level budget:

Level	Threshold	Response
Soft alert	`>2%` of turns misclassified (per tenant, rolling 24h)	structured warning log; operator visibility
Hard alert	`>5%` of turns misclassified (per tenant, rolling 24h)	triggers handoff escalation review

Misclassification is measured against the DeepEval calibration set (see Testing). In production, the daily WhatsApp digest surfaces per-tenant classifier accuracy. See Observability for what to look for in the digest.

The LLMRouter routes the classifier call to GPT-4.1 mini by default (narrow task, low cost). Other roles:

Role	Default model	Used for
`intent_classifier`	GPT-4.1 mini	Per-turn intent routing
`answer_shaper`	Claude Haiku	Booking dialogue + inquiry answers
`handoff_summarizer`	Claude Haiku	Briefing-card generation on handoff
`reorientation`	Claude Haiku	Restoring conversation context after an admin interruption
`vision`	Anthropic Vision	Catalog OCR (image/PDF import)
`nl_admin`	GPT-4.1 mini	Admin NL command classification

All calls are tracked against the per-booking $0.05 soft / $0.20 hard cost ceiling via total_token_cost_usd on BookingState. When the soft ceiling is hit, the FSM emits a warning log; hitting the hard ceiling fires handoff signal d1 (cost) and cedes the conversation to the admin rail — see Admin orchestrator.

Thread identity and the conversation_threads pointer table

Every booking conversation gets a fresh ULID thread_id generated when the dispatcher enters _dispatch_locked for the first time on a new booking intent. The thread_id is the key for both Redis hot state (ratiba:<tenant_id>:thread:<thread_id>:state) and the Postgres checkpoint row in checkpoints_<slug>.checkpoints.

The conversation_threads pointer table (tenant_<slug>.conversation_threads) stores the live mapping from a customer's customer_id (or session_id for unverified Tier-2) to the current thread_id. This is how the dispatcher finds the right thread across turns — a WhatsApp customer sends three messages, and each picks up from the same checkpoint via this lookup.

When a booking FSM terminates (state BOOKED or LEAD_CAPTURED), the actor_type field on conversation_threads is reset to CUSTOMER so the next inbound message starts fresh. If the admin orchestrator is in ENGAGED state, actor_type = ADMIN and inbound customer messages are held until /handback fires.

For operators: if a customer claims their conversation "started over", the most common cause is a stale or missing conversation_threads row. From the runbook:

psql -h localhost -p 5434 -U ratiba -d ratiba \
  -c "SELECT thread_id, actor_type, updated_at FROM tenant_<slug>.conversation_threads \
      WHERE customer_id = '<customer_id>' ORDER BY updated_at DESC LIMIT 5;"

Two-tier persistence in depth

Redis holds the hot state — the BookingState JSON blob — under a prefixed key with a 24-hour TTL. Every FSM node read/write hits Redis directly; Postgres is not consulted on the hot path. The SETNX mutex key is separate: ratiba:<tenant_id>:thread:<thread_id>:lock, TTL 30s. If the lock expires (e.g. a worker crash mid-turn), the next inbound message re-acquires cleanly; the checkpoint in Postgres preserves the last fully committed state.

Postgres holds the durable checkpoint — the LangGraph PostgresSaver writes a row to checkpoints_<slug>.checkpoints at the end of each successfully committed node. The checkpoint schema is LangGraph-native: thread_id, checkpoint_id (ULID), type (full or diff), checkpoint (JSONB), metadata (JSONB including state_version). On FSM hydration, migrate_state_shape() reads the state_version field and back-fills missing keys before handing control to LangGraph.

Retention: rows older than 90 days move to checkpoints_<slug>.checkpoints_archive by the daily 3 AM EAT reaper. The archive has the same schema; queries for audit or replay can union both tables. See Payments for the reaper's full scope (it also sweeps payment_routing and handoff_log_archive).

The SETNX mutex

The per-thread mutex (SET NX EX 30) is the guard against double-processing when a customer sends two messages within the same second (double-tap) or when a WhatsApp delivery retry lands before the first processing is done.

Acquisition uses exponential backoff starting at 100ms with a ceiling of 10s. If the lock is not acquired within 10s, the FSM returns a bilingual failure response ("Samahani, jaribu tena" / "Sorry, please try again") and logs a mutex_acquisition_failed event. The 30s TTL is long enough that a stuck worker releases the lock automatically rather than blocking the customer forever.

For operators: if you see mutex_acquisition_failed in the structured logs, check whether a worker process is stuck in an LLM call (long tail latency from the provider). The mutex timing is calibrated so that a normal 5–8s LLM round-trip completes well within the TTL; a stuck call at the 30s boundary indicates a provider timeout, not a Ratiba bug. See Observability for the log filter.

Where it lives in code

Concern	File	Key entry point
FSM version constant	`app/orchestrator/state.py`	`FSM_GRAPH_VERSION = 5` (line 73)
State shape	`app/orchestrator/state.py`	`class BookingState` (line 292)
State migration shim	`app/orchestrator/state.py`	`migrate_state_shape()` (line 76)
Booking graph build	`app/orchestrator/booking_graph.py`	`build_booking_graph()` (line 1985)
Cross-sell node (M11)	`app/orchestrator/booking_graph.py`	`cross_sell_offer_node()` (line 1374)
`COLLECT_PHONE` routing	`app/orchestrator/booking_graph.py`	`needs_phone_capture()` (line 735)
Cancel graph	`app/orchestrator/cancel_graph.py`	full graph
Reschedule graph	`app/orchestrator/reschedule_graph.py`	full graph

Line numbers pinned per ADR-0011 D6; the architecture-index plugin's stale-pointer gate flags drift if the listed components are touched without bumping last_verified.

Decisions

The two ADRs that own this surface:

ADR-0003 — Two-tier persistence (Redis hot + Postgres LangGraph checkpoints); shared Redis with prefixed keys; fresh thread_id per booking via the conversation_threads pointer table; 90-day retention with per-tenant checkpoints_archive; per-thread Redis SETNX mutex (30s TTL, exponential backoff to 10s ceiling).
ADR-0005 — Single bilingual intent-classifier prompt; tenant-locale fallback for low language confidence; $0.05 soft / $0.20 hard per-booking cost ceiling (per-tenant configurable, contextvar-tracked); LLMRouter with role-based config; shallow 4-state AdminOrchestrator with 4h actor-type TTL; directory-per-tool MCP-shape registry with safety_class-driven confirmation gating.

Channel substrate — the adapters that feed inbound messages to dispatch_inbound_message; ChannelKind is metadata on BookingState
Knowledge answers — the inquiry path (services / hours / other) handled by the dispatcher's else branch; Phase 0 snippet injection into the answer shaper
Payments — the payment FSM that takes over after BOOKED; STK push, PAYMENT_CANCELLED_BY_CUSTOMER, and the 3 AM reaper that sweeps checkpoints_archive
Admin orchestrator — handoff trigger signals (d1–d5) that cede the conversation to the human owner; /handback to resume
Cross-sell — the CROSS_SELL_OFFER → BUNDLE_CONFIRM branch embedded inside the booking graph
Identity and tenancy — current_tenant ContextVar, schema-per-tenant, COLLECT_PHONE phone-binding mechanics
Testing — calibration set, DeepEval 4-tuple cache key, transcript replay
Observability — fsm.state_transition log events, mutex_acquisition_failed, classifier accuracy digest

Try this on local dev

The three steps below mirror the runbook_steps block in this page's frontmatter. Bring the stack up first (docker compose up -d) and have a tenant onboarded.

Watch hot-state writes live. Open one terminal and start a booking on the WhatsApp test number. In another terminal, watch Redis traffic for the active thread:
```
redis-cli -h localhost -p 6381 MONITOR | grep '<thread_id>'
```
You'll see the SETNX mutex acquire/release plus state-checkpoint key writes. Substitute the thread_id from tenant_<slug>.conversation_threads.

Inspect the durable checkpoint. After a booking lands, dump the persisted state from the per-tenant checkpoint table:

psql -h localhost -p 5434 -U ratiba -d ratiba \
  -c "SELECT thread_id, checkpoint_id, type, jsonb_pretty(metadata::jsonb) \
      FROM checkpoints_<slug>.checkpoints \
      ORDER BY checkpoint_id DESC LIMIT 5;"

The state_version field stamped by migrate_state_shape() will read 5 on any conversation initiated after M11 lands.

Tail FSM transitions. Backend logs every node entry and the resulting fsm_state field — useful when a conversation wedges. From the project root:
```
tail -f backend.uvicorn.log | grep -E 'fsm_state|FSMState|build_booking_graph'
```
You'll see the GREET → COLLECT_SERVICE → ... progression in real time. If you trigger a Tier-2 conversation (web widget at localhost:3010/widget), the COLLECT_PHONE node will appear; on a WhatsApp test number it will not.

What it does​

How it fits in the system​

How it flows​

Reply enrichment (chips + confirmation recap)​

The intent classifier​

Thread identity and the conversation_threads pointer table​

Two-tier persistence in depth​

The SETNX mutex​

Where it lives in code​

Decisions​

Related​

Try this on local dev​