Skip to main content

ADR-0009: Channel-Agnostic Conversation Substrate + Channel Tier Model

Status: Accepted Date: 2026-05-05 Supersedes: ROADMAP §"What this roadmap deliberately doesn't include" — the "no web booking widget" exclusion (lines 70-71 of docs/ROADMAP.md as of 2026-05-05).

Context

By the close of M9+1 (2026-05-05), the Ratiba codebase already contained a de facto channel-agnostic substrate even though the architecture had never been formally codified that way:

  • M5 landed the dispatcher (app/orchestrator/dispatcher.py) with a channel parameter propagating end-to-end.
  • M7's voice work demonstrated empirically that adding a channel was "mostly I/O plumbing on top of the channel-agnostic substrate" — the M7 close-out memo's exact phrasing.
  • AnswerShaper grew a _render_for_voice post-processor that strips markdown bullets, replaces URLs, and caps sentences — a channel-specific rendering rule riding on a single shared LLM prompt.
  • app/agents/identity_resolver.py exists as a stub anticipating cross-channel customer identity resolution.

Three independent product asks now force a formal architectural commitment:

  1. A web chat widget for tenants who want to capture leads on their own websites. Initially a hosted standalone link (widget.ratiba.chat/<tenant-slug>); eventually an embeddable script.
  2. Instagram DM bot to handle service/price inquiries that today pile up in the tenant's social inbox unanswered.
  3. Facebook Messenger bot for the same reason on Meta's other page-messaging surface.

The ROADMAP (docs/ROADMAP.md) had previously excluded a web widget on thesis grounds: "shipping a widget undermines [the AI-is-the-interface thesis] and dilutes the differentiator." On re-examination, that reasoning conflated two distinct things:

  • A booking form on a website — yes, a thesis violation. Customers click dropdowns instead of conversing.
  • A conversational widget that runs the same FSM as WhatsApp/voicemore thesis-aligned than a traditional booking form, because the customer still books by talking, just on a different surface.

The 2026 industry consensus on conversational AI (Salesforce, Zendesk, Intercom, Yellow.ai, Parloa) is that omnichannel ≠ multichannel:

  • Multichannel = separate bots per platform (loses context, drift, duplicated logic). Bad.
  • Omnichannel = one brain, many I/O surfaces, unified identity. This is what the Ratiba codebase is already doing, just informally.

The cost picture also matters. Per fully-handled conversation:

ChannelPlatform fees
Web widget$0 (our infra only)
WhatsApp$0 in-window / ~$0.01–0.02 utility template out-of-window (Kenya)
Instagram DMSame in-window economics; approved tags only out-of-window
Messenger DMSame as Instagram
Voice (LiveKit + Deepgram + ElevenLabs)~$0.10–0.20 / minute

The web widget is the cheapest fully-bidirectional conversational surface by an order of magnitude vs. voice and meaningfully cheaper than out-of-window WhatsApp. This isn't just a feature; it's an operational lever for cost-conscious East African SMB pilots (per project_cost_discipline.md memory).

Decisions

D1. The substrate thesis (one-sentence commitment)

The conversation substrate is channel-agnostic; channels are I/O adapters with capability flags and identity-resolution semantics, not first-class branches in the FSM.

This formalises what M5/M7 already implemented. Concretely:

  • dispatch_inbound_message(channel: ChannelKind, ...) is the single entry point regardless of inbound surface.
  • The booking, cancel, and reschedule graphs from M5 run identically across all channels — only one new FSM node is added (D5).
  • A new Channel protocol (D2) governs all I/O adapters.
  • Per-channel rendering rules (D6) are declarative, not branched.

Future contributors shall not add channel-specific branches to the dispatcher or FSM graphs. New channels add a row to the registry, an adapter class, and a rendering rule — nothing else.

D2. Channel protocol + ChannelKind enum

A new app/channels/_base/protocol.py module defines:

class ChannelKind(StrEnum):
WHATSAPP = "whatsapp"
VOICE = "voice"
WEB = "web"
INSTAGRAM = "instagram"
MESSENGER = "messenger"

class Tier(StrEnum):
TIER1 = "tier1" # phone known from message metadata
TIER2 = "tier2" # phone captured progressively (see D3)

class Channel(Protocol):
kind: ChannelKind
tier: Tier
identity_anchor: Literal["phone", "session"]
session_window_seconds: int | None # None = no window restriction
supports_payment_inline: bool
supports_rich_media: bool

async def inbound_handler(self, payload: dict, tenant_id: UUID) -> None: ...
async def outbound_send(self, customer_id: UUID, text: str) -> SendResult: ...
def is_within_session_window(self, thread: ConversationThread) -> bool: ...

A registry at app/channels/_base/registry.py maps ChannelKind → adapter instance. Existing app.channels.whatsapp and app.voice are wrapped in thin adapter classes (see D7) — zero behavioural change.

D3. Two-tier channel model

The Channel protocol's tier field is prescriptive, not descriptive:

CapabilityTier-1Tier-2
Identity anchorphone (from message metadata)session (cookie / page-scoped user ID)
Session windowNone (always open)24h hard window (IG/Messenger Meta policy) or cookie-bounded (web)
Payment STK pushDirect — phone known turn 1Gated on phone capture via D5's COLLECT_PHONE
Outbound postureReactive + proactiveReactive-only outside an active session window
MembersWHATSAPP, VOICEWEB, INSTAGRAM, MESSENGER

There is no Tier-3. A channel that doesn't fit one of these tiers does not ship. Tiering is per-channel-class, not per-tenant — a tenant cannot promote Instagram to Tier-1 by configuration because Meta's API constraints don't change with willingness to pay.

D4. Cross-channel identity model

Phone number is the only deterministic cross-channel join key.

Every other identifier (web cookie, IG PSID, Messenger PSID) is a tenant-scoped alias that binds to a customer once a phone is captured. Probabilistic matching (IP, fingerprint, behavioural) is explicitly prohibited — GDPR landmine, marginal accuracy, M-Pesa already requires phone for paying flows so phone-only is sufficient.

Schema (lives in tenant schema per ADR-0002, not public):

<tenant>.customers -- customer_id (UUID PK), phone_e164 NULLABLE,
name, dob NULLABLE, language_pref
<tenant>.customer_identities -- (customer_id, provider, external_id) — many-to-one
provider ∈ {'whatsapp','voice','web_cookie',
'instagram','messenger'}
<tenant>.customer_sessions -- anonymous landing sessions; promoted into
customers row at phone-capture time

The identity_resolver module at app/agents/identity_resolver.py extends with bind_phone(session_id, phone_e164) which:

  1. Looks up customers WHERE phone_e164 = :phone.
  2. Hit: merges session into existing customer (preserves cookie/PSID linkage in customer_identities); prior conversation history becomes available to the agent as context.
  3. Miss: upserts new customer, copies session metadata, marks session as promoted.

A returning Tier-2 visitor (same cookie) skips identity capture entirely because the cookie already maps to a customer record via customer_identities WHERE provider='web_cookie' AND external_id=cookie_id.

D5. COLLECT_PHONE FSM entry-state for Tier-2

A new node in booking_graph.py runs before COLLECT_SLOT, gated by:

needs_phone = channel.tier == TIER2 and customer.phone_e164 is None

Tier-1 channels skip it (phone known from metadata). Returning Tier-2 customers skip it (phone on file via cookie/PSID identity).

Failure modes:

  • Invalid format (E.164 reject) → re-prompt up to 3 times in customer's language.
  • Three-strike refusal or persistent invalid → graceful fallback to the lead-capture branch: agent says "No problem — we'll continue this on WhatsApp. Reach us at [tenant.whatsapp_display_number]" and ends the FSM thread. The session is recorded in the tenant dashboard with an unverified=true flag for manual follow-up.

LangGraph saver compatibility: adding COLLECT_PHONE is a graph-shape change → bumps the FSM's __version__ constant → existing in-flight threads with old state shapes get a one-time migration shim (if state.version < 4 and channel.tier == TIER2: insert COLLECT_PHONE).

D6. AnswerShaper declarative rendering rules

M7's _render_for_voice post-processor generalises to _render_for_channel(channel, raw_text). Rules are a declarative table keyed on ChannelKind:

ChannelRules
whatsappNo-op (raw markdown — WhatsApp renders bold + italic natively)
voiceExisting M7 rules: bullet-strip + URL-replace + 2-sentence-cap
webPass-through markdown; React widget renders via react-markdown
instagram1000-char hard cap (Meta limit); strip URL preview metadata; bullet → newline
messengerSame as Instagram

New channels add a row to the table. They do not add a function or branch in code.

D7. Existing channels are not relocated

app.channels.whatsapp and app.voice stay where they are. The Channel protocol is implemented by thin adapter classes (WhatsAppChannel, VoiceChannel) that delegate to the existing tested code. Zero behavioural change to M4/M7 surfaces.

This is a deliberate cost-discipline choice: refactoring well-tested code for cosmetic uniformity is exactly the "premature abstraction" anti-pattern the methodology warns against.

D8. Phone validation: STK push as the oracle, not SMS OTP

Phone validation stack:

  1. E.164 format check via phonenumbers library — free, catches typos.
  2. M6 reservation lock (15-min Redis SETNX on slot) — free, anti-squat.
  3. STK push as authoritative validator — wrong phone → STK fails → ADR-0007's PAYMENT_FAILED state recovers the slot. Free for paying flows.
  4. Tenant dashboard "unverified" flag for lead-capture-only sessions that never reach payment.

SMS OTP is explicitly not in the M10 default flow. Justification:

  • Africa's Talking SMS in Kenya costs ~$0.003 per OTP send — small but per-booking, compounds at pilot scale.
  • The M-Pesa STK push is a stronger validator than OTP for paying flows (OTP only proves SIM possession; STK proves M-Pesa account ownership and phone validity simultaneously).
  • The threat model is typos and spam-bookings, not account takeover.

When SMS OTP becomes worth adding (defer to M11+):

  • Tenants with no-show fees (dental clinics charging deposits).
  • Lead-capture-only tenants who want pre-validated phone leads.

For both: opt-in per-tenant flag phone_otp_required BOOLEAN. Not in M10.

D9. Channel-switch primitive (bidirectional)

Tenants may opt into channel steering (per-tenant flag channel_steering_enabled BOOLEAN DEFAULT FALSE).

Tier-2 → Tier-1 (lead-capture pivot): When a web/IG/Messenger visitor abandons or refuses phone, the FSM emits a channel_switch directive that renders as a one-tap link (https://wa.me/{tenant.whatsapp_display_number}?text=Continuing+from+web). The dispatcher pre-stages a single-use customer_sessions.handoff_token (24h TTL) so the next inbound WhatsApp message carrying the token in its first turn resumes the conversation context.

Tier-1 → Tier-2 (cost optimisation): The agent may elect to send a deep-link reply (widget.ratiba.chat/{slug}?resume=<token>) when a rich-media response would be awkward in WhatsApp text, OR when the 24h WhatsApp window is about to close. The customer taps the link, lands on the widget, conversation history loads server-side via the resume token.

Conversation thread semantics (re-affirms ADR-0003): channel-switch = same customer, fresh thread on the new channel, agent context-loaded with prior conversation as background. Not a single thread spanning channels — that breaks LangGraph's saver invariants.

D10. Outbound reminder routing rule

Reminders (day-before appointment, post-booking utility messages) follow this priority order:

def route_outbound_reminder(customer, message):
if customer.has_whatsapp and channel.is_within_session_window(customer.thread):
send_via_whatsapp(...) # FREE in-window
elif customer.has_phone:
send_via_sms(...) # ~$0.003/msg via Africa's Talking (Kenya)
else:
log.warning("no reachable channel"); skip

SMS is a NotificationSink, not a Channel. Outbound-only, doesn't enter the FSM, lives at app/notifications/sms.py. Customers replying to an SMS reminder don't hit the dispatcher (replies go nowhere or to the tenant's manual SMS inbox if they care). This keeps the Channel abstraction clean (channels = bidirectional conversational surfaces).

The Africa's Talking integration already exists for voice (M7's SIP bridge) — adding outbound SMS is a one-API-key extension, not a new vendor.

D11. Per-tenant credentials + project-level webhook secrets

Mirrors ADR-0008's WhatsApp pattern. New columns on public.tenants:

  • instagram_business_account_id TEXT (nullable until provisioned)
  • instagram_access_token TEXT (encrypted at rest)
  • instagram_enabled BOOLEAN DEFAULT FALSE
  • facebook_page_id TEXT
  • facebook_page_access_token TEXT
  • messenger_enabled BOOLEAN DEFAULT FALSE
  • widget_primary_color TEXT DEFAULT '#0F766E'
  • widget_logo_url TEXT NULLABLE
  • widget_display_name TEXT NULLABLE (falls back to tenants.name)
  • channel_steering_enabled BOOLEAN DEFAULT FALSE
  • phone_otp_required BOOLEAN DEFAULT FALSE (for D8 future-opt-in)
  • reminder_lead_seconds INT DEFAULT 86400 (24h before appointment)

Webhook authentication: two new project-level secrets alongside the existing WHATSAPP_APP_SECRET:

  • INSTAGRAM_APP_SECRET — HMAC key for /api/v1/webhooks/instagram.
  • MESSENGER_APP_SECRET — HMAC key for /api/v1/webhooks/messenger.

Separate apps yield cleaner Meta permission scopes during initial provisioning. They can be merged later if Meta App configuration permits; ADR-0009 does not preclude consolidation.

Per-tenant routing happens after signature validation, by reading the Meta entity ID from the trusted payload (IG Business Account ID for Instagram; Page ID for Messenger) and looking up the corresponding row in public.tenants.

D12. Anti-abuse posture for the web widget

The widget is exposed without authentication (anonymous visitors must be able to read it before they're known). Risk surface + mitigations:

RiskMitigation
DoS via WS connection spamPer-IP connection rate limit: 10/min
Cost abuse via message spam (LLM tokens)Per-session message rate limit: 30 turns/hour while anonymous, unlimited post-phone-capture; ADR-0005 per-tenant cost ceiling already enforced via contextvar
Fake-phone bookingsStructurally bounded — STK push goes to entered phone; wrong number = no payment = no booking + slot auto-released by M6 reservation lock
Cross-tenant injectionRoutes are tenant-scoped from URL onwards (widget.ratiba.chat/{slug} and /api/v1/channels/web/ws/{slug}); cookie name is tenant-prefixed (rtb_widget_session_<slug>)

Consequences

Positive

  • Adding new channels in the future (TikTok DM if Meta-style API matures, Telegram, RCS) becomes a one-row-plus-adapter exercise rather than a dispatcher refactor.
  • Tenants with web traffic get lead capture and full booking on the cheapest channel — no platform fees.
  • Channel-switch primitive lets tenants steer customers between WhatsApp and web based on conversation needs (rich media, cost windows).
  • The voice channel (most expensive per minute) gains a "continue on web" escape hatch for customers willing to switch surfaces.
  • Cross-channel customer history (same person on IG today, WhatsApp tomorrow) gives the agent richer context without breaking thread isolation.

Negative

  • New schema migrations are required — three new tenant tables + ~12 new columns on public.tenants. Wave 0 of M10 carries this work.
  • Two new Meta App secrets to manage in production deployment configuration.
  • The widget's frontend route lives in frontend/app/widget/[tenant_slug]/ alongside the M9 admin dashboard — same Next.js app, same dev server. Frontend dev-server collision risk during M10 dispatch is low (different routes) but worth noting.
  • Cookie consent / privacy posture for the web widget needs a clear GDPR baseline (cookies are session-functional, not tracking — should qualify as "essential" under GDPR but the determination warrants a privacy ADR before pilot scale).

Reversibility

The substrate is reversible by design:

  • Disabling a channel = setting <channel>_enabled = FALSE per tenant or removing the channel from the registry globally. No FSM impact.
  • Removing the entire channel-agnostic refactor would require unwinding the Channel protocol and inlining channel-specific branches back into the dispatcher — but this would be reverting a codification, not an irreversible architectural commitment. The substrate already existed informally before this ADR.

Open questions deferred to future ADRs

  • ADR-0010 (potential): Privacy and cookie consent posture. Cookies on the web widget are session-functional, not tracking, and should qualify as "essential" under GDPR — but this needs a clear ADR before pilot scale. Includes: cookie banner UX, retention policy, customer data export/deletion flows, cross-tenant data isolation in identity table.
  • ADR-0011 (potential): Phone-OTP gating — when SMS OTP becomes worth the per-message cost (no-show-fee tenants, lead-capture-only tenants). Will codify the threat model + cost trade-off.
  • ADR-0012 (potential): Widget embeddable script (Phase 2). Once pilot validates the standalone-link Phase 1, the embeddable <script> tag pattern (iframe injection, postMessage lifecycle, cross-domain cookie strategy) needs its own architectural commitment.

Implementation

Land in M10 — Channel-Agnostic Tier-2 Expansion. Plan at docs/superpowers/plans/2026-05-05-m10-channel-agnostic-tier2-expansion.md. 12 tasks across 5 waves; Wave 2 = 4-way parallel (web + IG + Messenger

  • SMS sink); estimated +80–100 backend tests + +15–25 frontend Vitest post-M10.

Pre-flight: M9+1 must be COMPLETE per STATE.md before Wave 0 dispatch. (Already satisfied as of 2026-05-05.)