ADR-0010: Tenant Self-Service Customisation

Status: Accepted Date: 2026-05-06

Context

By the close of M10 (2026-05-06), Ratiba's customer-facing pipeline runs on a single channel-agnostic substrate across five surfaces: WhatsApp, voice, web widget, Instagram DM, Messenger DM. Every booking flows through one FSM, one identity resolver, one set of prompts. The remaining gap before pilot deployment is everything behind the agent:

Tenants cannot self-onboard their service catalog. New tenants hand their menu / price list to Adrian, who runs SQL inserts by hand. The bottleneck on tenant-acquisition velocity is now the founder's keyboard.
Every Ratiba tenant runs the same agent personality. The dental clinic and the spa deliver indistinguishable greetings, cancellation tone, and upsell behaviour. Tenants who try the product cannot make it feel theirs.
There is no cross-sell motion. Customer books a manicure; the pedicure slot ten minutes later sits empty. The agent never asks.

Each gap has a different shape, but they share a thesis: tenants should be able to customise the agent — its catalog, its personality, its sales motion — without ever opening a developer console. Where information density permits, this happens through the same conversational interfaces customers use; where bulk data entry is unavoidable (multi-axis dial config, image-based catalog onboarding), form-based UIs ship alongside.

The 2026 industry consensus on conversational AI for SMBs converges on curated tunables, not free-form prompts: the SMB owner who can articulate "warm, not formal" cannot debug a system message. Free-text overrides demand a moderation pipeline; for a pilot scoped to East African SMBs, that pipeline is unnecessary plumbing.

The cost picture also matters. Catalog onboarding via Anthropic Vision is ~$0.01–0.03 per image — a one-time cost per tenant. Cross-sell adds ~$0 marginal per conversation when implemented as a SQL relation lookup against an LLM-inferred graph built once at onboarding. This ADR commits to designs that keep these costs structurally bounded.

Decisions

D1. Three-prong scope

M11 ships three distinct customisation surfaces, each with its own shape but a shared thesis.

Bulk catalog onboarding — tenants upload an existing menu / price list (image, multi-image, PDF, plain text, CSV). A vision LLM extracts services + a second LLM pass infers complementary / alternative / sequential relations. Admin reviews + commits. Bulk INSERT into <tenant>.services and <tenant>.service_relations.
Conversational steady-state — slash commands + natural language for catalog edits, stats queries, dial tweaks. Reuses M9's AdminCommandHandler infrastructure with new commands. Both slash and NL paths converge on the same CommandOutcome shape; confirmation gating per ADR-0005 safety_class is preserved on every mutation regardless of entry mode.
Per-tenant agent personality + sales-strategy tuning — six curated dials (Tone / Greeting / Upsell / Cancellation tone / Honorific / Cross-sell) with per-vertical defaults and tenant overrides. Single dashboard page at /admin/personality with save-button per section.

D2. Curated-dial-only personality v1

Personality customisation in v1 ships as curated dials only — no free-text overrides.

The expressiveness trade-off:

Approach	Risk	Expressiveness	M11 verdict
Curated dials only (v1)	Low — every combo is enumerable + testable	Limited — bounded combinations	SHIP
Curated dials + sandboxed free-text	Medium — moderation pipeline needed	Substantially higher	Defer to v2 if pilot demands
Full prompt CMS	High — tenants can break or weaponise their agent	Maximal but unused by SMBs	Wrong scale for SMB pilot

ADR-0005's "single bilingual intent-classifier prompt" commitment is preserved — M11 doesn't fork prompts per tenant. Instead, prompts gain user-template variables that read from tenant config. The system message stays single-source and single-cached (D9).

D3. The six dials — value sets

Dial	Values	Default-source
Tone	`warm` / `professional` / `playful`	per-vertical YAML
Greeting	`default-bilingual` / `custom-string` (≤140 chars)	per-vertical YAML
Upsell	`never` / `suggest-once-after-confirm` / `suggest-during-slot-collection`	per-vertical YAML
Cancellation tone	`forgiving` / `neutral` / `firm`	per-vertical YAML
Honorific	`first-name` / `formal-en` (Mr/Ms/Mx) / `formal-sw` (Bwana/Bibi)	per-vertical YAML
Cross-sell	`never` / `related-only` / `full-suggest`	per-vertical YAML

Dial 3 (Upsell) gates Dial 6 (Cross-sell) compositionally: when Upsell = never, cross-sell never fires regardless of Cross-sell value. Validation of dial values lives at the application layer (PersonalityConfig, M11 T1) so that YAML-vs-DB drift fails loudly at load time rather than silently producing malformed prompts.

D4. Per-vertical defaults via YAML — no DB defaults table

Vertical defaults ship as a hardcoded YAML file at app/prompts/personality_defaults.yaml. There is no public.personality_vertical_defaults table.

Walkthrough Q4 weighed three options:

YAML only — single source of truth, ships with code, version-controlled, no drift risk.
DB defaults table seeded from YAML — SQL-queryable, but introduces a drift risk between personality_defaults.yaml and the seeded snapshot.
Hybrid: YAML source + seeded snapshot for SQL queryability — solves drift via "rebuild snapshot at deploy" but adds operational complexity for a query path no caller in v1 actually exercises.

Adrian chose option 1. Pure YAML, loaded into a PersonalityConfig contextvar cache at request time. The 8 canonical verticals (barbershop, dental, legal, medical, physio, salon, spa, tutoring) each get a block in the YAML; a tenant's vertical selects its block at config-merge time. Tenant overrides land in the new <tenant>.tenant_personality_config table (D10); NULL columns inherit the YAML default.

Vertical names match the public.tenants.vertical CHECK constraint verbatim — physio not physiotherapy, medical not medical-clinic.

D5. LLM-inferred relation graph at onboarding

A second LLM pass at catalog import time infers complementary, alternative, and sequential relations between services, stored deterministically in <tenant>.service_relations.

Three approaches considered:

Manual — admin pins every pair. Friction-laden for pilot tenants with 30+ services.
Runtime LLM — infer relations on every cross-sell decision. Token-cost prohibitive at scale; defeats cost discipline.
Inferred at onboarding (chosen) — one-shot LLM pass at import time produces a deterministic graph; runtime cross-sell is a SQL lookup against that graph.

The graph stores three relation types:

complementary — used by v1 cross-sell (manicure ↔ pedicure).
alternative — populated for future slot-collection disambiguation; unused in v1.
sequential — populated for future reminder/upsell campaigns ("your hair appointment was 4 weeks ago"); unused in v1.

Each relation carries a confidence (0.0–1.0) and a free-text reason (LLM rationale). Source is one of llm_inferred, tenant_pinned, or association_rule. Post-pilot, a nightly χ²-significance miner over real booking data augments the graph with source='association_rule'; tenant-pinned relations always win at lookup.

Idempotency: re-running inference on the same catalog yields the same pairs (LLM determinism via temperature=0).

D6. Slot-aware cross-sell with provisional atomic pair-reservation

Cross-sell offers a complementary service only when an adjacent slot is available, locks both slots atomically while the customer decides, and releases atomically on decline / timeout.

Mechanics locked in v1:

Decision	v1 commitment
Adjacency window	Same-visit only: candidate slot within ±15 min of primary
Staff swap	Allowed (different specialist for nails vs face is the typical salon pattern)
Pricing	Simple sum (no bundle discounts in v1)
Cascade depth	Cap at 1 cross-sell per booking conversation
"Yes" arrives but slot taken	Provisional atomic pair-reservation extends M6 SETNX from one slot to ordered pair; on decline / timeout, release

The pair-reservation extends M6's existing Redis SETNX primitive: a new reserve_pair(primary_slot, secondary_slot, ttl_seconds=30) acquires both slots atomically (or rolls back fully) and a paired release_pair(token) releases both.

The FSM extension introduces three new nodes — CROSS_SELL_OFFER, CROSS_SELL_RESPONSE, BUNDLE_CONFIRM — between CONFIRM and PAYMENT_PENDING. When find_cross_sell_options() returns no candidates, the FSM transitions straight to PAYMENT_PENDING, preserving M5/M10 behaviour for catalogs without inferred relations.

D7. Vision LLM via Anthropic; never auto-fill price (safety floor)

M11 adds a vision role to LLMRouter, routed exclusively through Anthropic's vision-capable Sonnet model. The single non-negotiable safety floor: the LLM never auto-fills a service's price.

Why Anthropic for v1:

Single-call structured extraction (image → JSON) — no separate OCR step.
Native bilingual handling (en/sw mixed) outperforms Tesseract+LLM stacks on Swahili.
Cost: ~$0.01–0.03 per image. Negligible for once-per-tenant onboarding.
LLMRouter (M5) already routes Anthropic; adding a vision role is a one-config change with zero new vendor surface.

Reversibility: the VisionExtractor.extract(image, prompt) → list[ExtractedRow] interface is provider-agnostic. If Anthropic Vision becomes unavailable or pricing shifts, swap to GPT-4 Vision via OpenAI direct without disturbing callers.

A new project Setting ANTHROPIC_VISION_MODEL (default claude-3-5-sonnet-20241022) pins the model. Revisit if Anthropic ships vision-specific Opus / Haiku variants.

The price safety floor is hard:

If vision extraction returns confidence(price) < 0.5, the review screen marks the row red and disables the bulk-accept button for that row until the admin types the price manually.
The LLM gap-filling pass that suggests defaults for missing duration_min / name_sw / description must not suggest a price under any circumstances.

Rationale: M-Pesa STK push operates on the entered price. A hallucinated price is a direct customer-trust bomb and a refund scenario the platform absorbs.

D8. Idempotent catalog imports with audit trail

Re-uploading the same catalog matches existing rows by (name, tenant_id) and updates rather than duplicates. Every import produces an audit row tagged type='re-import' for matched rows.

Tenants will re-upload menus to refresh prices, add seasonal items, or correct OCR errors. Without idempotency, every upload doubles the catalog. The match key is (name, tenant_id) rather than (name, vertical, tenant_id) because vertical is immutable on tenants — the tenant_id alone scopes uniqueness.

Audit rows land in <tenant>.catalog_imports (the import-level record) and <tenant>.catalog_audits (per-row edit trail). Wave 0 T0 ships both tables alongside the schema changes.

D9. Prompt-template parameterisation via user-template variables

Dial values are spliced into prompts as user-template variables, NOT system messages. This preserves Anthropic prompt-cache eligibility on every cache_eligible prompt.

ADR-0005's single-prompt commitment requires a stable system message across every call to a given role. Anthropic's prompt cache keys on exact-string match of the system message; per-tenant variation in the system message would produce a cache-miss per tenant per call.

The chosen mechanism: extend LLMRouter.complete() with a tenant_personality: dict | None = None parameter. When provided, dial values render into the user-template variables that the existing template-rendering layer already consumes:

[answer_shaper user_template]
Tenant: {tenant_name}
Personality: {personality_directive}   # e.g., "warm and friendly, occasional emoji"
Cancellation tone: {cancellation_tone_directive}
Intent: {intent}
...

cache_eligible: true system messages stay byte-identical across tenants. Test (T2 unit suite) asserts system-message stability across tenants and lints prompt YAML files for absence of dial-variable references in the system_message: block.

D10. Tenant-level dial scope only (no per-customer-segment in v1)

Dial values apply at the tenant level only. There is no per- customer-segment differentiation in v1 — VIP recognition, returning-customer perks, demographic-driven tone shifts all defer.

The dial config table is structurally a singleton per tenant: <tenant>.tenant_personality_config enforces this via a partial unique index on a constant boolean (is_singleton). The shape permits future expansion to multi-row "segment override" rows without breaking the v1 contract — a future ADR can lift the singleton constraint and add a segment-key column.

Pilot data collection during M12+ informs whether segmentation is worth the parsing complexity.

D11. Audit retention 90 days (matching ADR-0006)

Both <tenant>.catalog_audits and <tenant>.dial_audits ship with 90-day retention parity to ADR-0006's handoff_log retention.

The daily 3 AM EAT consolidated reaper from ADR-0007 D5 extends to sweep both new tables. Vertical-specific override (medical / dental clinics with longer regulatory retention) is deferred to a future compliance-focused ADR; M11 ships the wide-net 90-day default.

D12. Single-snapshot rollback for bulk imports (7-day window)

<tenant>.catalog_imports.snapshot_jsonb carries the catalog state as it existed immediately before the import. A one-click "undo last import" rollback is supported within a 7-day window post-import.

Pilot tenants will accidentally upload the wrong PDF, miscalibrate the review-screen bulk-accept, or want a "back-out" path the day after a price update. The 7-day window matches typical "review my business changes weekly" cadence; older rollbacks become support tickets. The snapshot is bounded by the per-tenant catalog size (typically <100 services × <1KB JSON each = <100KB) — no separate storage tier needed.

D13. LLM tool-call confirmation timeout: 5 minutes → silent rollback

Mutation tool calls (slash or NL) ask "yes/no". If the admin walks away without responding, the pending mutation is silently rolled back after 5 minutes. The audit row records reason='timed_out'.

The timeout matches typical "I'll be right back" admin-context duration; longer leaves stale mutations dangling and risks confused state on resume. Silent rollback (rather than nag-prompt) keeps admin attention focused on whatever they actually returned to do.

Reversibility

Every D1–D13 commitment is reversible by design:

D1 three-prong scope — disabling a prong = removing the feature flag (catalog onboarding can be hidden behind the dashboard's /admin/catalog route; conversational steady-state reuses M9's existing handler with new commands removable individually; personality dials default to vertical defaults if the table is empty).
D2 curated dials only — adding free-text overrides in v2 is additive; the curated-dial table doesn't preclude a sibling tenant_personality_overrides table.
D3 dial value sets — the application layer owns enum validation (per Q4 resolution); adding a new dial value is a one-line YAML + one-line PersonalityConfig change.
D4 YAML defaults — replacing with a DB table is additive (seed-from-YAML, drop YAML).
D5 inferred relations — the relation graph is a flat table; rebuilding from scratch on demand is supported by the RelationInferrer interface.
D6 cross-sell — disabling cross-sell tenant-wide = setting Dial 3 (Upsell) to never or Dial 6 (Cross-sell) to never. No FSM impact; the cross-sell-options query returns [] and the FSM transitions straight through.
D7 vision provider swap — VisionExtractor interface isolates Anthropic; swapping to GPT-4 Vision is one-adapter work.
D8 idempotency — re-import-as-update is a SQL semantic; a future "always-insert" mode is configurable per import call.
D9 user-template variables — preserving prompt-cache eligibility was the design driver; the mechanism is the reversible default.
D10 tenant-level scope — adding segment-level dials lifts the singleton constraint; existing rows remain valid.
D11–D13 — all numeric thresholds (90-day retention, 7-day rollback, 5-minute confirm timeout) live as table constants or per-tenant config columns; tuning is a one-config change.

Cross-references

ADR-0005 (Orchestration model) — single-prompt commitment preserved via D9's user-template-variable mechanism. The cache_eligible: true system messages stay byte-identical across tenants.
ADR-0006 (Handoff model) — 90-day audit retention parity for catalog_audits + dial_audits (D11).
ADR-0007 (Payments orchestration) — combined-bundle STK push for cross-sell BUNDLE_CONFIRM extends the existing M-Pesa STK primitive; daily reaper sweeps the new audit tables (D11).
ADR-0009 (Channel-agnostic conversation substrate) — M11 builds on the substrate without modifying it. Dial-driven prompt variation works identically across all five channels because rendering is post-LLM (per-channel rules apply downstream of the dial-aware prompt).

Implementation

Land in M11 — Tenant Self-Service Customisation. Plan at docs/superpowers/plans/2026-05-05-m11-tenant-self-service-customisation.md. Companion design doc at docs/plans/2026-05-05-m11-tenant-self-service-design.md.

15 task commits across 5 waves; Wave 2 = 5-way parallel; estimated +119 backend + +22 frontend Vitest post-M11.

Pre-flight: M10 must be COMPLETE per STATE.md before Wave 0 dispatch. (Already satisfied as of 2026-05-06.)

Open questions deferred to future ADRs

Free-text personality overrides + moderation pipeline — pilot data dictates whether the bounded curated-dial surface suffices. Future ADR territory.
Per-customer-segment dials — VIP / returning-customer / demographic-driven tuning. Defer to post-pilot.
Vertical-specific audit retention overrides — medical / dental clinics with regulatory retention requirements above the 90-day default. Future compliance ADR.
Bundle pricing / discount rules — cross-sell uses simple-sum pricing in v1. Tenant-defined bundle discounts deferred to feature work.
Sequential cross-sell campaigns — the sequential relation type is populated at onboarding but unused in v1 cross-sell. Reminder/upsell campaigns over sequential relations are future feature work.
OCR fallback — Anthropic Vision is sole provider in v1. If vision pricing or availability shifts, the provider-agnostic interface enables a swap; a fallback chain (Anthropic → OpenAI → Tesseract) would be future ADR territory.

Context​

Decisions​

D1. Three-prong scope​

D2. Curated-dial-only personality v1​

D3. The six dials — value sets​

D4. Per-vertical defaults via YAML — no DB defaults table​

D5. LLM-inferred relation graph at onboarding​

D6. Slot-aware cross-sell with provisional atomic pair-reservation​

D7. Vision LLM via Anthropic; never auto-fill price (safety floor)​

D8. Idempotent catalog imports with audit trail​

D9. Prompt-template parameterisation via user-template variables​

D10. Tenant-level dial scope only (no per-customer-segment in v1)​

D11. Audit retention 90 days (matching ADR-0006)​

D12. Single-snapshot rollback for bulk imports (7-day window)​

D13. LLM tool-call confirmation timeout: 5 minutes → silent rollback​

Reversibility​

Cross-references​

Implementation​

Open questions deferred to future ADRs​