ADR-0010: Tenant Self-Service Customisation
Status: Accepted Date: 2026-05-06
Context
By the close of M10 (2026-05-06), Ratiba's customer-facing pipeline runs on a single channel-agnostic substrate across five surfaces: WhatsApp, voice, web widget, Instagram DM, Messenger DM. Every booking flows through one FSM, one identity resolver, one set of prompts. The remaining gap before pilot deployment is everything behind the agent:
- Tenants cannot self-onboard their service catalog. New tenants hand their menu / price list to Adrian, who runs SQL inserts by hand. The bottleneck on tenant-acquisition velocity is now the founder's keyboard.
- Every Ratiba tenant runs the same agent personality. The dental clinic and the spa deliver indistinguishable greetings, cancellation tone, and upsell behaviour. Tenants who try the product cannot make it feel theirs.
- There is no cross-sell motion. Customer books a manicure; the pedicure slot ten minutes later sits empty. The agent never asks.
Each gap has a different shape, but they share a thesis: tenants should be able to customise the agent — its catalog, its personality, its sales motion — without ever opening a developer console. Where information density permits, this happens through the same conversational interfaces customers use; where bulk data entry is unavoidable (multi-axis dial config, image-based catalog onboarding), form-based UIs ship alongside.
The 2026 industry consensus on conversational AI for SMBs converges on curated tunables, not free-form prompts: the SMB owner who can articulate "warm, not formal" cannot debug a system message. Free-text overrides demand a moderation pipeline; for a pilot scoped to East African SMBs, that pipeline is unnecessary plumbing.
The cost picture also matters. Catalog onboarding via Anthropic Vision is ~$0.01–0.03 per image — a one-time cost per tenant. Cross-sell adds ~$0 marginal per conversation when implemented as a SQL relation lookup against an LLM-inferred graph built once at onboarding. This ADR commits to designs that keep these costs structurally bounded.
Decisions
D1. Three-prong scope
M11 ships three distinct customisation surfaces, each with its own shape but a shared thesis.
- Bulk catalog onboarding — tenants upload an existing menu /
price list (image, multi-image, PDF, plain text, CSV). A vision
LLM extracts services + a second LLM pass infers complementary
/ alternative / sequential relations. Admin reviews + commits.
Bulk INSERT into
<tenant>.servicesand<tenant>.service_relations. - Conversational steady-state — slash commands + natural
language for catalog edits, stats queries, dial tweaks. Reuses
M9's
AdminCommandHandlerinfrastructure with new commands. Both slash and NL paths converge on the sameCommandOutcomeshape; confirmation gating per ADR-0005 safety_class is preserved on every mutation regardless of entry mode. - Per-tenant agent personality + sales-strategy tuning — six
curated dials (Tone / Greeting / Upsell / Cancellation tone /
Honorific / Cross-sell) with per-vertical defaults and tenant
overrides. Single dashboard page at
/admin/personalitywith save-button per section.
D2. Curated-dial-only personality v1
Personality customisation in v1 ships as curated dials only — no free-text overrides.
The expressiveness trade-off:
| Approach | Risk | Expressiveness | M11 verdict |
|---|---|---|---|
| Curated dials only (v1) | Low — every combo is enumerable + testable | Limited — bounded combinations | SHIP |
| Curated dials + sandboxed free-text | Medium — moderation pipeline needed | Substantially higher | Defer to v2 if pilot demands |
| Full prompt CMS | High — tenants can break or weaponise their agent | Maximal but unused by SMBs | Wrong scale for SMB pilot |
ADR-0005's "single bilingual intent-classifier prompt" commitment is preserved — M11 doesn't fork prompts per tenant. Instead, prompts gain user-template variables that read from tenant config. The system message stays single-source and single-cached (D9).
D3. The six dials — value sets
| Dial | Values | Default-source |
|---|---|---|
| Tone | warm / professional / playful | per-vertical YAML |
| Greeting | default-bilingual / custom-string (≤140 chars) | per-vertical YAML |
| Upsell | never / suggest-once-after-confirm / suggest-during-slot-collection | per-vertical YAML |
| Cancellation tone | forgiving / neutral / firm | per-vertical YAML |
| Honorific | first-name / formal-en (Mr/Ms/Mx) / formal-sw (Bwana/Bibi) | per-vertical YAML |
| Cross-sell | never / related-only / full-suggest | per-vertical YAML |
Dial 3 (Upsell) gates Dial 6 (Cross-sell) compositionally: when
Upsell = never, cross-sell never fires regardless of Cross-sell value.
Validation of dial values lives at the application layer
(PersonalityConfig, M11 T1) so that YAML-vs-DB drift fails loudly at
load time rather than silently producing malformed prompts.
D4. Per-vertical defaults via YAML — no DB defaults table
Vertical defaults ship as a hardcoded YAML file at
app/prompts/personality_defaults.yaml. There is no
public.personality_vertical_defaults table.
Walkthrough Q4 weighed three options:
- YAML only — single source of truth, ships with code, version-controlled, no drift risk.
- DB defaults table seeded from YAML — SQL-queryable, but
introduces a drift risk between
personality_defaults.yamland the seeded snapshot. - Hybrid: YAML source + seeded snapshot for SQL queryability — solves drift via "rebuild snapshot at deploy" but adds operational complexity for a query path no caller in v1 actually exercises.
Adrian chose option 1. Pure YAML, loaded into a PersonalityConfig
contextvar cache at request time. The 8 canonical verticals
(barbershop, dental, legal, medical, physio, salon,
spa, tutoring) each get a block in the YAML; a tenant's vertical
selects its block at config-merge time. Tenant overrides land in the
new <tenant>.tenant_personality_config table (D10); NULL columns
inherit the YAML default.
Vertical names match the public.tenants.vertical CHECK constraint
verbatim — physio not physiotherapy, medical not medical-clinic.
D5. LLM-inferred relation graph at onboarding
A second LLM pass at catalog import time infers complementary,
alternative, and sequential relations between services, stored
deterministically in <tenant>.service_relations.
Three approaches considered:
- Manual — admin pins every pair. Friction-laden for pilot tenants with 30+ services.
- Runtime LLM — infer relations on every cross-sell decision. Token-cost prohibitive at scale; defeats cost discipline.
- Inferred at onboarding (chosen) — one-shot LLM pass at import time produces a deterministic graph; runtime cross-sell is a SQL lookup against that graph.
The graph stores three relation types:
complementary— used by v1 cross-sell (manicure ↔ pedicure).alternative— populated for future slot-collection disambiguation; unused in v1.sequential— populated for future reminder/upsell campaigns ("your hair appointment was 4 weeks ago"); unused in v1.
Each relation carries a confidence (0.0–1.0) and a free-text
reason (LLM rationale). Source is one of llm_inferred,
tenant_pinned, or association_rule. Post-pilot, a nightly
χ²-significance miner over real booking data augments the graph with
source='association_rule'; tenant-pinned relations always win at
lookup.
Idempotency: re-running inference on the same catalog yields the same
pairs (LLM determinism via temperature=0).
D6. Slot-aware cross-sell with provisional atomic pair-reservation
Cross-sell offers a complementary service only when an adjacent slot is available, locks both slots atomically while the customer decides, and releases atomically on decline / timeout.
Mechanics locked in v1:
| Decision | v1 commitment |
|---|---|
| Adjacency window | Same-visit only: candidate slot within ±15 min of primary |
| Staff swap | Allowed (different specialist for nails vs face is the typical salon pattern) |
| Pricing | Simple sum (no bundle discounts in v1) |
| Cascade depth | Cap at 1 cross-sell per booking conversation |
| "Yes" arrives but slot taken | Provisional atomic pair-reservation extends M6 SETNX from one slot to ordered pair; on decline / timeout, release |
The pair-reservation extends M6's existing Redis SETNX primitive: a
new reserve_pair(primary_slot, secondary_slot, ttl_seconds=30)
acquires both slots atomically (or rolls back fully) and a paired
release_pair(token) releases both.
The FSM extension introduces three new nodes — CROSS_SELL_OFFER,
CROSS_SELL_RESPONSE, BUNDLE_CONFIRM — between CONFIRM and
PAYMENT_PENDING. When find_cross_sell_options() returns no
candidates, the FSM transitions straight to PAYMENT_PENDING,
preserving M5/M10 behaviour for catalogs without inferred relations.
D7. Vision LLM via Anthropic; never auto-fill price (safety floor)
M11 adds a vision role to LLMRouter, routed exclusively through
Anthropic's vision-capable Sonnet model. The single non-negotiable
safety floor: the LLM never auto-fills a service's price.
Why Anthropic for v1:
- Single-call structured extraction (image → JSON) — no separate OCR step.
- Native bilingual handling (en/sw mixed) outperforms Tesseract+LLM stacks on Swahili.
- Cost: ~$0.01–0.03 per image. Negligible for once-per-tenant onboarding.
LLMRouter(M5) already routes Anthropic; adding avisionrole is a one-config change with zero new vendor surface.
Reversibility: the VisionExtractor.extract(image, prompt) → list[ExtractedRow]
interface is provider-agnostic. If Anthropic Vision becomes
unavailable or pricing shifts, swap to GPT-4 Vision via OpenAI direct
without disturbing callers.
A new project Setting ANTHROPIC_VISION_MODEL (default
claude-3-5-sonnet-20241022) pins the model. Revisit if Anthropic
ships vision-specific Opus / Haiku variants.
The price safety floor is hard:
- If vision extraction returns
confidence(price) < 0.5, the review screen marks the row red and disables the bulk-accept button for that row until the admin types the price manually. - The LLM gap-filling pass that suggests defaults for missing
duration_min/name_sw/descriptionmust not suggest a price under any circumstances.
Rationale: M-Pesa STK push operates on the entered price. A hallucinated price is a direct customer-trust bomb and a refund scenario the platform absorbs.
D8. Idempotent catalog imports with audit trail
Re-uploading the same catalog matches existing rows by
(name, tenant_id) and updates rather than duplicates. Every import
produces an audit row tagged type='re-import' for matched rows.
Tenants will re-upload menus to refresh prices, add seasonal items,
or correct OCR errors. Without idempotency, every upload doubles the
catalog. The match key is (name, tenant_id) rather than
(name, vertical, tenant_id) because vertical is immutable on
tenants — the tenant_id alone scopes uniqueness.
Audit rows land in <tenant>.catalog_imports (the import-level
record) and <tenant>.catalog_audits (per-row edit trail). Wave 0
T0 ships both tables alongside the schema changes.
D9. Prompt-template parameterisation via user-template variables
Dial values are spliced into prompts as user-template variables, NOT system messages. This preserves Anthropic prompt-cache eligibility on every cache_eligible prompt.
ADR-0005's single-prompt commitment requires a stable system message across every call to a given role. Anthropic's prompt cache keys on exact-string match of the system message; per-tenant variation in the system message would produce a cache-miss per tenant per call.
The chosen mechanism: extend LLMRouter.complete() with a
tenant_personality: dict | None = None parameter. When provided,
dial values render into the user-template variables that the
existing template-rendering layer already consumes:
[answer_shaper user_template]
Tenant: {tenant_name}
Personality: {personality_directive} # e.g., "warm and friendly, occasional emoji"
Cancellation tone: {cancellation_tone_directive}
Intent: {intent}
...
cache_eligible: true system messages stay byte-identical across
tenants. Test (T2 unit suite) asserts system-message stability across
tenants and lints prompt YAML files for absence of dial-variable
references in the system_message: block.
D10. Tenant-level dial scope only (no per-customer-segment in v1)
Dial values apply at the tenant level only. There is no per- customer-segment differentiation in v1 — VIP recognition, returning-customer perks, demographic-driven tone shifts all defer.
The dial config table is structurally a singleton per tenant:
<tenant>.tenant_personality_config enforces this via a partial
unique index on a constant boolean (is_singleton). The shape
permits future expansion to multi-row "segment override" rows
without breaking the v1 contract — a future ADR can lift the
singleton constraint and add a segment-key column.
Pilot data collection during M12+ informs whether segmentation is worth the parsing complexity.
D11. Audit retention 90 days (matching ADR-0006)
Both <tenant>.catalog_audits and <tenant>.dial_audits ship
with 90-day retention parity to ADR-0006's handoff_log retention.
The daily 3 AM EAT consolidated reaper from ADR-0007 D5 extends to sweep both new tables. Vertical-specific override (medical / dental clinics with longer regulatory retention) is deferred to a future compliance-focused ADR; M11 ships the wide-net 90-day default.
D12. Single-snapshot rollback for bulk imports (7-day window)
<tenant>.catalog_imports.snapshot_jsonb carries the catalog
state as it existed immediately before the import. A one-click
"undo last import" rollback is supported within a 7-day window
post-import.
Pilot tenants will accidentally upload the wrong PDF, miscalibrate the review-screen bulk-accept, or want a "back-out" path the day after a price update. The 7-day window matches typical "review my business changes weekly" cadence; older rollbacks become support tickets. The snapshot is bounded by the per-tenant catalog size (typically <100 services × <1KB JSON each = <100KB) — no separate storage tier needed.
D13. LLM tool-call confirmation timeout: 5 minutes → silent rollback
Mutation tool calls (slash or NL) ask "yes/no". If the admin
walks away without responding, the pending mutation is silently
rolled back after 5 minutes. The audit row records reason='timed_out'.
The timeout matches typical "I'll be right back" admin-context duration; longer leaves stale mutations dangling and risks confused state on resume. Silent rollback (rather than nag-prompt) keeps admin attention focused on whatever they actually returned to do.
Reversibility
Every D1–D13 commitment is reversible by design:
- D1 three-prong scope — disabling a prong = removing the
feature flag (catalog onboarding can be hidden behind the
dashboard's
/admin/catalogroute; conversational steady-state reuses M9's existing handler with new commands removable individually; personality dials default to vertical defaults if the table is empty). - D2 curated dials only — adding free-text overrides in v2 is
additive; the curated-dial table doesn't preclude a sibling
tenant_personality_overridestable. - D3 dial value sets — the application layer owns enum validation (per Q4 resolution); adding a new dial value is a one-line YAML + one-line PersonalityConfig change.
- D4 YAML defaults — replacing with a DB table is additive (seed-from-YAML, drop YAML).
- D5 inferred relations — the relation graph is a flat table;
rebuilding from scratch on demand is supported by the
RelationInferrerinterface. - D6 cross-sell — disabling cross-sell tenant-wide = setting
Dial 3 (Upsell) to
neveror Dial 6 (Cross-sell) tonever. No FSM impact; the cross-sell-options query returns[]and the FSM transitions straight through. - D7 vision provider swap —
VisionExtractorinterface isolates Anthropic; swapping to GPT-4 Vision is one-adapter work. - D8 idempotency — re-import-as-update is a SQL semantic; a future "always-insert" mode is configurable per import call.
- D9 user-template variables — preserving prompt-cache eligibility was the design driver; the mechanism is the reversible default.
- D10 tenant-level scope — adding segment-level dials lifts the singleton constraint; existing rows remain valid.
- D11–D13 — all numeric thresholds (90-day retention, 7-day rollback, 5-minute confirm timeout) live as table constants or per-tenant config columns; tuning is a one-config change.
Cross-references
- ADR-0005 (Orchestration model) — single-prompt commitment
preserved via D9's user-template-variable mechanism. The
cache_eligible: truesystem messages stay byte-identical across tenants. - ADR-0006 (Handoff model) — 90-day audit retention parity for
catalog_audits+dial_audits(D11). - ADR-0007 (Payments orchestration) — combined-bundle STK push
for cross-sell
BUNDLE_CONFIRMextends the existing M-Pesa STK primitive; daily reaper sweeps the new audit tables (D11). - ADR-0009 (Channel-agnostic conversation substrate) — M11 builds on the substrate without modifying it. Dial-driven prompt variation works identically across all five channels because rendering is post-LLM (per-channel rules apply downstream of the dial-aware prompt).
Implementation
Land in M11 — Tenant Self-Service Customisation. Plan at
docs/superpowers/plans/2026-05-05-m11-tenant-self-service-customisation.md.
Companion design doc at
docs/plans/2026-05-05-m11-tenant-self-service-design.md.
15 task commits across 5 waves; Wave 2 = 5-way parallel; estimated +119 backend + +22 frontend Vitest post-M11.
Pre-flight: M10 must be COMPLETE per STATE.md before Wave 0 dispatch. (Already satisfied as of 2026-05-06.)
Open questions deferred to future ADRs
- Free-text personality overrides + moderation pipeline — pilot data dictates whether the bounded curated-dial surface suffices. Future ADR territory.
- Per-customer-segment dials — VIP / returning-customer / demographic-driven tuning. Defer to post-pilot.
- Vertical-specific audit retention overrides — medical / dental clinics with regulatory retention requirements above the 90-day default. Future compliance ADR.
- Bundle pricing / discount rules — cross-sell uses simple-sum pricing in v1. Tenant-defined bundle discounts deferred to feature work.
- Sequential cross-sell campaigns — the
sequentialrelation type is populated at onboarding but unused in v1 cross-sell. Reminder/upsell campaigns oversequentialrelations are future feature work. - OCR fallback — Anthropic Vision is sole provider in v1. If vision pricing or availability shifts, the provider-agnostic interface enables a swap; a fallback chain (Anthropic → OpenAI → Tesseract) would be future ADR territory.