Skip to main content

Schema evolution

Ratiba runs a two-pool, two-chain Postgres topology. A single shared public.* registry holds tenant-less rows (registry, webhook routing, dead-letter queues). Each tenant gets a private tenant_<slug>.* schema holding their domain rows (services, staff, appointments, conversation threads, payments). Both schemas are versioned by Alembic but driven by two separately-invoked migration chains — the public chain runs once at deploy time, the tenant chain runs once per tenant at onboarding (and once per existing tenant during deploys with a fan-out loop).

The column-by-column field dictionary is in Reference → Schema. This page covers the structural story: two-pool model, ER diagrams, migration ladder, and key constraints.

Two-pool model

Per ADR-0002:

PoolSchemaLifetimeMigration trigger
Shared registrypublicForever; one row per tenantDeploy-time alembic upgrade head against the public chain
Per-tenant domaintenant_<slug>Created at onboarding; soft-deletablePer-tenant alembic upgrade head against the tenant chain (in onboarding orchestrator + deploy fan-out)

Cross-pool boundaries:

  • Webhook routing: Daraja / PesaPal callbacks arrive tenant-less. public.payment_routing carries a merchant_reference PK that resolves to (tenant_id, schema_name, thread_id) so the webhook handler can flip search_path into the right tenant schema.
  • Dead-letter queues: unmatched callbacks land in public.payment_callbacks_unrouted and unmatched WhatsApp inbound webhooks land in public.inbound_quarantine. Both are tenant-less by design — the tenant is unknown, that's why they're quarantined.
  • Archive symmetry: the per-tenant handoff_log and LangGraph checkpoints family have public-schema archive twins (public.handoff_log_archive, public.checkpoints_archive) for cross-tenant ops queries on long-tail audit data. Daily 3 AM EAT cron mover (per ADR-0007 D4) lifts >90-day rows from the per-tenant tables to public archive.

The two chains never cross-reference at the migration level. ORM-level FKs from per-tenant rows to public.tenants(id) are forbidden by ADR-0002 D2 — search_path semantics break cross-schema FK validation under tenant-isolated transactions. Application code enforces the implicit tenant_id constraint via the active-schema contextvar.

For the runtime side of this boundary (onboarding orchestrator, lifecycle FSM, contextvar), see How it works → Identity and tenancy.

Public ER diagram

tenants is shown column-truncated above. The full column set after all public migrations is enumerated in Reference → Schema — public.tenants.

Per-tenant ER diagrams

Twenty-three domain tables across M3 – Phase 0 contributions. All tables live in the same per-tenant schema and share a transactional boundary. The diagrams are split by cluster for readability.

Booking domain (M3 baseline + M5 + M6 + M10 + M13)

M13 extends staff with employment attrs and appointments with rendering actuals. Commission tables are shown in the next cluster.

Commission engine (M13)

New in M13 per Beyond-Ratiba spec §3 B. commission_rules defines the rate schedule; commission_period holds the frozen monthly close.

Identity + sessions (M10)

(customer_id, provider, external_id) is the composite PK; reverse-direction ix_customer_identities_alias (provider, external_id) supports the hot-path "who is this external id?" query that every Tier-2 adapter issues on inbound turns.

Conversation + handoff (M5 + M9)

LangGraph checkpoint family (checkpoints, checkpoint_blobs, checkpoint_writes, checkpoint_migrations) is created by PostgresSaver(conn).setup() inside the onboarding orchestrator transaction — not by Alembic. Schema is owned upstream by langgraph-checkpoint-postgres and elided here.

Catalog + personality (M11)

Knowledge snippets (Phase 0 — ADR-0013)

Standalone table; no FK to any other per-tenant table. Scoped to the tenant by search_path alone. See ADR-0013 for the full rationale and How it works → Knowledge answers for the runtime injection flow.

No embedding column — that is the single additive Phase-1 upgrade. The knowledge_overflow WARN log (emitted when the tenant exceeds ~20 snippets / ~1,500 tokens) is the observable Phase-0 → Phase-1 graduation trigger.

Migration ladder

Two chains, separately invoked. Public chain runs once at deploy; tenant chain runs per-tenant at onboarding plus a deploy-time fan-out loop over existing tenants.

Public chain (9 migrations)

P0009 adds 8 columns to public.tenants: six BOOLEAN NOT NULL DEFAULT FALSE feature gates (voice_full_duplex_enabled, voice_backchannel_filter, voice_hard_interrupt, voice_streaming, voice_listening_ack, voice_adaptive_speed) and two nullable INTEGER endpointing knobs (voice_endpointing_min_delay_ms, voice_endpointing_max_delay_ms). All default to half-duplex behaviour until an admin opts in.

Tenant chain (13 migrations)

LangGraph checkpoint tables are created by PostgresSaver.setup() inside the onboarding orchestrator transaction between T0001 and T0002 application-time — they're not part of the Alembic chain because their shape is owned upstream.

T0009 – T0010 (M13 staff + appointments): staff gains six Workpay-authoritative employment columns; appointments gains four rendering-actual columns that CommissionEngine uses to compute hours and tips. M17 will sync the employment attrs from Workpay on a monthly cadence; during M13 beta, the spa owner enters them directly.

T0011 – T0012 (M13 commission): commission_rules defines the rate schedule (pct / fixed / tiered, per-staff XOR per-service-category, with a validity window). commission_period holds the frozen monthly close; amendments create a new row with type='amendment' linked back via amendment_of_id. Idempotency key uq_commission_period_idempotency makes re-running CommissionEngine a no-op.

T0013 (Phase 0 knowledge snippets): Standalone knowledge_snippets table. No embedding column — real retrieval is deferred as YAGNI until a paying tenant's knowledge base outgrows the ~1,500-token prompt cap. See ADR-0013.

Key constraints + invariants

ConstraintWherePurpose
customers.phone_e164 partial UNIQUE WHERE NOT NULLtenant.0003_m10_customer_identityM10 D5 — multiple pre-phone-capture rows coexist with NULL phones; uniqueness applies only to populated phones
customer_sessions.handoff_token partial UNIQUE WHERE NOT NULLtenant.0003_m10_customer_identityM10 D11 — channel-switch token is single-use; multiple NULL tokens coexist for sessions that haven't initiated a switch
service_relations PK (service_a_id, service_b_id, relation_type)tenant.0004_m11_service_relationsM11 D5 — same pair may carry multiple relation types (complementary AND sequential); ordering of a→b is meaningful for sequential
idx_service_relations_complementary partial WHERE relation_type='complementary' AND confidence >= 0.7tenant.0004_m11_service_relationsHot-path v1 cross-sell lookup — predicate matches runtime WHERE clause; planner skips alt+sequential rows
tenant_personality_config.is_singleton partial UNIQUE WHERE TRUEtenant.0007_m11_personality_configM11 D10 — at most one row per tenant; forward-compatible with future per-segment dials by adding a segment_key column
tenants.vertical CHECK 8 enumpublic.0001_public_schema_baselineIN ('spa', 'salon', 'barbershop', 'dental', 'physio', 'medical', 'tutoring', 'legal') — drives defaults_for(vertical) seeding at onboarding
tenants.status CHECK 4 enumpublic.0001_public_schema_baselineIN ('active', 'suspended', 'trial', 'deleted') — lifecycle states from ADR-0002 D6
tenants.schema_name partial UNIQUE WHERE NOT NULLpublic.0001_public_schema_baselineNULL marks finalized (post-7-day-cooling) tenants vs soft-deleted-but-recoverable; daily reaper uses NULL signal to skip already-finalized rows
tenants.whatsapp_number / voice_number / whatsapp_phone_number_id partial UNIQUEpublic.0001 + public.0002Three partial uniques on tenants (M4 channel lookup, M4 voice lookup, M3 finalize)
payment_routing.provider CHECK 4 enumpublic.0001_public_schema_baselineIN ('mpesa', 'airtel', 'equitel', 'pesapal') — webhook-routable providers only
customer_identities.provider CHECK 5 enumtenant.0003_m10_customer_identityIN ('whatsapp', 'voice', 'web_cookie', 'instagram', 'messenger') — channel-agnostic identity per ADR-0009 D5
service_relations.relation_type CHECK 3 enumtenant.0004_m11_service_relationsIN ('complementary', 'alternative', 'sequential') — only complementary used by v1 cross-sell
service_relations.source CHECK 3 enumtenant.0004_m11_service_relationsIN ('llm_inferred', 'tenant_pinned', 'association_rule')tenant_pinned always wins at lookup
catalog_imports.modality CHECK 5 enumtenant.0005_m11_catalog_importsIN ('image', 'multi_image', 'pdf', 'text', 'csv') — ADR-0010 D1 three-prong scope
catalog_audits.source CHECK 4 enumtenant.0006_m11_catalog_auditsIN ('slash', 'nl', 'import', 'dashboard') — ops-triage signal: which entry point produced bad data
dial_audits.dial CHECK 7 axestenant.0008_m11_dial_auditsIN ('tone', 'greeting_mode', 'greeting_custom', 'upsell', 'cancellation_tone', 'honorific', 'cross_sell') — six dials + custom-string companion
handoff_log.claimed_via / handoff_log_archive.claimed_via CHECK 2 enumtenant.0002_handoff_log_admin_columnsIN ('dashboard', 'whatsapp') — which rail the admin used; column-parity test enforces archive sync
payments.status widened to VARCHAR(30)tenant.0001_tenant_schema_baseline (M3 T4 amendment)Accommodates cancelled_by_customer (21 chars) + future enum growth
idx_inbound_quarantine_unreviewed partial WHERE reviewed_at IS NULLpublic.0003_inbound_quarantineDaily ops-triage query stays bounded as the queue grows
idx_payment_callbacks_unrouted_unreviewed partial WHERE reviewed_at IS NULLpublic.0001_public_schema_baselineSame shape as inbound_quarantine — symmetric DLQ ergonomics
idx_threads_active_customer / idx_threads_active_admin partial WHERE closed_at IS NULLtenant.0001_tenant_schema_baselineActive-thread lookup hot path; per-actor partial split avoids serializing customer + admin queries
staff.employment_type CHECK 4 enumtenant.0009_m13_staff_employment_attrsIN ('permanent', 'casual', 'commission_only', 'apprentice') — drives Workpay payroll classification
staff.kra_pin CHECK regextenant.0009_m13_staff_employment_attrs^[AP][0-9]{9}[A-Z]$ — KRA PIN format (A-prefix personal, P-prefix business)
commission_rules.rule_type CHECK 3 enumtenant.0011_m13_commission_rulesIN ('pct', 'fixed', 'tiered')
commission_rules staff XOR categorytenant.0011_m13_commission_rules(staff_id IS NOT NULL) <> (service_category IS NOT NULL) — a rule targets exactly one axis
commission_rules.valid_to > valid_fromtenant.0011_m13_commission_rulesPrevents zero-length validity windows
commission_period idempotency UNIQUEtenant.0012_m13_commission_period(staff_id, period_start, period_end, type) — re-running CommissionEngine is a no-op
commission_period.type CHECK 2 enumtenant.0012_m13_commission_periodIN ('close', 'amendment')
commission_period amendment-link constrainttenant.0012_m13_commission_period(type = 'amendment') = (amendment_of_id IS NOT NULL) — amendment rows always link back
idx_knowledge_snippets_category partial WHERE is_activetenant.0013_knowledge_snippetsHot-path fetch_snippets(intent) query; false rows excluded at index level

The orm-vs-migration drift fence is enforced by three backend tests: test_models_match_migration_metadata (compare_metadata), test_tenants_columns_match_evolution_doc (column-name reference set), and test_no_pgcrypto_extension_create_in_tenant_migration (public-only extension policing).

  • How-it-works: Identity and tenancy walks the runtime side — onboarding orchestrator + lifecycle FSM + active-schema contextvar.
  • How-it-works: Knowledge answers — Phase 0 snippet injection into the answer_shaper prompt.
  • Reference: Schema reference is the column-by-column field dictionary.
  • ADR-0002: Multi-tenant isolation — public registry + per-tenant schema; two pools, two chains.
  • ADR-0010: Catalog + personality schema — M11 contributions (service_relations, catalog_imports, catalog_audits, tenant_personality_config, dial_audits).
  • ADR-0012: M13 containers and real pilot — M13 commission scaffolding (staff employment attrs, rendering actuals, commission_rules, commission_period).
  • ADR-0013: Phase 0 knowledge answersknowledge_snippets rationale, cap, and Phase-1 upgrade trigger.