Seed test data
Realistic test data is the difference between a booking flow that exercises real logic and one that returns empty-catalog deflections. Before you can walk through the First booking guide end-to-end you need a tenant with:
- Services (what can be booked, prices, durations)
- Staff (who delivers the service)
- Staff schedules (when those staff are available — the FSM reads these to propose slots)
- Knowledge snippets (canned answers to FAQ-style questions about hours, policies, facilities)
This page covers the fast developer path: exact commands, the right order, and honest notes about which step requires the UI vs a script. The UI-led path — intended for real tenant onboarding — is documented in Onboard a tenant. This page is the complement: identical infrastructure, short-circuit for the catalog and knowledge steps.
Programmatic seeding
Step 1 — Provision the tenant
A tenant must exist in public.tenants before anything else can happen. The seeding script and SQL commands below target a tenant by its slug, and they will fail with a clear error if the slug is not found.
Use the admin endpoint (same as Onboard a tenant Step 1):
curl -X POST http://localhost:8010/api/v1/admin/tenants \
-H "Content-Type: application/json" \
-H "X-Admin-API-Key: dev-admin-key" \
-d '{
"name": "Aria Aura Spa (dev)",
"slug": "aria_aura",
"vertical": "spa",
"owner_phone": "+254712345678",
"whatsapp_number": "+15555555555",
"locale": "en",
"timezone": "Africa/Nairobi",
"mpesa_enabled": false
}'
Slug rules: ^[a-z0-9_]+$ — lowercase, digits, underscores only. No hyphens; the slug becomes a Postgres schema name (tenant_aria_aura). The response is HTTP 201 with the tenant object.
Confirm the schema landed:
docker compose exec postgres psql -U ratiba ratiba \
-c "\dt tenant_aria_aura.*" | grep services
You should see services in the table list.
Step 2 — Add services, staff, and schedules
There is no standalone seed script for the catalog — the production path uses Vision-LLM-assisted upload through the dashboard (see Catalog onboarding). For dev purposes the fastest route is a direct psql insert. The shape below mirrors exactly what the seeded_tenant pytest fixture inserts (see backend/tests/conftest.py — the canonical reference for the complete SeededTenant data model).
docker compose exec postgres psql -U ratiba ratiba
Inside psql, set the search path to your tenant schema and insert:
SET search_path TO tenant_aria_aura;
-- Services (name, Swahili name, duration in minutes, price in KES)
INSERT INTO services (name, name_sw, duration_min, price, is_active, sort_order)
VALUES
('Deep Tissue Massage', 'Masaji', 60, 3500.00, TRUE, 0),
('Manicure', 'Manikure', 30, 1200.00, TRUE, 1),
('Pedicure', 'Pedikure', 45, 1500.00, TRUE, 2);
-- One staff member
INSERT INTO staff (name, phone_number, is_active)
VALUES ('Maria', '+254712345678', TRUE)
RETURNING id;
-- Copy the returned UUID into <staff_uuid> below
-- Mon-Fri 09:00-17:00 schedule (day_of_week: 0=Mon, 4=Fri)
INSERT INTO staff_schedules (staff_id, day_of_week, start_time, end_time, is_available)
VALUES
('<staff_uuid>', 0, '09:00', '17:00', TRUE),
('<staff_uuid>', 1, '09:00', '17:00', TRUE),
('<staff_uuid>', 2, '09:00', '17:00', TRUE),
('<staff_uuid>', 3, '09:00', '17:00', TRUE),
('<staff_uuid>', 4, '09:00', '17:00', TRUE);
The seeded_tenant fixture (used in integration tests) inserts identical rows programmatically via get_tenant_session(). If you are writing a new integration test and need a fully-populated tenant, use that fixture directly rather than repeating the SQL — it handles the Alembic migration, pool init, and schema context automatically.
Step 3 — Seed knowledge snippets
scripts/seed_knowledge.py reads a curated YAML from backend/config/knowledge/<slug>.yaml and upserts every snippet into <tenant>.knowledge_snippets (per ADR-0013). The upsert is idempotent: it deletes on the natural key (category, title) then re-inserts, so re-running is safe.
cd backend
python -m scripts.seed_knowledge --tenant aria_aura
Dry-run first to confirm the YAML loads and count the rows without writing:
python -m scripts.seed_knowledge --tenant aria_aura --dry-run
The script resolves the tenant by querying public.tenants WHERE slug = $1 AND status IN ('active', 'trial'). If the tenant is not yet provisioned, or its status is something else, the script exits with a descriptive error — provision first (Step 1).
The YAML file lives at backend/config/knowledge/<slug>.yaml. For aria_aura it ships with the repo at backend/config/knowledge/aria_aura.yaml. Categories used by the Phase 0 intent router are: policy, facility, prep, service, hours, general. For how category maps to intent and how snippets reach the prompt, see Knowledge answers.
For a new dev tenant with a different slug, either copy and rename aria_aura.yaml or author a new YAML following the same structure:
snippets:
- category: hours
title: Opening hours
body: We are open Monday to Saturday, 9 AM to 6 PM EAT.
language: en
- category: policy
title: Cancellation policy
body: Please give 24 hours notice to cancel or reschedule.
language: en
Step 4 — Verify
Quick row-count check:
docker compose exec postgres psql -U ratiba ratiba -c "
SET search_path TO tenant_aria_aura;
SELECT
(SELECT count(*) FROM services WHERE is_active) AS services,
(SELECT count(*) FROM staff WHERE is_active) AS staff,
(SELECT count(*) FROM staff_schedules) AS schedules,
(SELECT count(*) FROM knowledge_snippets WHERE is_active) AS snippets;
"
Expected output for the data above:
services | staff | schedules | snippets
----------+-------+-----------+----------
3 | 1 | 5 | 20
(1 row)
(Snippet count varies with the YAML; the aria_aura.yaml seed ships ~20 rows.)
Smoke message. If you have a Meta test phone number configured (see First-run checklist), send a WhatsApp message to your tenant's number asking something like "what time do you open?" — the agent should return a grounded answer drawn from the hours snippets rather than the deflection fallback. Tail the backend log and filter for the knowledge service:
docker compose logs -f backend | grep -E "knowledge_gap_candidate|knowledge_overflow|fetch_snippets"
A knowledge_overflow WARN means the snippet set is approaching the ~20-snippet / ~1,500-token Phase 0 cap — that is the ADR-0013 graduation trigger to add the embedding column and switch to top-k retrieval.
First-run checklist
Work through this list top-to-bottom whenever you stand up a fresh dev environment. Each item links to where it is done.
- Docker Compose up and all services healthy — see Local dev runbook
- Backend migrations applied (
alembic upgrade headinsidebackend/) — see Dev setup - Tenant provisioned via
POST /api/v1/admin/tenants— Step 1 above or Onboard a tenant - Services, staff, and staff schedules inserted — Step 2 above or the catalog dashboard at
http://localhost:3010/admin/catalog - Knowledge snippets seeded with
python -m scripts.seed_knowledge --tenant <slug>— Step 3 above - Test phone number verified with Meta (required for live WhatsApp smoke test) — Onboard a tenant Step 2 / First booking
- Row counts verified via
psql— Step 4 above - Payment sandbox configured if testing M-Pesa flows (
mpesa_enabled: trueon the tenant + Daraja sandbox creds in.env) — see ADR-0007