Skip to main content

Seed test data

Realistic test data is the difference between a booking flow that exercises real logic and one that returns empty-catalog deflections. Before you can walk through the First booking guide end-to-end you need a tenant with:

  • Services (what can be booked, prices, durations)
  • Staff (who delivers the service)
  • Staff schedules (when those staff are available — the FSM reads these to propose slots)
  • Knowledge snippets (canned answers to FAQ-style questions about hours, policies, facilities)

This page covers the fast developer path: exact commands, the right order, and honest notes about which step requires the UI vs a script. The UI-led path — intended for real tenant onboarding — is documented in Onboard a tenant. This page is the complement: identical infrastructure, short-circuit for the catalog and knowledge steps.


Programmatic seeding

Step 1 — Provision the tenant

A tenant must exist in public.tenants before anything else can happen. The seeding script and SQL commands below target a tenant by its slug, and they will fail with a clear error if the slug is not found.

Use the admin endpoint (same as Onboard a tenant Step 1):

curl -X POST http://localhost:8010/api/v1/admin/tenants \
-H "Content-Type: application/json" \
-H "X-Admin-API-Key: dev-admin-key" \
-d '{
"name": "Aria Aura Spa (dev)",
"slug": "aria_aura",
"vertical": "spa",
"owner_phone": "+254712345678",
"whatsapp_number": "+15555555555",
"locale": "en",
"timezone": "Africa/Nairobi",
"mpesa_enabled": false
}'

Slug rules: ^[a-z0-9_]+$ — lowercase, digits, underscores only. No hyphens; the slug becomes a Postgres schema name (tenant_aria_aura). The response is HTTP 201 with the tenant object.

Confirm the schema landed:

docker compose exec postgres psql -U ratiba ratiba \
-c "\dt tenant_aria_aura.*" | grep services

You should see services in the table list.


Step 2 — Add services, staff, and schedules

There is no standalone seed script for the catalog — the production path uses Vision-LLM-assisted upload through the dashboard (see Catalog onboarding). For dev purposes the fastest route is a direct psql insert. The shape below mirrors exactly what the seeded_tenant pytest fixture inserts (see backend/tests/conftest.py — the canonical reference for the complete SeededTenant data model).

docker compose exec postgres psql -U ratiba ratiba

Inside psql, set the search path to your tenant schema and insert:

SET search_path TO tenant_aria_aura;

-- Services (name, Swahili name, duration in minutes, price in KES)
INSERT INTO services (name, name_sw, duration_min, price, is_active, sort_order)
VALUES
('Deep Tissue Massage', 'Masaji', 60, 3500.00, TRUE, 0),
('Manicure', 'Manikure', 30, 1200.00, TRUE, 1),
('Pedicure', 'Pedikure', 45, 1500.00, TRUE, 2);

-- One staff member
INSERT INTO staff (name, phone_number, is_active)
VALUES ('Maria', '+254712345678', TRUE)
RETURNING id;
-- Copy the returned UUID into <staff_uuid> below

-- Mon-Fri 09:00-17:00 schedule (day_of_week: 0=Mon, 4=Fri)
INSERT INTO staff_schedules (staff_id, day_of_week, start_time, end_time, is_available)
VALUES
('<staff_uuid>', 0, '09:00', '17:00', TRUE),
('<staff_uuid>', 1, '09:00', '17:00', TRUE),
('<staff_uuid>', 2, '09:00', '17:00', TRUE),
('<staff_uuid>', 3, '09:00', '17:00', TRUE),
('<staff_uuid>', 4, '09:00', '17:00', TRUE);

The seeded_tenant fixture (used in integration tests) inserts identical rows programmatically via get_tenant_session(). If you are writing a new integration test and need a fully-populated tenant, use that fixture directly rather than repeating the SQL — it handles the Alembic migration, pool init, and schema context automatically.


Step 3 — Seed knowledge snippets

scripts/seed_knowledge.py reads a curated YAML from backend/config/knowledge/<slug>.yaml and upserts every snippet into <tenant>.knowledge_snippets (per ADR-0013). The upsert is idempotent: it deletes on the natural key (category, title) then re-inserts, so re-running is safe.

cd backend
python -m scripts.seed_knowledge --tenant aria_aura

Dry-run first to confirm the YAML loads and count the rows without writing:

python -m scripts.seed_knowledge --tenant aria_aura --dry-run

The script resolves the tenant by querying public.tenants WHERE slug = $1 AND status IN ('active', 'trial'). If the tenant is not yet provisioned, or its status is something else, the script exits with a descriptive error — provision first (Step 1).

The YAML file lives at backend/config/knowledge/<slug>.yaml. For aria_aura it ships with the repo at backend/config/knowledge/aria_aura.yaml. Categories used by the Phase 0 intent router are: policy, facility, prep, service, hours, general. For how category maps to intent and how snippets reach the prompt, see Knowledge answers.

For a new dev tenant with a different slug, either copy and rename aria_aura.yaml or author a new YAML following the same structure:

snippets:
- category: hours
title: Opening hours
body: We are open Monday to Saturday, 9 AM to 6 PM EAT.
language: en
- category: policy
title: Cancellation policy
body: Please give 24 hours notice to cancel or reschedule.
language: en

Step 4 — Verify

Quick row-count check:

docker compose exec postgres psql -U ratiba ratiba -c "
SET search_path TO tenant_aria_aura;
SELECT
(SELECT count(*) FROM services WHERE is_active) AS services,
(SELECT count(*) FROM staff WHERE is_active) AS staff,
(SELECT count(*) FROM staff_schedules) AS schedules,
(SELECT count(*) FROM knowledge_snippets WHERE is_active) AS snippets;
"

Expected output for the data above:

services | staff | schedules | snippets
----------+-------+-----------+----------
3 | 1 | 5 | 20
(1 row)

(Snippet count varies with the YAML; the aria_aura.yaml seed ships ~20 rows.)

Smoke message. If you have a Meta test phone number configured (see First-run checklist), send a WhatsApp message to your tenant's number asking something like "what time do you open?" — the agent should return a grounded answer drawn from the hours snippets rather than the deflection fallback. Tail the backend log and filter for the knowledge service:

docker compose logs -f backend | grep -E "knowledge_gap_candidate|knowledge_overflow|fetch_snippets"

A knowledge_overflow WARN means the snippet set is approaching the ~20-snippet / ~1,500-token Phase 0 cap — that is the ADR-0013 graduation trigger to add the embedding column and switch to top-k retrieval.


First-run checklist

Work through this list top-to-bottom whenever you stand up a fresh dev environment. Each item links to where it is done.

  • Docker Compose up and all services healthy — see Local dev runbook
  • Backend migrations applied (alembic upgrade head inside backend/) — see Dev setup
  • Tenant provisioned via POST /api/v1/admin/tenantsStep 1 above or Onboard a tenant
  • Services, staff, and staff schedules inserted — Step 2 above or the catalog dashboard at http://localhost:3010/admin/catalog
  • Knowledge snippets seeded with python -m scripts.seed_knowledge --tenant <slug>Step 3 above
  • Test phone number verified with Meta (required for live WhatsApp smoke test) — Onboard a tenant Step 2 / First booking
  • Row counts verified via psqlStep 4 above
  • Payment sandbox configured if testing M-Pesa flows (mpesa_enabled: true on the tenant + Daraja sandbox creds in .env) — see ADR-0007