Seed test data

Realistic test data is the difference between a booking flow that exercises real logic and one that returns empty-catalog deflections. Before you can walk through the First booking guide end-to-end you need a tenant with:

Services (what can be booked, prices, durations)
Staff (who delivers the service)
Staff schedules (when those staff are available — the FSM reads these to propose slots)
Knowledge snippets (canned answers to FAQ-style questions about hours, policies, facilities)

This page covers the fast developer path: exact commands, the right order, and honest notes about which step requires the UI vs a script. The UI-led path — intended for real tenant onboarding — is documented in Onboard a tenant. This page is the complement: identical infrastructure, short-circuit for the catalog and knowledge steps.

Programmatic seeding

Step 1 — Provision the tenant

A tenant must exist in public.tenants before anything else can happen. The seeding script and SQL commands below target a tenant by its slug, and they will fail with a clear error if the slug is not found.

Use the admin endpoint (same as Onboard a tenant Step 1):

curl -X POST http://localhost:8010/api/v1/admin/tenants \
  -H "Content-Type: application/json" \
  -H "X-Admin-API-Key: dev-admin-key" \
  -d '{
    "name": "Aria Aura Spa (dev)",
    "slug": "aria_aura",
    "vertical": "spa",
    "owner_phone": "+254712345678",
    "whatsapp_number": "+15555555555",
    "locale": "en",
    "timezone": "Africa/Nairobi",
    "mpesa_enabled": false
  }'

Slug rules: ^[a-z0-9_]+$ — lowercase, digits, underscores only. No hyphens; the slug becomes a Postgres schema name (tenant_aria_aura). The response is HTTP 201 with the tenant object.

Confirm the schema landed:

docker compose exec postgres psql -U ratiba ratiba \
  -c "\dt tenant_aria_aura.*" | grep services

You should see services in the table list.

Step 2 — Add services, staff, and schedules

There is no standalone seed script for the catalog — the production path uses Vision-LLM-assisted upload through the dashboard (see Catalog onboarding). For dev purposes the fastest route is a direct psql insert. The shape below mirrors exactly what the seeded_tenant pytest fixture inserts (see backend/tests/conftest.py — the canonical reference for the complete SeededTenant data model).

docker compose exec postgres psql -U ratiba ratiba

Inside psql, set the search path to your tenant schema and insert:

SET search_path TO tenant_aria_aura;

-- Services (name, Swahili name, duration in minutes, price in KES)
INSERT INTO services (name, name_sw, duration_min, price, is_active, sort_order)
VALUES
  ('Deep Tissue Massage', 'Masaji',     60, 3500.00, TRUE, 0),
  ('Manicure',            'Manikure',   30, 1200.00, TRUE, 1),
  ('Pedicure',            'Pedikure',   45, 1500.00, TRUE, 2);

-- One staff member
INSERT INTO staff (name, phone_number, is_active)
VALUES ('Maria', '+254712345678', TRUE)
RETURNING id;
-- Copy the returned UUID into <staff_uuid> below

-- Mon-Fri 09:00-17:00 schedule (day_of_week: 0=Mon, 4=Fri)
INSERT INTO staff_schedules (staff_id, day_of_week, start_time, end_time, is_available)
VALUES
  ('<staff_uuid>', 0, '09:00', '17:00', TRUE),
  ('<staff_uuid>', 1, '09:00', '17:00', TRUE),
  ('<staff_uuid>', 2, '09:00', '17:00', TRUE),
  ('<staff_uuid>', 3, '09:00', '17:00', TRUE),
  ('<staff_uuid>', 4, '09:00', '17:00', TRUE);

The seeded_tenant fixture (used in integration tests) inserts identical rows programmatically via get_tenant_session(). If you are writing a new integration test and need a fully-populated tenant, use that fixture directly rather than repeating the SQL — it handles the Alembic migration, pool init, and schema context automatically.

Step 3 — Seed knowledge snippets

scripts/seed_knowledge.py reads a curated YAML from backend/config/knowledge/<slug>.yaml and upserts every snippet into <tenant>.knowledge_snippets (per ADR-0013). The upsert is idempotent: it deletes on the natural key (category, title) then re-inserts, so re-running is safe.

cd backend
python -m scripts.seed_knowledge --tenant aria_aura

Dry-run first to confirm the YAML loads and count the rows without writing:

python -m scripts.seed_knowledge --tenant aria_aura --dry-run

The script resolves the tenant by querying public.tenants WHERE slug = $1 AND status IN ('active', 'trial'). If the tenant is not yet provisioned, or its status is something else, the script exits with a descriptive error — provision first (Step 1).

The YAML file lives at backend/config/knowledge/<slug>.yaml. For aria_aura it ships with the repo at backend/config/knowledge/aria_aura.yaml. Categories used by the Phase 0 intent router are: policy, facility, prep, service, hours, general. For how category maps to intent and how snippets reach the prompt, see Knowledge answers.

For a new dev tenant with a different slug, either copy and rename aria_aura.yaml or author a new YAML following the same structure:

snippets:
  - category: hours
    title: Opening hours
    body: We are open Monday to Saturday, 9 AM to 6 PM EAT.
    language: en
  - category: policy
    title: Cancellation policy
    body: Please give 24 hours notice to cancel or reschedule.
    language: en

Step 4 — Verify

Quick row-count check:

docker compose exec postgres psql -U ratiba ratiba -c "
  SET search_path TO tenant_aria_aura;
  SELECT
    (SELECT count(*) FROM services WHERE is_active) AS services,
    (SELECT count(*) FROM staff   WHERE is_active)  AS staff,
    (SELECT count(*) FROM staff_schedules)           AS schedules,
    (SELECT count(*) FROM knowledge_snippets WHERE is_active) AS snippets;
"

Expected output for the data above:

 services | staff | schedules | snippets
----------+-------+-----------+----------
        3 |     1 |         5 |       20
(1 row)

(Snippet count varies with the YAML; the aria_aura.yaml seed ships ~20 rows.)

Smoke message. If you have a Meta test phone number configured (see First-run checklist), send a WhatsApp message to your tenant's number asking something like "what time do you open?" — the agent should return a grounded answer drawn from the hours snippets rather than the deflection fallback. Tail the backend log and filter for the knowledge service:

docker compose logs -f backend | grep -E "knowledge_gap_candidate|knowledge_overflow|fetch_snippets"

A knowledge_overflow WARN means the snippet set is approaching the ~20-snippet / ~1,500-token Phase 0 cap — that is the ADR-0013 graduation trigger to add the embedding column and switch to top-k retrieval.

First-run checklist

Work through this list top-to-bottom whenever you stand up a fresh dev environment. Each item links to where it is done.

Docker Compose up and all services healthy — see Local dev runbook
Backend migrations applied (alembic upgrade head inside backend/) — see Dev setup
Tenant provisioned via POST /api/v1/admin/tenants — Step 1 above or Onboard a tenant
Services, staff, and staff schedules inserted — Step 2 above or the catalog dashboard at http://localhost:3010/admin/catalog
Knowledge snippets seeded with python -m scripts.seed_knowledge --tenant <slug> — Step 3 above
Test phone number verified with Meta (required for live WhatsApp smoke test) — Onboard a tenant Step 2 / First booking
Row counts verified via psql — Step 4 above
Payment sandbox configured if testing M-Pesa flows (mpesa_enabled: true on the tenant + Daraja sandbox creds in .env) — see ADR-0007

Programmatic seeding​

Step 1 — Provision the tenant​

Step 2 — Add services, staff, and schedules​

Step 3 — Seed knowledge snippets​

Step 4 — Verify​

First-run checklist​