Knowledge answers
What it does
Customers ask questions that structured data cannot answer: "Do I need to arrive early?", "Is there parking?", "What's your cancellation policy?", "Are you open on Christmas Eve?". Before Phase 0, all three inquiry intents (services, hours, other) answered from structured DB rows only. The other catch-all dead-ended with a "I'll check with the team" deflection because its raw_data was an empty dict.
Phase 0 fixes this with the simplest possible mechanism: curated per-tenant knowledge_snippets rows injected directly into the existing answer_shaper prompt. The model performs the "retrieval" by attention — no vector index, no embedding column, no retrieval subsystem. Because a single service business's entire knowledge base is on the order of tens of short snippets (not thousands of documents), the whole active KB fits comfortably in the prompt window. Ratiba ships grounded free-form answers from day one with a backend-only change: one new table, one new service function, one dispatcher line, one prompt revision.
Real embeddings + pgvector retrieval are deferred as YAGNI — ADR-0013 records both the decision and the observable graduation trigger that makes "not yet" expire with evidence rather than guesswork.
How it fits
The knowledge_snippets table
The table lives inside each tenant's schema (auto-scoped by the ADR-0002 search_path rule — no tenant_id column needed). For the full per-tenant schema routing design, see Identity and tenancy.
Columns:
| Column | Type | Notes |
|---|---|---|
id | UUID | PK |
category | text | One of: policy, facility, prep, service, hours, general |
title | text | Short label (used in structured log output) |
body | text | The snippet text injected into the prompt |
language | text | Default en; informational only — not filtered at query time |
is_active | bool | Only active rows are fetched |
created_at | timestamptz | |
updated_at | timestamptz | Used as the ORDER BY key — newest snippets surface first |
No embedding vector column. That is the single additive Phase-1 upgrade. Adding it later requires no schema changes to consumers — it is a pure ALTER TABLE ADD COLUMN + backfill on the existing table.
The language column is intentionally not used as a filter: an English-only KB still serves Swahili customers because answer_shaper translates at render time. A snippet authored in English appears in a Swahili reply with no extra configuration.
The fetch_snippets seam
app/services/knowledge.py::fetch_snippets(intent) is the single retrieval entry point. Phase 0 runs a category-scoped SELECT; Phase 1 swaps the internals for a top-k cosine query against the added embedding column. The function signature and every caller stay identical — the only forward-compat addition is a query: str kwarg (currently unused, reserved for the Phase-1 swap).
The dispatcher inject
Inside _dispatch_locked in app/orchestrator/dispatcher.py, the inquiry else branch (which handles services, hours, and other) does this:
raw_data = await _fetch_inquiry_raw_data(intent)
raw_data["knowledge"] = await fetch_snippets(intent)
One line. Snippets serialise into raw_data_json and ride into the user template — exactly the path personality dials already use. The system_message in answer_shaper.yaml is never touched by snippets, which preserves the prompt-cache invariant. For the full prompt-cache design and how answer_shaper.yaml is structured, see Personality dials.
How it flows
Category-to-intent routing
fetch_snippets maps intent to category filter via _CATEGORIES_BY_INTENT:
| Intent | Categories fetched |
|---|---|
services | service, general |
hours | hours, general |
other | all categories (no filter) |
The other catch-all deliberately fetches everything — it covers the widest possible question space (cancellation policy, parking, deposit rules, prep instructions). The trade-off is that other turns carry the full active KB in the prompt on every call; the cap keeps that bounded.
The cap and the graduation trigger
fetch_snippets enforces two limits simultaneously:
- Count cap:
limit=20(default) - Character cap:
max_chars=6000(default; roughly ~1500 tokens at 4 chars/token)
Whichever limit is hit first terminates the loop. When a row would push past either limit, the function emits a knowledge_overflow WARN structlog event and returns what it has:
knowledge_overflow | intent=other | returned=18 | total=31
This WARN is the Phase-0 → Phase-1 graduation trigger. When a tenant routinely overflows, stuffing the whole active KB into the prompt no longer fits and real retrieval (Phase 1: embeddings + pgvector + top-k cosine) is finally justified — for that tenant, with evidence. The YAGNI expires observably rather than by calendar guess. Track overflow frequency in the daily digest (see Observability).
What the structured log lines look like
On every other-intent turn the dispatcher emits a knowledge_gap_candidate WARNING (see Gap logging below). On a successful snippet fetch you will see an implicit absence of that event only when snippets fully covered the question — but there is no separate "hit" event in Phase 0 (the LLM performs the match silently by attention). The two observable events are:
# Overflow: fetch_snippets hit the cap
{"event": "knowledge_overflow", "intent": "other", "returned": 18, "total": 31, "level": "warning"}
# Gap candidate: every other-intent turn
{"event": "knowledge_gap_candidate", "question": "Do you do home visits?", "tenant_id": "...", "snippets_available": 18, "level": "warning"}
Seeding snippets
Snippets are hand-seeded for the M13 pilot. There is no conversational or dashboard authoring UI in Phase 0.
For the full seeding workflow — the scripts/seed_knowledge.py script, the per-tenant YAML format, and the idempotency contract — see Seed data.
Gap logging
On every other-intent turn, the dispatcher emits a knowledge_gap_candidate structured log event immediately after shape_answer returns:
if intent == "other":
logger.warning(
"knowledge_gap_candidate",
question=customer_text,
tenant_id=str(tenant_ctx.tenant_id),
snippets_available=len(raw_data.get("knowledge", [])),
)
Honest caveat about precision: other is the intent classifier's catch-all. Every message that is not a recognisable booking/cancel/reschedule/services/hours intent lands here — including greetings, thanks, and off-topic questions the KB genuinely cannot cover. The knowledge_gap_candidate event therefore logs catch-all questions, not a precise "the bot failed to answer this" signal. Precise miss-detection would require an LLM self-report ("did I answer this from the snippets?"), which breaks the plain-text output contract and is deferred.
The practical value is aggregate: when the daily WhatsApp digest surfaces recurring questions tagged knowledge_gap_candidate, you can review them and decide which warrant new snippets. See Observability for the digest pipeline.
No ADR-0006 handoff wiring in Phase 0. On a miss, the existing deflection path continues unchanged ("I'll check with the team"). The handoff escalation from knowledge gaps is deferred until pilot data shows which gaps are frequent enough to justify automatic escalation. For the handoff model, see ADR-0006.
Where it lives in code
| Concern | File | Key entry point |
|---|---|---|
| Retrieval seam | app/services/knowledge.py | fetch_snippets(intent, *, limit, max_chars) |
| Category-to-intent routing | app/services/knowledge.py | _CATEGORIES_BY_INTENT dict (line 25) |
| Dispatcher inject | app/orchestrator/dispatcher.py | _dispatch_locked inquiry else branch (line 1090–1091) |
| Gap candidate log | app/orchestrator/dispatcher.py | knowledge_gap_candidate WARN (line 1127–1133) |
| Prompt version | app/prompts/answer_shaper.yaml | version: 0.5.0 (0.3.0 added knowledge awareness; 0.4.0 the Savannah persona name; 0.5.0 asks for Markdown bullet lists for multi-item answers so they render as real lists, and KES 4,500 money formatting) |
Decisions
- ADR-0013 is the authoritative decision record: the no-RAG-RAG framing, the YAGNI deferral of pgvector, the
knowledge_overflowgraduation trigger, the Phase-1 upgrade path, and the alternatives considered (full zol-rag port, JSONB blob, seed-file-only approach, immediate ADR-0006 handoff wiring — all rejected).
Related
- Conversation FSM — the inquiry path inside the FSM that routes
services/hours/otherintents to AnswerShaper rather than a booking subgraph. - Personality dials — owns
answer_shaper.yaml, the prompt-cache invariant, and the user-template splice pattern that knowledge snippets follow. - Seed data — the
scripts/seed_knowledge.pyworkflow and per-tenant YAML format. - Observability — the daily digest that surfaces
knowledge_gap_candidateandknowledge_overflowevents. - Glossary — definitions for
knowledge_snippets,fetch_snippets,knowledge_gap_candidate,knowledge_overflow, and "no-RAG RAG".