Knowledge answers

What it does

Customers ask questions that structured data cannot answer: "Do I need to arrive early?", "Is there parking?", "What's your cancellation policy?", "Are you open on Christmas Eve?". Before Phase 0, all three inquiry intents (services, hours, other) answered from structured DB rows only. The other catch-all dead-ended with a "I'll check with the team" deflection because its raw_data was an empty dict.

Phase 0 fixes this with the simplest possible mechanism: curated per-tenant knowledge_snippets rows injected directly into the existing answer_shaper prompt. The model performs the "retrieval" by attention — no vector index, no embedding column, no retrieval subsystem. Because a single service business's entire knowledge base is on the order of tens of short snippets (not thousands of documents), the whole active KB fits comfortably in the prompt window. Ratiba ships grounded free-form answers from day one with a backend-only change: one new table, one new service function, one dispatcher line, one prompt revision.

Real embeddings + pgvector retrieval are deferred as YAGNI — ADR-0013 records both the decision and the observable graduation trigger that makes "not yet" expire with evidence rather than guesswork.

How it fits

The `knowledge_snippets` table

The table lives inside each tenant's schema (auto-scoped by the ADR-0002 search_path rule — no tenant_id column needed). For the full per-tenant schema routing design, see Identity and tenancy.

Columns:

Column	Type	Notes
`id`	UUID	PK
`category`	text	One of: `policy`, `facility`, `prep`, `service`, `hours`, `general`
`title`	text	Short label (used in structured log output)
`body`	text	The snippet text injected into the prompt
`language`	text	Default `en`; informational only — not filtered at query time
`is_active`	bool	Only active rows are fetched
`created_at`	timestamptz
`updated_at`	timestamptz	Used as the `ORDER BY` key — newest snippets surface first

No embedding vector column. That is the single additive Phase-1 upgrade. Adding it later requires no schema changes to consumers — it is a pure ALTER TABLE ADD COLUMN + backfill on the existing table.

The language column is intentionally not used as a filter: an English-only KB still serves Swahili customers because answer_shaper translates at render time. A snippet authored in English appears in a Swahili reply with no extra configuration.

The `fetch_snippets` seam

app/services/knowledge.py::fetch_snippets(intent) is the single retrieval entry point. Phase 0 runs a category-scoped SELECT; Phase 1 swaps the internals for a top-k cosine query against the added embedding column. The function signature and every caller stay identical — the only forward-compat addition is a query: str kwarg (currently unused, reserved for the Phase-1 swap).

The dispatcher inject

Inside _dispatch_locked in app/orchestrator/dispatcher.py, the inquiry else branch (which handles services, hours, and other) does this:

raw_data = await _fetch_inquiry_raw_data(intent)
raw_data["knowledge"] = await fetch_snippets(intent)

One line. Snippets serialise into raw_data_json and ride into the user template — exactly the path personality dials already use. The system_message in answer_shaper.yaml is never touched by snippets, which preserves the prompt-cache invariant. For the full prompt-cache design and how answer_shaper.yaml is structured, see Personality dials.

How it flows

Category-to-intent routing

fetch_snippets maps intent to category filter via _CATEGORIES_BY_INTENT:

Intent	Categories fetched
`services`	`service`, `general`
`hours`	`hours`, `general`
`other`	all categories (no filter)

The other catch-all deliberately fetches everything — it covers the widest possible question space (cancellation policy, parking, deposit rules, prep instructions). The trade-off is that other turns carry the full active KB in the prompt on every call; the cap keeps that bounded.

The cap and the graduation trigger

fetch_snippets enforces two limits simultaneously:

Count cap: limit=20 (default)
Character cap: max_chars=6000 (default; roughly ~1500 tokens at 4 chars/token)

Whichever limit is hit first terminates the loop. When a row would push past either limit, the function emits a knowledge_overflow WARN structlog event and returns what it has:

knowledge_overflow | intent=other | returned=18 | total=31

This WARN is the Phase-0 → Phase-1 graduation trigger. When a tenant routinely overflows, stuffing the whole active KB into the prompt no longer fits and real retrieval (Phase 1: embeddings + pgvector + top-k cosine) is finally justified — for that tenant, with evidence. The YAGNI expires observably rather than by calendar guess. Track overflow frequency in the daily digest (see Observability).

What the structured log lines look like

On every other-intent turn the dispatcher emits a knowledge_gap_candidate WARNING (see Gap logging below). On a successful snippet fetch you will see an implicit absence of that event only when snippets fully covered the question — but there is no separate "hit" event in Phase 0 (the LLM performs the match silently by attention). The two observable events are:

# Overflow: fetch_snippets hit the cap
{"event": "knowledge_overflow", "intent": "other", "returned": 18, "total": 31, "level": "warning"}

# Gap candidate: every other-intent turn
{"event": "knowledge_gap_candidate", "question": "Do you do home visits?", "tenant_id": "...", "snippets_available": 18, "level": "warning"}

Seeding snippets

Snippets are hand-seeded for the M13 pilot. There is no conversational or dashboard authoring UI in Phase 0.

For the full seeding workflow — the scripts/seed_knowledge.py script, the per-tenant YAML format, and the idempotency contract — see Seed data.

Gap logging

On every other-intent turn, the dispatcher emits a knowledge_gap_candidate structured log event immediately after shape_answer returns:

if intent == "other":
    logger.warning(
        "knowledge_gap_candidate",
        question=customer_text,
        tenant_id=str(tenant_ctx.tenant_id),
        snippets_available=len(raw_data.get("knowledge", [])),
    )

Honest caveat about precision: other is the intent classifier's catch-all. Every message that is not a recognisable booking/cancel/reschedule/services/hours intent lands here — including greetings, thanks, and off-topic questions the KB genuinely cannot cover. The knowledge_gap_candidate event therefore logs catch-all questions, not a precise "the bot failed to answer this" signal. Precise miss-detection would require an LLM self-report ("did I answer this from the snippets?"), which breaks the plain-text output contract and is deferred.

The practical value is aggregate: when the daily WhatsApp digest surfaces recurring questions tagged knowledge_gap_candidate, you can review them and decide which warrant new snippets. See Observability for the digest pipeline.

No ADR-0006 handoff wiring in Phase 0. On a miss, the existing deflection path continues unchanged ("I'll check with the team"). The handoff escalation from knowledge gaps is deferred until pilot data shows which gaps are frequent enough to justify automatic escalation. For the handoff model, see ADR-0006.

Where it lives in code

Concern	File	Key entry point
Retrieval seam	`app/services/knowledge.py`	`fetch_snippets(intent, *, limit, max_chars)`
Category-to-intent routing	`app/services/knowledge.py`	`_CATEGORIES_BY_INTENT` dict (line 25)
Dispatcher inject	`app/orchestrator/dispatcher.py`	`_dispatch_locked` inquiry `else` branch (line 1090–1091)
Gap candidate log	`app/orchestrator/dispatcher.py`	`knowledge_gap_candidate` WARN (line 1127–1133)
Prompt version	`app/prompts/answer_shaper.yaml`	`version: 0.5.0` (0.3.0 added knowledge awareness; 0.4.0 the Savannah persona name; 0.5.0 asks for Markdown bullet lists for multi-item answers so they render as real lists, and `KES 4,500` money formatting)

Decisions

ADR-0013 is the authoritative decision record: the no-RAG-RAG framing, the YAGNI deferral of pgvector, the knowledge_overflow graduation trigger, the Phase-1 upgrade path, and the alternatives considered (full zol-rag port, JSONB blob, seed-file-only approach, immediate ADR-0006 handoff wiring — all rejected).

Conversation FSM — the inquiry path inside the FSM that routes services / hours / other intents to AnswerShaper rather than a booking subgraph.
Personality dials — owns answer_shaper.yaml, the prompt-cache invariant, and the user-template splice pattern that knowledge snippets follow.
Seed data — the scripts/seed_knowledge.py workflow and per-tenant YAML format.
Observability — the daily digest that surfaces knowledge_gap_candidate and knowledge_overflow events.
Glossary — definitions for knowledge_snippets, fetch_snippets, knowledge_gap_candidate, knowledge_overflow, and "no-RAG RAG".

What it does​

How it fits​

The knowledge_snippets table​

The fetch_snippets seam​

The dispatcher inject​

How it flows​

Category-to-intent routing​

The cap and the graduation trigger​

What the structured log lines look like​

Seeding snippets​

Gap logging​

Where it lives in code​

Decisions​

Related​