ADR-0015: Pooled deferred staff allocation for bookings
Date: 2026-06-14 Status: Accepted Deciders: Adrian (founder), Claude Opus 4.8 (design + implementation agent)
Context
Ratiba's booking flow has, since M-series, used an instant-binding model: the
customer (or the "anyone" least-busy tie-break) selects a therapist at the moment
they pick a time, and appointments.staff_id is committed at CONFIRM
(collect_slot_node sets staff_id from the chosen SlotOption;
find_available_slots at app/services/availability.py:153 enumerates per-staff
slots across now+1h..+14d and either pins one staff or mixes the qualified pool
with a (start_at, day_busy, staff_id) sort). The therapist is locked the instant
a slot is picked.
This fights how the target business (a Nairobi spa) actually operates. The owner wants to:
- Let customers book by time, then allocate who serves them as a batch against the next day's full booking sheet (front-desk reality).
- Make "I want a specific therapist" an opt-in feature, not the default — most customers don't care who, and locking a name at booking removes the manager's freedom to balance load.
- Bring in a temporary masseuse when demand spikes, and have that capacity appear immediately.
- Absorb a same-day absence (sick / no-show) or a multi-day holiday by reshuffling the team — ideally by just telling the assistant in chat ("Grace is out sick today") and having the slots and capacity reallocate.
Instant-binding supports none of these gracefully: a locked staff_id can't be
silently re-pooled, capacity is per-person rather than per-pool, and a specific
person is always asked for. The substrate, however, is already half-ready:
appointments.staff_id is nullable, the /admin/calendar grid (PR #19/#20)
already renders an "Unassigned" column, staff_blocks exists for absences,
and public.tenants already carries the per-tenant boolean-flag pattern
(mpesa_enabled, voice_enabled, channel_steering_enabled, …).
A live two-browser QA transcript triggered this redesign: a customer asked for "a
female masseuse" and the instant-binding flow mis-handled it as a fresh service
booking. The fix for that (the "Excellent Booking Conversation" wave, spec
2026-06-14-excellent-booking-conversation-design.md) was partially shipped
(PR-A, item ② unknown-therapist, merged as 27f9a41) before this larger model
question surfaced; its PR-B (slot picker) and PR-C (gender) were paused and are
folded into this model's phases instead of being built on the instant-binding
assumption.
Decision
Adopt pooled deferred allocation: commit the time at booking; assign the therapist later.
-
Assignment status. Add
appointments.assignment_status('provisional' | 'bound' | 'finalized', NOT NULL, default'provisional').staff_idstays nullable and holds the provisionally/finally assigned therapist. The customer is never shown a name while a row isprovisional. -
Time-first booking (default, flag OFF). Reuse
find_available_slotsunchanged for offering times; the FSM no longer surfaces or commits a therapist name at slot-pick. Confirmation reads "we'll confirm your therapist before your visit." -
Provisional auto-pick at booking. The system provisionally assigns the least-busy free therapist (today's
_sort_key) so capacity is honest and a fallback always exists — but the row isprovisionaland the name is hidden. -
Capacity = the provisional guard. No separate capacity subsystem: the engine already only offers a time when a pooled therapist is free; provisional assignment consumes that slot. Temps raise capacity (a staff row); absences (
staff_blocks) lower it — both automatically. -
Manager reallocation + cutoff finalize. The
/admin/calendarday-grid lets the manager reassign provisional appointments across the qualified+free pool. A per-tenant cutoff (default 18:00 the day before) flipsprovisional → finalizedand notifies each customer their therapist's name; same-day bookings finalize + notify immediately. -
Specific-person = opt-in flag. New
public.tenants.staff_preference_enabled(default FALSE). When TRUE the staff step offers "Any available" (pooled) vs "Choose a therapist" (→assignment_status='bound', fixed, not reallocatable). When FALSE the staff step is skipped. -
Gender as a pool constraint. A gender preference filters the offered capacity and the provisional pick to matching-gender staff (graceful fallback to all when none on file), rather than picking a person. The
staff.genderdata/UI/Q&A layer (from the paused PR-C) ships unchanged. -
Temporary staff. A temp is a normal
staffrow withis_temporary=true+ a bounded schedule; it expandsfind_available_slotscapacity immediately, is assignable/reallocatable, and is reaped after its dates by the existing 3 AM EAT job (ADR-0007). -
Conversational absence. An admin NL verb + slash
/absence <name> <range>(under the ADR-0014 two-turn confirm) writes astaff_block; the absent therapist's provisional appointments are silently re-pooled where they fit; overflow +bound-to-the-absent-person conflicts land in a manager "needs attention" queue. No customer is messaged until the manager acts.
Rollout is phased (see the companion spec §"Phasing"): Core → Manager
reallocation + finalize → Temp staff → Conversational absence → Specific-person +
gender. Each phase is its own PR with the clean -n 0 sweep as the done-gate.
Decision context
- Latency: none added on the hot path — offering still runs
find_available_slots(unchanged); provisional assignment is the same least-busy sort already computed. The finalize-and-notify is a batched daily job (per-tenant cutoff), off the request path. Re-pool on absence is an admin-triggered batch, not customer-facing. - Dependency surface: zero new packages. One enum column + two booleans + one
cutoff setting; reuses
staff_blocks, the nullablestaff_id, the calendar grid, the notification sink, the ADR-0014 admin NL router, and the ADR-0007 reaper. - Debuggability:
assignment_statusmakes the lifecycle explicit and queryable (SELECT … WHERE assignment_status='provisional'); re-pool + finalize emit structured logs; the "needs attention" queue surfaces every un-absorbable case rather than failing silently. The failure mode (a provisional appt with no finalizable therapist) is visible on the grid, not buried. - Reversibility: moderate. The flag (
staff_preference_enabled) lets a tenant fall back toward person-picking; the schema additions are additive/nullable. But the FSM's decouple-name-from-slot-pick and the finalize job are a behavioural change that would take a few files to fully revert — not a single flag flip. Estimate: ~1 day to revert to instant-binding if the pilot rejects it. - Blast radius: the booking FSM's COLLECT_STAFF/COLLECT_SLOT/CONFIRM path, the appointment write, the calendar grid, and the admin NL router. Additive to the schema; substitutive in the FSM (the name is no longer bound at slot-pick). Phased so each PR's radius is bounded.
- Alternative considered: keep instant-binding and only hide the name (rejected — does not deliver manager batch-allocation, temp surge, or graceful absence, which are the whole point). See Alternatives.
Consequences
Positive
- Matches how the pilot business actually schedules (book time now, allocate staff against tomorrow's sheet).
- Absences and surges become graceful: a one-sentence chat message re-pools provisional appointments and recomputes capacity; a temp instantly adds capacity.
- The manager regains load-balancing control they lost under instant-binding.
- "Specific therapist" is honoured as a deliberate feature, not forced on every customer; the fast "any available" path is the default.
- Gender preference becomes correct (a pool constraint) instead of a brittle person-pick.
Negative
- The customer leaves the booking without a named therapist and must trust a later confirmation — a UX regression for customers who want to know "who" immediately (mitigated by the day-before name notification and the opt-in flag).
- New lifecycle state (
assignment_status) + a finalize job + a re-pool routine = more moving parts than instant-binding; more to test and operate. - The "needs attention" queue depends on the manager acting; if ignored, some appointments stay provisional past the cutoff (mitigated: the cutoff job can finalize the least-busy fallback and alert, so no customer is left with no one).
- Bound (specific-person) appointments to an absent therapist can't auto-resolve — they require manager action and a customer message.
Neutral
find_available_slotsis unchanged for offering; only the consumption (no name bound) and a new provisional write differ.- The calendar's existing "Unassigned" column gains real meaning.
- PR-B (slot picker) and PR-C (gender) are re-homed into this model's phases rather than shipped standalone.
Alternatives Considered
Alternative 1: Keep instant-binding, only hide the therapist name
- Pick the least-busy staff at booking exactly as today, but don't show the name until later.
- Why rejected: delivers none of the operating requirements — no manager batch
allocation against the next day's sheet, no graceful absence re-pool (a locked
staff_idstill can't be silently moved), no clean temp-surge. It is cosmetic.
Alternative 2: Day-only or day+time-window booking
- The customer picks a day (or morning/afternoon) and the manager builds the exact times + staff.
- Why rejected: removes the customer's exact-time certainty and forces a follow-up message just to learn their time; higher manager burden per booking. Adrian chose exact-time-with-deferred-staff (fork A1) precisely to keep time certainty.
Alternative 3: True pool with a separate per-time capacity counter
- Leave
staff_idgenuinely NULL at booking and enforce capacity with a new per-minute concurrency counter over the pool; assign every name at allocation time. - Why rejected: adds a whole capacity-counter subsystem and an allocation step that must run or appointments stay nameless, for no benefit over reusing the provisional least-busy assignment as the guard. The provisional-assignment approach reuses the existing engine and is strictly simpler.
Alternative 4: Auto-resolve absences by messaging/rescheduling customers
- On an absence, immediately message affected customers or auto-move their appointments.
- Why rejected: fires before the manager can add a temp that would have covered them, over-notifying customers and imposing changes they didn't choose. Manager-in-the-loop for the hard cases (silent re-pool what fits; flag the rest) was chosen instead.