Skip to main content

ADR-0015: Pooled deferred staff allocation for bookings

Date: 2026-06-14 Status: Accepted Deciders: Adrian (founder), Claude Opus 4.8 (design + implementation agent)

Context

Ratiba's booking flow has, since M-series, used an instant-binding model: the customer (or the "anyone" least-busy tie-break) selects a therapist at the moment they pick a time, and appointments.staff_id is committed at CONFIRM (collect_slot_node sets staff_id from the chosen SlotOption; find_available_slots at app/services/availability.py:153 enumerates per-staff slots across now+1h..+14d and either pins one staff or mixes the qualified pool with a (start_at, day_busy, staff_id) sort). The therapist is locked the instant a slot is picked.

This fights how the target business (a Nairobi spa) actually operates. The owner wants to:

  1. Let customers book by time, then allocate who serves them as a batch against the next day's full booking sheet (front-desk reality).
  2. Make "I want a specific therapist" an opt-in feature, not the default — most customers don't care who, and locking a name at booking removes the manager's freedom to balance load.
  3. Bring in a temporary masseuse when demand spikes, and have that capacity appear immediately.
  4. Absorb a same-day absence (sick / no-show) or a multi-day holiday by reshuffling the team — ideally by just telling the assistant in chat ("Grace is out sick today") and having the slots and capacity reallocate.

Instant-binding supports none of these gracefully: a locked staff_id can't be silently re-pooled, capacity is per-person rather than per-pool, and a specific person is always asked for. The substrate, however, is already half-ready: appointments.staff_id is nullable, the /admin/calendar grid (PR #19/#20) already renders an "Unassigned" column, staff_blocks exists for absences, and public.tenants already carries the per-tenant boolean-flag pattern (mpesa_enabled, voice_enabled, channel_steering_enabled, …).

A live two-browser QA transcript triggered this redesign: a customer asked for "a female masseuse" and the instant-binding flow mis-handled it as a fresh service booking. The fix for that (the "Excellent Booking Conversation" wave, spec 2026-06-14-excellent-booking-conversation-design.md) was partially shipped (PR-A, item ② unknown-therapist, merged as 27f9a41) before this larger model question surfaced; its PR-B (slot picker) and PR-C (gender) were paused and are folded into this model's phases instead of being built on the instant-binding assumption.

Decision

Adopt pooled deferred allocation: commit the time at booking; assign the therapist later.

  1. Assignment status. Add appointments.assignment_status ('provisional' | 'bound' | 'finalized', NOT NULL, default 'provisional'). staff_id stays nullable and holds the provisionally/finally assigned therapist. The customer is never shown a name while a row is provisional.

  2. Time-first booking (default, flag OFF). Reuse find_available_slots unchanged for offering times; the FSM no longer surfaces or commits a therapist name at slot-pick. Confirmation reads "we'll confirm your therapist before your visit."

  3. Provisional auto-pick at booking. The system provisionally assigns the least-busy free therapist (today's _sort_key) so capacity is honest and a fallback always exists — but the row is provisional and the name is hidden.

  4. Capacity = the provisional guard. No separate capacity subsystem: the engine already only offers a time when a pooled therapist is free; provisional assignment consumes that slot. Temps raise capacity (a staff row); absences (staff_blocks) lower it — both automatically.

  5. Manager reallocation + cutoff finalize. The /admin/calendar day-grid lets the manager reassign provisional appointments across the qualified+free pool. A per-tenant cutoff (default 18:00 the day before) flips provisional → finalized and notifies each customer their therapist's name; same-day bookings finalize + notify immediately.

  6. Specific-person = opt-in flag. New public.tenants.staff_preference_enabled (default FALSE). When TRUE the staff step offers "Any available" (pooled) vs "Choose a therapist" (→ assignment_status='bound', fixed, not reallocatable). When FALSE the staff step is skipped.

  7. Gender as a pool constraint. A gender preference filters the offered capacity and the provisional pick to matching-gender staff (graceful fallback to all when none on file), rather than picking a person. The staff.gender data/UI/Q&A layer (from the paused PR-C) ships unchanged.

  8. Temporary staff. A temp is a normal staff row with is_temporary=true + a bounded schedule; it expands find_available_slots capacity immediately, is assignable/reallocatable, and is reaped after its dates by the existing 3 AM EAT job (ADR-0007).

  9. Conversational absence. An admin NL verb + slash /absence <name> <range> (under the ADR-0014 two-turn confirm) writes a staff_block; the absent therapist's provisional appointments are silently re-pooled where they fit; overflow + bound-to-the-absent-person conflicts land in a manager "needs attention" queue. No customer is messaged until the manager acts.

Rollout is phased (see the companion spec §"Phasing"): Core → Manager reallocation + finalize → Temp staff → Conversational absence → Specific-person + gender. Each phase is its own PR with the clean -n 0 sweep as the done-gate.

Decision context

  • Latency: none added on the hot path — offering still runs find_available_slots (unchanged); provisional assignment is the same least-busy sort already computed. The finalize-and-notify is a batched daily job (per-tenant cutoff), off the request path. Re-pool on absence is an admin-triggered batch, not customer-facing.
  • Dependency surface: zero new packages. One enum column + two booleans + one cutoff setting; reuses staff_blocks, the nullable staff_id, the calendar grid, the notification sink, the ADR-0014 admin NL router, and the ADR-0007 reaper.
  • Debuggability: assignment_status makes the lifecycle explicit and queryable (SELECT … WHERE assignment_status='provisional'); re-pool + finalize emit structured logs; the "needs attention" queue surfaces every un-absorbable case rather than failing silently. The failure mode (a provisional appt with no finalizable therapist) is visible on the grid, not buried.
  • Reversibility: moderate. The flag (staff_preference_enabled) lets a tenant fall back toward person-picking; the schema additions are additive/nullable. But the FSM's decouple-name-from-slot-pick and the finalize job are a behavioural change that would take a few files to fully revert — not a single flag flip. Estimate: ~1 day to revert to instant-binding if the pilot rejects it.
  • Blast radius: the booking FSM's COLLECT_STAFF/COLLECT_SLOT/CONFIRM path, the appointment write, the calendar grid, and the admin NL router. Additive to the schema; substitutive in the FSM (the name is no longer bound at slot-pick). Phased so each PR's radius is bounded.
  • Alternative considered: keep instant-binding and only hide the name (rejected — does not deliver manager batch-allocation, temp surge, or graceful absence, which are the whole point). See Alternatives.

Consequences

Positive

  • Matches how the pilot business actually schedules (book time now, allocate staff against tomorrow's sheet).
  • Absences and surges become graceful: a one-sentence chat message re-pools provisional appointments and recomputes capacity; a temp instantly adds capacity.
  • The manager regains load-balancing control they lost under instant-binding.
  • "Specific therapist" is honoured as a deliberate feature, not forced on every customer; the fast "any available" path is the default.
  • Gender preference becomes correct (a pool constraint) instead of a brittle person-pick.

Negative

  • The customer leaves the booking without a named therapist and must trust a later confirmation — a UX regression for customers who want to know "who" immediately (mitigated by the day-before name notification and the opt-in flag).
  • New lifecycle state (assignment_status) + a finalize job + a re-pool routine = more moving parts than instant-binding; more to test and operate.
  • The "needs attention" queue depends on the manager acting; if ignored, some appointments stay provisional past the cutoff (mitigated: the cutoff job can finalize the least-busy fallback and alert, so no customer is left with no one).
  • Bound (specific-person) appointments to an absent therapist can't auto-resolve — they require manager action and a customer message.

Neutral

  • find_available_slots is unchanged for offering; only the consumption (no name bound) and a new provisional write differ.
  • The calendar's existing "Unassigned" column gains real meaning.
  • PR-B (slot picker) and PR-C (gender) are re-homed into this model's phases rather than shipped standalone.

Alternatives Considered

Alternative 1: Keep instant-binding, only hide the therapist name

  • Pick the least-busy staff at booking exactly as today, but don't show the name until later.
  • Why rejected: delivers none of the operating requirements — no manager batch allocation against the next day's sheet, no graceful absence re-pool (a locked staff_id still can't be silently moved), no clean temp-surge. It is cosmetic.

Alternative 2: Day-only or day+time-window booking

  • The customer picks a day (or morning/afternoon) and the manager builds the exact times + staff.
  • Why rejected: removes the customer's exact-time certainty and forces a follow-up message just to learn their time; higher manager burden per booking. Adrian chose exact-time-with-deferred-staff (fork A1) precisely to keep time certainty.

Alternative 3: True pool with a separate per-time capacity counter

  • Leave staff_id genuinely NULL at booking and enforce capacity with a new per-minute concurrency counter over the pool; assign every name at allocation time.
  • Why rejected: adds a whole capacity-counter subsystem and an allocation step that must run or appointments stay nameless, for no benefit over reusing the provisional least-busy assignment as the guard. The provisional-assignment approach reuses the existing engine and is strictly simpler.

Alternative 4: Auto-resolve absences by messaging/rescheduling customers

  • On an absence, immediately message affected customers or auto-move their appointments.
  • Why rejected: fires before the manager can add a temp that would have covered them, over-notifying customers and imposing changes they didn't choose. Manager-in-the-loop for the hard cases (silent re-pool what fits; flag the rest) was chosen instead.