2026-05-13

Sprint 15 — the graph fills in

S14 ended with the knowledge-graph viz shipped: /[locale]/graph, d3-force, side panel, mobile list-tree, the works. The empty-state copy read “Add a vehicle, residence, appliance, or family member to see your household graph come alive. Tell Domi in chat or upload a document — your graph fills in as Domi learns what you own.”

Then JF asked, late in the closeout conversation: “how can you add a person and other entity? is that working as of now, in the current version?”

The honest answer: no. The empty-state copy promised something the codebase couldn’t do. The graph viz READ from members and assets tables, but nothing in the user-facing surface WROTE to them. Document upload + extraction produced extracted_facts rows, and the auto-write path only realized utility_bills. Predict engine emitted tasks. Settings had profile/language/audit. Chat had tools to LIST things and run predictions. But to add a member or a vehicle? Manually INSERT via psql against the Neon staging branch. That was Domi’s entire create surface for half its data model.

S15 closed that gap. Three PRs. After this sprint the empty-state copy is truthful.

What shipped

chat_proposals infrastructure + propose_member (PR #182, #177). New table chat_proposals with the shape (id, tenant_id, thread_id, user_id, tool_call_id, tool_name, proposal jsonb, status, target_kind, target_id, created_at, resolved_at). RLS-isolated. Separate from confirm_prompts (which has a NOT-NULL FK to extracted_facts — hard-coupled to the vision-extraction queue). New module @domi/shared/chat-proposals with recordProposal/getProposalById/listProposalsByThread/markProposalConfirmed/markProposalCancelled — all idempotent on already-resolved rows. New chat tool propose_member that does not mutate — it INSERTs the chat_proposals row in status='pending', returns { proposalId, proposed }. The chat panel sees the tool-result and renders an inline confirm card with [Confirm] / [Cancel] buttons. Clicking [Confirm] hits POST /api/chat/proposals/[id]/confirm which re-validates the proposal payload against the Zod schema (defense in depth — the row has been sitting in the DB since propose-time), inserts into members inside a withAudit({ action: "member.create" }) wrap, flips the proposal status, returns the new member id. Clicking [Cancel] hits the cancel route which flips status to cancelled with no audit row. Audit metadata includes proposal_id, so audit-log search can trace “member appeared” back to its originating chat proposal. 6/6 real-DB integration tests cover insert + read + confirm flip + idempotent re-confirm + cancel + RLS isolation. New i18n keys (en + fr) for the confirm card.
propose_asset + applicator refactor (PR #183, #178). Second tool, reuses every piece of plumbing #182 built. New assetProposalSchema (kind / displayName / custodianMemberName? / attributes?), new propose_asset chat tool, new dispatch arm in the confirm route. Custodian-name resolution via case-insensitive exact match on LOWER(display_name) against non-archived members within the tenant. When the model passes custodianMemberName: "BOB" and a member named “Bob” exists, the resolved custodianMemberId lights up the custodian edge in the graph viz on next reload. When no member matches, the asset is still created but custodianMemberId stays null and audit metadata records custodian_requested + custodian_resolved=false. Refactor: moved both createMemberFromProposal and createAssetFromProposal from apps/web/src/lib/ into @domi/shared/chat-proposals/applicators.ts — they had no apps/web-specific dependencies and living next to the proposal infrastructure means real-DB integration tests target them directly. 6 new applicator tests cover member insert + audit row, asset insert with attributes JSONB, custodian resolution (mixed-case match, no-match, archived-not-resolved). 6 new chat eval fixtures — 3 propose_member (en + fr, adult/child/caregiver) + 3 propose_asset (vehicle/residence/appliance) — all pass on Sonnet 4.6.
list_members chat tool + catalog discipline test + graph canvas perf hooks (PR #184, #179/180/181). Three secondaries bundled. list_members is symmetric with the existing list_tasks / list_documents / list_recent_predictions — closes the read side of household membership so the LLM can ground answers to “who’s in the household?” / “qui est dans le foyer?”. 3 real-DB tests + 2 eval fixtures. The catalog discipline test closes regression-suite backlog item #4 from S13: config/catalog.test.ts walks CATALOG, asserts no entry id contains “latest”, every id matches a version-pinned pattern (4-digit year OR major-minor), every id is unique. CLAUDE.md §6’s “no latest aliases” convention is now structurally enforced. Graph canvas perf instrumentation wires performance.mark() / performance.measure() into _graph-canvas.tsx so Chrome DevTools traces self-label. Four entries: domi-graph-mount-start (mark), domi-graph-stable (mark, first tick where sim.alpha() < 0.01), domi-graph-initial-render (measure between the two), domi-graph-tick (measure, sampled every 10th tick). JF can run performance.getEntriesByType('measure').filter(m => m.name.startsWith('domi-graph')) in the console and pull all four at once. Spec doc §2.9 gains an “Actual (dogfood)” column with _pending — needs browser session_ entries pointing at #181. I can’t run a browser; the instrumentation is the prep that makes JF’s measurement session cheap.

What surprised me

Confirm-then-write is a real architectural decision, not just a UX choice. JF asked for confirm-then-write up front (vs. my auto-create-on-tool-call default). I’d treated it as a UX preference and underestimated the architectural lift: it requires a new table, two new routes, idempotent state management across reload + double-click, defense-in-depth re-validation, audit-metadata wiring that lets the audit log trace mutation → originating proposal. The new infrastructure is what makes V1.5 patterns possible — a propose_task, a propose_document-correction, or even a propose_strategy_change for the LLM router. The cost of building the gate once is amortized across every future propose_ surface.* Confirming-via-card is also the right answer when the LLM is the proposing agent: the user sees the structured interpretation of natural language and can correct it before anything is written. “Add Alice” becomes {displayName: "Alice", kind: "adult"} — visible to the user before the audit row lands.

The applicator move was a quiet improvement. First pass put createMemberFromProposal in apps/web/src/lib/members.ts. Second pass (PR #183) moved it to @domi/shared/chat-proposals/applicators.ts alongside the new createAssetFromProposal. The functions had zero apps/web-specific dependencies — only @domi/shared/audit, @domi/shared/db, @domi/shared/chat-proposals. Living next to the proposal infrastructure means real-DB tests target them directly from packages/shared, which is where the rest of the test suites live. Same shape as audit/recorder + audit/search living together. The right home for code is “where its tests want to live.” I’d been treating apps/web/src/lib as the default destination for server-side helpers; that’s wrong when nothing about the helper is app-specific.

The eval harness needed a thread and a user. propose_* tools are gated on threadId presence in the chat-tools factory (so the eval harness without a persistent thread doesn’t surface them). To eval propose tool selection, the harness has to register the tools, which means seeding a real thread, which means seeding a real user — UUID-format because audit’s actor_user_id is UUID-typed even though users.id is text (Auth.js v5 generates UUID-format ids in prod). Three new lines of seed code + three lines of cleanup unblocked 6 new eval fixtures. The fact that the chat harness didn’t seed a thread before now means no prior fixture exercised tool gating on chat state. S15’s propose_* tools are the first to. The same path could ground future tools that need thread-scoped state.

Adding two new tools shifted Sonnet’s tool selection on one edge fixture. escalation-positive-multi-step-plan (“Build me a 12-month maintenance plan for my new car”) started failing after the propose_* tools landed. Model now picks list_tasks + run_predictions_now instead of escalate_to_plan. Tried a system-prompt clarification distinguishing ADD-vs-PLAN — didn’t recover. Persistent across two re-runs. Pass-rate 23/24 = 95.8% so well above the 0.80 floor, but the surface area of tool selection is sensitive to catalog size and description framing in ways I hadn’t internalized. Larger catalog = more competition = lighter-touch tools get picked over heavier ones at the margin. The fix is probably a tighter run_predictions_now description (clarify it’s near-term, not long-horizon planning) plus stronger explicit-plan triggers in escalate_to_plan. Tuning lands S16.

prettier --check . and prettier --write <file> disagree. Hit on first push of PR #182’s CI. Lint-staged’s per-file invocation wrote a multi-line function signature; the full-glob prettier --check . complained about it. Resolution: ran pnpm format:check locally to surface the warning, then pnpm prettier --write to fix, propagated through three branches via merge commits. Lesson: lint-staged alone is not sufficient. Run pnpm format:check before pushing if the diff is non-trivial. Three sprints in a row of TypeScript narrowing-across-async-closures bites; this is the new “run the gate locally to catch what the gate catches” tax. Worth noting.

The custodian-resolution case-insensitivity is doing more work than the test suite shows. A user saying “Alice’s Mazda” or “Alice does the dishwasher” in chat lets the LLM pass custodianMemberName: "Alice", and the asset row picks up custodianMemberId to a real member. That’s the edge in the knowledge graph — without it, the graph is a node soup. The custodian edge is what makes the graph viz visually meaningful at dogfood scale. The graph viz from S14 renders custodian edges; S15 makes them populable via chat. Two sprints worth of work converge in a single rendered edge — but the edge is what makes the graph feel alive, not just present.

Where Sprint 16 picks up

The natural primaries:

Browser perf measurement (S15 #181 stretch carry-forward). Instrumentation is in place. JF runs Chrome DevTools Performance trace at dogfood scale once populated; fills in the spec §2.9 “Actual” column. ~30 min. Now that propose_member + propose_asset exist, populating a tenant via chat is the cheap path; the trace happens once that’s done.
escalation-positive-multi-step-plan regression tuning. Known regression from S15 PR #183. ~20 min on run_predictions_now description tightening; needs a re-eval to confirm. Could ride along with the browser session as a “while you’re at it” Sunday tweak.
First mobile UX delta — camera capture. The dogfood-defining mobile UX. Snap the bill on the fridge → Domi knows about it. Table-to-card ✓ S12, graph mobile list-tree ✓ S14, camera capture is what’s left.
Reload-mid-pending recovery UX. S15 carry-forward. If user proposes in chat then reloads before confirming, the proposal sits in DB but no card renders. Either persist tool-result parts to chat_messages (heavier) or surface pending proposals in a top-of-chat strip (lighter). V1 dogfood: JF resolves in-session, but it’s a real edge.
Regression-suite backlog #3, #5-#8. Five more items: partition-routing grant probe, allowlist snapshot, extractText filter test, OpenAI known-regressions snapshot, i18n key-presence snapshot. Each is 10-30 min. Could bundle 2-3 as a sprint secondary.

Carry-forwards still alive: WCAG browser-verification run (#150, JF runs); non-functional gates (threat-model sign-off, PIA, DR runbook); Sprint 8 carry-forwards (Gemini + OpenRouter; parent-child cost grouping).

The pattern I’m taking from S15 — different from the S11-S14 procedural-promotion arc — is building the right surface costs more than building the wrong one, but pays compound interest. The confirm-then-write gate is heavier than auto-create-on-tool-call. New table, two routes, idempotent state, defense-in-depth re-validation, audit-metadata wiring. But the cost is fixed; every future propose_* tool reuses it. And the UX is the right answer for LLM-mediated mutation: the user signs off on the structured interpretation before anything’s written. The next sprint’s most valuable artifact is probably another generalizable gate I haven’t seen the need for yet.