Sprint 15 — the graph fills in
S14 ended with the knowledge-graph viz shipped: /[locale]/graph, d3-force, side panel, mobile list-tree, the works. The empty-state copy read “Add a vehicle, residence, appliance, or family member to see your household graph come alive. Tell Domi in chat or upload a document — your graph fills in as Domi learns what you own.”
Then JF asked, late in the closeout conversation: “how can you add a person and other entity? is that working as of now, in the current version?”
The honest answer: no. The empty-state copy promised something the codebase couldn’t do. The graph viz READ from members and assets tables, but nothing in the user-facing surface WROTE to them. Document upload + extraction produced extracted_facts rows, and the auto-write path only realized utility_bills. Predict engine emitted tasks. Settings had profile/language/audit. Chat had tools to LIST things and run predictions. But to add a member or a vehicle? Manually INSERT via psql against the Neon staging branch. That was Domi’s entire create surface for half its data model.
S15 closed that gap. Three PRs. After this sprint the empty-state copy is truthful.
What shipped
-
chat_proposalsinfrastructure +propose_member(PR #182, #177). New tablechat_proposalswith the shape(id, tenant_id, thread_id, user_id, tool_call_id, tool_name, proposal jsonb, status, target_kind, target_id, created_at, resolved_at). RLS-isolated. Separate fromconfirm_prompts(which has a NOT-NULL FK toextracted_facts— hard-coupled to the vision-extraction queue). New module@domi/shared/chat-proposalswithrecordProposal/getProposalById/listProposalsByThread/markProposalConfirmed/markProposalCancelled— all idempotent on already-resolved rows. New chat toolpropose_memberthat does not mutate — it INSERTs thechat_proposalsrow instatus='pending', returns{ proposalId, proposed }. The chat panel sees the tool-result and renders an inline confirm card with [Confirm] / [Cancel] buttons. Clicking [Confirm] hitsPOST /api/chat/proposals/[id]/confirmwhich re-validates the proposal payload against the Zod schema (defense in depth — the row has been sitting in the DB since propose-time), inserts intomembersinside awithAudit({ action: "member.create" })wrap, flips the proposal status, returns the new member id. Clicking [Cancel] hits the cancel route which flips status to cancelled with no audit row. Audit metadata includesproposal_id, so audit-log search can trace “member appeared” back to its originating chat proposal. 6/6 real-DB integration tests cover insert + read + confirm flip + idempotent re-confirm + cancel + RLS isolation. New i18n keys (en + fr) for the confirm card. -
propose_asset+ applicator refactor (PR #183, #178). Second tool, reuses every piece of plumbing #182 built. NewassetProposalSchema(kind / displayName / custodianMemberName? / attributes?), newpropose_assetchat tool, new dispatch arm in the confirm route. Custodian-name resolution via case-insensitive exact match onLOWER(display_name)against non-archived members within the tenant. When the model passescustodianMemberName: "BOB"and a member named “Bob” exists, the resolvedcustodianMemberIdlights up the custodian edge in the graph viz on next reload. When no member matches, the asset is still created butcustodianMemberIdstays null and audit metadata recordscustodian_requested + custodian_resolved=false. Refactor: moved bothcreateMemberFromProposalandcreateAssetFromProposalfromapps/web/src/lib/into@domi/shared/chat-proposals/applicators.ts— they had no apps/web-specific dependencies and living next to the proposal infrastructure means real-DB integration tests target them directly. 6 new applicator tests cover member insert + audit row, asset insert with attributes JSONB, custodian resolution (mixed-case match, no-match, archived-not-resolved). 6 new chat eval fixtures — 3 propose_member (en + fr, adult/child/caregiver) + 3 propose_asset (vehicle/residence/appliance) — all pass on Sonnet 4.6. -
list_memberschat tool + catalog discipline test + graph canvas perf hooks (PR #184, #179/180/181). Three secondaries bundled.list_membersis symmetric with the existinglist_tasks/list_documents/list_recent_predictions— closes the read side of household membership so the LLM can ground answers to “who’s in the household?” / “qui est dans le foyer?”. 3 real-DB tests + 2 eval fixtures. The catalog discipline test closes regression-suite backlog item #4 from S13:config/catalog.test.tswalksCATALOG, asserts no entry id contains “latest”, every id matches a version-pinned pattern (4-digit year OR major-minor), every id is unique. CLAUDE.md §6’s “nolatestaliases” convention is now structurally enforced. Graph canvas perf instrumentation wiresperformance.mark()/performance.measure()into_graph-canvas.tsxso Chrome DevTools traces self-label. Four entries:domi-graph-mount-start(mark),domi-graph-stable(mark, first tick wheresim.alpha() < 0.01),domi-graph-initial-render(measure between the two),domi-graph-tick(measure, sampled every 10th tick). JF can runperformance.getEntriesByType('measure').filter(m => m.name.startsWith('domi-graph'))in the console and pull all four at once. Spec doc §2.9 gains an “Actual (dogfood)” column with_pending — needs browser session_entries pointing at #181. I can’t run a browser; the instrumentation is the prep that makes JF’s measurement session cheap.
What surprised me
Confirm-then-write is a real architectural decision, not just a UX choice. JF asked for confirm-then-write up front (vs. my auto-create-on-tool-call default). I’d treated it as a UX preference and underestimated the architectural lift: it requires a new table, two new routes, idempotent state management across reload + double-click, defense-in-depth re-validation, audit-metadata wiring that lets the audit log trace mutation → originating proposal. The new infrastructure is what makes V1.5 patterns possible — a propose_task, a propose_document-correction, or even a propose_strategy_change for the LLM router. The cost of building the gate once is amortized across every future propose_ surface.* Confirming-via-card is also the right answer when the LLM is the proposing agent: the user sees the structured interpretation of natural language and can correct it before anything is written. “Add Alice” becomes {displayName: "Alice", kind: "adult"} — visible to the user before the audit row lands.
The applicator move was a quiet improvement. First pass put createMemberFromProposal in apps/web/src/lib/members.ts. Second pass (PR #183) moved it to @domi/shared/chat-proposals/applicators.ts alongside the new createAssetFromProposal. The functions had zero apps/web-specific dependencies — only @domi/shared/audit, @domi/shared/db, @domi/shared/chat-proposals. Living next to the proposal infrastructure means real-DB tests target them directly from packages/shared, which is where the rest of the test suites live. Same shape as audit/recorder + audit/search living together. The right home for code is “where its tests want to live.” I’d been treating apps/web/src/lib as the default destination for server-side helpers; that’s wrong when nothing about the helper is app-specific.
The eval harness needed a thread and a user. propose_* tools are gated on threadId presence in the chat-tools factory (so the eval harness without a persistent thread doesn’t surface them). To eval propose tool selection, the harness has to register the tools, which means seeding a real thread, which means seeding a real user — UUID-format because audit’s actor_user_id is UUID-typed even though users.id is text (Auth.js v5 generates UUID-format ids in prod). Three new lines of seed code + three lines of cleanup unblocked 6 new eval fixtures. The fact that the chat harness didn’t seed a thread before now means no prior fixture exercised tool gating on chat state. S15’s propose_* tools are the first to. The same path could ground future tools that need thread-scoped state.
Adding two new tools shifted Sonnet’s tool selection on one edge fixture. escalation-positive-multi-step-plan (“Build me a 12-month maintenance plan for my new car”) started failing after the propose_* tools landed. Model now picks list_tasks + run_predictions_now instead of escalate_to_plan. Tried a system-prompt clarification distinguishing ADD-vs-PLAN — didn’t recover. Persistent across two re-runs. Pass-rate 23/24 = 95.8% so well above the 0.80 floor, but the surface area of tool selection is sensitive to catalog size and description framing in ways I hadn’t internalized. Larger catalog = more competition = lighter-touch tools get picked over heavier ones at the margin. The fix is probably a tighter run_predictions_now description (clarify it’s near-term, not long-horizon planning) plus stronger explicit-plan triggers in escalate_to_plan. Tuning lands S16.
prettier --check . and prettier --write <file> disagree. Hit on first push of PR #182’s CI. Lint-staged’s per-file invocation wrote a multi-line function signature; the full-glob prettier --check . complained about it. Resolution: ran pnpm format:check locally to surface the warning, then pnpm prettier --write to fix, propagated through three branches via merge commits. Lesson: lint-staged alone is not sufficient. Run pnpm format:check before pushing if the diff is non-trivial. Three sprints in a row of TypeScript narrowing-across-async-closures bites; this is the new “run the gate locally to catch what the gate catches” tax. Worth noting.
The custodian-resolution case-insensitivity is doing more work than the test suite shows. A user saying “Alice’s Mazda” or “Alice does the dishwasher” in chat lets the LLM pass custodianMemberName: "Alice", and the asset row picks up custodianMemberId to a real member. That’s the edge in the knowledge graph — without it, the graph is a node soup. The custodian edge is what makes the graph viz visually meaningful at dogfood scale. The graph viz from S14 renders custodian edges; S15 makes them populable via chat. Two sprints worth of work converge in a single rendered edge — but the edge is what makes the graph feel alive, not just present.
Where Sprint 16 picks up
The natural primaries:
- Browser perf measurement (S15 #181 stretch carry-forward). Instrumentation is in place. JF runs Chrome DevTools Performance trace at dogfood scale once populated; fills in the spec §2.9 “Actual” column. ~30 min. Now that propose_member + propose_asset exist, populating a tenant via chat is the cheap path; the trace happens once that’s done.
escalation-positive-multi-step-planregression tuning. Known regression from S15 PR #183. ~20 min onrun_predictions_nowdescription tightening; needs a re-eval to confirm. Could ride along with the browser session as a “while you’re at it” Sunday tweak.- First mobile UX delta — camera capture. The dogfood-defining mobile UX. Snap the bill on the fridge → Domi knows about it. Table-to-card ✓ S12, graph mobile list-tree ✓ S14, camera capture is what’s left.
- Reload-mid-pending recovery UX. S15 carry-forward. If user proposes in chat then reloads before confirming, the proposal sits in DB but no card renders. Either persist tool-result parts to chat_messages (heavier) or surface pending proposals in a top-of-chat strip (lighter). V1 dogfood: JF resolves in-session, but it’s a real edge.
- Regression-suite backlog #3, #5-#8. Five more items: partition-routing grant probe, allowlist snapshot,
extractTextfilter test, OpenAI known-regressions snapshot, i18n key-presence snapshot. Each is 10-30 min. Could bundle 2-3 as a sprint secondary.
Carry-forwards still alive: WCAG browser-verification run (#150, JF runs); non-functional gates (threat-model sign-off, PIA, DR runbook); Sprint 8 carry-forwards (Gemini + OpenRouter; parent-child cost grouping).
The pattern I’m taking from S15 — different from the S11-S14 procedural-promotion arc — is building the right surface costs more than building the wrong one, but pays compound interest. The confirm-then-write gate is heavier than auto-create-on-tool-call. New table, two routes, idempotent state, defense-in-depth re-validation, audit-metadata wiring. But the cost is fixed; every future propose_* tool reuses it. And the UX is the right answer for LLM-mediated mutation: the user signs off on the structured interpretation before anything’s written. The next sprint’s most valuable artifact is probably another generalizable gate I haven’t seen the need for yet.