2026-05-15

Sprint 18 — the mobile unblock, and a V1.5 that wasn't

S17 closed the chat-driven create loop on its read side: chat creates a member or asset or task via a confirm card, /tasks shows it, the user marks it complete from the row, audit records every step. The user’s full mental model of Domi-as-a-thing-that-manages-things became addressable end-to-end. What it didn’t address: the user’s full mental model of Domi-as-a-thing-they-use-on-their-phone.

Sprint 18 picked up that thread. Five PRs in ~10 hours one evening:

Camera capture in chat (PR #204) — the first real mobile-only entry point in the chat composer
Regression-suite hardening bundle (PR #206) — extractText extracted + tested, allowlist snapshot test
i18n key-presence + locale-parity CI gate (PR #208) — catches en/fr drift and orphaned t("key") callsites
Partition-routing grant probe (PR #210) — generic check across all partitioned tables for the S12 0016 surprise
Reload-mid-pending recovery for chat proposals (PR #212) — pulled forward from V1.5

17 new tests + 1 new CI gate. Four regression-suite ❌/⚠️ markers closed (#3, #5, #6, #8). The graph perf trace (#181, JF runs) and the WCAG browser-verification (#150, JF runs) both moved from “S17/S18 carry-forward” to “end-of-V1 ramp” — they’re real but they’re JF-driven sessions, not single-evening sprint work.

What shipped

Camera capture (PR #204). Two i18n strings, one new <label>, ~30 lines of JSX. The trick wasn’t where to mount the camera pipeline; the existing /api/chat multipart → uploadAndExtract → vision-extract path already handles every step from “user attaches a file” to “extracted facts in the DB”. The trick was the choice between adding capture="environment" to the existing input vs. adding a second input. Adding it to the existing input would have forced camera-only on mobile (the browser doesn’t show the photo library when capture is set) — that would have regressed the “pick a photo I already took” flow. Two entry points (paperclip for library + camera button for capture) preserves both, at the cost of one extra hidden <input>. Mobile-only via md:hidden because desktop browsers silently ignore capture and a duplicate file picker is just visual noise. Tested in the build; the actual mobile validation is JF’s first phone session.
Regression-suite hardening bundle (PR #206). Two backlog ❌s flipped to ✅. The first: extracted extractText from chat-panel.tsx to lib/extract-message-text.ts so the WCAG SC 4.1.2 invariant (tool-call parts stripped from the rendered DOM, since the chat log container has aria-live="polite" and screen readers shouldn’t announce raw JSON) can be unit-tested without mounting the panel. 6 tests. The second: snapshot of EMAIL_SENDER_ALLOWLIST (227 entries × 9 categories of Quebec consumer-surface domains) — total count + per-category counts + non-empty + no-duplicates. 4 tests. Updates require bumping EXPECTED_COUNTS_BY_CATEGORY in the same PR; the test exists to make additions/deletions explicit, not to block growth.
i18n key-presence CI gate (PR #208). scripts/check-i18n-keys.mjs walks apps/web/src/**/*.{ts,tsx}, extracts useTranslations/getTranslations namespaces and t("key") callsites via regex, and compares against flattened key sets from en.json + fr.json. Hard-fails on locale parity break OR any callsite that doesn’t resolve under any declared namespace. Warns (non-blocking) on JSON keys with no static callsite — the extractor intentionally skips dynamic keys (t(opt.labelKey)), so warnings are advisory. Manually probed by deleting chat.send from fr.json only → exit 1 with “1 key(s) in en.json missing from fr.json: chat.send”. Restored. 175 en + 175 fr keys; 164 referenced; 11 advisory warnings.
Partition-routing grant probe (PR #210). The S12 surprise was that PostgreSQL routes INSERTs targeted at a partitioned parent by reading the partition descriptor from pg_class, which requires SELECT on the parent — so GRANT INSERT alone is insufficient. recorder.test.ts exercises that for audit.events specifically, but the next partitioned table that lands could silently regress. The new partition-grants.test.ts walks pg_class.relkind='p' for every partitioned parent and asserts: if app_role has has_table_privilege INSERT, it also has SELECT. Today’s coverage is llm.calls (S8) + audit.events (S12), both compliant. Failure message names the offending tables + the fix recipe. ~850ms.
Reload-mid-pending recovery for chat proposals (PR #212). When a user proposes a member/asset/task via chat and reloads before clicking [Confirm]/[Cancel], today the card is gone from the rehydrated message stream and the proposal is stranded in chat_proposals with status='pending' and no UI surface. Tool-call/result parts aren’t persisted in chat_messages — they’re intentionally turn-local, the model recomputes them on each turn from the assistant text. The V1 fix: server-side synthesis. The chat page now calls listProposalsByThread and synthesizes fake assistant UIMessages carrying the tool-propose_* part shape extractProposalParts already understands. Cards land at the bottom of the conversation on reload. Pending only — confirmed/cancelled history would need persisting tool parts in chat_messages, which is V1.5. No DB migration. No new UI components.

What surprised me

The V1.5 framing was wrong. Reload-mid-pending recovery had been on the V1.5 list since S15 with the implicit assumption that it needed a schema change to persist tool parts in chat_messages. JF asked to pull it forward into V1 today. Spending 30 min mapping the actual data flow before committing to the V1.5 scope would have caught it months ago: chat_proposals already had every column the synthesized recovery message needs (id, toolCallId, toolName, proposal payload), and the chat-panel render path was shape-compatible with server-fabricated tool parts. The V1 fix landed in ~45 minutes. Generalized lesson: when a feature gets punted to V1.5 with the assumption “this needs a schema change,” verify that assumption before locking in the punt. The V1.5 deeper change (persisting confirmed/cancelled cards across reload, for the historical view) is still V1.5 — but it’s a different feature than the one I thought “reload-mid-pending recovery” meant.

A research subagent contradicted itself, and the summary lied. I asked an Explore subagent to map the current state of chat-proposal persistence so I could scope PR #212 before writing code. The agent’s evidence section flagged “extractProposalParts has nothing to extract — persisted messages contain only text, not tool parts.” Its summary section concluded “the only missing piece is a single query in chat/page.tsx.” Both can’t be true simultaneously. I almost acted on the summary; instead I read the actual files (chat-thread.ts, route.ts, the ChatPanel render loop) and saw the contradiction. The eventual fix (server-side synthesis) was correctly scoped only after that re-read. Generalized lesson: when a research subagent’s summary contradicts its own findings, trust the findings and verify directly. Subagents will sometimes paper over a contradiction in their summary because the summary is a separate generation pass. The findings cite line numbers; the summary doesn’t.

Local typecheck on --filter @domi/web isn’t enough when a PR touches multiple packages. PR #210 added a test in packages/shared/src/db/. I ran pnpm --filter @domi/web typecheck (and lint, and build) before pushing. CI typecheck on @domi/shared then failed with “PartitionRow does not satisfy Record<string, unknown>” — Drizzle’s db.execute<T>() requires an index signature, which a plain interface with named properties doesn’t carry. Fixed by switching to type PartitionRow = Record<string, unknown> & { ... }. Generalized lesson: for any multi-package PR, run pnpm typecheck at the root, not the per-package filter. The CI gate caught it because that’s exactly what CI does. The miss was purely my pre-push verification — and it’s a 30-second fix to align local with CI.

Promoting a documentation gap to a CI gate twice in two sprints felt like a pattern. S17 promoted the docs/api/ convention to a CI gate (api-docs-sync). S18 promoted the i18n parity convention to a CI gate (i18n-keys-sync). Both followed the same arc: an undocumented expectation → a written convention → drift caught manually → “we should automate this” → a gate. The cost of writing each gate (~30-60 min, pure Node, no install) buys hard enforcement. Generalized lesson: any documentation convention you write down is a soft commitment until CI enforces it. The script doesn’t need to be sophisticated — count diffs and key-set comparisons cover most of the drift modes. Stop writing conventions you don’t enforce.

The “regression-suite hardening” backlog is paying compound interest. Four backlog markers closed in one sprint: #3 (partition probe), #5 (allowlist snapshot), #6 (extractText test), #8 (i18n parity). None individually was hard. Several were sub-30-minute jobs. They had been sitting in the backlog because each, alone, didn’t justify a PR. Bundled across one evening, four ✅s land for the same overhead as one. Generalized lesson: if a backlog item is “could be automated, ~30 min, low value individually,” wait until you can ship 3-4 in the same session. The PR-overhead gets amortized; the regression-suite-coverage compounds.

EXIF orientation didn’t bite. I considered preemptively adding EXIF orientation handling in uploadAndExtract for the camera-capture PR (rotation issues are the classic gotcha for iOS Safari uploads). Decided against — iOS Safari uploads photos with EXIF orientation tags intact, and the vision model reads the bytes as-is. If we observe rotation errors during dogfood, the fix is local to uploadAndExtract. Generalized lesson: don’t pre-add complexity to defend against speculative failure modes when the dogfood signal is going to land in days. The cost of finding out it’s broken is small; the cost of carrying speculative defensive code is larger than it looks.

Where Sprint 19 picks up

The Phase 10 candidate list narrows further:

Phase 10 non-functional gates. Threat-model sign-off, PIA, DR runbook. All V1-ship-blocking. None started. These are paperwork-shaped — the kind of work that doesn’t fit neatly into a 10-hour evening but blocks the dogfood window from starting in earnest.
First ChatPanel component test. Today there isn’t one. The S18 #204 follow-up (jsdom test for the camera input attribute) and the S18 #206 follow-up (test that tool-call parts don’t render to DOM) both want one. Bootstrapping ChatPanel’s first component test would unlock several backlog tests cheaply.
End-of-V1-ramp browser sessions. Live graph perf trace (#181) and WCAG browser-verification (#150). Both are JF-driven, both have runbooks written, both deferred for ~5 sprints now. The right play is probably bundling them with the dogfood-tenant population pass.
Regression-suite backlog. Four markers closed in S18 (#3, #5, #6, #8). Remaining: #7 (OpenAI known-regressions snapshot, after matrix run), #9 (api-docs name-level diff, V1.5), #10 (race-safe terminal-transition probe, V1.5), #11 (camera-input attribute snapshot, S18 #204 follow-up).
Tasks UI polish (V1.5 candidates). Bulk-complete, snooze, multi-select, “show completed in last N days.” None V1 blockers; all surface area for the dogfood window to tell us which actually matter.
Eval cost trajectory. S14 ~$0.34/run, S16 ~$0.54, S17 ~$0.59. No change in S18 — chat eval didn’t run because no LLM prompt changes shipped. The trajectory’s real driver is escalate_to_plan firing more often + new propose_* fixtures; it’s worth keeping an eye on but not a worry yet.

S18 didn’t close a milestone (M10 still open, the only milestone left). It did close four backlog markers, ship the headline mobile feature carrying forward since S12, and pull a feature out of V1.5 that turned out not to need the V1.5 framing. The most useful thing I learned this sprint wasn’t a code lesson — it was that the punt decisions in CLAUDE.md need re-examination when the actual data flow gets traced. “V1.5” can mean “needs a schema change,” “needs a UI rework,” “needs a third-party integration,” or “needs JF to run a browser session.” Conflating those into one bucket means the cheap ones don’t get pulled forward when they should.