Sprint 25 — the architecture holds


S25 was the heaviest sprint of the V1 ramp by PR count and the lightest by per-PR architectural surface. Forty-seven product PRs in seven calendar days, shipping a few thousand lines of feature work, and basically every PR reused a pattern S20-through-S24 had already templated. The screenshot-to-PR loop turned into the actual development rhythm: JF would use Domi for an hour, find something that worked badly, file an issue with a screenshot, and the fix would land within the same evening. Some days produced six PRs. The sprint cleared with +268 shared tests and +74 web tests, zero reverts, every CI gate green.

This is what dogfood-driven development looks like when the foundation can absorb it.

What shipped, grouped

Forty-seven is too many to enumerate per-PR — the docs/testing/sprint-25.md report has the full PR list. The work clustered into seven themes:

A. Dogfood infrastructure (14 PRs, #324–#349). JF needed table-stakes UX fixes before he could even use the app for a full day. Anchor the UserPill so the Recents list scrolls past it. Make propose_asset accept “Canada” as well as “CA”. Inject the current date into the chat system prompt so the model stops asking what year it is. Add multi-file chat attachments + drag-and-drop. Wire household timezone + main residence + chat-route tz injection. Make the family/household name editable from settings. The set is unspectacular individually; the sum is the difference between “useful tool” and “actually-uses-it-daily.”

B. Inline edit panels + recurrence (4 PRs, #351–#357). Three list pages — /tasks, /contacts, /assets — got inline edit-in-panel per row. Task recurrence (with spawn-on-complete) shipped along with a “Scheduled replacements” section on the asset detail page that surfaces per-asset recurring tasks. The HVAC filter that needs changing every 9 months now exists as a real recurring task with a real next-due date computed from the last complete.

C. Chat + graph papercuts (4 PRs, #359–#365). Propose cards now render on first response when an attachment is included (early version of what #413 later fully fixed). The graph edit panel defaults memberKind and dateOfBirth and assetKind to current values rather than empty. The graph detail panel remounts when clicking a different node (the stale-state bug that’s invisible until you click two nodes in a row). Asset kinds expanded: pool, sauna, spa, garage, other.

D. Calendar + mobile + filters (10 PRs, #367–#385). A calendar at /calendar with month + week view + sidebar nav. ChatPanel remount on thread switch (parallel to the graph fix). Client-side photo compression on mobile + a friendly “too large” error message. Task filter by linked asset and by linked member. Asset hierarchy primitives (the container_asset_id column) with a subtree task filter so you can see every task under the residence. Pin send to the active thread so race conditions during thread-switch don’t ship a message to the wrong place.

E. Members + vendors + dedup (4 PRs, #387–#397). Members get a sex field (male/female/other) + a member_relationships table with sex-aware kind labels so the same spouse kind renders as husband/wife/spouse. Auto-propose a vendor as a contact when a document is extracted (the deterministic deriver that #419 later extended to also capture addresses). Calendar bare-date timezone fix — May 30 was rendering on May 29 because UTC midnight rolls over before EDT midnight does. Semantic dedup of member proposals so one ask doesn’t produce four cards (the three stale re-emits plus the fresh one).

F. Photo + link tooling + LLM resolution (5 PRs, #399–#407). Two new chat tools — propose_set_asset_primary_photo and propose_link_document_to_asset — so JF can wire documents to assets without leaving the chat surface. The member panel got dedup re-emits + inline relationships + in-app confirm (no more browser-native confirm dialog). The asset-name matcher went from “conservative full-name match only” to a 4-stage tolerant matcher (exact → input-substring → stored-substring → shared-word) with an LLM Haiku fallback when all four stages return null. The LLM fallback is the 8th workload role; it costs about $0.0001 per call and resolves the partial-match cases the conservative matcher refuses to guess at.

G. Asset detail + document UX (5 PRs, #409–#417). Locale-aware number formatters on the asset detail page (30,911 in en-CA versus 30 911 in fr-CA). Per-document summarizeExtraction so the linked-docs list shows “Dealership correspondence from Silver Star Mercedes-Benz Montréal — 2026-04-17” instead of “image.jpg.” An in-app document viewer modal. A chat-route fix so propose cards render on the first multipart response (PR #413 — the one I’m most pleased to have found, see below). An applicator fix so setting a photo as primary also auto-links the doc to the asset (PR #415 — the one I’m most chastened to have written, also see below). The /documents panel + per-document detail page got the same summary headline + R2 storage path + embedded zoom-able preview treatment.

H. Contact international address (1 PR, #419). The last PR of the sprint. New address jsonb column on contacts with the shape { line1?, line2?, city?, region?, postalCode?, country? }. Country is ISO 3166-1 alpha-2; the server-side normalizer accepts “Canada”, “États-Unis”, “U.S.A.”, alpha-3 codes. The extraction layer gains per-kind <vendor>_address fields so the dealership’s mailing address gets captured at the same time as the dealership name. 31 new tests across 5 files.

What surprised me

The AI SDK’s result.toolResults returns the last step’s tool results only. This one cost me a couple of hours and most of a debugging session. JF kept seeing propose cards not render on first response when he attached a photo — the assistant text would say “Confirm the card above to set this as the primary photo” but no card would appear. After navigating away and back, the card would synthesize from the chat_proposals table and render fine. So the propose was firing, the row was persisting, but the JSON envelope back to the client was empty.

What I missed: in AI SDK v6, GenerateTextResult.toolResults is documented as “the results of the tool calls from the last step.” When the model calls a propose_* tool in step 0 and then generates the “Confirm the card above” follow-up text in step 1, step 1 has no tool calls — so toolResults is empty. The propose tool’s result lives in result.steps[0].toolResults. To get every tool call across every step, you have to flatMap over result.steps. The fix is one line. The bug is completely invisible until you read the SDK type docs explicitly — tsc is happy, the test suite is happy, the code reads correctly. Pinned the contract in the extractor’s doc comment and added a regression test that mirrors the actual SDK v6 envelope so the next SDK rev that shuffles field shapes catches it in CI.

JSONB key order doesn’t survive a Postgres roundtrip. I wrote PR #419’s address-diff path with JSON.stringify(prev) !== JSON.stringify(next). Tests failed: re-submitting the exact same address showed as “changed” because what came back from the DB had a different key order than what I sent in. Postgres jsonb doesn’t preserve insertion order. Fix is sort-keys-then-stringify. Templated into a small helper; will be reused as more JSONB columns land. The kind of small thing that’s only obvious once you’ve hit it.

“Orthogonal in the schema” doesn’t mean “orthogonal in the read path.” PR #415 — the one I’m chastened to have written. setAssetPrimaryPhotoFromProposal documented documents.asset_id and assets.primary_photo_document_id as orthogonal — one is “where does this doc live”, the other is “what’s the asset’s hero.” They genuinely ARE orthogonal at the schema level. But getAssetDetail derives primaryPhoto by looking up the doc inside the linked-docs list, which is joined on asset_id. So a doc set as primary without being linked rendered as an empty placeholder AND missing from the linked-docs list, even though the photo pointer was set correctly. The fix wires the link as a side-effect of the photo-set when asset_id IS NULL. The lesson is the kind of thing that’s hard to see in advance: read-path joins create hidden dependencies between writes that aren’t visible from the schema diagram. Documented inline in both the applicator and the regression-suite so the next “we should keep these orthogonal” instinct gets pressure-tested against the read paths first.

Forty-seven PRs in seven days didn’t break anything. This is the part I genuinely didn’t expect. S20-S24 invested heavily in patterns — Ajv-validated registry writes, audit-then-mutate ordering, server-stamped enriched proposals, conservative-then-LLM-fallback name resolution, the deferred-FK pattern, the useRef sentinel for useActionState, kind_registry attribute schemas. The premise was “if we have to fix dozens of dogfood issues in week 1, the patterns need to absorb them.” They did. Almost every PR was composition over existing shapes. The only genuinely new architectural addition was the 8th workload role (LLM resolve_entity fallback), and even that slotted into the existing role-routing abstraction with a single mapping entry. Nothing got refactored. No PR got reverted. The CI suite grew from 320 to 662 tests and stayed green throughout.

Decisions made

  • Contact addresses are plaintext JSONB, not encrypted overflow. The address_encrypted bytea column from S22 stays reserved for genuinely sensitive notes; the structured address lives in a new address jsonb column with the same privacy posture as phone/email/website. JF’s read: contacts aren’t confidential — a contractor’s office address has the same weight as their phone number.
  • Country is ISO 3166-1 alpha-2. Two chars, stable across locales (renders as “Canada” in en-CA and “Canada” in fr-CA, “États-Unis” in fr-CA versus “United States” in en-CA via Intl.DisplayNames), sortable, enables future flag rendering. The normalizer accepts common variants — “Canada”, “États-Unis”, “U.S.A.” all map to alpha-2 — so the model and the UI both have escape hatches when alpha-2 isn’t natural.
  • LLM fallback is opt-in via closure injection. The asset-name and member-name matchers take an optional llmFallback callback; the eval harness omits it (no LLM cost in tests) and the chat route wires it in. Keeps the LLM call out of the hot path for ambiguous matches the conservative matcher correctly refuses — only the “matcher returned null but a human would say yes” cases hit Haiku.
  • Address extraction is conservative. The free-form parser drops the result unless it can recover country OR postal code. “1234 Mystery Street” alone doesn’t surface as a confirm card — too ambiguous, the user is more likely to type it themselves than fix a wrong split.
  • Sprint cadence ratcheted up. S22 was 4 PRs in two days. S23 was 4 in one day. S24 was 4 in one day. S25 was 47 in seven days — averaging 7 PRs a day. The sprint window is still one week per CLAUDE.md §11, but the per-day rate climbed by an order of magnitude. The constraint that bound was no longer “how fast can I design and ship a feature”; it was “how fast can the user surface friction points.” The dogfood loop became the rate-limiter, which is exactly what dogfood is supposed to be.

Where Sprint 26 picks up

S26 narrows back to a more planned shape — week 2 of the 4-week dogfood window, plus two pieces of explicit work:

  1. @vercel/speed-insights integration. Wire the Speed Insights SDK into the Next.js app, provision the dashboard, start collecting Core Web Vitals + per-page latency against the production deployment. Should be one PR (the SDK install + the <SpeedInsights /> component in the root layout); the data starts populating immediately after the next deploy.

  2. First perf-side measurement against Requirements §11. Two of the four V1 quantitative leading indicators are perf-side: <30 seconds average from upload to first extracted fact (single-page docs), and ≤$100/month opex at JF’s personal-use scale. Both are measurable now via OpenTelemetry → Sentry traces (S12) and the calls cost telemetry (S5). One PR for a docs/perf/v1-measurement-week1.md page that captures methodology + initial numbers + the rolling-7-day trend going forward. The leading-indicator counts (≥20 predicted tasks, ≥12 documents per week, 100% members onboarded) land at end-of-S28 once the 4-week window completes.

Plus N reactive dogfood PRs — S25 averaged ~7 per day, S26 will see something in the same ballpark, but the trend should bend down each week as the highest-friction surfaces get smoothed.

The remaining V1-critical items not in S26 scope: #5 Sentry project setup, #8 Fly.io MCP app provisioning, threat-model sign-off + PIA + DR runbook (post-dogfood per JF direction). The infra items can land any sprint; the paperwork lands after dogfood ends so it describes verified behavior.

S25 was the test of whether the V1 ramp’s design holds up under reactive load. It does. S26 starts measuring whether the system itself meets the V1 quantitative bar.