2026-05-31

Sprint 28 — the feature was already built

Last sprint’s post ended with a plan. S28, I wrote, would build “insurance as a first-class entity … option B — typed FKs on obligations (insured_asset_id, insured_member_id).”

Both halves of that sentence were wrong. Insurance wasn’t a typed-FK problem, and it wasn’t unbuilt. I just didn’t know that yet, because the place I keep my plans had quietly drifted away from the place I keep my code.

This is a sprint about the gap between those two places.

The map and the territory

Domi has a working-memory file — CLAUDE.md — that I (and the AI I pair with) read at the start of every session. Section 3 is the current-state ledger: what sprint we’re in, what’s done, what’s next. It’s a summary. And summaries rot.

When I went to start the insurance work, I did the thing I should always do first and almost never want to: I checked what was already there. I expected to be writing a migration. Instead I found four junction tables — obligation_assets, obligation_members, contact_assets, contact_members — already in the schema, already migrated (0041), already wired into a relationships/ module with link and unlink and list helpers, already surfaced on the obligation and contact detail pages as “Covers” and “Serves” sections, already rendering four new edge types in the knowledge graph, already filterable by chip on the list pages, already documented in a help page in both languages.

The whole feature. Shipped. Earlier in the same sprint, during dogfood, after a reveal JF logged about his internet provider serving one specific residence and not another.

And CLAUDE.md §3 said: insurance entity — option B (typed FKs on obligations.insured_asset_id + obligations.insured_member_id), unstarted.

It was wrong twice. Wrong that the work hadn’t started — it was 100% done. And wrong about the design itself: the abandoned “option B” was a single typed FK, one insurance policy pointing at one asset. The thing that actually shipped was the opposite decision — many-to-many junctions, so one auto policy covers the Mazda and the Bayliner, one family health plan links every member. JF made that call on 2026-05-28 and it’s recorded, correctly, in the spec. The §3 headline just never caught up.

The lesson is uncomfortable and worth keeping: the working memory is a map, and the map drifts from the territory. The spec folder (docs/specs/) is the territory’s survey — versioned, decision-stamped, authoritative. The §3 summary is a convenience. When they disagree, the summary is the liar. I’d internalized that for code (git is truth) but not for my own notes.

The real gap was one layer down

So if the feature was built, why did it feel unbuilt during dogfood? Because the success bar isn’t “the data exists.” It’s a chat question: “what insurance covers the Bayliner?”

I asked it. Domi said nothing useful.

The data was right there in obligation_assets. The write path put it there — say “my auto insurance covers the Mazda and the Bayliner” and the proposal resolves both names and writes both junction rows. The detail page showed it. The graph drew it. But the chat read tool — list_obligations — was still filtering by an attribute that no longer existed: a stale attributes.asset_id / insured_asset_id JSONB convention from the design that got abandoned. It even applied the filter after the row limit, which is its own quiet bug. So the model asked the database “which obligations have this asset id in their JSONB blob?” and the database, correctly, said none — because coverage doesn’t live in the blob anymore, it lives in the junction.

list_contacts was worse: no asset or member filter at all. “Who’s the dentist for Theo?” wasn’t answerable, full stop.

The feature had a write path and a read path and the read path had been left behind when the design changed underneath it. Nobody noticed because the UI read from the junctions directly — only the chat went through the stale tool. The most-built feature of the sprint had a one-function hole exactly where the user actually stands.

The fix is small and the smallness is the point: both tools now query the junction tables in SQL, before the limit, and return the coverage edges on every row so the model can describe what something covers without a second call. assetId and memberId intersect when you pass both. Eight real-DB tests, because the thing I most wanted to guarantee is that “what covers the Bayliner?” never silently returns nothing again. Tool names didn’t change, so the API-docs gate didn’t even blink.

I spent more words in CLAUDE.md correcting the record of this feature than I spent code building the fix. That ratio is the sprint.

The Briefing learned to write

The other through-line this sprint was the Briefing — a top-level surface where Domi reports back what it did while you were away. It came in as a spec, then grew up over a handful of PRs: a nightly Gmail sync cron with a per-connection cadence dial, then the surface itself with a cleanup sweep, then an admin catalog so I can see every cron’s last and next run, then the Vercel Pro flip that finally let the Gmail sync run hourly and honor each connection’s preferred local hour instead of a fixed 03:00 UTC.

That cron work produced a convention I wrote down so I stop re-deriving it: every Vercel cron ships four artifacts together. A schedule entry. A catalog row plus an enum value so the admin page and the audit table both know it exists. An audit envelope so every run lands a row. And — the one that’s easy to skip — a per-tenant control surface, so a household can turn the thing off or change its cadence. Skipping the fourth is how you ship an opaque background process the user can’t see or stop, which is the exact opposite of what a household-management tool should feel like.

The capstone was the digest. The event feed reports individual runs (“synced Gmail, found two invoices”). The digest is different: it’s a short narrative brief — here’s where your household stands, here’s what happened the last few weeks, here’s what’s due this week and next — written by the model and posted to the Briefing thread. There’s a “Generate brief” button for on demand and a weekly cron for Monday mornings, both behind a per-tenant cadence dial.

Two decisions I’m happy with. First, no new model role: the digest reads sensitive household data and needs no tools, so it reuses the sensitive_context role I already had — one new adapter method, not a ninth workload role to maintain. Second, the prompt is built only from structured fields — names, dates, counts, titles — never from the raw text inside a document. That keeps the threat I flagged in the threat model (document content leaking into briefing prose) closed by construction. The brief can tell you three documents came in; it can’t quote them.

What it can’t do yet is the qualitative thing — is the prose actually good? I shipped a structural eval (does the prompt get fed every bucket, in the right language, with the “don’t invent facts” instruction) and left the prose-quality judgment to dogfood. A live-model quality eval is a real follow-up; a deterministic structural one is the floor that catches the regressions that actually happen.

Soft-delete, finally everywhere

The quieter win: the archive pattern finally closed across the graph. Obligations got an archive button on the list row (the backend was already there from an earlier sprint — another small map/territory gap). Documents got the whole thing from scratch — a new archived_at column, a race-safe audit-first mutation, the Active/Archived toggle.

And then a dogfood papercut that taught the real lesson: I shipped documents-archive, JF archived a document, and it vanished from the /documents list — but stayed visible in the chat grounding tool, in the asset’s linked-documents panel, in the /list documents slash command, and in the Briefing digest’s “recent activity” count. Archiving from one surface isn’t archiving. The archived_at IS NULL filter has to live on every read path, not just the one with the button. Fixed all four, then went back and wrote dedicated tests for each, because “it relies on the shared filter staying green” is exactly the kind of assurance that rots the same way a working-memory summary does.

The safety net went in early

One thing I’d parked for next sprint went in this one. Sentry has been on the critical-path list since Sprint 0 — the single piece of observability I’d never actually wired — and the migration-drift 500 a couple of sprints back was the standing argument for it: when dogfood breaks, I want a stack trace, not a screenshot of a sad error page.

It’s wired now, across all three Next.js runtimes — server, client, edge. And two follow-ups came straight out of using it, which is the whole point. The chat route had been swallowing unhandled 500s behind a generic message; the briefing’s generate-brief action was quietly eating its own errors. Both now wrap a human-readable envelope around a Sentry capture — the user sees something kind, I see the trace. The hardening lesson I wrote down: a misconfigured Sentry org, project, or token must never fail the build. Observability exists to watch the app, not to become a fresh way for the app to fall over.

The level that wouldn’t count past two

The last thing I built this sprint was the smallest, and it caught me in the same trap from the other direction.

Assets in Domi nest — a fridge lives in a residence, a pool hangs off a house, a water filter sits inside the fridge. That containment has been arbitrary-depth since Sprint 25. What it didn’t have was a name for how deep a thing sits. JF asked for one: give every asset a “level,” and surface only the top-level ones in the overview lists, so the household reads as residences and vehicles, not a flat dump of every pump and filter.

I shipped it binary. Level 1 = not inside anything. Level 2 = inside something. It read perfectly against the sentence I’d been handed — the assets not located in another asset are level 1; everything inside is level 2 — and I even wrote the comment defending it: a filter three levels deep is still Level 2, depth doesn’t make a third level.

JF came back the next message: as per our design, there could be a level 3 or 4. A pool is attached to the residence — that’s level 2. The thermopompe and the water-filtration system are attached to the pool — those are level 3.

He was right, and I had flattened it. The nesting itself already carried the full depth — the /list assets tree indents the thermopompe under the pool under the house; the detail-page breadcrumb walks the whole chain. The only thing that refused to count past two was the number I stamped on top. I’d read “level 1 vs level 2” as a taxonomy of two classes — standalone things and their parts — when JF meant the first two rungs of a ladder that keeps going.

The fix made the label honest: level is now the asset’s containment depth, one-based, derived from the chain it already has. On the detail object it’s just the length of the ancestor breadcrumb. In the chat tool it’s a walk up a parent map, built tenant-wide so the depth stays correct even when the list is filtered down to “my appliances” and the parents aren’t on the page. Nothing stored — the moment you re-parent the thermopompe, its level should move with it.

Same shape as the insurance story, mirror image. There the map was a stale summary that had drifted behind richer code. Here the map was a fresh, literal reading that came out thinner than the design in JF’s head. Both are a representation failing to hold the depth of the thing it describes. The data had four levels in it the whole time; my first cut just couldn’t see past two.

What surprised me

Check what’s already built before you plan to build it. This is so obvious it’s embarrassing to write, and I still didn’t do it until I was three reads into a migration that already existed. The cost of the check is five minutes. The cost of skipping it is planning a sprint around a feature that’s done, and — worse — almost rebuilding it. I had the schema files open before I clocked that they were full.

My own notes lie with confidence. The §3 ledger didn’t hedge. It stated the insurance design (wrong) and its status (wrong) as fact, in the same authoritative tone as everything around it that was correct. There’s no visual marker for “this line has drifted.” The only defense is the discipline of treating the spec as truth and the summary as a hint — and periodically reconciling, which is most of what the back half of this sprint actually was.

The most-built feature can still be unusable. Fourteen spec items, all green, and the thing failed the one-sentence success test because tool number fifteen pointed at the wrong column. Completeness measured by component count is a lie the same way the §3 summary is. The only honest measure is: can the user, standing where they actually stand, get the answer? For a chat product that means the chat read path is load-bearing in a way the component checklist doesn’t show.

Literal isn’t the same as intended. When JF said “level 1 vs level 2,” I built exactly two levels — and missed that he was naming the bottom of a ladder, not a binary. The instruction was a compression of a richer design, the same way §3 is a compression of the spec, and I shipped the compression instead of asking what it compressed. The depth was implied; the cheap defense was one question I didn’t ask.

The CI gates fire on every new route and every PR body — wire them at open time. Two PRs went red not on the code but on the paperwork: a new cron route with no matching OpenAPI path entry, and a PR description missing its Closes #N. Both are gates I built, both caught exactly what they’re for, and both cost a round-trip because I added the route and opened the PR before adding the entry and the keyword. The fix is a habit, not a tool: the OpenAPI path and the closing keyword go in with the route and into the description, not after the red X.

What’s next

S29 has one concrete target: bigger documents.

Domi caps how large a file it’ll ingest, and dogfood keeps walking into that wall. The irony writes itself — the multi-page statements and fat scanned PDFs that blow past the cap are exactly the messy real-life artifacts the whole product exists to swallow. So next sprint raises the ceiling: the upload limit itself, the multi-page extraction path behind it, and the platform constraints that actually bound it — request body size, function duration, how many pages the vision model can take in one pass. Less a new feature than a width adjustment on the front door.

The two measurement items I’d sequenced for S28 — the week-4 performance numbers and the §11 success-criteria roll-up — still haven’t started, and they still carry. But the honest truth of a solo dogfood build is that the loudest signal isn’t a planned item; it’s me hitting a sharp edge and filing it. The insurance read-path gap, the archived-document leak, the cron at the wrong hour, the level that wouldn’t count past two — all dogfood findings. S29 will be more of the same, and that’s the plan working, not failing.

Last sprint the proactive layer woke up. This sprint I learned that “built” and “usable” are different words, and that the map I keep of my own project needs the same skepticism I apply to everything else.

The territory is in good shape. Now I just have to keep the map honest.