Sprint 6 — making the cash-out durable


The Sprint 5 post ended with “M6 is plumbing — making the cash-out durable.” That mostly happened. The chat that read from Sprint 3-4’s data and answered in real sentences now has a real history on the server side; refresh the browser, sign in from a second device, your conversation is there. The “Settings” link in the user-pill menu, which was conspicuously absent in Sprint 5.5 because there was no route to send it to, now goes somewhere. The Sprint-2 carry-forward where the app cheated and opened a privileged DB connection to look up which tenant a user belonged to — finally retired by a SECURITY DEFINER Postgres function. None of which is user-visible in the spectacular way Sprint 5’s chat surface was. That’s the point.

Two evenings + one half-day of dev time, six issues queued, six PRs merged. First sprint where multiple PRs ran in parallel against the same shared files — Drizzle schemas, the LLM abstraction’s lint rules, the RLS-touching lib/onboarding.ts. The merge friction shaped both the work and the lessons.

What shipped

  • chat_threads + chat_messages per Data Model §9. Two tables, tenant-isolated via RLS, with content_encrypted backed by libsodium secretbox. Generic encryptBytes / decryptBytes / encryptString / decryptString helpers at crypto/secretbox.ts — same nonce || ciphertext wire format as the existing filename encryption, chosen specifically so the future per-tenant DEK migration is a re-key in place rather than a format change. First real RLS-isolation integration test landed: create a tenant via the privileged connection, write through withTenant, then read the same row under a DIFFERENT tenant context using the unprivileged app_role — and assert zero rows. Earlier sprints relied on social trust that the policy actually fired; this sprint nailed it down.
  • SECURITY DEFINER app_get_user_tenant(text). The Sprint 2 carry-forward retires for the read path. The function runs as its owner (which has BYPASSRLS via owning the schema) but is callable from app_role via an explicit GRANT EXECUTE. Returns the user’s first membership’s tenant_id or NULL. Before: every user→tenant lookup opened a bootstrap_role connection that bypassed RLS entirely. After: only createTenantForUser (the very-first membership insert, where RLS WITH CHECK can’t yet be satisfied) and getTenantById still touch that connection. The blast radius shrank from “every tenant lookup” to “two narrowly-scoped paths.”
  • tenants.regiontenants.region_code. Naming drift between tenants.region (lowercase default ca-qc) and region_packs.region_code (uppercase CA-QC) was making my joins ugly. Renamed and uppercased existing rows in the same migration. Greeting time-band moved server-side as part of the same PR: the chat page resolves tenantId → tenant.region_code → regionPack.locale.timezone → greetingBand via Intl.DateTimeFormat, passes the band as a prop. The empty-state greeting drops its useEffect + suppressHydrationWarning workaround — both renders produce the same string now.
  • Chat persistence wiring. Single thread per user for V1 (the multi-thread rail is User Flow & Nav §5.2 work, deferred). The chat page server-component fetches the active thread, decrypts, passes as initialMessages. The /api/chat route persists each user message before the model call and the assistant’s final text via the AI SDK’s onFinish callback. Tool-call/tool-result messages aren’t persisted — they’re turn-local; the model recomputes them from the user/assistant text on each turn. localStorage domi:chat:messages:v1 is gone; pending-input (unsent input across session-expiry round-trip) stays.
  • /[locale]/settings with two sections per Settings IA §3 — Profile (display_name editable, email read-only) and Language & locale (en / fr-CA / use-route-locale radio). The user-pill “Settings” link reinstated. Other Settings sections (Chat preferences, Notifications, API tokens, all of §4 Household) deferred — V1 dogfood doesn’t need them yet, and the layout primitives shipped here let later sections drop in without surface-level rework. Server actions write users.name and users.language_pref.
  • ESLint banning @anthropic-ai/sdk outside the adapter. CLAUDE.md §6 says provider SDKs are only allowed inside the LLM abstraction. Today the rule is enforced socially; this PR makes ESLint enforce it. no-restricted-imports at the root flat-config; an override block exempts packages/shared/src/llm/adapters/**. Verified by adding a throwaway probe file in apps/web/src/lib/ — ESLint flagged it with the right error message.

That’s M6 — Settings + persistence — closed.

What I had to fix mid-sprint

The merge friction was the new thing this sprint, and it bit three different ways.

Drizzle parallel-branch snapshot collision. Two branches (feat/m6-security-definer-tenant-lookup and feat/m6-tenant-region-code) both branched off main with #75 as their parent. Each ran drizzle-kit generate independently, producing 0007_snapshot.json with prevId pointing at 0006. After both merged, the second one (renamed to 0008 to dodge the file-name collision) still pointed at 0006 in its prevId — and drizzle-kit’s lineage check failed with “snapshots pointing to a parent which is a collision.” Fixed by editing 0008_snapshot.json’s prevId to chain through 0007_snapshot.json’s id. Also had to fix 0008’s snapshot column entry that still showed region (pre-rename) instead of region_code--custom migrations don’t re-introspect the schema, so the snapshot stays stale. Worth knowing for any future workspace doing parallel feature branches against the same Drizzle schema: your snapshots have a single linear chain, and parallel branches mean the second-to-merge has to rewrite its prevId.

--custom migration with empty body got marked applied. When you generate a custom migration, drizzle creates an empty file with a placeholder comment. Running db:migrate on that empty file marks the migration as applied in __drizzle_migrations. Filling in the SQL afterwards and re-running db:migrate doesn’t re-apply, because drizzle’s tracking thinks the migration is already done. Fix on staging was to apply the SQL directly via a one-off script (CREATE OR REPLACE FUNCTION is idempotent, so safe). Production deploys will run all migrations from scratch including the now-real 0007 SQL, so production isn’t affected. Going forward: write the SQL into the file before running db:migrate for the first time. Obvious in retrospect; not where I expected the foot-gun to be.

Main-merge regression on PR #83. When main was merged into the feature branch (so the branch could pick up #82’s already-landed work), an import that #82 had removed (eq from drizzle-orm) was needed by #83’s new getTenantById. The merge resolved cleanly at the textual level — git saw no conflict — but produced broken TypeScript at the merged state. CI caught it. This is the same root cause as Sprint 5’s PR #57 → #58 stacking gotcha, just in a different shape: parallel branches that touch shared files need a final post-merge typecheck before pushing, every time. Adding it to the post-merge mental checklist along with “verify the PR is still open before pushing follow-ups” — that one was supposed to stick last sprint, didn’t.

What surprised me

The first real RLS isolation test felt larger than it looks on paper. Earlier sprints had RLS policies on every tenant-scoped table, and they had been present — meaning, you could read them with \d+ in psql and see the USING clause. They had not been proven, in the sense of “I have a test that shows a different tenant context returns zero rows on a row that exists.” The chat round-trip integration test does that for the first time. The pattern is reusable: create a tenant via the privileged connection (so the WITH CHECK doesn’t bite), seed under withTenant, read under a DIFFERENT withTenant, assert empty. I’m going to write this test for every new tenant-scoped table from now on, because the cost is one Vitest file and the confidence is large.

The bootstrap-DB cheat shrinking is a relief. Sprint 2’s getBootstrapDb() was openly described in its own comment as “we cheat” — open a separate connection as the database superuser and let RLS not apply. It worked, it had been clearly scoped, and it had also been visible to me every time I read tenant-bootstrap.ts for any reason. Watching the read-path callers refactor to the SECURITY DEFINER function, and the file’s leading comment go from “This is a placeholder.” to “Used only by createTenantForUser for the very-first membership insert” — small thing, but the kind of thing where you stop seeing the codebase as a pile of “things I’ll fix when I get to them” and start seeing it as a thing that’s been getting better.

Server-side greeting was the right shape, not just better data. Sprint 5.5 had moved the greeting time band from “browser-local time” to “browser-local time but with the right interface text,” via a useEffect that ran post-mount and then setBand(...) to the right value. SSR rendered an indeterminate state and the first client render replaced it. The suppressHydrationWarning was holding back React’s mismatch warning. It worked, and it was the right call for V1.5 where the per-tenant region didn’t exist. This sprint, with the per-tenant region_code field landing alongside the Intl.DateTimeFormat(timeZone) resolution, the band could be computed once on the server and passed in as a prop. The useEffect disappeared. The suppressHydrationWarning disappeared. The greeting now renders with the right text on first paint, no flicker. The lesson I’m taking is: when you find yourself writing a workaround whose name starts with “suppress,” there’s usually a way to delete it once one upstream input changes shape, and it’s worth keeping a mental note of which input.

The fix-by-direct-SQL-on-staging is a tiny dirt-stain on the migrations-are-clean reputation. When the empty-body 0007 got marked applied, I could have rolled back the migration on staging and re-run it. I instead ran the SQL directly through a one-off script. That’s defensible — CREATE OR REPLACE FUNCTION is idempotent — but it means staging’s __drizzle_migrations history doesn’t perfectly match the SQL that’s actually live. Production will be fine because it’ll run all migrations from a clean slate. But “the migrations chain is the truth” was a thing I’d been able to say without caveats up to this sprint. Now there’s a footnote.

Where Sprint 7 picks up

M7 — chat-to-plan escalation (Dev Plan §8 Phase 7). The cheap chat role escalates to the premium reasoning role when the user asks for something that needs deeper planning (“build me a maintenance plan for the new car”). Tool-use is the preferred path; classify+route is the fallback. The provenance JSON has to record which role(s) handled the turn so cost telemetry stays accurate. This is the first sprint that exercises the LLM abstraction’s role layer in a non-trivial way — Sprints 1-6 only used the chat role with one model. M7 forces the abstraction to actually be an abstraction.

The Sprint 7 backlog gets cut at sprint planning Monday. Two things I want to keep in mind from this sprint as I scope it:

  1. If multiple PRs need to touch the same Drizzle schema, plan for them to land sequentially or do the second one as a --custom migration that knows the first one happened.
  2. Write the SQL into --custom migration files before the first db:migrate. Always.