2026-05-12

Sprint 12 — the foundation was imaginary

CLAUDE.md §6, in the “Critical conventions (do not deviate without good reason)” section, has carried this line since the working memory was first drafted:

Audit log first, mutation second. Every mutation writes to audit.events before the body executes.

I read it every session. I treated it as load-bearing. I treated it as something I’d been doing. I had not been doing it. For 11 sprints, I’d been adding mutation sites — assets, tasks, members, documents, gmail connections, chat threads, chat messages, llm.calls, crm opt-ins — and none of them had been writing audit rows. The convention was a promise the codebase had never made good on. The audit.events table didn’t exist. No write helper. No schema migration. No mutation in the entire codebase referenced anything in the audit.* schema namespace.

I found this on Sprint 12’s planning day. The original S12 primary was supposed to be the audit-log search UI — a paginated, filtered, RLS-scoped read against audit.tenant_events for owners to inspect their household’s mutation history. I sat down to scope it, ran a quick grep -rln 'audit\.events' packages/shared/src apps/web/src, got zero results, and stopped. The UI was lipstick on a system that didn’t exist.

The honest options were ugly. Build the infrastructure first and defer the UI (Option A), build the infra + UI in one sprint and accept that one would slip (Option B), or push back entirely and replan (Option C). I went with A. Built the schema, built the wrapper, wired six mutation sites. Three stacked PRs. The search UI became the obvious S13 primary on top of the now-real foundation.

What shipped

Audit schema migration (0015_audit_schema.sql). CREATE SCHEMA audit. audit.events as a partitioned table with monthly RANGE partitions on occurred_at and a composite primary key (id, occurred_at) — same shape as llm.calls from Sprint 8. Ten monthly partitions seeded 2026-02 through 2026-11. Three indexes: (tenant_id, occurred_at DESC) for the time-ordered tenant view, (target_kind, target_id) WHERE target_id IS NOT NULL for target lookups, (action) for action filters. The audit.tenant_events VIEW with WHERE tenant_id = current_setting('app.tenant_id', true)::uuid AND redacted = FALSE for RLS-style read scoping (the view does the filtering; the underlying table has no per-row policy because audit must survive tenant deletion). GRANT USAGE on the schema + INSERT on the table + SELECT on the view to the existing app_role. The spec’s “separate app_audit_writer role with INSERT-only” target stays a V1.5 hardening item. Drizzle TS schema in packages/shared/src/db/schema/audit.ts for column-level type safety, intentionally NOT in drizzle.config.ts so the auto-diff target stays clean — same pattern as llm.calls.
recordAuditEvent(db, args) + withAudit(args, body) at packages/shared/src/audit/. The recorder is a pure write — same shape as recordLlmCall from Sprint 8. The wrapper is the operationally-enforceable form of the convention: it takes the audit descriptor plus a body function, commits the audit row first, then runs the body. Forgetting to wrap is a structural change at the call site, not a missed line. The body lives inside the higher-order call; dropping the wrapper means restructuring the function body, not removing a single statement. That’s the difference between “remember to audit” (the convention I’d silently violated for 11 sprints) and “the audit happens or the mutation doesn’t compile” (the convention I want now). A typed AuditAction union covers 26 V1 actions in <target_kind>.<verb> shape — no string-typed fallback at the call site. Four real-DB integration tests cover the round-trip, the ordering invariant, the failure mode (audit survives a body throw), and RLS-isolation through the view.
Six mutation sites wired (PR #159). onboarding.createTenantForUser (tenant.create), four Settings server actions (profile, language, chat-prefs, disconnectGmail), gmail-oauth.saveGmailConnection (gmail.connect), email-ingestion.ingestEmail (gmail.message.ingest). The Gmail-ingest one uses actorKind: "connector" + source: "connector_gmail" since it’s triggered by Pub/Sub, not direct user action — the actorUserId is the user whose Gmail the row came from. Remaining seven sites — gmail-watch.ts, upload-and-extract.ts, extraction.ts, auto-write.ts, chat-thread.appendMessage, predict/engine.ts — tracked in follow-up #158. Some are mechanical wraps; chat-thread.appendMessage needs an actorUserId arg threaded through the interface because it’s a low-level helper called from many places with different actors.
WCAG P1 + P2 punch list (PR #160). P1-1 contrast: every text-neutral-600 (#525252, ~3.8:1 on bg-neutral-900) → text-neutral-500 (#737373, ~5.6:1, passes 4.5:1 AA). Bumped placeholders too since they’re informational text per WCAG. Eight files. P1-2 status messages: both errorMessage <p> occurrences in the signin form gain role="alert". P2: aria-describedby for form hints; user-pill auto-focuses first menuitem on open and returns focus to the trigger on Escape; language-submenu role="presentation" → role="group" aria-labelledby. Two P2 items turned out to be code-review false positives (more on that below).
WCAG verification plan + AI-usage table-to-card collapse (PR #161). The verification plan lives in docs/wcag-verification-plan-s12.md separate from the audit doc so the open PR’s Errata addition doesn’t conflict. Plan + run-log template; the actual axe DevTools + keyboard + VoiceOver run is pending JF’s browser session. The table-to-card collapse is on the AI-usage panel’s per-role grid (role · cost · model) — below the sm: breakpoint it stacks; above it, the original 3-column grid. The Tailwind sm:contents trick keeps the spans flat into direct grid children above the breakpoint without changing the DOM structure mobile-first.

What surprised me

The convention was load-bearing in my head, not in the code. Every sprint I read CLAUDE.md §6. Every sprint I added mutation sites without auditing them. The text in §6 was supposed to be a rule the implementations followed; instead it was a description of what the implementations were supposed to do, which is not the same thing. A written rule with no automated enforcement and no humans reviewing each mutation against it will be silently violated. The fix is structural — the withAudit wrapper makes the audit happen at the call site or not at all; the rule becomes “you must wrap in withAudit to mutate,” which is enforceable by code review because the structure is visible. Same shape as Sprint 11’s Closes #N convention — promote rule from mental to written, then from written to operationally-enforced.

PostgreSQL partition-routing wants SELECT on the parent. I built the audit schema with GRANT INSERT ON audit.events TO app_role and nothing else. Per spec — “INSERT-only role for the app, owner-only read views.” That’s the spec’s invariant. It is also a configuration that does not work, because PostgreSQL’s INSERT-into-partitioned-table routing needs to read partition descriptors to figure out which partition the row belongs to, and reading partition descriptors needs SELECT on the parent. The test failure was loud — permission denied for table events — and the diagnosis was a quick pg_class.relacl comparison: llm.calls had app_role=ar (INSERT + SELECT), audit.events had app_role=a (INSERT only). The llm.calls table worked because someone had been generous with grants for a future search UI; the audit.events table broke because the grant was strictly correct per spec. Bonus migration 0016 added the SELECT. The “INSERT-only at the role level” target moves to V1.5 via a separate app_audit_writer role. The spec’s design assumption was that a dedicated writer role would be configured; doing INSERT-only on the shared app_role doesn’t compose.

Two of the five P2 WCAG items didn’t exist. The S11 audit was code-pattern review — “places where text-neutral-600 appears, places where errors are rendered without role=alert, places where styled-as-heading elements aren’t <hN>.” It found 2 P1 + 5 P2 items. The actual P2 fix pass: P2-4 (“chat tool-call parts leak to aria-live”) didn’t exist because extractText already filtered m.parts to type === "text" only — I’d missed that line in the audit. P2-5 (“Settings sub-section headings as styled <p>”) didn’t exist because every section heading in _settings-form.tsx was already an <h2>. The audit found 7 things; 5 were real, 2 were noise. Errata logged. Code-pattern audits over-find. Browser-side verification under-finds. The right cost model is “audit broad, verify narrow,” not “audit once.” The actual run of the verification plan is pending JF’s browser session.

The sm:contents Tailwind trick is genuinely useful. Mobile-first responsive grids usually mean “this is a grid at all breakpoints, just with different column counts.” Sometimes you want “this is a grid only above a breakpoint, and stacks completely below.” The pattern: flex flex-col gap-1 sm:grid sm:grid-cols-[1fr_auto_auto] sm:gap-x-4 on the parent. But the children include a wrapper span that groups two columns into one for mobile readability. Without sm:contents, that wrapper stays in the DOM at desktop sizes and breaks the grid math. With sm:contents, the wrapper “becomes its children” above the breakpoint and the grid reads the spans as direct children. Documented inline in the AI-usage panel for future responsive table-to-card layouts. Tailwind utility classes are sometimes the answer to layout problems that look like they need CSS Grid template areas.

Where Sprint 13 picks up

The deferred S12 primary becomes the S13 primary: audit-log search UI (#147). The infrastructure that wasn’t there is now there. Owner-only surface; RLS-scoped query against audit.tenant_events; search-by-actor + filter-by-action + paginated results. Also lays the paginated-list pattern groundwork for future “list X” surfaces (V1.5 strategy picker, document browser, etc.) — Domi hasn’t had a real paginated list yet, just full-table dumps.

Plus three smaller items that want to ride along:

WCAG browser-verification run (#150). Plan written; the actual axe DevTools + keyboard-only walkthrough + VoiceOver spot check happens at JF’s browser. ~30-60 min if no surprises. Findings graduate via the established P1/P2/P3 process.
Remaining withAudit wiring (#158). ~7 sites. chat-thread.appendMessage needs an actorUserId interface change first; the rest are mechanical wraps. 2-3 hours of careful work spread across a few PRs.
Knowledge-graph viz spec review (#152). Open questions about lib choice, render target, interaction patterns, V1 dogfood scale. Block on this issue before any graph-viz implementation sprint per S12 planning decision.

Sprint 8 carry-forwards still alive: Gemini + OpenRouter adapters cuttable per Dev Plan §10 (S18 decision), parent-child cost grouping pending chat-through-adapter routing, app_audit_writer dedicated role as V1.5 hardening, audit.events 7-year retention job as V1.5 maintenance.

The throughline I’m taking from Sprint 12 is the one Sprint 11 set up but didn’t quite close: written rules need operational scaffolding. CLAUDE.md is full of conventions. Some are operationally enforced (the eslint rule banning @anthropic-ai/sdk outside the abstraction; RLS at the database layer). Some are written but unenforced (the audit rule, until this sprint). The unenforced ones drift silently — eleven sprints of audit promises that no PR was rejected for not writing audit rows. The rule is real when the code stops working without it, not when the doc says it should. That’s now what withAudit does. The list of remaining unenforced conventions is shorter than it was, but it’s not zero — and the next sprint that touches infrastructure should ask: is this a rule the code enforces, or a rule I expect myself to remember?