Sprint 9 — the inbox surfaces the gaps


Sprint 8 ended on the line “sometimes the sign that an abstraction is right is that the work to use it for the second time is faster than the work to build it for the first.” Sprint 9 ended on a different line: using a thing tells you what’s missing in a way no spec can.

The plan was M9 — Email & Calendar connector per Dev Plan §8 Phase 9. Gmail watch via Cloud Pub/Sub, locked in spec v0.9 as push notifications not polling, so a new bill or insurance renewal lands on a documents row without me clicking anything. Five issues queued: OAuth foundation, the email-ingestion helper with the §5.12 filters, one-shot import via a “Sync now” button in Settings, the passive watch via Pub/Sub, and the token refresh + 404 channel re-auth machinery. Two evenings, all five merged. The connector demo per the spec works.

What I didn’t plan was what happened next. Within minutes of connecting my own Gmail and forwarding myself a Hydro-Québec bill, the connector silently skipped it. The spec said “from an allowlisted sender OR contains an asset/member-name keyword match.” Forwarded emails fail both: the From: header becomes my own address, and the keyword match looks at tenant-asset display names which weren’t matched in the body. The bug wasn’t a spec violation. The behavior was correct per the letter of §5.12. It was just, you know, useless.

That cascaded. Once I was looking at the live catalog with one user (me), 28 hand-picked Quebec consumer entries felt arbitrary. Once I was using slash commands /think and /quick in chat, I wanted to type /list providers and see what Domi was actually watching for. Once I’d reloaded the chat page, I wanted the latest answer in view, not the first. Three scope expansions landed mid-sprint at my own ask, and the dev plan got a v0.3 bump to capture the V1-launch surface — Terms, Privacy, Beta badge, CRM opt-in — that I’d been carrying as a vague mental note.

What shipped

  • Gmail OAuth foundation. OAuth2 client wrapped in gmail-oauth.ts. The consent URL builder takes a signed state, the auth-code → tokens exchange returns a typed ExchangeResult, and saveGmailConnection encrypts both the access token and the refresh token using the libsodium secretbox key already in service for filename + chat-message envelope encryption (V1 single key per Issue #76). Settings → Connectors gets a Gmail card with Connect / Disconnect actions. The gmail_connections table has the same RLS shape as everything else — tenant_id = current_setting('app.tenant_id', true)::uuid.
  • Email-ingestion helper + filter rules + idempotency. ingestEmail() is the single entry point both the one-shot import and the passive watch call into; the §5.12 filters (Promotions/Social/Spam/Trash exclusion, allowlist, tenant-keyword match) live in lib/email-ingestion.ts as pure helpers; idempotency rides a partial unique index on (tenant_id, external_message_id) WHERE external_message_id IS NOT NULL. On accept the email body uploads to R2 and a documents row inserts at status='uploaded' with mime_type='text/plain'no extraction in V1; the row sits in the existing M3 documents list and the auto-extraction-from-email path is a deliberate later sprint.
  • One-shot Gmail import via “Sync now”. A button in Settings → Connectors that walks users.messages.list + users.messages.get through the inbox, dedupes on Gmail message id, and ingests the matches. Returns a typed summary { totalScanned, ingested, skippedCategory, skippedNoKeywordMatch, skippedDuplicate, errors } so the UI can surface what the connector decided about each message. Useful both as a diagnostic and as a backfill once you connect for the first time.
  • Pub/Sub passive watch. Per-tenant Pub/Sub topic + subscription, a webhook at /api/connectors/gmail/pubsub that verifies Google’s OIDC JWT (using google-auth-library against a 5-minute replay window) before enqueueing an extraction job, a historyId watermark on gmail_connections so we don’t re-process old messages on a watch re-establishment. Idempotent re-establishment when the channel id rotates — Google rotates them every ~7 days and the 404 on a stale users.history.list triggers a fresh activateGmailWatch.
  • Token refresh + 404 channel re-auth. withGmailAuth(connection, fn) wraps every Gmail API call. 401 → call refreshGmailAccessToken, retry once. 404 on users.history.list → assume the watch channel rotated, re-activate, persist the new channelId + historyId. gmail_connections.lastError surfaces in Settings; transitions to status='failed' so the user gets a Reconnect button when the refresh token itself is revoked.
  • OAuth scope fix. Live OAuth callback was failing with Request is missing required authentication credential. Expected OAuth 2 access token. The scope I was requesting was gmail.readonly, but exchangeGmailCode was calling oauth2.userinfo.get() to fetch the connected email address — and that endpoint requires userinfo.email, which I wasn’t requesting. Fix: gmail.users.getProfile({ userId: "me" }) returns emailAddress and is covered by gmail.readonly. PR #120, one file, six lines.

That’s the planned + caught arc — five PRs and one runtime fix.

The unplanned arc:

  • Forwarded emails are first-class now. Two new accept paths. bodyContainsAllowlistedDomain(body) scans the first 8000 chars of the body for any allowlisted pattern — catches the typical From: noreply@hydroquebec.com line that lives inside the quoted forward block. hasAllowlistedNameMatch(subject, body) scans subject + first 1000 chars of body for any localized brand name — catches Fwd: Avis Hydro-Québec that drops the original headers entirely. AllowlistEntry gains an optional names: ReadonlyArray<string> so each entry carries EN + FR forms.
  • The catalog is now research-backed and Quebec-specific. 28 entries grew to 227 across 9 categories with three new ones: internet, mobile, streaming. The web-research session locked in: every internet provider in Quebec (15), every meaningful mobile carrier (14), the streaming services available in Canada (9), the big-six + digital banks + credit-card issuers (27), every major insurer including Quebec mutuals (23), every vehicle brand sold in Canada plus the four largest Quebec dealer groups (46), Greater Montreal + Quebec City school service centres + all 9 anglophone school boards + 18 Quebec universities + the largest CEGEPs (65), and pharmacies + hospital centres + telehealth + optical (26). Bell/Rogers/Telus moved from utilitymobile; Vidéotron from utilityinternet. The list is no longer “things JF probably gets email from”; it’s the Quebec consumer surface.
  • Chat slash commands. /help (bilingual), /list assets|members|documents|tasks|providers [category]. The slash endpoint short-circuits the LLM (free, instant) but persists both turn halves to the active chat thread — so subsequent LLM turns can reference the slash output in their context. The pattern matters: it’s a deterministic surface for the things the LLM doesn’t need to be involved in, while staying within the same chat history that the user reads. /list providers vehicle prints the 46-entry vehicle list inline.
  • Auto-scroll polish. Two bug-class UX fixes: on reload the chat log was scrolled to the top of the restored thread (so a returning user landed on their first message, not their latest), and on send the freshly arriving line landed below the fold. One useEffect keyed on [messages, hydrated] writes scrollTop = scrollHeight. Re-fires per streaming token — that’s intentional, the latest token always stays visible.
  • Dev plan v0.3. Phase 10 (V1 launch) got four items I’d been carrying as mental notes: Terms of Service at /[locale]/terms, Privacy Policy at /[locale]/policy, a “Beta” badge on the public sign-in page, and a CRM opt-in (crm_optins table — email PK; first/last name nullable, back-filled later; source; opted_in_at; unsubscribed_at; last_emailed_at). Resend handles delivery. One-click unsubscribe deferred to V1.5; V1 captures only.

What surprised me

Live testing finds bugs the spec validates. The From: header check wasn’t wrong; it was correct. §5.12 said “from an allowlisted sender OR contains an asset/member-name keyword match,” and that’s what shipped. The user-visible behavior — Hydro-Québec bill, forwarded to myself, silently skipped — was what I’d actually use Domi for, and the spec was loose enough that the bug fit comfortably inside it. Spec coverage isn’t behavior coverage. I won’t get this wrong by writing better specs; I’ll get it right by using the product earlier in the loop. The dev plan calls for a 4-week dogfood window before V1 ships, and Sprint 9 was a tiny preview of what that’s going to be like — every hour of real use is going to surface something that was technically-correct-and-still-broken.

The catalog wanted to be bigger than I thought. Twenty-eight entries felt right when I was writing the file. Twenty-eight entries felt embarrassing the first time I typed /list providers and saw the summary. The body-domain scan I just added catches @bell.ca in forwarded headers — but if a user types “Bell” or “Vidéotron” or “RBC” in chat and asks “do you watch for them,” the answer is yes-or-no based on whether I happened to think of them. The right response was a research session. About 80 minutes of structured web search across nine consumer categories produced the full list. The expansion isn’t gold-plating — Quebec has a long tail of small ISPs (oxio, TekSavvy, VMedia, Distributel, Beanfield, Bravo Telecom, EBOX, Acanac, CIK, Carry, Sogetel, Maskatel) that real Quebec households use, and which a “household management product for Quebec” needs to know about by default.

Slash commands as a “deterministic surface alongside the LLM” is a pattern I want more of. /list providers doesn’t need an LLM. It’s a static catalog read. Routing it through the chat model would have been free in dev hours but expensive at runtime — every list query consumes context and tokens for a deterministic answer. Carving out a non-LLM surface that lives in the same chat thread (so its output is in context for the next LLM turn) is a nice middle ground. The user types /list providers, sees the list, then asks “do I have any Quebec utilities?” — the next turn has the slash output in its message history and can answer directly. Two execution surfaces, one user-visible thread. I think there’s a generalization here for “things the user wants quickly that the model would only get right slowly.”

The --custom migration empty-body footgun bit a fourth time. This is now a sprint-over-sprint pattern. Sprint 6 introduced custom migrations for the app_get_user_tenant SECURITY DEFINER function and I learned that drizzle-kit’s --custom generate creates a placeholder file you have to fill in before db:migrate. Sprint 8 hit the same on llm.calls. Sprint 9 hit it again on gmail_connections. The procedural “remember to fill the file before migrate” mental note hasn’t taken root because the gap between “I have an empty file” and “the migration has been marked applied with no DDL run” is invisible from the CLI output. Adding a procedural guard to the sprint-N.md template under “Issues caught”: did the SQL get into the file before the first db:migrate? Fourth time means it stops being a personal-discipline issue and becomes a process gap.

Vercel doesn’t auto-deploy every merge to main, and I keep forgetting that. PR #120 fixed the OAuth scope bug. I merged it, retested, got the same slice(0, 80) truncated error, and spent five minutes on the wrong hypothesis. The truncated error string was identical to the old one — that’s the diagnostic moment. New code would produce a new truncation; identical truncation means old code. Force-redeployed via the dashboard, the fix worked. The lesson: when a fix-it commit lands and the live error doesn’t change, suspect the deploy first. The truncated string is the deploy-status signal.

Where Sprint 10 picks up

Phase 10 ramp. Two carry-forwards from earlier sprints want to land:

  • Cost-line UI in Settings → AI usage. Third sprint of carry. The data has been in llm.calls since M8 — role, model, cost, tokens, occurred_at, parent_call_id — and the Settings IA §4.3 surface still doesn’t render it. A simple two-table query (calls grouped by role + day, calls grouped by parent_call_id for the parent-child cost grouping) and a small chart. Day’s worth of work; should not be the fourth sprint of carry.
  • OpenAI cross-asset miss as an eval re-baseline. Sprint 8’s known regression. Re-run the chat matrix after the slash-command surface lands and confirm it remains the only regression — the eval is the right surface, not a code fix.

A few Phase 10 launch items want to start moving while they’re small enough to fold in incrementally:

  • crm_optins table migration + sign-in opt-in checkbox + Beta badge. Per Dev Plan v0.3. Schema is one custom migration (the empty-body footgun applies — fill the file first), the checkbox is a couple of lines on the sign-in form, the badge is a span on the public layout. Token-backed unsubscribe stays deferred to V1.5.
  • A chat-input placeholder hint pointing at /help. Discoverability for the slash-command surface that just landed. The placeholder reads “ask domi something…” today; an iteration like “ask domi something… or /help” gives the slash commands a foothold without dedicated docs.

Carrying forward as still-V1-cuttable: Gemini and OpenRouter adapters (decision still pending end of Sprint 18 per Dev Plan §10), knowledge-graph viz (decision at Sprint 12, lowest-stakes V1 cut), mobile UX deltas + audit-log search UI, and the non-functional gates (WCAG 2.2 AA spot check, threat-model sign-off, PIA, DR runbook). Plus the 4-week dogfood window the dev plan calls for — at the pace of one bug per evening of real use, that window is going to be denser than the building was.

The sprint that I thought would be five issues was five issues, plus three I didn’t know I needed to ship until I tried using what I’d built. Live testing within the sprint that ships the feature isn’t quite the dogfood window per the dev plan, but it’s the same shape. What I’m starting to think is true: the sprint that ships a feature should also ship at least one improvement to that feature, made the moment after using it.