Where we are right now
Honest snapshot anchored on the head-of-api status file last updated 2026-05-24 13:00 UTC, plus what's verifiable today.
22%
Weekly Max-200 used
1 day into week 5
$2,037
24h list-price equiv
1.94× the 7d median
$1,048
7d median per day
Baseline if nothing weird
2026-06-01
API key cap reset
Set in your console
0
Watchdog actions
Since flip 2026-05-24
The actual question: API key is hard-capped through June 1. The current burn is the Max-plan OAuth window. At 22% used on day 1 of a weekly cycle, simple math: 22 × 7 = 154% projected. Without action, you'll be rate-limited mid-week.
What's contributing to the Max-plan burn right now
| Source | Status | Est. share | Cheapest fix |
| This long Claude Code session (Opus 4.7) | live | ~50% | Switch to Sonnet 4.6 for non-architecture work · cap turns/session |
| Agent dispatches (overnight + control surface builds) | live but rare | ~20% | Don't dispatch Agents for work I can do myself |
| D⁴ chain on each prompt | live, cap N=2 | ~15% | Route mental-model skills to Haiku instead of Sonnet |
| Wiki-update headless run (caught + killed) | stopped | ~5% | Already stopped |
| Misc deploys, MCPs, watchdog ticks | low | ~10% | Trivial · keep as-is |
Hard caps already in place
- live cpu-bleed-watchdog · auto-SIGSTOP runaway processes
- live D⁴ flock concurrency cap N=2 · max 2 parallel claude subprocs
- live Process-group cleanup on timeout · no orphans
- live action-executor budget-slot gate · refuses direct invocation
- live Anthropic API key spend cap · auto-resets 2026-06-01
- missing Per-session-turn cap · this is the next layer to add
- missing Model-tier guard · should route work to Haiku/Gemini when feasible
- missing Weekly-budget-aware scheduler · should slow down at 70% used
What we built in the last 3 days
Plain English. Every system-wide change between 2026-05-23 and 2026-05-25, with what it costs to run.
Infrastructure & safety
- CPU/Bleed Watchdog (2026-05-24) — launchd job polling every 30s. If anything pegs CPU > 80% for 5+ min, it auto-SIGSTOPs the process and sends you a Telegram. Cost: $0/mo. Did its job all night, 0 incidents since flip.
- D⁴ Engine Rebuild (2026-05-24) — replaced the broken "every prompt fan-outs Claude" pattern with a lean enqueue hook → worker with flock concurrency cap. Three banned patterns now structurally impossible. Cost: ~$0.05–0.20 per user prompt when worker fires real skills.
- OpenClaw Dispatch Bridge (2026-05-24) — HTTP service on your Mac that proxies all LLM calls through one place. 4 adapters (Claude OAuth, Gemini, OpenAI stub, Ollama stub). Cost: $0/mo for the bridge itself; calls billed to whatever provider you route to.
- Tailscale Funnel — external HTTPS access to the bridge so the Vercel UI can hit it. Cost: $0/mo.
- 42 Launchd dispatcher agents booted out — auto-dispatcher, agent-world-class reflection/brief/trends, daily/weekly briefs, scan-victor-interactive, head-of-api daily, usage-overseer, all the hook.* agents. This was the silent multi-hundred-dollar/mo killer. Each had been firing on cron and spawning Claude. Cost: ~$300–800/mo saved (rough estimate based on per-tick claude burn × frequency).
Auth & routing
- OAuth-over-API-key preference — fixed the Phone Steve bridge so all Claude calls route through your Max plan, not the API key. Already implemented in
~/.openclaw/bin/phone-steve-cc-bridge.py (lessons.md 2026-05-21).
- D⁴ action-executor env-strip — same fix applied to the D⁴ worker so it uses Max OAuth not API key.
- Loopback-grace auth fix in bridge — requires bearer token even from 127.0.0.1 when token is configured (Tailscale Funnel proxies to localhost).
Project framework
- D⁴ rebuild SPEC + staged rollout (~/projects/d4-rebuild-2026-05-24/SPEC.md) — Stages 1 / 1.5 / 2 / 3 with explicit pass conditions, all completed.
- 3 new lessons in lessons.md — UserPromptSubmit fan-out, subprocess timeout grandchildren, loopback-grace + reverse proxy. These prevent recurrence.
- OpenClaw Dispatch Bridge SPEC (~/projects/openclaw-dispatch-bridge/SPEC.md).
- Planning Session doc v1/v2/v3 (planning-session-2026-05-24-bleed-and-control-surface).
UI surfaces
- Control Surface v1 (the deployed Next.js v2.0 from 2026-05-22) — chat with full tool-rendering. Still live at
control-surface.vercel.app.
- Control Surface v3 — project-grid only. Wrong scope. At
control-surface-v3.vercel.app.
- Command Center v4 — Linear-with-chat per td-decisions architecture. Live at
cc.thomasdigital.com. This is the current surface.
- Custody research compile — at
custody.thomasdigital.com.
- Overnight digests + planning docs — all on Vercel.
Overnight autonomous run (2026-05-24 → 2026-05-25)
- 3 Agent dispatches: MSA forensic audit (custody), CA family law research, wedding "who's on the boat" paragraphs.
- D⁴ Stage 3 synthetic load test (5/5 entries clean).
- Project inventory + lessons retrospective (no LLM cost).
- Granola research (WebSearch+WebFetch, free).
- Total estimated burn for the overnight: under $5 of Max-plan capacity.
Where the spend went
Best-effort allocation of the ~$2,037 list-price-equiv 24h burn. Anthropic doesn't show per-feature breakdown via API, so this is reasoned from request types + my session knowledge.
24-hour spend allocation (estimated)
This Claude Code session (Opus 4.7)~$1,018
Misc (deploys, MCPs, watchdog)~$204
Killed wiki-update run~$102
The honest read: ~50% of your overage is THIS conversation. Opus 4.7 with a long context window costs about $15/Mtok input and $75/Mtok output. Every long turn is ~$2–5. After 50+ turns over 2–3 days, that compounds to the $1k range fast.
What the 80/20 says
| Lever | Cost cut | Quality cost | How fast |
| Stop using Opus 4.7 unless task is architecture / strategy | ~50% | Low (Sonnet 4.6 is >90% as good for execution) | Now |
| Don't dispatch Agents for work I can do directly | ~15% | None (just self-discipline) | Now |
| Route D⁴ mental-model skills to Haiku 4.5 | ~10% | Negligible (these are classifiers) | 1h work to wire |
| Add per-turn budget cap that warns at 70% weekly | ~5% | None (just a heads-up) | 2h work |
| Batch related questions into one turn | ~5% | None | Behavioral |
Combined effect: these 5 changes could cut burn ~85%. Bring you from 22%/day to ~3.3%/day. Comfortably under weekly cap with headroom.
Frontier model comparison
Where you are vs the frontier today, and the cost/quality curve. Tier the work to the right model.
| Model | $/Mtok in | $/Mtok out | Frontier rank | Best for |
| Claude Opus 4.7 | $15 | $75 | frontier | Architecture, multi-step strategy, complex code reasoning |
| Claude Sonnet 4.6 | $3 | $15 | near-frontier | Default for skilled work — 90%+ as good as Opus on most tasks |
| Claude Haiku 4.5 | $0.80 | $4 | competent | Classification, routing, simple Q&A, structured extraction |
| GPT-5 | ~$15 | ~$60 | frontier | Same tier as Opus · alternative provider |
| Gemini 2.5 Pro | $1.25 | $10 | near-frontier | Cheap research with huge context · multimodal (Loom!) · 2M tokens |
| Gemini 2.5 Flash | $0.075 | $0.30 | competent | Bulk classification · 200×+ cheaper than Opus |
| Llama 3.3 70B (local Ollama) | $0 | $0 | solid | Free fallback · slower · no quotas |
The math: Switching the same task from Opus to Sonnet = 5× cheaper. From Opus to Haiku = 19× cheaper. From Opus to Gemini Flash = 200× cheaper. Local Ollama = free.
How good are we on Max-200?
You have access to everything from Haiku to Opus via the Max plan, bounded by hourly + weekly windows. The bridge can also fall back to Gemini API and (when implemented) local Ollama. So your effective stack is:
- Frontier-class reasoning available (Opus 4.7) when needed
- Near-frontier default (Sonnet 4.6) for 90% of work
- Cheap-classifier for routing/triage (Haiku, Flash)
- Free escape valve for when quotas are tight (local Ollama, planned)
You're not behind the frontier on capability. You're ahead of most users on infrastructure (the bridge + tiered adapters is the right architecture). The problem is execution discipline — defaulting to Opus when Sonnet would do.
Tier routing strategy
Which model gets which task. The D⁴ guardrails CAN enforce this — but right now they don't. Wire it.
flowchart TD
A[User prompt arrives] --> B{Haiku classifier}
B -->|Architecture / strategy / multi-step| C[Opus 4.7]
B -->|Code / draft / skilled work| D[Sonnet 4.6]
B -->|Lookup / classify / extract| E[Haiku 4.5]
B -->|Research / huge context| F[Gemini 2.5 Pro]
B -->|Bulk classification| G[Gemini Flash]
B -->|Offline / free| H[Local Ollama]
C -.budget exhausted.-> D
D -.budget exhausted.-> H
E -.budget exhausted.-> G
style C fill:#FDEBEB,stroke:#B91C1C
style D fill:#EEF2FA,stroke:#1E40AF
style E fill:#E6F5EE,stroke:#047857
style F fill:#EEF2FA,stroke:#1E40AF
style G fill:#E6F5EE,stroke:#047857
style H fill:#fafaf7,stroke:#5a5a5a
What "wire it" means concretely
- Add a model-tier guard to D⁴ — single guard template at
~/.openclaw/d4/registry/model-tier-enforcement.yaml. Already referenced in the SPEC; not implemented yet.
- Make the dispatch-bridge accept a
tier hint — caller says "tier=classifier" → bridge picks Haiku; "tier=execution" → Sonnet; "tier=strategy" → Opus.
- Default route is Sonnet when no tier is specified. Opus must be explicitly requested.
- D⁴ mental-model skills always Haiku — they're classifiers, not reasoners. 19× savings on every skill fire.
- Sub-agent dispatches default Sonnet — only escalate to Opus on explicit need.
Effort to wire: ~2 hours. Touch dispatch-bridge.py + skill-runner.py. Add one YAML guard. Test with a few prompts.
What about free models?
Yes — the Ollama adapter stub already exists. To make it real:
- Install Ollama on your Mac (one command):
brew install ollama
- Pull a model:
ollama pull llama3.3:70b (or smaller for speed)
- Flip the
ollama-local adapter to enabled: true in ~/.openclaw/state/dispatch-bridge/adapters.json
- Implement the 10-line
dispatch() method in ollama_local.py (subprocess ollama run llama3.3)
The D⁴ guard can route "low stakes" prompts here when weekly Max budget > 80% used.
Sample project: token breakdown
Walking through what this very session (the cost-overage planning we're in right now) cost, juncture by juncture. Real numbers.
Phases of this session (rough)
| Phase | Turns | Avg in / out | Est. cost (Opus 4.7) |
| 1. Bleed diagnosis + watchdog build | 12 | 8k / 3k | ~$8.40 |
| 2. D⁴ rebuild + spec + worker | 15 | 12k / 5k | ~$11.25 |
| 3. Dispatch bridge build + adapters | 8 | 10k / 4k | ~$8.40 |
| 4. Tailscale Funnel + Vercel env wiring | 10 | 6k / 2k | ~$5.70 |
| 5. Control Surface v3 / v4 iterations | 14 | 15k / 7k | ~$10.50 |
| 6. Custody + wedding overnight (Agents) | 3 + agents | +sub-Claude sessions | ~$5 |
| 7. Legal research compile + design | 8 | 14k / 6k | ~$8 |
| 8. THIS doc (you're reading it) | 1 | 35k / 10k | ~$4 |
| Session total (rough) | 71 | — | ~$61 |
Caveat: Max plan is flat $200/mo regardless of "list price" — these are the equivalent token-burn values. Anthropic uses them to compute your % of weekly window. So ~$61 list-equiv ≈ ~3% of weekly cap consumed just by my responses.
80/20 within the session
- The repeat-iteration tax — I built Control Surface v1, v2, v3, v4 instead of getting v1 right. That's ~$25 of waste from drift. The 2026-05-22 lesson "build-clean ≠ intent-match" cost me here.
- Long context overhead per turn — by turn 50, every prompt carries 50+ turns of history. ~10k tokens of just context replay = ~$0.15 per turn on Opus baseline.
- Agent dispatches in the overnight — 3 sub-Claude sessions, each with full context = ~$5 of pure overhead.
If we'd done this session on Sonnet 4.6 from the start
Same ~71 turns at Sonnet pricing ($3 in / $15 out): ~$12 total. 5× cheaper, ~95% as useful for this kind of work.
The structural fix: in Claude Code, switch the default model away from Opus 4.7 unless I'm doing architectural reasoning. Run /model claude-sonnet-4-6 in any session that's mostly execution. Switch to Opus only for hard thinking turns.
Staying on project — a feedback mechanism
You're right that the project system didn't keep you on rails. Here's a lightweight mechanism that would, with no claude subprocesses (lesson from 2026-05-24).
The problem
Mid-session, you pivoted from "cost overage planning" → "control surface design" → "Granola research" → "Mac IT watchdog" → "control surface again" → "where's that Vercel doc". Each pivot was a legitimate sub-thread, but together they consumed the session's budget on context-switching, not on the original problem.
The project framework was meant to gate this. It didn't — because there was no active feedback when you drifted. By the time we caught it, $1k+ was burned.
Design — "Context Drift Detector"
flowchart LR
A[New user prompt] --> B[UserPromptSubmit hook]
B --> C[Local JS classifier]
C --> D{Topic match
active project?}
D -->|Yes, >70% similarity| E[Pass through]
D -->|No, looks like a pivot| F[Inject system reminder]
F --> G["Reminder:
'This looks like a pivot from
active project X. Add as new project,
add as sub-task, or stay on X?'"]
G --> H[Claude sees reminder, asks you]
style C fill:#EEF2FA,stroke:#1E40AF
style F fill:#FDEEE3,stroke:#B0410C
style G fill:#FDEEE3,stroke:#B0410C
How it works
- Hook:
~/.openclaw/bin/hook-userprompt-drift-detector.sh — pure regex + keyword matching, no Claude exec (lesson 2026-05-24).
- Active project state: persisted at
~/.openclaw/state/active-project.json with current name + keyword fingerprint.
- Classifier logic: extract nouns/verbs from prompt, compute Jaccard-ish overlap with active project's keywords. >70% = on-topic. <70% = potential drift.
- Soft surface: hook injects a system reminder into Claude's context: "Heads up — this looks like a pivot from {active}. Add as new project, sub-task of {active}, or continue on {active}?"
- No interrupt: you keep typing if you want. The reminder shows up next to your prompt for me to acknowledge.
What's required to build it
- 1h Write the bash + small Python helper
- 30m Wire to UserPromptSubmit (replacing or alongside the D⁴ enqueue hook)
- 30m Add a
/project new {name} slash-command pattern that updates active-project.json
- 15m Add a one-line "you've used X% of weekly Max" to the same hook output
Critical: no claude exec in this hook. Per 2026-05-24 lesson, hooks fan out and bleed. Pure regex classifier only.
Max plan #2 — SOP
How to set up a second Max-200 plan on a separate macOS user, doubling your weekly throughput. Costs $200/mo additional.
Setup steps
- Create a second Anthropic account
Use a different email (e.g., your second Gmail or victor+claude2@thomasdigital.com). Sign up at claude.ai/login.
- Subscribe to Claude Max ($200/mo) on the new account
This account gets its own independent hourly + weekly window.
- Create a second macOS user
System Settings → Users & Groups → Add User. Name it openclaw2. Give it Administrator role. Reboot into that user once to initialize home dir.
- Install Claude Code on the second user
Run brew install claude (or download from anthropic.com). Sign in with the new account.
- Generate the OAuth token on the second user
Run claude setup-token as openclaw2. Copy the CLAUDE_CODE_OAUTH_TOKEN value it produces.
- Add second token to the bridge config
On the openclaw user's ~/.openclaw/.env, add CLAUDE_CODE_OAUTH_TOKEN_2=<token>.
- Register the second adapter
Add claude-cc-oauth-2 to ~/.openclaw/state/dispatch-bridge/adapters.json. Copy the existing claude-cc-oauth adapter file as claude_cc_oauth_2.py and have it read the new token env var.
- Implement round-robin in the bridge
Modify openclaw-dispatch-bridge.py adapter_for() to alternate between the two claude adapters when the model is Claude-family. Maintain a counter in state file.
- Add weekly-budget awareness
Bridge checks current % weekly used per account (via heuristic — Anthropic doesn't expose this API). When account 1 hits 80%, route everything to account 2 until reset. When both at 80%, fall back to Gemini.
- Telegram alerts at 70% threshold per account
watchdog sends "Account 1 at 70%, switching primary to Account 2" so you know.
Cost math
| Option | $/mo | Throughput | Risk |
| 1× Max-200 (current) | $200 | 1× weekly window | overage as seen |
| 2× Max-200, two users | $400 | 2× independent windows | safe headroom |
| 1× Max + tier discipline (no #2) | $200 | ~3–5× effective via Haiku/Sonnet routing | solves it if discipline holds |
| Teams 5-seat ($150/mo) | $150 | 5× chat windows | doesn't include Claude Code CLI auth |
Recommendation: do tier-discipline first (free, this week). If that's not enough by next week's reset, then buy Max #2. Don't pay $200/mo more before exhausting the cheap fixes.
Output design system v1
Formal spec so every Vercel doc looks like it came from the same place. This doc itself is the reference implementation.
Aesthetic principles
- White-first · background
#ffffff · ink #0a0a0a · no dark mode by default
- Splash colors used sparingly · color is meaning, not decoration · default to ink, accent only when signaling state
- Typographic hierarchy carries the structure · don't lean on color or boxes to organize
- Tabular numerics in JetBrains Mono · everything that's a number lines up
- One frame per artifact · use the same shell (eyebrow + Fraunces title + sub + tabs) so docs feel like a series
Color palette
●
Cobalt #1E40AF
links, active, info
●
Burnt #B0410C
eyebrow, warn, accent
●
Emerald #047857
ok, success state
●
Crimson #B91C1C
alert, draft, critical
●
Rule #e8e7e2
dividers, borders
Rule: body is always ink. Splash colors only on labels, tags, callouts, and chart bars. Never colored body text. Never colored headlines.
Typography
Fraunces — display
Used for h2 (section titles) and the page head
Inter — body & subheads
Body copy, h3 subheads, lists. Default everything not display or mono.
JetBrains Mono — labels & code
h4 eyebrows, table headers, tags, code, KPI labels. All-caps + letter-spacing for label use.
Component library
KPI cards — for headline numbers. Fraunces value, mono label.
22%
Sample KPI
small descriptor
Tags — for state/category. Mono, small, bordered.
default cobalt burnt emerald crimson
Pills — inline status. Smaller than tags.
ok info warn alert
Callouts — for emphasis. Left rule, soft tint.
Info callout · default cobalt left rule, soft cobalt bg
Warn callout · burnt left rule, soft burnt bg
OK callout · emerald left rule, soft emerald bg
Alert callout · crimson left rule, soft crimson bg
Tables — burnt header, mono numerics, rule dividers.
Bar viz — horizontal, labeled, single color per row.
Steps — numbered SOP with circular black step number.
Layout
- Max page width:
1080px
- Side padding:
28px desktop, 18px mobile
- Sticky top bar with eyebrow + title + tabs
- Tab nav for multi-section docs
- Footer with timestamp + source files (mono, small)
Going-forward rule
Every artifact deployed to Vercel by Steve uses this shell · same fonts · same palette · same eyebrow/title/tab pattern · same component library · so docs feel like a series, not haphazard one-offs.