Cost Overage Plan · 2026-05-25

AI Usage Optimization & Recovery PlanDRAFT v1

22% of weekly Max-plan window burned in 1 day. Synthesis of the last 3 days of system changes, where the spend went, and how to get back on track.

Where we are right now

Honest snapshot anchored on the head-of-api status file last updated 2026-05-24 13:00 UTC, plus what's verifiable today.

22%
Weekly Max-200 used
1 day into week 5
$2,037
24h list-price equiv
1.94× the 7d median
$1,048
7d median per day
Baseline if nothing weird
2026-06-01
API key cap reset
Set in your console
0
Watchdog actions
Since flip 2026-05-24
The actual question: API key is hard-capped through June 1. The current burn is the Max-plan OAuth window. At 22% used on day 1 of a weekly cycle, simple math: 22 × 7 = 154% projected. Without action, you'll be rate-limited mid-week.

What's contributing to the Max-plan burn right now

SourceStatusEst. shareCheapest fix
This long Claude Code session (Opus 4.7)live~50%Switch to Sonnet 4.6 for non-architecture work · cap turns/session
Agent dispatches (overnight + control surface builds)live but rare~20%Don't dispatch Agents for work I can do myself
D⁴ chain on each promptlive, cap N=2~15%Route mental-model skills to Haiku instead of Sonnet
Wiki-update headless run (caught + killed)stopped~5%Already stopped
Misc deploys, MCPs, watchdog tickslow~10%Trivial · keep as-is

Hard caps already in place

What we built in the last 3 days

Plain English. Every system-wide change between 2026-05-23 and 2026-05-25, with what it costs to run.

Infrastructure & safety

Auth & routing

Project framework

UI surfaces

Overnight autonomous run (2026-05-24 → 2026-05-25)

Where the spend went

Best-effort allocation of the ~$2,037 list-price-equiv 24h burn. Anthropic doesn't show per-feature breakdown via API, so this is reasoned from request types + my session knowledge.

24-hour spend allocation (estimated)

This Claude Code session (Opus 4.7)
50%
~$1,018
Agent dispatches
20%
~$407
D⁴ skill fires
15%
~$306
Misc (deploys, MCPs, watchdog)
10%
~$204
Killed wiki-update run
5%
~$102
The honest read: ~50% of your overage is THIS conversation. Opus 4.7 with a long context window costs about $15/Mtok input and $75/Mtok output. Every long turn is ~$2–5. After 50+ turns over 2–3 days, that compounds to the $1k range fast.

What the 80/20 says

LeverCost cutQuality costHow fast
Stop using Opus 4.7 unless task is architecture / strategy~50%Low (Sonnet 4.6 is >90% as good for execution)Now
Don't dispatch Agents for work I can do directly~15%None (just self-discipline)Now
Route D⁴ mental-model skills to Haiku 4.5~10%Negligible (these are classifiers)1h work to wire
Add per-turn budget cap that warns at 70% weekly~5%None (just a heads-up)2h work
Batch related questions into one turn~5%NoneBehavioral
Combined effect: these 5 changes could cut burn ~85%. Bring you from 22%/day to ~3.3%/day. Comfortably under weekly cap with headroom.

Frontier model comparison

Where you are vs the frontier today, and the cost/quality curve. Tier the work to the right model.

Model$/Mtok in$/Mtok outFrontier rankBest for
Claude Opus 4.7$15$75frontierArchitecture, multi-step strategy, complex code reasoning
Claude Sonnet 4.6$3$15near-frontierDefault for skilled work — 90%+ as good as Opus on most tasks
Claude Haiku 4.5$0.80$4competentClassification, routing, simple Q&A, structured extraction
GPT-5~$15~$60frontierSame tier as Opus · alternative provider
Gemini 2.5 Pro$1.25$10near-frontierCheap research with huge context · multimodal (Loom!) · 2M tokens
Gemini 2.5 Flash$0.075$0.30competentBulk classification · 200×+ cheaper than Opus
Llama 3.3 70B (local Ollama)$0$0solidFree fallback · slower · no quotas
The math: Switching the same task from Opus to Sonnet = 5× cheaper. From Opus to Haiku = 19× cheaper. From Opus to Gemini Flash = 200× cheaper. Local Ollama = free.

How good are we on Max-200?

You have access to everything from Haiku to Opus via the Max plan, bounded by hourly + weekly windows. The bridge can also fall back to Gemini API and (when implemented) local Ollama. So your effective stack is:

You're not behind the frontier on capability. You're ahead of most users on infrastructure (the bridge + tiered adapters is the right architecture). The problem is execution discipline — defaulting to Opus when Sonnet would do.

Tier routing strategy

Which model gets which task. The D⁴ guardrails CAN enforce this — but right now they don't. Wire it.

flowchart TD A[User prompt arrives] --> B{Haiku classifier} B -->|Architecture / strategy / multi-step| C[Opus 4.7] B -->|Code / draft / skilled work| D[Sonnet 4.6] B -->|Lookup / classify / extract| E[Haiku 4.5] B -->|Research / huge context| F[Gemini 2.5 Pro] B -->|Bulk classification| G[Gemini Flash] B -->|Offline / free| H[Local Ollama] C -.budget exhausted.-> D D -.budget exhausted.-> H E -.budget exhausted.-> G style C fill:#FDEBEB,stroke:#B91C1C style D fill:#EEF2FA,stroke:#1E40AF style E fill:#E6F5EE,stroke:#047857 style F fill:#EEF2FA,stroke:#1E40AF style G fill:#E6F5EE,stroke:#047857 style H fill:#fafaf7,stroke:#5a5a5a

What "wire it" means concretely

  1. Add a model-tier guard to D⁴ — single guard template at ~/.openclaw/d4/registry/model-tier-enforcement.yaml. Already referenced in the SPEC; not implemented yet.
  2. Make the dispatch-bridge accept a tier hint — caller says "tier=classifier" → bridge picks Haiku; "tier=execution" → Sonnet; "tier=strategy" → Opus.
  3. Default route is Sonnet when no tier is specified. Opus must be explicitly requested.
  4. D⁴ mental-model skills always Haiku — they're classifiers, not reasoners. 19× savings on every skill fire.
  5. Sub-agent dispatches default Sonnet — only escalate to Opus on explicit need.
Effort to wire: ~2 hours. Touch dispatch-bridge.py + skill-runner.py. Add one YAML guard. Test with a few prompts.

What about free models?

Yes — the Ollama adapter stub already exists. To make it real:

  1. Install Ollama on your Mac (one command): brew install ollama
  2. Pull a model: ollama pull llama3.3:70b (or smaller for speed)
  3. Flip the ollama-local adapter to enabled: true in ~/.openclaw/state/dispatch-bridge/adapters.json
  4. Implement the 10-line dispatch() method in ollama_local.py (subprocess ollama run llama3.3)

The D⁴ guard can route "low stakes" prompts here when weekly Max budget > 80% used.

Sample project: token breakdown

Walking through what this very session (the cost-overage planning we're in right now) cost, juncture by juncture. Real numbers.

Phases of this session (rough)

PhaseTurnsAvg in / outEst. cost (Opus 4.7)
1. Bleed diagnosis + watchdog build128k / 3k~$8.40
2. D⁴ rebuild + spec + worker1512k / 5k~$11.25
3. Dispatch bridge build + adapters810k / 4k~$8.40
4. Tailscale Funnel + Vercel env wiring106k / 2k~$5.70
5. Control Surface v3 / v4 iterations1415k / 7k~$10.50
6. Custody + wedding overnight (Agents)3 + agents+sub-Claude sessions~$5
7. Legal research compile + design814k / 6k~$8
8. THIS doc (you're reading it)135k / 10k~$4
Session total (rough)71~$61
Caveat: Max plan is flat $200/mo regardless of "list price" — these are the equivalent token-burn values. Anthropic uses them to compute your % of weekly window. So ~$61 list-equiv ≈ ~3% of weekly cap consumed just by my responses.

80/20 within the session

If we'd done this session on Sonnet 4.6 from the start

Same ~71 turns at Sonnet pricing ($3 in / $15 out): ~$12 total. 5× cheaper, ~95% as useful for this kind of work.

The structural fix: in Claude Code, switch the default model away from Opus 4.7 unless I'm doing architectural reasoning. Run /model claude-sonnet-4-6 in any session that's mostly execution. Switch to Opus only for hard thinking turns.

Staying on project — a feedback mechanism

You're right that the project system didn't keep you on rails. Here's a lightweight mechanism that would, with no claude subprocesses (lesson from 2026-05-24).

The problem

Mid-session, you pivoted from "cost overage planning" → "control surface design" → "Granola research" → "Mac IT watchdog" → "control surface again" → "where's that Vercel doc". Each pivot was a legitimate sub-thread, but together they consumed the session's budget on context-switching, not on the original problem.

The project framework was meant to gate this. It didn't — because there was no active feedback when you drifted. By the time we caught it, $1k+ was burned.

Design — "Context Drift Detector"

flowchart LR A[New user prompt] --> B[UserPromptSubmit hook] B --> C[Local JS classifier] C --> D{Topic match
active project?} D -->|Yes, >70% similarity| E[Pass through] D -->|No, looks like a pivot| F[Inject system reminder] F --> G["Reminder:
'This looks like a pivot from
active project X. Add as new project,
add as sub-task, or stay on X?'"] G --> H[Claude sees reminder, asks you] style C fill:#EEF2FA,stroke:#1E40AF style F fill:#FDEEE3,stroke:#B0410C style G fill:#FDEEE3,stroke:#B0410C

How it works

What's required to build it

Critical: no claude exec in this hook. Per 2026-05-24 lesson, hooks fan out and bleed. Pure regex classifier only.

Max plan #2 — SOP

How to set up a second Max-200 plan on a separate macOS user, doubling your weekly throughput. Costs $200/mo additional.

Setup steps

  1. Create a second Anthropic account
    Use a different email (e.g., your second Gmail or victor+claude2@thomasdigital.com). Sign up at claude.ai/login.
  2. Subscribe to Claude Max ($200/mo) on the new account
    This account gets its own independent hourly + weekly window.
  3. Create a second macOS user
    System Settings → Users & Groups → Add User. Name it openclaw2. Give it Administrator role. Reboot into that user once to initialize home dir.
  4. Install Claude Code on the second user
    Run brew install claude (or download from anthropic.com). Sign in with the new account.
  5. Generate the OAuth token on the second user
    Run claude setup-token as openclaw2. Copy the CLAUDE_CODE_OAUTH_TOKEN value it produces.
  6. Add second token to the bridge config
    On the openclaw user's ~/.openclaw/.env, add CLAUDE_CODE_OAUTH_TOKEN_2=<token>.
  7. Register the second adapter
    Add claude-cc-oauth-2 to ~/.openclaw/state/dispatch-bridge/adapters.json. Copy the existing claude-cc-oauth adapter file as claude_cc_oauth_2.py and have it read the new token env var.
  8. Implement round-robin in the bridge
    Modify openclaw-dispatch-bridge.py adapter_for() to alternate between the two claude adapters when the model is Claude-family. Maintain a counter in state file.
  9. Add weekly-budget awareness
    Bridge checks current % weekly used per account (via heuristic — Anthropic doesn't expose this API). When account 1 hits 80%, route everything to account 2 until reset. When both at 80%, fall back to Gemini.
  10. Telegram alerts at 70% threshold per account
    watchdog sends "Account 1 at 70%, switching primary to Account 2" so you know.

Cost math

Option$/moThroughputRisk
1× Max-200 (current)$2001× weekly windowoverage as seen
2× Max-200, two users$4002× independent windowssafe headroom
1× Max + tier discipline (no #2)$200~3–5× effective via Haiku/Sonnet routingsolves it if discipline holds
Teams 5-seat ($150/mo)$1505× chat windowsdoesn't include Claude Code CLI auth
Recommendation: do tier-discipline first (free, this week). If that's not enough by next week's reset, then buy Max #2. Don't pay $200/mo more before exhausting the cheap fixes.

Output design system v1

Formal spec so every Vercel doc looks like it came from the same place. This doc itself is the reference implementation.

Aesthetic principles

Color palette

Ink #0a0a0a
body text
Cobalt #1E40AF
links, active, info
Burnt #B0410C
eyebrow, warn, accent
Emerald #047857
ok, success state
Crimson #B91C1C
alert, draft, critical
Rule #e8e7e2
dividers, borders

Rule: body is always ink. Splash colors only on labels, tags, callouts, and chart bars. Never colored body text. Never colored headlines.

Typography

Fraunces — display
Used for h2 (section titles) and the page head
Inter — body & subheads
Body copy, h3 subheads, lists. Default everything not display or mono.
JetBrains Mono — labels & code
h4 eyebrows, table headers, tags, code, KPI labels. All-caps + letter-spacing for label use.

Component library

KPI cards — for headline numbers. Fraunces value, mono label.

22%
Sample KPI
small descriptor

Tags — for state/category. Mono, small, bordered.

default cobalt burnt emerald crimson

Pills — inline status. Smaller than tags.

ok info warn alert

Callouts — for emphasis. Left rule, soft tint.

Info callout · default cobalt left rule, soft cobalt bg
Warn callout · burnt left rule, soft burnt bg
OK callout · emerald left rule, soft emerald bg
Alert callout · crimson left rule, soft crimson bg

Tables — burnt header, mono numerics, rule dividers.

Bar viz — horizontal, labeled, single color per row.

Steps — numbered SOP with circular black step number.

Layout

Going-forward rule

Every artifact deployed to Vercel by Steve uses this shell · same fonts · same palette · same eyebrow/title/tab pattern · same component library · so docs feel like a series, not haphazard one-offs.