Token Saving Techniques

CLI-focused techniques for Claude Code, Cursor Agent (agent), Codex CLI, and Gemini CLI. Output tokens cost 5× more than input across all Claude models (Sonnet 4.6: $3 in / $15 out per MTok). Thinking tokens bill as output. Codex and Gemini show 4×–8× asymmetry. Every technique here reduces one or both.

Claude Code
Cursor Agent CLI
Codex CLI
Gemini CLI
/slash commands
--flags & env vars
files & config paths
prompt examples
notes
🧹
Context Hygiene
01
Clear Context Between Tasks
All
Reset when switching tasks. Stale context re-sends every turn.
Claude Code
/clear
Cursor
/clear
Codex
/new
Gemini
/clear
Impact
Keep Your Rules File Lean
All
Your rules file is re-read on every turn. A 200-line file across a 20-turn session costs 4,000 lines of input. Keep it short — move workflows to skills, use scoped rules, and split by directory.
Claude Code
CLAUDE.md
Keep under 200 lines
Move workflows to skills
Cursor
.cursor/rules/*.mdc
Each .mdc targets specific
files (e.g. *.tsx) — skipped
unless you touch those files
Codex
AGENTS.md
Put one per directory —
child dirs see parent rules
plus their own additions
Gemini
GEMINI.md
@import to split files
Hierarchical inheritance
Impact
Surgical File References
All
Point to specific files — don't dump whole directories.
Claude Code
@file · /add-dir
Cursor
@file
Codex
/mention
--add-dir
Gemini
@file
Impact
Exclude Irrelevant Files
All
Stop the agent from reading noisy directories (node_modules, build output, logs). Each tool uses a config file to define what's excluded.
Claude Code
.claude/settings.json
permissions.deny list
Cursor
.cursorignore
Same syntax as .gitignore
Codex
~/.codex/config.toml
sandbox_permissions list
Gemini
.geminiignore
.gitignore also respected
Impact
Compact Before Hitting the Limit
All
Summarise conversation to free tokens. All four support manual + auto compaction.
Claude Code
/compact
Cursor
/compress
Codex
/compact
Gemini
/compress
Impact
Tame Command Output Bloat
All
Pipe through tail/head. Delegate verbose tasks to subagents. Paste only errors, not full logs.
Claude Code
| tail -30
Cursor
| tail -30
Codex
| tail -30
Gemini
| tail -30
Impact
Mind the Prompt Cache TTL
CC
Claude Code cached reads cost 10% of input price. 5-min default TTL. Cache miss = full re-read at full price. Docs →
Claude Code
⏱ 5-min TTL
Cache = 10% of input
Cursor
Managed internally
Codex
Responses API caching
Gemini
Managed internally
Impact
🔔 Enable Completion Sounds
All
Audio notifications free you from watching the terminal. Different sounds for "done" vs "needs approval". Prevents cache timeouts — you hear the bell, respond promptly, cache stays warm.
Claude Code
~/.claude.json
preferredNotifChannel
Set to "terminal_bell"
Cursor
No CLI config available
Codex
/statusline
Terminal bell on completion
Gemini
~/.gemini/settings.json
enableNotifications
Terminal bell on completion
Impact
🎯
Prompting Strategy
02
Use Plan Mode First
All
Read-only analysis before coding. Uses fewer tokens. Prevents trial-and-error loops.
Claude Code
/plan
Cursor
/plan
Codex
/plan
Gemini
/plan
Impact
Feed Precise Context
All
Front-load constraints and file paths. Vague prompts cause exploring — burning tokens to find what you already know.
Claude Code
"In src/auth.ts add JWT
using types/auth.d.ts"
Cursor
@src/auth.ts "Add JWT"
Codex
/mention files + prompt
Gemini
@file + specific prompt
Impact
Don't Paste Large Blobs
All
Pasting full docs, logs, images, or long stack traces floods your context and gets re-sent every turn. Instead: point to files by path, pipe logs through tail/head, paste only the relevant error, and use @file references for docs. Let the agent read what it needs.
Claude Code
As per title
Cursor
As per title
Codex
As per title
Gemini
As per title
Impact
Batch Related Changes
All
Combine related changes in one prompt. Each turn re-sends full context — fewer turns = fewer re-reads.
Claude Code
"Add validation, handler,
and tests for /users"
Cursor
Same — batch prompts
Codex
Same — batch prompts
Gemini
Same — batch prompts
Impact
Watch, Don't Interrupt
All
Each interruption re-ingests context. Queue feedback, deliver in one shot. Corrected twice and still wrong? Rewrite instructions, start fresh.
Claude Code
As per title
Cursor
As per title
Codex
As per title
Gemini
As per title
Impact
Use Ask / Read-Only Mode
All
Explore code without changes — cheaper than full agent mode.
Claude Code
/plan
Cursor
/ask
Codex
/plan
Gemini
/plan
Impact
Use /btw for Side Questions
CC
Ask a quick question without adding to conversation history. Uses parent cache so minimal cost.
Claude Code
/btw
Doesn't add to history
Cursor
No equivalent
Codex
No equivalent
Gemini
No equivalent
Impact
Inspect Before You Commit
Claude Code: /context · /cost   Cursor: /usage · Usage panel   Codex: /status · /statusline   Gemini: /stats · /stats model
These cost almost nothing — the insight saves entire conversations.
🔀
Model Routing
03

Use the right model for the job. All tools: /model to switch.

Flagship
Opus 4.6 · GPT-5.4
Gemini 3.1 Pro
Planning or complex reasoning
Balanced
Sonnet 4.6 · GPT-5.3-Codex
Gemini 3 Pro
Implementation
Fast / Light
Haiku 4.5 · GPT-5.1-Codex-mini
Gemini 3 Flash · Auto Mode
Subagents · cheap
🏗
Agent Architecture
04
Subagents = Fresh Context
All
Each subagent starts with clear context.
Claude Code
Use prompt text to ask to
run task in subagent
Cursor
Use prompt text to ask to
run task in subagent
Codex
Use prompt text to ask to
run task in subagent
Gemini
Use prompt text to ask to
run task in subagent
Impact
Disconnect Unused MCPs
All
Every MCP injects tool schemas into context. Disable what you're not using. All four support per-server tool filtering.
Claude Code
/mcp
Cursor
/mcp
Codex
/mcp
Gemini
/mcp
Impact
Prefer Native CLIs Over MCPs
All
git, npm, docker — run directly. No tool schema overhead.
Claude Code
As per title
Cursor
As per title
Codex
As per title
Gemini
As per title
Impact
Manage Plugins
All
Keep plugins to a minimum. Each one injects tool schemas into your context window every turn.
Claude Code
/plugins
Manage from here
Cursor
/plugins
Manage from here
Codex
/plugins
Manage from here
Gemini
No plugin system
Impact
Use Skills
All
Skills load on-demand instead of every turn. Move workflows into skills to keep base context small.
Claude Code
.claude/skills/*/SKILL.md
Cursor
.cursor/skills/
Codex
.codex/skills/*/SKILL.md
Gemini
.gemini/skills/
Impact
Reduce Thinking Effort
All
Thinking/reasoning tokens bill as output (5× input). Lower effort for simple tasks.
Claude Code
/effort
Menu to set low/med/high/max
Cursor
/max-mode [on|off]
Codex
/model
Menu to pick model + effort
Gemini
/model manage
Menu to pick model + effort
Impact
Automate With Hooks
All
Hooks run shell commands on agent events — block expensive tool calls, filter verbose output, trigger early compaction. All four CLIs support them.
Claude Code
.claude/settings.json
PreCompact · PreToolUse
30+ events · 4 hook types
Cursor
.cursor/hooks.json
beforeShellExecution
beforeReadFile · afterFileEdit
Codex
.codex/hooks.json
PreToolUse · PostToolUse
5 events · command type
Gemini
.gemini/settings.json
PreCompress · BeforeTool
11 events · migrate from CC
Impact
💰
Cost & Limit Management
05
Persist Decisions & Learnings
All
Without this, next session re-derives everything from scratch.
Claude Code
CLAUDE.md
/memory
Cursor
.cursor/rules/
Codex
AGENTS.md
Gemini
GEMINI.md
/memory add
Impact
Track Spend Per Session
All
Know your burn rate. Each tool has a built-in cost inspector.
Claude Code
/cost
Cursor
/usage
Codex
/status · /statusline
Gemini
/stats · /footer
/footer shows token info
in prompt
Impact
Cap Context Window & Thinking Budget
CC
Adaptive thinking burns output tokens at 5× input price. Tune context window and thinking budget via env vars. Docs →
Claude Code
CLAUDE_CODE_DISABLE_1M_CONTEXT=1
CLAUDE_CODE_AUTO_COMPACT_WINDOW=50000
CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1
MAX_THINKING_TOKENS=<n>
Add to .zshrc / .bashrc
Cursor
No equivalent
Codex
No equivalent
Gemini
No equivalent
Impact
Resume Instead of Rebuilding
All
Resume a previous session with full transcript instead of re-explaining from scratch.
Claude Code
/resume
Cursor
/resume
Codex
/resume
Gemini
/resume
Impact
Schedule Around Peak Hours
CC · CU
Rate limits can be tighter during peak hours. Use schedules and automations to run batch work off-peak.
Claude Code
/schedule
Use schedules
Cursor
Use automations
Codex
No equivalent
Gemini
No equivalent
Impact
🧠
The Meta-Rule
Every token is re-read every turn. A 50-line rules file across a 20-turn session is read 20 times — 1,000 lines from one file. Context isn't cost-per-message, it's cost × conversation length. Keep it tight, clear often, persist decisions, use the cheapest mode that works.
Quick Command Reference
Claude Code
/clear · /compact · /cost
/context · /plan · /effort
/model · /memory · /resume
permissions.deny
Shift+Tab
hooks
CLAUDE_CODE_*
Cursor Agent CLI
/clear · /plan · /ask
/model · /compress · /resume
@file · & msg → cloud
.cursor/rules/ · .cursorignore
/mcp · /usage
hooks.json
Codex CLI
/new · /compact · /status
/model · /statusline · /diff
/mention · /agent · /review
AGENTS.md
!cmd · Tab
hooks.json
Gemini CLI
/clear · /compress · /stats
/plan · /model · /memory
/mcp · /agents · @file
GEMINI.md · .geminiignore
!cmd · /resume
hooks