Skip to content

Plan: Kimi K2.6 Subagent Setup

Goal

Reduce Claude Code token consumption by delegating I/O-heavy and boilerplate tasks to Kimi K2.6 — a cheap (~1/100th the cost), OpenAI-compatible model with a 262k context window. Claude handles reasoning; Kimi handles bulk reads and simple generation.

Two CLI Scripts

scripts/ask-kimi

Routes bulk file reading and summarization to Kimi. Instead of Claude consuming thousands of tokens reading large files directly, it calls this script and gets back a focused summary.

Usage (from Claude via Bash):

python scripts/ask-kimi.py "Summarize the scheduler jobs in scheduler/jobs.py"
python scripts/ask-kimi.py "What columns does the PortfolioHolding model have?" --files db/models.py
python scripts/ask-kimi.py "Find all places that call build_features()" --files models/predictor.py models/trend_analyzer.py

scripts/kimi-write

Generates boilerplate — test scaffolding, translation strings, config entries, migration stubs. Claude reviews and refines the output rather than generating from scratch.

Usage:

python scripts/kimi-write.py "Add Korean translation strings for the new Monte Carlo page" --context dashboard/translations.py
python scripts/kimi-write.py "Write a pytest scaffold for portfolio/parser.py"

API Details

  • Base URL: https://api.moonshot.ai/v1 (OpenAI-compatible)
  • Auth: MOONSHOT_API_KEY env var
  • Model: kimi-k2.6
  • Context: 262k tokens
  • Pricing: $0.95/1M input, $4.00/1M output (cache hit: $0.16/1M input)

CLAUDE.md Routing Rules

Add a delegation section to CLAUDE.md that tells Claude when to use each script:

## Subagent Delegation (Token Efficiency)

Delegate to `ask-kimi` when:
- Reading files > 300 lines to answer a factual question
- Searching across multiple large files for a symbol or pattern
- Summarizing a module's current state before making changes

Delegate to `kimi-write` when:
- Generating test file scaffolding
- Adding translation string entries (en + ko pairs)
- Writing config boilerplate (new ticker batch, new sector keywords)
- Drafting docstrings or inline comments

Use Claude directly for:
- All architectural decisions
- Debugging logic errors
- Any task requiring cross-file reasoning or judgment
- Code that touches FEATURE_ORDER, scheduler timing, or DB schema

Implementation Checklist

  • [ ] Add MOONSHOT_API_KEY to .env
  • [ ] Write scripts/ask-kimi.py (~60 lines, OpenAI SDK, stdin or --files args)
  • [ ] Write scripts/kimi-write.py (~60 lines, OpenAI SDK, task + optional context file)
  • [ ] Add routing rules to CLAUDE.md
  • [ ] Test ask-kimi on a large file (frontend/src/pages/Overview.tsx or similar)
  • [ ] Test kimi-write on a translation string task
  • [ ] Verify Kimi handles Korean text correctly in both scripts

Notes

  • Both scripts use the standard openai Python SDK pointed at api.moonshot.ai/v1 — no new SDK needed beyond what may already be installed
  • Keep scripts simple and stateless — no session management, no streaming required
  • If Kimi quality is insufficient for a task, fallback is just not using the script — Claude reads the file directly as normal