Education series · Week

Week 9 — Using local LLMs to replace API spend

$211 to $19 in one routing rule — same workflows, no quality loss.

📹 Video shipping soon

Drafted + scripted. Recording in queue. Subscribe to the Founder Plan to be notified the moment it lands.

What you'll walk away with

A scene-by-scene walkthrough of every beat in the video. Plain English. Real numbers. Real workflow.

The walkthrough

Week 9 — The bill that breaks every AI hobby project.
This is what happens when every task hits the cloud.
Two questions. Is it novel? Does it need fresh facts? If no, it stays local.
Six task types that NEVER need the cloud.
Four task types that DO need the cloud — for now.
Install Ollama. Three steps. Free.
Pull two models. ~5GB each. One time.
It works. Locally. No internet.
Same API as OpenAI. Drop-in replacement.
Local Model Dojo benchmarks 109 models per task per GPU.
Your hardware. Your task. The one model that wins.
The Router. One file. Forty lines.
Step one: try local.
Step two: gate on confidence.
Step three: budget check before any cloud call.
Step four: Claude. Haiku for fast, Sonnet for hard.
Step five: log every call. Path. Model. Tokens. Time. Cost.
Save this matrix. It is the entire decision tree.
Workflow one: contact form intent. 1000 calls.
Cloud-only: a buck ten. Hybrid: six tenths of a cent. 99% reduction.
Workflow two: ticket summaries. 5000 tickets.
Workflow three: cross-doc reasoning. This one stays cloud.
Three places local genuinely loses today: long context, cutting-edge reasoning, frontier coding.
90/10 is the target ratio. Track it weekly.
Your log table is your dashboard. Pivot it weekly.
Embeddings local. Forever. Always.
RAG flips the math. The retrieval is local. The answer can be either.
Cache identical prompts for an hour. Free re-runs.
Failure mode one: malformed local output. Always validate.
Failure mode two: Ollama down. Health-check before route.
Failure mode three: GPU OOM. Concurrency cap.
Real bill. Same workflows. 91% reduction.
LMD picks the model. You write the routing. Done in an afternoon.
Founder pricing. $24/mo locked for life. Code LAUNCH30.
Four bugs that bite hybrid setups. Save this clip.
Week 10: take this stack to production. Domain. HTTPS. Real traffic.
localmodeldojo.com — code LAUNCH30.
Education series. Module 9 of 12. Subscribe.
(silent)

Cut your AI bill 90%+

I built Local Model Dojo so the same kind of bot work that costs $100/mo on the Claude or OpenAI API runs on hardware you already own. Founder pricing locks in for the life of your account.

See Founder pricing → Code LAUNCH30 · 30-day window

← Week 8 — Automation flows (n8n + your own site) Week 10 — Production deploy (domain, DNS, HTTPS) →

— Jake Morris · Oklahoma veteran · localmodeldojo.com