AI Cost Control for SaaS — How to Avoid the OpenAI Bill Shock
AI features can blow up gross margin if you don't engineer cost from day one. The five patterns that decide whether AI helps or hurts your P&L.
AI Cost Control for SaaS
Every SaaS founder I work with eventually asks the same question: "How do I keep AI from eating my gross margin?"
Five patterns separate teams that have AI-cost discipline from teams that don't.
1. Tier routing by task difficulty
Not every task needs GPT-4o or Claude Sonnet. Most don't.
- Cheap tier (Haiku, GPT-4o-mini, Gemini Flash): classification, formatting, simple extraction. $0.001–0.005 per call.
- Mid tier (GPT-4o, Sonnet): general user-facing chat, RAG synthesis. $0.01–0.05 per call.
- Reasoning tier (o1, o3, Opus): hard problems that need actual thought. $0.10–1.00 per call.
Result: 50–80% cost reduction on most features.
2. Prompt caching aggressively
Anthropic and OpenAI both offer prompt caching. If your prompt has any repeated prefix — system prompt, RAG context, few-shot examples — cache it.
Anthropic gives you 90% off cached tokens. For RAG-heavy use cases, this is a 3–5x cost reduction immediately.
Pattern: put the variable user input at the END of the prompt; everything before it can be cached.
3. Per-user rate limits with budgets
Every feature has a per-user rate limit AND a per-user monthly budget. Both are enforced server-side.
When a user hits their limit, the feature gracefully tells them. They don't break your bank.
Default budgets I use:
- Free tier: $0.50/user/month
- Paid tier: $5–25/user/month depending on plan
- Enterprise: negotiated, with overage billing
4. Eval-driven model swapping
Without an eval harness, you can't safely move to cheaper models. With one, you can.
Build the eval. Run quarterly: "Can we move this feature to a cheaper model and stay within quality bar?"
Often yes. The market keeps shipping cheaper-equivalent models.
5. RAG over fine-tuning, fine-tuning over big-context
For knowledge-heavy use cases, fight the urge to dump everything into context. RAG is cheaper at scale.
For format/style use cases, fine-tune a small model. Cheaper than calling Sonnet for every request.
The math founders miss
If your AI feature costs $0.05/use × 10 uses/user/day × 30 days = $15/user/month.
If your SaaS is $20/month, you have $5 left for everything else (servers, support, marketing, profit). That's tight.
If your AI feature costs $0.10/use, you're losing money.
Calculate this before launching the feature. Most founders don't.
Live patterns from my products
- Maeum Campfire: prompt caching for character context. 4x cost reduction.
- MAEUM Learn: tier routing across 10 LLM providers. Free tier sustainable at $0/user.
- GC OS: cheap tier for routine extraction, Sonnet for complex AI assistant queries.
What I build into every AI integration
Every AI feature I ship includes:
- Per-user rate limit + monthly budget
- Tier routing logic
- Prompt caching where applicable
- Cost dashboard (per feature, per user)
- Eval harness for quality regression detection
Email [email protected] to scope yours.