How AI token pricing works
AI API pricing is usually split into input tokens, output tokens, cached input tokens and, in some stacks, reasoning or tool-related usage. The real commercial question is not just cost per million tokens. It is cost per request, cost per user, cost per month and whether the feature still carries an acceptable gross margin.
This calculator translates raw token prices into business numbers. That matters because most teams do not budget in token units. They budget in support tickets handled, user sessions, monthly active users and subscription revenue.
Core formulas
Input cost = (input tokens / 1,000,000) × input price
Output cost = (output tokens / 1,000,000) × output price
Cached cost = (cached tokens / 1,000,000) × cached price
Reasoning cost = (reasoning tokens / 1,000,000) × reasoning price
Cost per request = input + output + cached + reasoning
Monthly cost = cost per request × monthly requests
Cost per user = monthly cost ÷ monthly active users
Minimum viable plan price = cost per user ÷ (1 − target gross margin)
Provider billing models differ. This calculator is designed for planning and commercial sizing, not invoice reconciliation.
Why output tokens often matter more
| Cost driver | Why it rises | Typical effect | Best response |
| Input tokens | Long prompts, system instructions, tool context | Steady request inflation | Compress prompts, trim context |
| Output tokens | Verbose answers or long generations | Often strongest unit cost driver | Set output caps, tighten style |
| Cached input | Reusable prompt blocks | Can materially cut repeated costs | Use prompt caching where available |
| RAG context | Too many or too large chunks | Prompt cost spikes | Better retrieval and chunk sizing |
| Retries | Timeouts, bad orchestration, second-pass calls | Silent cost drag | Track retry rate and tool quality |
Frequently Asked Questions
What is a token in AI pricing?+
A token is a chunk of text used for billing and model processing. The exact token count depends on the model tokenizer, but for cost planning you can treat tokens as the billable unit for prompts and responses.
Why is output often more expensive than input?+
Many providers price output tokens above input tokens because generation is the more expensive stage. That means long answers can push unit economics off target even when prompts are controlled.
How does caching lower cost?+
When a provider supports cached input pricing, repeated prompt sections can be billed more cheaply than normal input. This matters for shared system prompts, repeated tools and repeated reference context.
Why does RAG increase spend so quickly?+
Because each retrieved chunk adds tokens to the prompt. If chunk count or chunk size is loose, input cost rises sharply even before the model generates a response.
What should I track for SaaS pricing?+
Track cost per user, cost per session, retry rate, cache hit rate and minimum viable plan price at your target margin. These metrics are more useful than raw token totals in isolation.
Can I rely on built-in model presets forever?+
No. Provider pricing changes, batch discounts vary and enterprise contracts differ. Presets are a starting point, so manual override should stay available.