AI Token Cost Calculator
Estimate cost per request, per user and per month, compare models, measure cached token savings and price your AI feature with margin
Currency Scenario preset Popular pricing
Model pricing
$
Prompt tokens and non-cached request input.
$
Response tokens generated by the model.
$
Prompt caching or repeated reusable context.
$
Optional separate reasoning token billing.
Single request inputs
tok
Raw input excluding cached tokens if split separately.
tok
Answer length or model completion tokens.
tok
Context re-used at cached pricing.
tok
Leave at 0 where not billed separately.
Monthly usage inputs
tok
Average prompt size per request.
tok
Average answer size.
req
Traffic volume per day.
day
Used for monthly and yearly projection.
Chatbot inputs
msg
Turns per conversation session.
ses
Average engagement level.
MAU
Projected active user base.
tok
User message plus prompt overhead.
tok
Average answer length per turn.
%
Reusable system prompt and repeated context.
%
Failed or repeated requests.
$
Used for gross margin estimate.
RAG / long-context inputs
tok
Persistent orchestration prompt.
pcs
Number of documents injected.
tok
Average retrieval chunk length.
tok
Question or query length.
tok
Typical RAG answer size.
req
RAG traffic level.
%
Prompt share priced as cached input.
%
Commercial markup for pricing your feature.
Compare models
A
B
C
tok
tok
tok
req
Business controls
%
Optional discounted processing mode.
%
Used for minimum viable plan price.
usr
Scenario for cost-at-scale card.
req
Used to estimate per-user monthly AI cost.
Cost per Request
base unit cost
Cost per 1,000
usage multiple
Monthly Cost
projected spend
Cost per User
monthly active user view
Minimum Plan Price
for target margin
Cost profile
Current Scenario
monthly total
Optimization
Savings Signal
cache and model signal
Commercial Summary
Monthly Cost by Request Volume
Monthly cost
Baseline
Spend Split by Token Type
Input
Output
Cached
Reasoning
Model Comparison
Monthly cost
Model comparison table
Model Input / 1M Output / 1M Cached / 1M Cost / request Monthly cost
Usage sensitivity
Volume Monthly cost Annual cost Cost / user Margin at plan
AI cost breakdown
Input cost per request
Output cost per request
Cached cost per request
Reasoning cost per request
Total cost per request
Daily cost
Monthly cost
Annual cost
Cost per user per month
Cost at scale
Gross margin
Minimum viable plan price
✦ Cal, AI Cost Analysis

How AI token pricing works

AI API pricing is usually split into input tokens, output tokens, cached input tokens and, in some stacks, reasoning or tool-related usage. The real commercial question is not just cost per million tokens. It is cost per request, cost per user, cost per month and whether the feature still carries an acceptable gross margin.

This calculator translates raw token prices into business numbers. That matters because most teams do not budget in token units. They budget in support tickets handled, user sessions, monthly active users and subscription revenue.

Core formulas

Input cost = (input tokens / 1,000,000) × input price

Output cost = (output tokens / 1,000,000) × output price

Cached cost = (cached tokens / 1,000,000) × cached price

Reasoning cost = (reasoning tokens / 1,000,000) × reasoning price

Cost per request = input + output + cached + reasoning

Monthly cost = cost per request × monthly requests

Cost per user = monthly cost ÷ monthly active users

Minimum viable plan price = cost per user ÷ (1 − target gross margin)
Provider billing models differ. This calculator is designed for planning and commercial sizing, not invoice reconciliation.

Why output tokens often matter more

Cost driverWhy it risesTypical effectBest response
Input tokensLong prompts, system instructions, tool contextSteady request inflationCompress prompts, trim context
Output tokensVerbose answers or long generationsOften strongest unit cost driverSet output caps, tighten style
Cached inputReusable prompt blocksCan materially cut repeated costsUse prompt caching where available
RAG contextToo many or too large chunksPrompt cost spikesBetter retrieval and chunk sizing
RetriesTimeouts, bad orchestration, second-pass callsSilent cost dragTrack retry rate and tool quality

Frequently Asked Questions

What is a token in AI pricing?+
A token is a chunk of text used for billing and model processing. The exact token count depends on the model tokenizer, but for cost planning you can treat tokens as the billable unit for prompts and responses.
Why is output often more expensive than input?+
Many providers price output tokens above input tokens because generation is the more expensive stage. That means long answers can push unit economics off target even when prompts are controlled.
How does caching lower cost?+
When a provider supports cached input pricing, repeated prompt sections can be billed more cheaply than normal input. This matters for shared system prompts, repeated tools and repeated reference context.
Why does RAG increase spend so quickly?+
Because each retrieved chunk adds tokens to the prompt. If chunk count or chunk size is loose, input cost rises sharply even before the model generates a response.
What should I track for SaaS pricing?+
Track cost per user, cost per session, retry rate, cache hit rate and minimum viable plan price at your target margin. These metrics are more useful than raw token totals in isolation.
Can I rely on built-in model presets forever?+
No. Provider pricing changes, batch discounts vary and enterprise contracts differ. Presets are a starting point, so manual override should stay available.