AI Token Cost Calculator | API Cost, Token Pricing, Monthly Spend & Model Comparison

Q: How does caching lower cost?

Repeated prompt sections can be billed more cheaply than normal input where cached pricing exists.

Q: Why does RAG increase spend so quickly?

Retrieved chunks add tokens to the prompt, which raises input cost before any answer is generated.

Q: Can I rely on built-in model presets forever?

No. Provider pricing changes and enterprise contracts differ, so manual override should stay available.

AI Token Cost Calculator
Estimate cost per request, per user and per month, compare models, measure cached token savings and price your AI feature with margin

Currency Scenario preset Popular pricing

Model pricing

Input price per 1M

Prompt tokens and non-cached request input.

Output price per 1M

Response tokens generated by the model.

Cached input price per 1M

Prompt caching or repeated reusable context.

Reasoning price per 1M

Optional separate reasoning token billing.

Single request inputs

Input tokens

tok

Raw input excluding cached tokens if split separately.

Output tokens

tok

Answer length or model completion tokens.

Cached tokens

tok

Context re-used at cached pricing.

Reasoning tokens

tok

Leave at 0 where not billed separately.

Monthly usage inputs

Input tokens per request

tok

Average prompt size per request.

Output tokens per request

tok

Average answer size.

Requests per day

req

Traffic volume per day.

Days per month

day

Used for monthly and yearly projection.

Chatbot inputs

Messages per session

msg

Turns per conversation session.

Sessions per user per month

ses

Average engagement level.

Monthly active users

MAU

Projected active user base.

Input tokens per turn

tok

User message plus prompt overhead.

Output tokens per turn

tok

Average answer length per turn.

Cached prompt share %

Reusable system prompt and repeated context.

Retry rate %

Failed or repeated requests.

SaaS price per user / month

Used for gross margin estimate.

RAG / long-context inputs

System prompt tokens

tok

Persistent orchestration prompt.

Retrieved chunks

pcs

Number of documents injected.

Tokens per chunk

tok

Average retrieval chunk length.

User query tokens

tok

Question or query length.

Output tokens

tok

Typical RAG answer size.

Requests per day

req

RAG traffic level.

Cache hit rate %

Prompt share priced as cached input.

Markup %

Commercial markup for pricing your feature.

Compare models

Model A

Model B

Model C

Input tokens per request

tok

Output tokens per request

tok

Cached tokens

tok

Monthly requests

req

Business controls

Batch discount %

Optional discounted processing mode.

Target gross margin %

Used for minimum viable plan price.

Users at scale

usr

Scenario for cost-at-scale card.

Requests per user / month

req

Used to estimate per-user monthly AI cost.

Cost per Request

—

base unit cost

Cost per 1,000

—

usage multiple

Monthly Cost

—

projected spend

Cost per User

—

monthly active user view

Minimum Plan Price

—

for target margin

Cost profile

Current Scenario

—

monthly total

—

Optimization

Savings Signal

—

cache and model signal

—

Commercial Summary

—

Monthly Cost by Request Volume

Monthly cost

Baseline

Spend Split by Token Type

Input

Output

Cached

Reasoning

Model Comparison

Monthly cost

Model comparison table

Model	Input / 1M	Output / 1M	Cached / 1M	Cost / request	Monthly cost

Usage sensitivity

Volume	Monthly cost	Annual cost	Cost / user	Margin at plan

AI cost breakdown

Input cost per request—

Output cost per request—

Cached cost per request—

Reasoning cost per request—

Total cost per request—

Daily cost—

Monthly cost—

Annual cost—

Cost per user per month—

Cost at scale—

Gross margin—

Minimum viable plan price—

✦ Cal, AI Cost Analysis

—

How AI token pricing works

AI API pricing is usually split into input tokens, output tokens, cached input tokens and, in some stacks, reasoning or tool-related usage. The real commercial question is not just cost per million tokens. It is cost per request, cost per user, cost per month and whether the feature still carries an acceptable gross margin.

This calculator translates raw token prices into business numbers. That matters because most teams do not budget in token units. They budget in support tickets handled, user sessions, monthly active users and subscription revenue.

Core formulas

Input cost = (input tokens / 1,000,000) × input price

Output cost = (output tokens / 1,000,000) × output price

Cached cost = (cached tokens / 1,000,000) × cached price

Reasoning cost = (reasoning tokens / 1,000,000) × reasoning price

Cost per request = input + output + cached + reasoning

Monthly cost = cost per request × monthly requests

Cost per user = monthly cost ÷ monthly active users

Minimum viable plan price = cost per user ÷ (1 − target gross margin)

Provider billing models differ. This calculator is designed for planning and commercial sizing, not invoice reconciliation.

Why output tokens often matter more

Cost driver	Why it rises	Typical effect	Best response
Input tokens	Long prompts, system instructions, tool context	Steady request inflation	Compress prompts, trim context
Output tokens	Verbose answers or long generations	Often strongest unit cost driver	Set output caps, tighten style
Cached input	Reusable prompt blocks	Can materially cut repeated costs	Use prompt caching where available
RAG context	Too many or too large chunks	Prompt cost spikes	Better retrieval and chunk sizing
Retries	Timeouts, bad orchestration, second-pass calls	Silent cost drag	Track retry rate and tool quality

Frequently Asked Questions

What is a token in AI pricing?+

A token is a chunk of text used for billing and model processing. The exact token count depends on the model tokenizer, but for cost planning you can treat tokens as the billable unit for prompts and responses.

Why is output often more expensive than input?+

Many providers price output tokens above input tokens because generation is the more expensive stage. That means long answers can push unit economics off target even when prompts are controlled.

How does caching lower cost?+

When a provider supports cached input pricing, repeated prompt sections can be billed more cheaply than normal input. This matters for shared system prompts, repeated tools and repeated reference context.

Why does RAG increase spend so quickly?+

Because each retrieved chunk adds tokens to the prompt. If chunk count or chunk size is loose, input cost rises sharply even before the model generates a response.

What should I track for SaaS pricing?+

Track cost per user, cost per session, retry rate, cache hit rate and minimum viable plan price at your target margin. These metrics are more useful than raw token totals in isolation.

Can I rely on built-in model presets forever?+

No. Provider pricing changes, batch discounts vary and enterprise contracts differ. Presets are a starting point, so manual override should stay available.

Related Calculators

AI Token to Word Converter→ AI API Monthly Cost Estimator→ Open Source vs API Cost→ AI Model Cost Comparison→ AI Image Generation Cost→ AI Video Generation Cost→ AI Voiceover Cost→ SaaS Subscription Audit→

Related Guides

What Are AI Tokens→ How AI API Pricing Works→ Input vs Output Token Cost→ How to Reduce AI API Spend→ When Caching Lowers Token Cost→ API vs Open Source Model Costs→

Quick facts

Most useful metric: cost per user, not just cost per million tokens

Common hidden drag: retries and repeated long prompts

Output risk: long answers often raise spend faster than prompts

Best lever: better caching and tighter output controls

Commercial use: derive minimum plan price from target margin