TokenOps Cost Engineering Lab

⚡ TL;DR

Tokens are the unit of cost. A rough estimate is 1 token ≈ 4 English characters or about 0.75 words.
Claude Opus 4.7: per 1M tokens (input/output)
Claude Sonnet 4.6: per 1M tokens
GPT-5.5: ; GPT-5.5 Pro:
Pricing last verified: May 2026. Rates change frequently. This tool is for estimation, not billing reconciliation.
Best practice: Use the Ledger Lab below to calculate your actual costs

Why Token Economics Matter

LLMs charge by the token, not the word or the request. If you're building at scale, token efficiency becomes a core part of your product architecture. Model choice alone can cut your LLM costs by 50–75%, but only if you understand how tokens work, how different providers tokenize differently, and which models make economic sense for which tasks.

This guide starts with the fundamentals — tokens, embeddings, tokenizers — then gives you a hands-on lab to compare Claude and OpenAI costs for your own prompts and workloads.

What Are Tokens?

LLMs don't read characters or words. They read tokens — subword units created by a tokenizer algorithm. You pay per token for input and output; everything else is free.

English prose: 1 token ≈ 4 characters. "Hello, world!" is 4 tokens. "Unbelievable" might be 1–2 tokens; "tokenization" is 3–4 tokens. Numbers split unpredictably: "2026" is 1 token, but "2026-05-03" is 4 tokens (hyphens, digits split). Code is much more token-dense because symbols, brackets, and special characters each become their own token or token sequence.

Token density by content type (approx):
English prose: 25 tokens per 100 chars
Source code: 30–40 tokens per 100 chars
JSON/YAML: 35–45 tokens per 100 chars (quotes, colons, commas are expensive)
Chinese/Korean: 50–80 tokens per 100 chars

The practical implication: if you're processing logs, code snippets, or API responses (which are often JSON), your token cost is significantly higher than raw English text. Plan accordingly.

What Are Embeddings?

Once tokenized, each token is converted into an embedding — a numerical vector (a list of numbers) that captures the token's meaning in context. In modern models, each token becomes a 4,096-dimensional (or larger) vector. Embeddings capture semantic relationships: "cat" and "kitten" are close in embedding space; "cat" and "democracy" are distant.

The context window is how many tokens the model can simultaneously process — it's the model's working memory. Each token in the context consumes one "slot." When you send a prompt of 50K input tokens, you consume 5% of a 1M-token window.

The context window sets the cost ceiling for any single request: Max Input Cost = Context Window Size × Input Price per Token. In practice, a full-window request costs the context window size multiplied by the model's current input rate. That is why pricing must stay configurable.

Side note: embedding models (used for RAG — retrieval-augmented generation) have separate, much cheaper pricing. Don't confuse inference token costs with embedding costs.

For production systems, use provider token-count APIs rather than relying only on rough character estimates. Heuristics are useful for planning, but content type, tools, files, schemas, and model behavior can all shift the real count.

How Tokenizers Work & Why They Matter

Each provider uses a different tokenizer. GPT uses tiktoken — a Byte Pair Encoding (BPE) algorithm that starts with individual bytes and recursively merges the most common pairs into larger tokens. The final vocabulary is ~100K tokens. Claude uses its own proprietary tokenizer, also BPE-based but with different merge rules and vocabulary priorities.

Same English text = 5–10% different token count across providers. Example: "The quick brown fox jumps over the lazy dog" is 9 tokens in tiktoken, might be 8–10 in Claude's tokenizer depending on whitespace handling. This variance is real but small; the bigger factor is the price difference, not the token count.

Key insight: Don't compare raw token counts across providers. Always compare costs (tokens × price per token). The lab below uses a rough English heuristic, but actual token counts vary by tokenizer, content type, tools, files, and model behavior.

Both providers have a price tier strategy: cheap models for high-volume simple tasks, expensive models for complex reasoning. In May 2026, Claude Haiku 4.5 is the low-cost entry point, GPT-5.5 is the mainstream OpenAI frontier model, and Claude Opus 4.7 / GPT-5.5 Pro sit in the premium reasoning tier.

Claude's approach: Haiku 4.5 is the low-cost entry point, Sonnet 4.6 is the balanced tier, and Opus 4.7 is the premium tier with the highest output price.

OpenAI's approach (May 2026): GPT-5.5 is the general-purpose frontier model, and GPT-5.5 Pro is the premium option for the hardest problems.

The key economic insight: Claude Haiku 4.5 and GPT-5.5 are the entry points for lower cost workloads, Claude Sonnet 4.6 is the balanced tier, and GPT-5.5 Pro is the premium option. Compare your real workload rather than assuming a model label tells the full cost story.

The Token Cost Formula

Cost is the sum of input and output token costs. Output tokens are typically more expensive than input tokens because the model must generate them token-by-token (autoregressive generation is serial and computation-heavy).

Total Cost = (Input Tokens × Input Price) + (Output Tokens × Output Price)

Use the TokenOps lab below for live calculations with the current shared pricing object.

Live pricing example is rendered from the shared pricing object.

The ratio of output price to input price varies by provider and model tier. The practical implication: output budget is precious. If you don't need 500 tokens of output, ask for 200.

At Scale, Model Choice Multiplies

Single-request costs are small. At scale, they compound. Process 100,000 documents monthly, and the model you pick determines your monthly bill.

Example: 100,000 requests × 500 input tokens each × 50 output tokens each

Batch totals are rendered from the shared pricing object.

The spread is wide enough that model selection remains a product engineering decision, not just an ops decision. On this workload, the cheaper tiers stay dramatically lower cost than the premium tiers, so route carefully.

Cost Optimization Levers

Model Routing: Not all requests need your most expensive model. Try Haiku 4.5 first; escalate to Sonnet 4.6 / GPT-5.5 only if confidence is low. Your cheaper model solves many routine tasks just fine.
Prompt Caching: If the same system prompt or context is reused across requests, enable prompt caching (both Claude and OpenAI support it). Cached tokens cost 90% less. For batch operations, caching can reduce costs by 30–50%.
Output Budget: You pay for every output token. Don't ask for 500 tokens if 100 will do. Constrain `max_tokens` to what you actually need.
Batch APIs: Both providers offer batch APIs (24-hour turnaround) with 50% discounts on input tokens. For non-urgent workloads, batching is nearly always more economical.

Code Examples: Token Counting & Cost Calculation

Basic token counter (JavaScript):


function estimateTokens(text) {

  // Rough heuristic: 1 token ≈ 0.75 words in English text

  const words = text.trim().split(/\s+/).length;

  return Math.round(words / 0.75);

}



const prompt = "Explain machine learning in 100 words";

const tokens = estimateTokens(prompt);

console.log(`~${tokens} tokens`); // Output: ~47 tokens

Cost calculator (Python):


MODELS = {

  "claude-opus-4-7": {"input": 5.00, "output": 25.00},

  "gpt-5.5": {"input": 5.00, "output": 30.00},

}



def calculate_cost(model, input_tokens, output_tokens):

  rates = MODELS[model]

  cost = (input_tokens * rates["input"] + 

            output_tokens * rates["output"]) / 1_000_000

  return round(cost, 6)



cost = calculate_cost("claude-opus-4-7", 2000, 500)

print(f"Cost: ${cost:.4f}")  # Output: $0.0125

Common Pitfalls (and How to Avoid Them)

❌ Pitfall 1: Ignoring output token costs

Many teams focus only on input token pricing. Output tokens are cheaper but add up fast if you're not constrained. Solution: Set explicit output limits and track output/input ratio per model.

❌ Pitfall 2: Paying full price for cached content

If the same system prompt or document is used 1000 times, cache it. Not caching is leaving 30–50% savings on the table. Both Claude and GPT support it; enable it by default.

❌ Pitfall 3: Using Opus/GPT-5.5 Pro for everything

It's tempting to use the best model for all tasks. In reality, Haiku or Sonnet solve 85% of problems perfectly. A/B test your cheaper models first; you'll often find they're sufficient.

✓ Best Practice: Token-aware architecture

Build token budgets into your system design. Know the token cost per request type, set per-request maximums, and monitor spend. The Ledger Lab below helps you calculate; use it to baseline your workload.

TokenOps Platform

The full model comparison table and budget simulator live on the TokenOps platform. Use it to estimate your monthly AI infrastructure costs across all providers.

Open TokenOps →

🔧 Interactive Tool

Try the TokenOps Platform

For hands-on cost calculations and model comparisons, open the TokenOps platform. You'll find:

Ledger Lab: Paste your text or prompt → see real-time token counts and costs across all models
Model Comparison Table: Side-by-side pricing, context windows, and capability matrix
Budget Simulator: Forecast monthly costs based on your request volume and token counts

Open TokenOps Platform →

Take-Away: Model Choice Is a Lever

Tokens are the atomic unit of LLM economics. Even modest differences in token costs become meaningful at scale. Use the lab to estimate your own workload, then apply the four optimization levers (routing, caching, output budget, batch APIs) to squeeze more value out of every token. Tokens are cheap in absolute terms — but they add up fast at scale.

Sources

Pricing last verified: May 2026. Rates change frequently. This tool is for estimation, not billing reconciliation.