[HN Gopher] Show HN: Price Per Token - LLM API Pricing Data
___________________________________________________________________
Show HN: Price Per Token - LLM API Pricing Data
The LLM providers are constantly adding new models and updating
their API prices. Anyone building AI applications knows that these
prices are very important to their bottom line. The only place I am
aware of is going to these provider's individual website pages to
check the price per token. To solve this inconvenience I spent a
few hours making pricepertoken.com which has the latest model's up-
to-date prices all in one place. Thinking about adding image
models too especially since you have multiple options (fal,
replicate) to use the same model and the prices are not always the
same.
Author : alexellman
Score : 269 points
Date : 2025-07-25 12:39 UTC (10 hours ago)
(HTM) web link (pricepertoken.com)
(TXT) w3m dump (pricepertoken.com)
| bananapub wrote:
| surprising that you didn't find any of the existing ones,
| including our own simonw's: https://www.llm-prices.com
| alexellman wrote:
| I searched around on Google and couldn't find anything
| xnx wrote:
| https://www.google.com/search?q=llm+price+comparison
| jjani wrote:
| Another one, with many more models:
| https://www.helicone.ai/llm-cost
| callbacked wrote:
| Awesome list, any chance of adding OpenRouter? Looking at their
| website seems like it would be a pain to scrape all of that due
| to the site's layout.
| alexellman wrote:
| Yeah I am going to be adding more sources like that and Groq
| but just wanted to start with the basics and see if it
| resonated
| murshudoff wrote:
| https://openrouter.ai/docs/api-reference/list-available-mode...
| OpenRouter has an endpoint to get models and their pricing
| uponasmile wrote:
| Well done. The UX is solid. Clean, intuitive, and the use of
| color makes everything instantly clear
| alexellman wrote:
| thanks I appreciate that
| dust42 wrote:
| tldr; low effort website that only contains 26 Google, OpenAI and
| Anthropic models and only input and output prices but no info
| about prompt cache and prompt cache prices. For a list of 473
| models of 60+ providers with input, output, context, prompt
| caching and usage: https://openrouter.ai/models (no affiliation)
| jvanderbot wrote:
| You know, I immediately found this useful and bookmarked it.
| I'm not sure why "Simple" and "focused" means "bad"
| ramon156 wrote:
| So would adding cache prices make it less "low effort blog
| spam"?
| dust42 wrote:
| Well, openrouter lists currently 473 models of dozens of
| providers. Three providers and 26 models is 2 minutes of
| Google search. I stand by my word to call that low effort.
| alexellman wrote:
| I wanted to gauge interest before adding 500 models
|
| I plan on adding cache prices and making the list more
| comprehensive
| dust42 wrote:
| As user murshudoff mentioned elsewhere in the discussion,
| openrouter has an endpoint to get the prices. Takes 1
| minutes to get them.
| alexellman wrote:
| then use OpenRouter, totally fine by me. Thought a
| dedicated website just for this would be useful.
| sam-cop-vimes wrote:
| Don't use it then. No need to shit on the effort someone has
| put into. Clearly it is a starting point which they seem keen
| to iterate on. If you can't say something kind, best not to say
| anything.
| dust42 wrote:
| Look, it takes two prompts and two minutes to create this
| page:
|
| https://claude.ai/share/20b36bd3-d817-4228-bc33-aa7c4910bc2b
| (the preview seems to only work in Chrome, for Firefox you
| have to download the html).
|
| Plus maybe half an hour to verify and correct the prices and
| another few minutes for domain and hosting.
|
| The author posted it himself, so why not spend an hour or two
| more and have a decent list with at least half a dozen
| providers and 100 models? In its current state it is just a
| mockup.
|
| It is since 3 hours on the top of the front page, if the
| author had added one per minute to the .json, then it would
| now be 200 models.
| alexc05 wrote:
| open router doesn't have that really slick graph though :)
| antoineMoPa wrote:
| It would be fun to compare with inference providers (groq/vertex
| ai, etc.).
| alexellman wrote:
| yes going to add that
| nisegami wrote:
| How consistent is the tokenization across different model
| families? It always served as a mental hangup for me when
| comparing LLM inference pricing.
| alexellman wrote:
| They all tokenize a little differently so they are not exactly
| 1-1. However I plan on addressing this by having each model
| complete a test task and getting the actual price from each api
| + token count to make a real 1-1 comparison.
| nisegami wrote:
| Ah, that's a great idea and would be a welcome addition to
| the site.
| esafak wrote:
| And please timestamp the benchmarks, and rerun them
| periodically, so vendors can't quietly cost optimize the
| model when no-one's looking.
| cahaya wrote:
| Nice! Missing a cost calculator with input and output fields.
| alexellman wrote:
| Can add for the future
| can16358p wrote:
| Does anyone know why o1-pro is more expensive than o3-pro?
| infecto wrote:
| That's been the case for a lot of the models. As they release
| new models those models often include optimizations that bring
| runtime costs down. I don't have the data to back it up but
| it's felt like chip cycles. There is a new model that is better
| but more expensive. Then further iterations on that model being
| costs down.
| pierre wrote:
| Main issue is that token are not equivalent across provider /
| models. With huge disparity inside provider beyond the tokenizer
| model:
|
| - An image will take 10x token on gpt-4o-mini vs gpt-4.
|
| - On gemini 2.5 pro output token are token except if you are
| using structure output, then all character are count as a token
| each for billing.
|
| - ...
|
| Having the price per token is nice, but what is really needed is
| to know how much a given query / answer will cost you, as not all
| token are equals.
| BonoboIO wrote:
| On gemini 2.5 pro output token are token except if you are
| using structure output, then all character are count as a token
| each for billing.
|
| Can you elaborate this? I don't quite understand the
| difference.
| rsanek wrote:
| I hadn't heard of this before either and can't find anything
| to support it on the pricing page.
|
| https://ai.google.dev/gemini-api/docs/tokens
| alexellman wrote:
| yeah I am going to add an experiment that runs everyday and the
| cost of that will be a column on the table. It will be
| something like summarize this article in 200 words and every
| model gets the same prompt + article
| aaronharnly wrote:
| Can you gather historical information as well? I did a bit of
| spelunking of the Wayback Machine to gather a partial dataset for
| OpenAI, but mine is incomplete. Future planning is well-informed
| by understanding the trends -- my rough calculation was that
| within a model family, prices drop by about 40-80% per 12 months.
| alexellman wrote:
| Yeah I am planning on setting up automatic scraping and just
| having my own database. Maybe could add historical data beyond
| as well but just gonna save all my own data for now
| DrJid wrote:
| This is actually really awesome To see. Opened my eyes a bit.
| Ignore the haters.
| peterspath wrote:
| I am missing Grok
| alexellman wrote:
| added
| kb_geek wrote:
| Nice! It will be good to also pull in leaderboard rankings and/or
| benchmarks for each of these models, so we understand capability
| perhaps from lmsys (not sure if there is a better source)
| mythz wrote:
| There was a time when it was unbelievably frustrating to navigate
| the bunch of marketing pages required to find the cost of a newly
| announced model, now I just look at OpenRouter to find pricing.
| StratusBen wrote:
| The http://ec2instances.info/ of the LLM era ;)
| lucasoshiro wrote:
| "OpenAI, Anthropic, Google and more", where "and more" = 0.
| Where's Gemma, DeepSeek, etc?
|
| The UI, however, is really clean and straight to the point. I
| like the interface, but miss the content
| hopelite wrote:
| That was my first thought too.
|
| Mistral, Llama, Kimi, Qwen...?
| v5v3 wrote:
| Same.
|
| Not a site if value unless it contains whole of the market.
| jacob019 wrote:
| Love it! It's going on my toolbar. I face the same problem,
| constantly trying to hunt down the latest pricing which is often
| changing. I think it's great that you want to add more models and
| features, but maybe keep the landing page simple with a default
| filter that just shows the current content.
| alexellman wrote:
| Yeah want to keep it really simple. Appreciate it!
| l5870uoo9y wrote:
| It appears that GPT-4.1 is missing, but nano and mini are there.
| jalopy wrote:
| Super valuable resource - thanks!
|
| What tools / experiments out there exist to exercise these
| cheaper models to output more tokens / use more CoT tokens to
| achieve the quality of more expensive models?
|
| eg, Gemini 2.5 flash / pro ratio is 1 1/3 for input, 1/8 for
| output... Surely there's a way to ask Flash to critique it's work
| more thoroughly to get to Pro level performance and still save
| money?
| CharlesW wrote:
| Site is down as I type this, but a shout-out to Simon Willison's
| LLM pricing calculator: https://www.llm-prices.com/
| criddell wrote:
| If you had a $2500ish budget for hardware, what types of models
| could you run locally? If $2500 isn't really enough, what would
| it take?
|
| Are there any tutorials you can recommend for somebody interested
| in getting something running locally?
| cogman10 wrote:
| This is where you'd start for local: https://ollama.com/
|
| You can, almost, convert the number of nodes to gb of memory
| needed. For example, Deepseek-r1:7b needs about 7gb of memory
| to run locally.
|
| Context window matters, the more context you need, the more
| memory you'll need.
|
| If you are looking for AI devices at $2500, you'll probably
| want something like this [1]. A unified memory architecture
| (which will mean LPDDR5) will give you the most memory for the
| least amount of money to play with AI models.
|
| [1] https://frame.work/products/desktop-diy-amd-
| aimax300/configu...
| mark_l_watson wrote:
| I bought a Mac Mini M2Pro 32G 18 months ago for $1900. It is
| sufficient to run good up to and including 40B local models
| that are quantized.
|
| When local models don't cut it, I like Gemini 2.5 flash/pro and
| gemini-cli.
|
| There are a lot of good options for commercial APIs and for
| running local models. I suggest choosing a good local and a
| good commercial API, and spend more time building things than
| frequently trying to evaluate all the options.
| criddell wrote:
| Are there any particular sources you found helpful to get
| started?
|
| It's been a while since I checked out Mini prices. Today,
| $2400 buys an M4 Pro with all the cores, 64GB RAM, and 1TB
| storage. That's pleasantly surprising...
| mark_l_watson wrote:
| You can read my book on local models with Ollama free
| online: https://leanpub.com/ollama/read
| criddell wrote:
| Awesome, thanks!
| dstryr wrote:
| I would purchase [2] used 3090's as close to $600 as you can.
| The 3090 still remains the price-performance king.
| yieldcrv wrote:
| the local side of things with an $7,000 - $10,000 machine
| (512gb fast memory, cpu and disk) can almost reach parity with
| regard to text input and output and 'reasoning', but lags far
| behind for multimodal anything: audio input, voice output,
| image input, image output, document input.
|
| there are no out the box solutions to run a fleet of models
| simultaneously or containerized either
|
| so the closed source solutions in the cloud are light years
| ahead and its been this way for 15 months now, no signs of
| stopping
| omneity wrote:
| Would running vLLM in docker work for you, or do you have
| other requirements?
| yieldcrv wrote:
| its not an image and audio model, so I believe it wouldn't
| work for me by itself
|
| would probably need multiple models running in distinct
| containers, with another process coordinating them
| skeezyboy wrote:
| you can run ollama stuff with just a decent cpu for some of
| them
| redox99 wrote:
| Kimi and deepseek are the only models that don't feel like a
| large downgrade from the typical providers.
| NitpickLawyer wrote:
| > The only place I am aware of is going to these provider's
| individual website pages to check the price per token.
|
| Openrouter is a good alternative. Added bonus that you can also
| see where the open models come in, and can make an educated guess
| on the true cost / size of a model, and how likely it is it's
| currently subsidised.
| danenania wrote:
| OpenRouter also has an endpoint for listing models (with
| pricing info) in its api:
| https://openrouter.ai/docs/overview/models
|
| A limitation though, at least the last time I checked, is that
| you only get a single provider returned per model. That's fine
| for the major commercial models that all have the same pricing
| on each provider, but makes it hard to rely on for open source
| models, which tend to have many providers offering them at
| different price points (sometimes _very_ different price points
| --like 5x or 10x difference).
| awongh wrote:
| This is great, but as others have mentioned the UX problem is
| more complicated than this:
|
| - for other models there are providers that serve the same model
| with different prices
|
| - each provider optimizes for different parameters: speed, cost,
| etc.
|
| - the same model can still be different quantizations
|
| - some providers offer batch pricing (e.g., Grok API does not)
|
| And there are plenty of other parameters to filter over- thinking
| vs. non-thinking, multi-modal or not, etc. not to even mention
| benchmarks ranking.
|
| https://artificialanalysis.ai gives a blended cost number which
| helps with sorting a bit, but a blended cost model for
| input/output costs are going to change depending on what you're
| doing.
|
| I'm still holding my breath for a site that has a really nice
| comparison UI.
|
| Someone please build it!
| alexellman wrote:
| would a column for "provider" meaning the place you are
| actually making the call to solve this
| svachalek wrote:
| Please not benchmark ranking. We've encouraged this nonsense
| far too long already.
| zeroCalories wrote:
| I think it would be very hard to make a fair comparison. Best
| you could do is probably make the trade-offs clear and let
| people make their own choices. I think it could be cool to make
| something like a token exchange where people put up their
| requirements, and then companies offer competing services that
| fit those requirements. Would be cool to let random people
| offer to their compute, but you would need to find a way to
| handle people lying about their capabilities or stealing data.
| numlocked wrote:
| (I work at OpenRouter)
|
| We have a simple model comparison tool that is not-at-all-
| obvious to find on the website, but hopefully can help
| somewhat. E.g.
|
| https://openrouter.ai/compare/qwen/qwen3-coder/moonshotai/ki...
| sshah_24 wrote:
| can we not just self host, expose things through VPN, and
| something that needs sharing with the world, then tunnel through
| some cloud server to keep the internal servers secure?
|
| I am newly to this hobby, but would like to know more about what
| experienced person things and do.
| BartjeD wrote:
| Mistral is missing
| ashwindharne wrote:
| KV caching is priced and managed quite differently between
| providers as well. Seeing as it becomes a huge chunk of the
| actual tokens used, wondering if there's an easy way to compare
| across providers.
| eugene3306 wrote:
| what's point of comparing token prices? especially for thinking
| models.
|
| Just now I was testing the new Qwen3-thinking model. I've run the
| same prompt five times. The costs I got, sorted: 0.0143, 0.0288,
| 0.0321, 0.0389, 0.048 . And this is for single model.
|
| Also, in my experience, sonnet-4 is cheaper than gemini-2.5-pro,
| despite token costs being higher.
| eugene3306 wrote:
| I think the proper way of estimating the cost is the cost of
| entire run of a test. Like in aider's leaderboard.
| alienbaby wrote:
| I'd like to be able to compare prices to determine things like;
|
| Should I use copilot pro in agent mode with sonnet 4, or is it
| cheaper to use claude with sonnet 4 directly?
| dgrin91 wrote:
| Cool site. Would be interesting to add a time dimension to track
| prices over time
| sophia01 wrote:
| But the data is... wrong? Google Gemini 2.5 Flash-Lite costs
| $0.10/mtok input [1] but is shown here as $0.40/mtok?
|
| [1] https://ai.google.dev/gemini-
| api/docs/pricing#gemini-2.5-fla...
| alexellman wrote:
| the data is not wrong you are reading my table wrong
|
| edit: my bad I was wrong shouldnt have responded like this
| GaggiX wrote:
| The input is wrong tho
|
| Your website reports 0.30$ for input and that wouldn't make
| any sense as it would be priced the same as the bigger Flash
| model.
| alexellman wrote:
| ok yeah fixed that one, sorry...
| Imustaskforhelp wrote:
| such level of condescending behaviour when you yourself
| are wrong is not allowed.
|
| Put a really really bad taste in my mouth.
| gompertz wrote:
| First poster could have approach better too. Like "Cool
| site! I think I may see an error on one item?". Instead
| of going right to a 'wrong' angle as if all the data
| should be discredited. I get highly triggered by this
| too.
| arccy wrote:
| this overly positive attitude triggers a bunch of people
| too. wrong data should just be called out, especially if
| that's your main selling point.
| copperx wrote:
| But why is condescension tolerable when the person is
| right?
| Imustaskforhelp wrote:
| It is not but its order of magnitudes worse if the person
| is wrong.
| unglaublich wrote:
| Ouch, bad response for someone with a business!
| binarymax wrote:
| Does anyone have an API that maintains a list of all model
| versions for a provider? I hand-update OpenAI into a JSON file
| that I use for cost reporting in my apps (and in an npm package
| called llm-primitives).
|
| Here's the current version: const
| pricesPerMillion = { "o1-2024-12-17": { input: 15.00,
| output: 60.00 }, "o1-mini-2024-09-12": { input: 1.10,
| output: 4.40 }, "o3-mini-2025-01-31": { input: 1.10,
| output: 4.40 }, "gpt-4.5-preview-2025-02-27": {
| input: 75.00, output: 150.00 }, "gpt-4o": { input:
| 5.00, output: 15.00 }, "gpt-4o-2024-08-06": { input:
| 2.50, output: 10.00 }, "gpt-4o-2024-05-13": { input:
| 5.00, output: 15.00 }, "gpt-4o-mini": { input: 0.15,
| output: 0.60 }, "gpt-4o-mini-2024-07-18": { input:
| 0.15, output: 0.60 }, "gpt-4-0613": { input: 30.00,
| output: 60.00 }, "gpt-4-turbo-2024-04-09": { input:
| 10.00, output: 30.00 }, "gpt-3.5-turbo": { input:
| 0.003, output: 0.006 }, "gpt-4.1": { input: 2.00,
| output: 8.00 }, "gpt-4.1-2025-04-14": { input: 2.00,
| output: 8.00 }, "gpt-4.1-mini": { input: 0.40,
| output: 1.60 }, "gpt-4.1-mini-2025-04-14": { input:
| 0.40, output: 1.60 }, "gpt-4.1-nano": { input: 0.10,
| output: 0.40 }, "gpt-4.1-nano-2025-04-14": { input:
| 0.10, output: 0.40 }, "gpt-4o-audio-
| preview-2024-12-17": { input: 2.50, output: 10.00 },
| "gpt-4o-realtime-preview-2024-12-17": { input: 5.00, output:
| 20.00 }, "gpt-4o-mini-audio-preview-2024-12-17": {
| input: 0.15, output: 0.60 }, "gpt-4o-mini-realtime-
| preview-2024-12-17": { input: 0.60, output: 2.40 },
| "o1-pro-2025-03-19": { input: 150.00, output: 600.00 },
| "o3-pro-2025-06-10": { input: 20.00, output: 80.00 },
| "o3-2025-04-16": { input: 2.00, output: 8.00 },
| "o4-mini-2025-04-16": { input: 1.10, output: 4.40 },
| "codex-mini-latest": { input: 1.50, output: 6.00 },
| "gpt-4o-mini-search-preview-2025-03-11": { input: 0.15, output:
| 0.60 }, "gpt-4o-search-preview-2025-03-11": { input:
| 2.50, output: 10.00 }, "computer-use-
| preview-2025-03-11": { input: 3.00, output: 12.00 } };
|
| I would love to replace this with an API call.
| urbandw311er wrote:
| Check out the source code to the vercel AI SDK. I've noticed
| that they broker calls out to various LLMs and then seem to
| return the cost as part of the response. So I'm thinking that
| this data could well be in there somewhere. Away from my desk
| right now so can't check.
| stogot wrote:
| I do this with other tools
|
| 1. Pull some large tech company's open source' tools' JS file
| 2. Extract an internal JSON blob that contains otherwise
| difficult information 3. Parse it and use what I need from
| within it for my tool
| zerocool0101 wrote:
| this is a snippet of the structure of the JSON file for this
| website if it helps: { "provider_id": 6, "provider": 7,
| "input_price_per_1m_tokens": 8, "output_price_per_1m_tokens":
| 9, "response_time_ms": 10, "actual_cost_usd": 11,
| "input_cost_per_word_usd": 12, "output_cost_per_word_usd": 13,
| "has_tiered_pricing": 14 }, "anthropic:claude-opus-4",
| "Anthropic Claude Opus 4", 15, 75, 1443.85168793832, 0.045,
| 0.00006, 0.0003, false, { "provider_id": 16, "provider": 17,
| "input_price_per_1m_tokens": 18, "output_price_per_1m_tokens":
| 8, "response_time_ms": 19, "actual_cost_usd": 20,
| "input_cost_per_word_usd": 21, "output_cost_per_word_usd": 12,
| "has_tiered_pricing": 14 }, "anthropic:claude-sonnet-4",
| "Anthropic Claude Sonnet 4", 3, 1568.72692800385, 0.009,
| 0.000012, { "provider_id": 23, "provider": 24,
| "input_price_per_1m_tokens": 25, "output_price_per_1m_tokens":
| 26, "response_time_ms": 27, "actual_cost_usd": 28,
| "input_cost_per_word_usd": 29, "output_cost_per_word_usd": 30,
| "has_tiered_pricing": 14 }, "anthropic:claude-haiku-3.5",
| "Anthropic Claude Haiku 3.5", 0.8, 4, 2141.1094386851, 0.0024,
| 0.0000032, 0.000016, { "provider_id": 32, "provider": 33,
| "input_price_per_1m_tokens": 8, "output_price_per_1m_tokens":
| 9, "response_time_ms": 34, "actual_cost_usd": 11,
| "input_cost_per_word_usd": 12, "output_cost_per_word_usd": 13,
| "has_tiered_pricing": 14 }, "anthropic:claude-opus-3",
| "Anthropic Claude Opus 3", 2538.34107347902, { "provider_id":
| 36, "provider": 37, "input_price_per_1m_tokens": 18,
| "output_price_per_1m_tokens": 8, "response_time_ms": 38,
| "actual_cost_usd": 20, "input_cost_per_word_usd": 21,
| "output_cost_per_word_usd": 12, "has_tiered_pricing": 14 },
| "anthropic:claude-sonnet-3.7", "Anthropic Claude Sonnet 3.7",
| 2513.9738537193, { "provider_id": 40, "provider": 41,
| "input_price_per_1m_tokens": 42, "output_price_per_1m_tokens":
| 43, "response_time_ms": 44, "actual_cost_usd": 45,
| "input_cost_per_word_usd": 46, "output_cost_per_word_usd": 47,
| "has_tiered_pricing": 14 }, "anthropic:claude-haiku-3",
| "Anthropic Claude Haiku 3", 0.25, 1.25, 2874.71054013884,
| 0.00075, 0.000001, 0.000005, { "provider_id": 49, "provider":
| 50, "input_price_per_1m_tokens": 51,
| "output_price_per_1m_tokens": 52, "response_time_ms": 53,
| "actual_cost_usd": 54, "input_cost_per_word_usd": 55,
| "output_cost_per_word_usd": 56, "has_tiered_pricing": 14 },
| "open-ai:open-ai-gpt-4.1-mini", "Open AI Open AI GPT-4.1-mini",
| 0.4, 1.6, 2903.77470624506, 0.001, 0.0000016, 0.0000064, {
| "provider_id": 58, "provider": 59, "input_price_per_1m_tokens":
| 60, "output_price_per_1m_tokens": 51, "response_time_ms": 61,
| "actual_cost_usd": 62, "input_cost_per_word_usd": 63,
| "output_cost_per_word_usd": 55, "has_tiered_pricing": 14 },
| "open-ai:open-ai-gpt-4.1-nano", "Open AI Open AI GPT-4.1-nano",
| 0.1, 2650.13976342621, 0.00025, 4e-7, { "provider_id": 65,
| "provider": 66, "input_price_per_1m_tokens": 9,
| "output_price_per_1m_tokens": 67, "response_time_ms": 68,
| "actual_cost_usd": 69, "input_cost_per_word_usd": 13,
| "output_cost_per_word_usd": 70, "has_tiered_pricing": 14 },
| Fanofilm wrote:
| They should add grok. I use grok.
| alexellman wrote:
| I just added grok
| iambateman wrote:
| This is cool! Two requests:
|
| - Filter by model "power" or price class. I want compare the mini
| models, the medium models, etc.
|
| - I'd like to see a "blended" cost which does 80% input + 20%
| output, so I can quickly compare the overall cost.
|
| Great work on this!
| alexellman wrote:
| thanks for the feedback!
| amelius wrote:
| Ok, that's price per token, but tells me nothing about the IQ of
| the models.
| Fripplebubby wrote:
| Maybe I am blinded by my own use case, but I find the caching
| pricing and strategy (since different providers use a different
| implementation of caching as well as different pricing) to be a
| major factor rather than just the "raw" per token cost, and that
| is missing here, as well as on the Simon Willison site [1]. Do
| most people just not care / not use caching that much that it
| matters?
|
| [1] https://llm-prices.com/
| MattSayar wrote:
| I know at least a couple LLM providers will do some caching for
| you automatically now, which muddies the waters a bit. [0]
|
| [0] https://developers.googleblog.com/en/gemini-2-5-models-
| now-s...
| paradite wrote:
| It's actually more complex than just input and output tokens,
| there are more pricing rules by various providers:
|
| - Off-peak pricing by DeepSeek
|
| - Batch pricing by OpenAI and Anthropic
|
| - Context window differentiated pricing by Google and Grok
|
| - Thinking vs non-thinking token pricing by Qwen
|
| - Input token tiered pricing by Qwen coder
|
| I originally posted here:
| https://x.com/paradite_/status/1947932450212221427
| james2doyle wrote:
| Nice. I think I prefer https://models.dev/ as it seems more
| complete
| manishsharan wrote:
| Is there a reason why you have not added DeepSeek and Qwen and
| Meta ?
|
| You should also aggregate prices from Vertex and AWS Bedrock .
| jimbo808 wrote:
| Are we really at a point already where we're treating tokens as a
| commodity? I certainly would not consider a token generated by
| Claude or Gemini to be of similar value to a token by Copilot,
| for example.
| nikvdp wrote:
| there's also http://llmprices.dev. similar, but with a searchbox
| for quick filtering
| antimatter15 wrote:
| The `ccusage` npm package pulls prices and other information from
| LiteLLM which has a lot of diferent models:
| https://raw.githubusercontent.com/BerriAI/litellm/main/model...
| fronty wrote:
| We are working on a similar problem, https://apiraces.com, to
| personalize the cost calculation of your llm api use case,
|
| We have uploaded mostly the openrouter api models, but trying to
| do it in a useful way to personalize calculation and comparison.
| If someone would like to test or have a demo, we will be glad for
| any feedback.
| techbuilder4242 wrote:
| Cool!
| forrestthewoods wrote:
| Neat. Would love to see this plotted on a Pareto curve to show
| quality of said tokens.
| intellectronica wrote:
| Should read "Up to date prices for Closed American LLM APIs"
| julianozen wrote:
| Keeping this up to date would be a good use for an agent.
| Companies might even pay for something like this
| croes wrote:
| Or a page scraper
| hagope wrote:
| this is great, I've always wanted something like this, do you
| think you can add other model metadata, like api name
| (`gemini-2.5-pro`), context length, modalities, etc
| numlocked wrote:
| (I work at OpenRouter)
|
| We have solved this problem by working with the providers to
| implement a prices and models API that we scrape, which is how we
| keep our marketplace up to date. It's been a journey; a year ago
| it was all happening through conversations in shared Slack
| channels!
|
| The pricing landscape has become more complex as providers have
| introduced e.g. different prices for tokens depending on prompt
| length, caching, etc.
|
| I do believe the right lens on this is actually the price per
| token by _endpoint_ , not by model; there are fast/slow versions,
| thinking/non-thinking, etc. that can sometimes also vary by
| price.
|
| The point of this comment is not to self promote, but we have put
| a huge amount of work into figuring all of this out, and have it
| all publicly available on OpenRouter (admittedly not in such a
| compact, pricing-focused format though!)
| tekacs wrote:
| I tried making it compact and easy just now! Thanks so much for
| the effort!
|
| https://github.com/tekacs/llm-pricing
| overwatch34 wrote:
| Thanks! I used Cursor + OpenRouter API to create
| https://tokentickr.com to give us a quick way to compare model
| capabilities, costs, visually. Let me know what yall think!
| OutOfHere wrote:
| It doesn't even list the price for GPT-4.1 (full model). This
| means it's not thorough and it doesn't try. What an immediate
| disappointment.
| generalizations wrote:
| This is awesome! I wonder how possible it is to incorporate
| benchmarks - maybe as a filter? Since not all tokens are as
| useful as others. Heh.
| tekacs wrote:
| I've run into this a ton of times and these websites all kinda
| suck. Someone mentioned the OpenRouter /models endpoint in a
| sibling comment here, so I quickly threw this together just now.
| Please feel free to PR!
|
| https://github.com/tekacs/llm-pricing llm-pricing
| Model | Input | Output |
| Cache Read | Cache Write ----------------------------------
| --------+-------+--------+------------+------------
| anthropic/claude-opus-4 | 15.00 | 75.00 | 1.50
| | 18.75 anthropic/claude-sonnet-4 |
| 3.00 | 15.00 | 0.30 | 3.75
| google/gemini-2.5-pro | 1.25 | 10.00 | N/A
| | N/A x-ai/grok-4 |
| 3.00 | 15.00 | 0.75 | N/A openai/gpt-4o
| | 2.50 | 10.00 | N/A | N/A ...
|
| --- llm-pricing calc 10000 200 -c 9500 opus-4 4.1
| Cost calculation: 10000 input + 200 output (9500 cached, 5m TTL)
| Model | Input | Output | Cache Read |
| Cache Write | Total ---------------------------+-------
| ----+-----------+------------+-------------+----------
| anthropic/claude-opus-4 | $0.007500 | $0.015000 | $0.014250 |
| $0.178125 | $0.214875 openai/gpt-4.1 |
| $0.001000 | $0.001600 | $0.004750 | $0.000000 | $0.007350
| openai/gpt-4.1-mini | $0.000200 | $0.000320 | $0.000950 |
| $0.000000 | $0.001470 openai/gpt-4.1-nano |
| $0.000050 | $0.000080 | $0.000237 | $0.000000 | $0.000367
| thudm/glm-4.1v-9b-thinking | $0.000018 | $0.000028 | $0.000333 |
| $0.000000 | $0.000378
|
| --- llm-pricing opus-4 -v ===
| ANTHROPIC === Model: anthropic/claude-opus-4
| Name: Anthropic: Claude Opus 4 Description: Claude Opus 4
| is benchmarked as the world's best coding model, at time of
| release, bringing sustained performance on complex,
| long-running tasks and agent workflows. It sets new
| benchmarks in software engineering, achieving leading results on
| SWE-bench (72.5%) and Terminal-bench (43.2%).
| Pricing: Input: $15.00 per 1M tokens Output:
| $75.00 per 1M tokens Cache Read: $1.50 per 1M tokens
| Cache Write: $18.75 per 1M tokens Per Request: $0
| Image: $0.024 Context Length: 200000 tokens
| Modality: text+image->text Tokenizer: Claude Max
| Completion Tokens: 32000 Moderated: true
| krashidov wrote:
| Where is Claude 3.5 Sonnet? Arguably the best model still lol
| ssalka wrote:
| I'd love to see this data joined with common benchmarks, in order
| to see which models get you the most "bang for your buck", i.e.
| benchmark score / token cost
___________________________________________________________________
(page generated 2025-07-25 23:00 UTC)