[HN Gopher] Show HN: A tool to benchmark LLM APIs (OpenAI, Claud...
___________________________________________________________________
Show HN: A tool to benchmark LLM APIs (OpenAI, Claude, local/self-
hosted)
I recently built a small open-source tool to benchmark different
LLM API endpoints -- including OpenAI, Claude, and self-hosted
models (like llama.cpp). It runs a configurable number of test
requests and reports two key metrics: * First-token latency (ms):
How long it takes for the first token to appear * Output speed
(tokens/sec): Overall output fluency Demo: https://llmapitest.com/
Code: https://github.com/qjr87/llm-api-test The goal is to provide
a simple, visual, and reproducible way to evaluate performance
across different LLM providers, including the growing number of
third-party "proxy" or "cheap LLM API" services. It supports: *
OpenAI-compatible APIs (official + proxies) * Claude (via
Anthropic) * Local endpoints (custom/self-hosted) You can also
self-host it with docker-compose. Config is clean, adding a new
provider only requires a simple plugin-style addition. Would love
feedback, PRs, or even test reports from APIs you're using.
Especially interested in how some lesser-known services compare.
Author : mrqjr
Score : 24 points
Date : 2025-06-29 15:33 UTC (7 hours ago)
(HTM) web link (llmapitest.com)
(TXT) w3m dump (llmapitest.com)
| mdhb wrote:
| In what universe is a post created by a new account with zero
| comments and a grand total of 2 votes over the course of 2 hours
| doing on the front page?
| iRomain wrote:
| LLM
| vntok wrote:
| It's an informative post about new tech, that fits pretty well
| here of all places.
|
| Why would you want the author to write about something else to
| validate the post? That would be an appeal to authority, which
| is the complete opposite of what the Hacker Manifesto has
| always been about in terms of ethos, goals, etc.
| bdangubic wrote:
| I am polishing up my blog about some FORTRAN code I wrote last
| week in hopes of the same :)
| swyx wrote:
| idk what it is but buying that domain made it seem more
| commercial and therefore less trustworthy. also most people prob
| want to just use artificialanalysis' numbers rather than self run
| benchmarks (but this is ok if want to run your own)
___________________________________________________________________
(page generated 2025-06-29 23:00 UTC)