[HN Gopher] Show HN: A tool to benchmark LLM APIs (OpenAI, Claud...
       ___________________________________________________________________
        
       Show HN: A tool to benchmark LLM APIs (OpenAI, Claude, local/self-
       hosted)
        
       I recently built a small open-source tool to benchmark different
       LLM API endpoints -- including OpenAI, Claude, and self-hosted
       models (like llama.cpp).  It runs a configurable number of test
       requests and reports two key metrics: * First-token latency (ms):
       How long it takes for the first token to appear * Output speed
       (tokens/sec): Overall output fluency  Demo: https://llmapitest.com/
       Code: https://github.com/qjr87/llm-api-test  The goal is to provide
       a simple, visual, and reproducible way to evaluate performance
       across different LLM providers, including the growing number of
       third-party "proxy" or "cheap LLM API" services.  It supports: *
       OpenAI-compatible APIs (official + proxies) * Claude (via
       Anthropic) * Local endpoints (custom/self-hosted)  You can also
       self-host it with docker-compose. Config is clean, adding a new
       provider only requires a simple plugin-style addition.  Would love
       feedback, PRs, or even test reports from APIs you're using.
       Especially interested in how some lesser-known services compare.
        
       Author : mrqjr
       Score  : 24 points
       Date   : 2025-06-29 15:33 UTC (7 hours ago)
        
 (HTM) web link (llmapitest.com)
 (TXT) w3m dump (llmapitest.com)
        
       | mdhb wrote:
       | In what universe is a post created by a new account with zero
       | comments and a grand total of 2 votes over the course of 2 hours
       | doing on the front page?
        
         | iRomain wrote:
         | LLM
        
         | vntok wrote:
         | It's an informative post about new tech, that fits pretty well
         | here of all places.
         | 
         | Why would you want the author to write about something else to
         | validate the post? That would be an appeal to authority, which
         | is the complete opposite of what the Hacker Manifesto has
         | always been about in terms of ethos, goals, etc.
        
         | bdangubic wrote:
         | I am polishing up my blog about some FORTRAN code I wrote last
         | week in hopes of the same :)
        
       | swyx wrote:
       | idk what it is but buying that domain made it seem more
       | commercial and therefore less trustworthy. also most people prob
       | want to just use artificialanalysis' numbers rather than self run
       | benchmarks (but this is ok if want to run your own)
        
       ___________________________________________________________________
       (page generated 2025-06-29 23:00 UTC)