hngopher.com

       [HN Gopher] Show HN: Countless.dev - A website to compare every ...
       ___________________________________________________________________
        
       Show HN: Countless.dev - A website to compare every AI model: LLMs,
       TTSs, STTs
        
       Author : ahmetd
       Score  : 200 points
       Date   : 2024-12-07 09:42 UTC (13 hours ago)
        
 (HTM) web link (countless.dev)
 (TXT) w3m dump (countless.dev)
        
       | xnx wrote:
       | Nice resource. Almost too comprehensive for someone who doesn't
       | know all the sub-version names. Would be great to have a column
       | of the score from lmarena leaderboard. Some prices are 0.00? Is
       | there a page that each row could link to for more detail?
        
         | ahmetd wrote:
         | thank you! some models either have N/A or 0.00, I found it is
         | like that for the free models and ones that aren't available.
         | 
         | As per llmarena I'll definitely add it, a lot of other people
         | recommended it as well.
         | 
         | over time will make the website more descriptive and detailed!
        
         | kmoser wrote:
         | And a link to the company page where one can use/subscribe to
         | the model
        
       | mtkd wrote:
       | Would poss be further useful to have a release date column,
       | license type, whether EU restricted and also right-align / comma-
       | delimit those numeric cells
        
         | ahmetd wrote:
         | good idea, will look into adding this!
        
       | mcklaw wrote:
       | It would be great if llmarena leadership information would also
       | appear to compare performance vs cost.
        
         | ahmetd wrote:
         | yep, will add this :)
        
         | ursaguild wrote:
         | https://lmarena.ai
        
       | Its_Padar wrote:
       | Would be great if it was possible to get to the page where the
       | pricing was found to make it easier to use the model
        
       | ursaguild wrote:
       | I like the idea of more comparisons of models. Are there plans to
       | add independent analyses of these models or is it only an
       | aggregation of input limits?
       | 
       | How do you see this differing from or adding to other analyses
       | such as:
       | 
       | https://artificialanalysis.ai
       | 
       | https://huggingface.co/spaces/TTS-AGI/TTS-Arena
       | 
       | https://huggingface.co/spaces/hf-audio/open_asr_leaderboard
       | 
       | https://huggingface.co/spaces/TIGER-Lab/GenAI-Arena
       | 
       | Great work on all the aggregation. The website is nice to
       | navigate.
        
         | ahmetd wrote:
         | the gradio ui looks ugly imo, that's why I used shadcn and
         | next.js to make the website look good.
         | 
         | I'll try to make it as user-friendly as possible. Most of the
         | websites are ugly + too technical.
        
         | botro wrote:
         | I made https://aimodelreview.com/ to compare the outputs of
         | LLMs over a variety of prompts and categories, allowing a side
         | by side comparison between them. I ran each prompt 4 times for
         | different temperature values and that's available as a toggle.
         | 
         | I was going to add reviews on each model but ran out of steam.
         | Some users have messaged me saying the comparisons are still
         | helpful to them in getting a sense of how different models
         | respond to the same prompt and how temperature affects the same
         | models output on the same prompt.
        
       | ursaguild wrote:
       | Just saw that this was built for a hackathon. Huge kudos and
       | congratulations!
        
         | ahmetd wrote:
         | thank you! although I wasn't able to win the hackathon it was
         | still a fun experience :)
        
       | politelemon wrote:
       | There are only two audio transcription models. Is this generally
       | true, are there no open source ones like llama but for
       | transcribing? Or just small dataset on that site
        
         | rhdunn wrote:
         | It looks like the site is only listing hosted models from major
         | providers, not all models available on huggingface, civit.ai,
         | etc. -- Looking at the image generation and chat lists there
         | are many more models that are on huggingface that are not
         | listed.
         | 
         | See https://huggingface.co/models?pipeline_tag=automatic-
         | speech-...
         | 
         | Note: Text to Speech and Audio Transcription/Automatic Speech
         | Recognition models can be trained on the same data. They
         | currently require training separately as the models are
         | structured differently. One of the challenges is training time
         | as the data can run into the hundreds of hours of audio.
        
           | politelemon wrote:
           | Thank you both
        
         | woodson wrote:
         | There are lots and lots of models, covering various use cases
         | (e.g., on device, streaming/low-latency, specific languages).
         | People somehow think OpenAI invented audio transcription with
         | whisper in 2022 when other models exist and have been used in
         | production for decades (whisper is the only one listed on that
         | website).
        
       | wslh wrote:
       | I'd like to share a personal perspective/rant on AI that might
       | resonate with others: like many, I'm incredibly excited about
       | this AI moment. The urge to dive headfirst into the field and
       | contribute is natural after all, it's the frontier of innovation
       | right now.
       | 
       | But I think this moment mirrors financial markets during times of
       | frenzy. When markets are volatile, one common piece of advice is
       | to "wait and see". Similarly, in AI, so many brilliant minds and
       | organizations are racing to create groundbreaking innovations.
       | Often, what you're envisioning as your next big project might
       | already be happening, or will soon be, somewhere else in the
       | world.
       | 
       | Adopting a "wait and see" strategy could be surprisingly
       | effective. Instead of rushing in, let the dust settle, observe
       | trends, and focus on leveraging what emerges. In a way, the
       | entire AI ecosystem is working for you: building the foundations
       | for your next big idea.
       | 
       | That said, this doesn't mean you can't integrate the state of the
       | art into your own (working) products and services.
        
         | whiplash451 wrote:
         | Your proposal makes a lot of sense. I assume a number of
         | companies are integrating sota models into their products.
         | 
         | That being said, there is no free lunch: when you're doing
         | this, you're more reactive than proactive. You minimize risk,
         | but you also lose any change to have a stake [1] in the few
         | survivors that will remain and be extremely valuable.
         | 
         | Do this long enough and you'll have no idea what people are
         | talking about in the field. Watch the latest Dwarkesh Patel
         | episode to get a sense of what I am talking about.
         | 
         | [1] stake to be understood broadly as: shares in a company,
         | knowledge as an AI researcher, etc.
        
           | wslh wrote:
           | Thank you for your thoughtful response! I completely agree
           | that there's a tradeoff between being proactive and reactive
           | in this kind of strategy: minimizing risk by waiting can mean
           | missing out on opportunities to gain a broader "stake".
           | 
           | That said, my perspective focuses more on strategic timing
           | rather than complete passivity. It's about being engaged with
           | understanding trends, staying informed, and preparing to act
           | decisively when the right opportunity emerges. It's less
           | about "waiting on the sidelines" and more about deliberate
           | pacing, recognizing that it's not always necessary to be at
           | the bleeding edge to create value.
           | 
           | I'll definitely check out Dwarkesh Patel's latest episode. I
           | assume it is the Gwern one, right? Thanks!
        
       | dangoodmanUT wrote:
       | This is missing... so many models... like most TTS and STT ones.
       | 
       | 11labs, deepgram, etc.
        
       | shahzaibmushtaq wrote:
       | It's weird that OpenAI has lower prices for same models and Azure
       | has higher prices. Anyone can explain?
       | 
       | BTW impressive idea and upvoted on PH as well.
        
         | ahmetd wrote:
         | tysm for the support!
         | 
         | OpenAI and Azure should be the same, it's weird that it shows
         | it as different. I'll look into fixing this.
         | 
         | currently #2 on PH, any help would be appreciated!
        
         | xrendan wrote:
         | Azure charges differently based on deployment zone/latency
         | guarantees, OpenAI doesn't let you pick your zone so it's
         | equivalent to the Global Standard deployment (which is the same
         | cost).
         | 
         | [0] https://azure.microsoft.com/en-
         | us/pricing/details/cognitive-...
        
       | mentalgear wrote:
       | This is interesting price-wise, but quality-wise if you do not
       | provide benchmark results, it's not that helpful a comparision.
        
       | methou wrote:
       | Thank you on behalf of my waifu!
        
       | alif_ibrahim wrote:
       | thanks for the comparison table! would be great if the header is
       | sticky so i don't get lost in identifying which column is which.
        
       | karpatic wrote:
       | Great! I wish there was a "bang to buck" value. Some way to know
       | the cheapest model I could use for creating structured data from
       | unstructured text, reliably. Using gpt4o-mini which is cheap but
       | wouldn't know if anything cheaper could do the job too.
        
         | jampa wrote:
         | Take a look at Gemini Flash 1.5. I had videos I needed to turn
         | into structured notes, and the result was satisfactory (even
         | better than the Gemini 1.5 Pro, for some reason).
         | https://jampauchoa.substack.com/i/151329856/ai-studio.
         | 
         | According to this website, the cost is half of the gpt4-o mini.
         | 0.15 vs 0.07 per 1M token.
        
           | nostrebored wrote:
           | Seconding Gemini flash for structured outputs. Have had some
           | quite large jobs I've been happy with.
        
         | mcbuilder wrote:
         | I always plug openrouter.ai for making cross-model comparisons.
         | It's my general goto for random stuff. (I am not affiliated,
         | just a user)
        
           | pickettd wrote:
           | I love the idea of openrouter. I hadn't realized until
           | recently though that you don't necessarily know what
           | quantization a certain provider is running. And of course
           | context size can vary widely from provider to provider for
           | the same model. This blog post had great food for thought
           | https://aider.chat/2024/11/21/quantization.html
        
         | sdesol wrote:
         | I haven't found a model at the price point of GPT-4o mini that
         | is as capable. Based on the hype surrounding Llama 3.3 70B, it
         | might be that one though. On Deepinfra, input tokens are more
         | expensive, but the output token is cheaper so I would say they
         | are probably equivalent in price.
         | 
         | Also, best bang for the buck is very subjective, since one
         | person might need it to work for one use case vs somebody else,
         | who needs it for more.
        
       | gtirloni wrote:
       | Tangent question: is there anything better on the desktop than
       | ChatGPT's native client? I find it too simple to organize chats
       | but I'm having a hard time evaluating the dozen or so apps (most
       | are disguise for some company's API service). Any
       | recommendations? macOS/Linux compatibility preferred.
        
         | ralfhn wrote:
         | There's https://www.typingmind.com/ local-only (no server) and
         | built by an indie dev
        
           | thelittleone wrote:
           | Peesonally im a Typing Mind user but it got too slow and
           | buggy with long cbaglts. Ended up with boltai which is a
           | natice mac app and found it very good after months of heavy
           | use. I think it could also improve navigation coloring or
           | iconography to help distinguish chats better but its my
           | favorite so far.
        
             | rubymamis wrote:
             | I'm working on a native LLM client that is beautiful and
             | fast[1], developed in Qt C++ and QML - so it can run on
             | Windows, macOS, Linux (and mobile). Would love to get your
             | feedback once it launches.
             | 
             | [1] https://rubymamistvalove.com/client.mp4
        
         | shepherdjerred wrote:
         | I've liked Machato:
         | https://untimelyunicorn.gumroad.com/l/machato
        
       | moralestapia wrote:
       | Hey this is great!
       | 
       | A small suggestion, a toggle to exclude between "free" and hosted
       | models.
       | 
       | Reason is, I'm obv. interested in seeing the cheaper models first
       | but am not interested in self-hosting which dominate the first
       | chunk of results because they're "free".
        
       | vunderba wrote:
       | OP, were you inspired by this LLM comparison tool?
       | 
       | https://whatllm.vercel.app
       | 
       | The tables are very similar - though you've added a custom
       | calculator which is a nice touch.
       | 
       | Also for the Versus Comparison, it might be nice to have a
       | checkbox that when clicked highlights the superlative fields of
       | each LLM at a glance.
        
         | xnx wrote:
         | Thanks for sharing. That's a better tool.
        
           | andrewmcwatters wrote:
           | Both seem to have great value. Some information is missing
           | from Vercel's tables.
        
         | Gcam wrote:
         | Data in this tool is from https://artificialanalysis.ai/ on
         | October 13 2024 and so is a little of out date.
         | 
         | This page has up to date information of all models and
         | providers: https://artificialanalysis.ai/leaderboards/providers
         | We also on other pages cover Speech to Text, Text to Speech,
         | Text to Image, Text to Video.
         | 
         | Note I'm one of the creators of Artificial Analysis.
        
       | amelius wrote:
       | I'm missing the "IQ" column.
        
       | wiradikusuma wrote:
       | Suggestions:
       | 
       | 1. Maybe explain what Chat Embedding Image generation Completion
       | Audio transcription TTS (Text To Speech) means?
       | 
       | 2. Put a running number on the left, or at least just show total?
        
       | NoZZz wrote:
       | Stop feeding their machine.
        
       | e-clinton wrote:
       | DeepInfra prices are significantly better than what's listed for
       | OS models.
        
       | robbiemitchell wrote:
       | One helpful addition would be Requests Per Minute (RPM), which
       | varies wildly and is critical for streaming use cases --
       | especially with Bedrock where the quota is account wide.
        
       | tomp wrote:
       | "every"
       | 
       | you're missing a lot
       | 
       | TTS: 11labs, PlayHT, Cartesia, iFLYTEK, AWS Polly, Deepgram Aura
       | 
       | STT: Deepgram (multiple models, including Whisper), Gladia
       | Whisper, Soniox
       | 
       | just off the top of my head (it's my dayjob!)
        
       | ProofHouse wrote:
       | These are hard to keep updated. I find they usually fall off. It
       | would be cool to have one, but honestly, this one already doesn't
       | even have 4o and pro on it which if it was being maintained, it
       | obviously would. Updating a table shouldn't take days. It's like
       | a one minute event.
        
       | SubiculumCode wrote:
       | I was surprised: what is that the model that costs the most per
       | token? Luminous-Supreme-Control
        
       ___________________________________________________________________
       (page generated 2024-12-07 23:00 UTC)