[HN Gopher] Show HN: Countless.dev - A website to compare every ...
___________________________________________________________________
Show HN: Countless.dev - A website to compare every AI model: LLMs,
TTSs, STTs
Author : ahmetd
Score : 200 points
Date : 2024-12-07 09:42 UTC (13 hours ago)
(HTM) web link (countless.dev)
(TXT) w3m dump (countless.dev)
| xnx wrote:
| Nice resource. Almost too comprehensive for someone who doesn't
| know all the sub-version names. Would be great to have a column
| of the score from lmarena leaderboard. Some prices are 0.00? Is
| there a page that each row could link to for more detail?
| ahmetd wrote:
| thank you! some models either have N/A or 0.00, I found it is
| like that for the free models and ones that aren't available.
|
| As per llmarena I'll definitely add it, a lot of other people
| recommended it as well.
|
| over time will make the website more descriptive and detailed!
| kmoser wrote:
| And a link to the company page where one can use/subscribe to
| the model
| mtkd wrote:
| Would poss be further useful to have a release date column,
| license type, whether EU restricted and also right-align / comma-
| delimit those numeric cells
| ahmetd wrote:
| good idea, will look into adding this!
| mcklaw wrote:
| It would be great if llmarena leadership information would also
| appear to compare performance vs cost.
| ahmetd wrote:
| yep, will add this :)
| ursaguild wrote:
| https://lmarena.ai
| Its_Padar wrote:
| Would be great if it was possible to get to the page where the
| pricing was found to make it easier to use the model
| ursaguild wrote:
| I like the idea of more comparisons of models. Are there plans to
| add independent analyses of these models or is it only an
| aggregation of input limits?
|
| How do you see this differing from or adding to other analyses
| such as:
|
| https://artificialanalysis.ai
|
| https://huggingface.co/spaces/TTS-AGI/TTS-Arena
|
| https://huggingface.co/spaces/hf-audio/open_asr_leaderboard
|
| https://huggingface.co/spaces/TIGER-Lab/GenAI-Arena
|
| Great work on all the aggregation. The website is nice to
| navigate.
| ahmetd wrote:
| the gradio ui looks ugly imo, that's why I used shadcn and
| next.js to make the website look good.
|
| I'll try to make it as user-friendly as possible. Most of the
| websites are ugly + too technical.
| botro wrote:
| I made https://aimodelreview.com/ to compare the outputs of
| LLMs over a variety of prompts and categories, allowing a side
| by side comparison between them. I ran each prompt 4 times for
| different temperature values and that's available as a toggle.
|
| I was going to add reviews on each model but ran out of steam.
| Some users have messaged me saying the comparisons are still
| helpful to them in getting a sense of how different models
| respond to the same prompt and how temperature affects the same
| models output on the same prompt.
| ursaguild wrote:
| Just saw that this was built for a hackathon. Huge kudos and
| congratulations!
| ahmetd wrote:
| thank you! although I wasn't able to win the hackathon it was
| still a fun experience :)
| politelemon wrote:
| There are only two audio transcription models. Is this generally
| true, are there no open source ones like llama but for
| transcribing? Or just small dataset on that site
| rhdunn wrote:
| It looks like the site is only listing hosted models from major
| providers, not all models available on huggingface, civit.ai,
| etc. -- Looking at the image generation and chat lists there
| are many more models that are on huggingface that are not
| listed.
|
| See https://huggingface.co/models?pipeline_tag=automatic-
| speech-...
|
| Note: Text to Speech and Audio Transcription/Automatic Speech
| Recognition models can be trained on the same data. They
| currently require training separately as the models are
| structured differently. One of the challenges is training time
| as the data can run into the hundreds of hours of audio.
| politelemon wrote:
| Thank you both
| woodson wrote:
| There are lots and lots of models, covering various use cases
| (e.g., on device, streaming/low-latency, specific languages).
| People somehow think OpenAI invented audio transcription with
| whisper in 2022 when other models exist and have been used in
| production for decades (whisper is the only one listed on that
| website).
| wslh wrote:
| I'd like to share a personal perspective/rant on AI that might
| resonate with others: like many, I'm incredibly excited about
| this AI moment. The urge to dive headfirst into the field and
| contribute is natural after all, it's the frontier of innovation
| right now.
|
| But I think this moment mirrors financial markets during times of
| frenzy. When markets are volatile, one common piece of advice is
| to "wait and see". Similarly, in AI, so many brilliant minds and
| organizations are racing to create groundbreaking innovations.
| Often, what you're envisioning as your next big project might
| already be happening, or will soon be, somewhere else in the
| world.
|
| Adopting a "wait and see" strategy could be surprisingly
| effective. Instead of rushing in, let the dust settle, observe
| trends, and focus on leveraging what emerges. In a way, the
| entire AI ecosystem is working for you: building the foundations
| for your next big idea.
|
| That said, this doesn't mean you can't integrate the state of the
| art into your own (working) products and services.
| whiplash451 wrote:
| Your proposal makes a lot of sense. I assume a number of
| companies are integrating sota models into their products.
|
| That being said, there is no free lunch: when you're doing
| this, you're more reactive than proactive. You minimize risk,
| but you also lose any change to have a stake [1] in the few
| survivors that will remain and be extremely valuable.
|
| Do this long enough and you'll have no idea what people are
| talking about in the field. Watch the latest Dwarkesh Patel
| episode to get a sense of what I am talking about.
|
| [1] stake to be understood broadly as: shares in a company,
| knowledge as an AI researcher, etc.
| wslh wrote:
| Thank you for your thoughtful response! I completely agree
| that there's a tradeoff between being proactive and reactive
| in this kind of strategy: minimizing risk by waiting can mean
| missing out on opportunities to gain a broader "stake".
|
| That said, my perspective focuses more on strategic timing
| rather than complete passivity. It's about being engaged with
| understanding trends, staying informed, and preparing to act
| decisively when the right opportunity emerges. It's less
| about "waiting on the sidelines" and more about deliberate
| pacing, recognizing that it's not always necessary to be at
| the bleeding edge to create value.
|
| I'll definitely check out Dwarkesh Patel's latest episode. I
| assume it is the Gwern one, right? Thanks!
| dangoodmanUT wrote:
| This is missing... so many models... like most TTS and STT ones.
|
| 11labs, deepgram, etc.
| shahzaibmushtaq wrote:
| It's weird that OpenAI has lower prices for same models and Azure
| has higher prices. Anyone can explain?
|
| BTW impressive idea and upvoted on PH as well.
| ahmetd wrote:
| tysm for the support!
|
| OpenAI and Azure should be the same, it's weird that it shows
| it as different. I'll look into fixing this.
|
| currently #2 on PH, any help would be appreciated!
| xrendan wrote:
| Azure charges differently based on deployment zone/latency
| guarantees, OpenAI doesn't let you pick your zone so it's
| equivalent to the Global Standard deployment (which is the same
| cost).
|
| [0] https://azure.microsoft.com/en-
| us/pricing/details/cognitive-...
| mentalgear wrote:
| This is interesting price-wise, but quality-wise if you do not
| provide benchmark results, it's not that helpful a comparision.
| methou wrote:
| Thank you on behalf of my waifu!
| alif_ibrahim wrote:
| thanks for the comparison table! would be great if the header is
| sticky so i don't get lost in identifying which column is which.
| karpatic wrote:
| Great! I wish there was a "bang to buck" value. Some way to know
| the cheapest model I could use for creating structured data from
| unstructured text, reliably. Using gpt4o-mini which is cheap but
| wouldn't know if anything cheaper could do the job too.
| jampa wrote:
| Take a look at Gemini Flash 1.5. I had videos I needed to turn
| into structured notes, and the result was satisfactory (even
| better than the Gemini 1.5 Pro, for some reason).
| https://jampauchoa.substack.com/i/151329856/ai-studio.
|
| According to this website, the cost is half of the gpt4-o mini.
| 0.15 vs 0.07 per 1M token.
| nostrebored wrote:
| Seconding Gemini flash for structured outputs. Have had some
| quite large jobs I've been happy with.
| mcbuilder wrote:
| I always plug openrouter.ai for making cross-model comparisons.
| It's my general goto for random stuff. (I am not affiliated,
| just a user)
| pickettd wrote:
| I love the idea of openrouter. I hadn't realized until
| recently though that you don't necessarily know what
| quantization a certain provider is running. And of course
| context size can vary widely from provider to provider for
| the same model. This blog post had great food for thought
| https://aider.chat/2024/11/21/quantization.html
| sdesol wrote:
| I haven't found a model at the price point of GPT-4o mini that
| is as capable. Based on the hype surrounding Llama 3.3 70B, it
| might be that one though. On Deepinfra, input tokens are more
| expensive, but the output token is cheaper so I would say they
| are probably equivalent in price.
|
| Also, best bang for the buck is very subjective, since one
| person might need it to work for one use case vs somebody else,
| who needs it for more.
| gtirloni wrote:
| Tangent question: is there anything better on the desktop than
| ChatGPT's native client? I find it too simple to organize chats
| but I'm having a hard time evaluating the dozen or so apps (most
| are disguise for some company's API service). Any
| recommendations? macOS/Linux compatibility preferred.
| ralfhn wrote:
| There's https://www.typingmind.com/ local-only (no server) and
| built by an indie dev
| thelittleone wrote:
| Peesonally im a Typing Mind user but it got too slow and
| buggy with long cbaglts. Ended up with boltai which is a
| natice mac app and found it very good after months of heavy
| use. I think it could also improve navigation coloring or
| iconography to help distinguish chats better but its my
| favorite so far.
| rubymamis wrote:
| I'm working on a native LLM client that is beautiful and
| fast[1], developed in Qt C++ and QML - so it can run on
| Windows, macOS, Linux (and mobile). Would love to get your
| feedback once it launches.
|
| [1] https://rubymamistvalove.com/client.mp4
| shepherdjerred wrote:
| I've liked Machato:
| https://untimelyunicorn.gumroad.com/l/machato
| moralestapia wrote:
| Hey this is great!
|
| A small suggestion, a toggle to exclude between "free" and hosted
| models.
|
| Reason is, I'm obv. interested in seeing the cheaper models first
| but am not interested in self-hosting which dominate the first
| chunk of results because they're "free".
| vunderba wrote:
| OP, were you inspired by this LLM comparison tool?
|
| https://whatllm.vercel.app
|
| The tables are very similar - though you've added a custom
| calculator which is a nice touch.
|
| Also for the Versus Comparison, it might be nice to have a
| checkbox that when clicked highlights the superlative fields of
| each LLM at a glance.
| xnx wrote:
| Thanks for sharing. That's a better tool.
| andrewmcwatters wrote:
| Both seem to have great value. Some information is missing
| from Vercel's tables.
| Gcam wrote:
| Data in this tool is from https://artificialanalysis.ai/ on
| October 13 2024 and so is a little of out date.
|
| This page has up to date information of all models and
| providers: https://artificialanalysis.ai/leaderboards/providers
| We also on other pages cover Speech to Text, Text to Speech,
| Text to Image, Text to Video.
|
| Note I'm one of the creators of Artificial Analysis.
| amelius wrote:
| I'm missing the "IQ" column.
| wiradikusuma wrote:
| Suggestions:
|
| 1. Maybe explain what Chat Embedding Image generation Completion
| Audio transcription TTS (Text To Speech) means?
|
| 2. Put a running number on the left, or at least just show total?
| NoZZz wrote:
| Stop feeding their machine.
| e-clinton wrote:
| DeepInfra prices are significantly better than what's listed for
| OS models.
| robbiemitchell wrote:
| One helpful addition would be Requests Per Minute (RPM), which
| varies wildly and is critical for streaming use cases --
| especially with Bedrock where the quota is account wide.
| tomp wrote:
| "every"
|
| you're missing a lot
|
| TTS: 11labs, PlayHT, Cartesia, iFLYTEK, AWS Polly, Deepgram Aura
|
| STT: Deepgram (multiple models, including Whisper), Gladia
| Whisper, Soniox
|
| just off the top of my head (it's my dayjob!)
| ProofHouse wrote:
| These are hard to keep updated. I find they usually fall off. It
| would be cool to have one, but honestly, this one already doesn't
| even have 4o and pro on it which if it was being maintained, it
| obviously would. Updating a table shouldn't take days. It's like
| a one minute event.
| SubiculumCode wrote:
| I was surprised: what is that the model that costs the most per
| token? Luminous-Supreme-Control
___________________________________________________________________
(page generated 2024-12-07 23:00 UTC)