[HN Gopher] Hypermode Model Router Preview - OpenRouter Alternative
___________________________________________________________________
Hypermode Model Router Preview - OpenRouter Alternative
Author : iamtherhino
Score : 29 points
Date : 2025-05-08 16:29 UTC (6 hours ago)
(HTM) web link (hypermode.com)
(TXT) w3m dump (hypermode.com)
| jbellis wrote:
| What I'm seeing with Brokk (https://brokk.ai) is that models are
| not really interchangeable for code authoring. Even with frontier
| models like GP2.5 and Sonnet 3.7, Sonnet is significantly better
| about following instructions ("don't add redundant comments")
| while GP2.5 has more raw intelligence. So we're using litellm to
| create a unified API to consume but the premise of "route your
| requests to whatever model is responding fastest" doesn't seem
| that attractive.
|
| But OpenRouter is ridiculously popular so it must be very useful
| for other use cases!
| johnymontana wrote:
| I think the value here is being able to have a unified API to
| access hosted open source models and proprietary models. And
| then being able to switch between models without changing any
| code. Model optionality was one of the factors Hypermode called
| out in the 12 Factor Agentic App:
| https://hypermode.com/blog/the-twelve-factor-agentic-app
|
| Also, being able to use models from multiple services and open
| source models without signing up for another service / bring
| your own API key is a big accelerator for folks getting started
| with Hypermode agents.
| iamtherhino wrote:
| Hey! Co-founder of Hypermode here.
|
| Agreed on swapping models for code-gen doesn't make sense.
| We're mostly indexed on GPT-4.1 for our AgentBuilder product. I
| haven't found it easy to move between models for code super
| effective.
|
| The most popular use case we've seen from folks is on the
| iteration/experimentation phase of building an agent/tool. We
| made ModelRouter originally as an internal service for our
| "prompt to agent" product, where folks are trying a few dozen
| models/MCPs/tools/data/etc really quickly as they try to find a
| local maximum for some automation or job.
| 0xDEAFBEAD wrote:
| Are there any of these tools which will use your evals to
| automatically recommend a model to use? Imagine if you didn't
| need to follow model releases anymore, and you just had a
| heuristic that would automatically select the right
| price/performance tradeoff. Maybe there's even a way to route
| queries differently to more expensive models depending on how
| tricky they are.
|
| (This would be more for using models at scale in production as
| opposed to individual use for code authoring etc.)
| jbellis wrote:
| Yeah, that seems possible, but a dumb preprocessing step
| won't help and a smart one will add significant latency.
|
| Feels a bit halting-problem-ish: can you tell if a problem is
| too hard for model A without being smarter than model A
| yourself?
| 0xDEAFBEAD wrote:
| I imagine if your volume is high enough it could be
| worthwhile to at least check to see if simple preprocessing
| gets you anywhere.
|
| Basically compare model performance on a bunch of problems,
| and see if the queries which actually require an expensive
| model have anything in common (e.g. low Flesch-Kincaid
| readability, or a bag-of-words approach which tries to
| detect the frequency of subordinate clauses/potentially
| ambiguous pronouns, or word rarity, or whatever).
|
| Maybe my knowledge of old-school NLP methods is useful
| after all :-) Generally those methods tend to be far less
| compute-intensive. If you wanted to go really crazy on
| performance, you might even use a Bloom filter to do fast,
| imprecise counting of words of various types.
|
| Then you could add some old-school, compute-lite ML, like
| an ordinary linear regression on the old-school-NLP-derived
| features.
|
| Really the win would be for a company like Hypermode to
| implement this automatically for customers who want it
| (high volume customers who don't mind saving money).
|
| Actually, a company like Hypermode might be uniquely well-
| positioned to offer this service to _smaller_ customers as
| well, if query difficulty heuristics generalize well across
| different workloads. Assuming they have access to data for
| a large variety of customers, they could look for
| heuristics that generalize well.
| iamtherhino wrote:
| I really like this approach.
|
| I think there's a big advantage to be had for folks
| brining "old school" ML approaches to LLMs. We've been
| spending a lot of time looking at the expert systems from
| the 90s.
|
| Another one we've been looking at is applying some query
| planning approaches to these systems to see if we can
| pull responses from cache instead of invoking the model
| again.
|
| Obviously there's a lot of complexity to identifying
| where we could apply some smaller ML models or cache--
| but it's been a really fun exploration.
| 0xDEAFBEAD wrote:
| >We've been spending a lot of time looking at the expert
| systems from the 90s.
|
| No way. I would definitely be curious to hear more if you
| want to share.
| iamtherhino wrote:
| We've been playing with that in the background. I can try to
| shoot you a preview in a few weeks. It works pretty well for
| reasoning tasks/NLP workloads but for workloads that need a
| "correct" answer, it's really tough to maintain accuracy when
| swapping models.
|
| What we've seen most successful is making recommendations in
| the agent creation process for a given tool/workload and then
| leaving them somewhat static after creation.
| 0xDEAFBEAD wrote:
| That's fair. Maybe you could even send the user an email if
| you detect a new model release or pricing change which
| handles their workload for cheaper at comparable quality,
| to notify them to investigate.
| iamtherhino wrote:
| That's a good idea-- then give them a link to "replay
| last X inferences with model ABC" so they can do a quick
| eyeball eval.
| 0xDEAFBEAD wrote:
| Sweet, maybe you'll like my other idea in this thread
| too: https://news.ycombinator.com/item?id=43929194
| threeducks wrote:
| The Python API example looks like it has been written by an LLM.
| You don't need to import json, you don't need to set the content
| type and it is good practice to use context managers ("with"
| statement) to release the connection in case of exceptions. Also,
| you don't gain anything by commenting variables with the name of
| the variable.
|
| The following sample (probably) does the same thing and is almost
| half as short. I have not tested it because there is no signup
| (EDIT: I was mistaken, there actually is a "signup" behind the
| login link, which is Google or GitHub login, so the naming makes
| sense. I confused it with a previously more prominent waitlist
| link.) import requests # Your
| Hypermode Workspace API key api_key =
| "<YOUR_HYP_WKS_KEY>" # Use the Hypermode Model
| Router API endpoint url =
| f"https://models.hypermode.host/v1/chat/completions"
| headers = {"Authorization": f"Bearer {api_key}"}
| payload = { "model": "meta-
| llama/llama-4-scout-17b-16e-instruct", "messages": [
| {"role": "system", "content": "You are a helpful assistant."},
| {"role": "user", "content": "What is Dgraph?"}, ],
| "max_tokens": 150, "temperature": 0.7, }
| # Make the API request with requests.post(url,
| headers=headers, json=payload) as response:
| response.raise_for_status()
| print(response.json()["choices"][0]["message"]["content"])
| iamtherhino wrote:
| Signups are open: hypermode.com/sign-up
|
| There's a waitlist for our prompt to agent product in the
| banner. That's a good call to update it to be more clear.
| threeducks wrote:
| Oh, I did not catch that. Sorry!
| iamtherhino wrote:
| Not at all! I'm updating the banner now
| iamtherhino wrote:
| updated our python example too!
| KTibow wrote:
| `post` automatically releases the connection. `with` only makes
| sense when you use a `requests.Session()`.
| hobo_mark wrote:
| Is there something like OpenRouter, but for text-to-speech
| models?
| iamtherhino wrote:
| I haven't seen one yet-- no reason we couldn't do that with
| Hypermode. I'll do some exploration!
___________________________________________________________________
(page generated 2025-05-08 23:01 UTC)