hngopher.com

       [HN Gopher] Hypermode Model Router Preview - OpenRouter Alternative
       ___________________________________________________________________
        
       Hypermode Model Router Preview - OpenRouter Alternative
        
       Author : iamtherhino
       Score  : 29 points
       Date   : 2025-05-08 16:29 UTC (6 hours ago)
        
 (HTM) web link (hypermode.com)
 (TXT) w3m dump (hypermode.com)
        
       | jbellis wrote:
       | What I'm seeing with Brokk (https://brokk.ai) is that models are
       | not really interchangeable for code authoring. Even with frontier
       | models like GP2.5 and Sonnet 3.7, Sonnet is significantly better
       | about following instructions ("don't add redundant comments")
       | while GP2.5 has more raw intelligence. So we're using litellm to
       | create a unified API to consume but the premise of "route your
       | requests to whatever model is responding fastest" doesn't seem
       | that attractive.
       | 
       | But OpenRouter is ridiculously popular so it must be very useful
       | for other use cases!
        
         | johnymontana wrote:
         | I think the value here is being able to have a unified API to
         | access hosted open source models and proprietary models. And
         | then being able to switch between models without changing any
         | code. Model optionality was one of the factors Hypermode called
         | out in the 12 Factor Agentic App:
         | https://hypermode.com/blog/the-twelve-factor-agentic-app
         | 
         | Also, being able to use models from multiple services and open
         | source models without signing up for another service / bring
         | your own API key is a big accelerator for folks getting started
         | with Hypermode agents.
        
         | iamtherhino wrote:
         | Hey! Co-founder of Hypermode here.
         | 
         | Agreed on swapping models for code-gen doesn't make sense.
         | We're mostly indexed on GPT-4.1 for our AgentBuilder product. I
         | haven't found it easy to move between models for code super
         | effective.
         | 
         | The most popular use case we've seen from folks is on the
         | iteration/experimentation phase of building an agent/tool. We
         | made ModelRouter originally as an internal service for our
         | "prompt to agent" product, where folks are trying a few dozen
         | models/MCPs/tools/data/etc really quickly as they try to find a
         | local maximum for some automation or job.
        
         | 0xDEAFBEAD wrote:
         | Are there any of these tools which will use your evals to
         | automatically recommend a model to use? Imagine if you didn't
         | need to follow model releases anymore, and you just had a
         | heuristic that would automatically select the right
         | price/performance tradeoff. Maybe there's even a way to route
         | queries differently to more expensive models depending on how
         | tricky they are.
         | 
         | (This would be more for using models at scale in production as
         | opposed to individual use for code authoring etc.)
        
           | jbellis wrote:
           | Yeah, that seems possible, but a dumb preprocessing step
           | won't help and a smart one will add significant latency.
           | 
           | Feels a bit halting-problem-ish: can you tell if a problem is
           | too hard for model A without being smarter than model A
           | yourself?
        
             | 0xDEAFBEAD wrote:
             | I imagine if your volume is high enough it could be
             | worthwhile to at least check to see if simple preprocessing
             | gets you anywhere.
             | 
             | Basically compare model performance on a bunch of problems,
             | and see if the queries which actually require an expensive
             | model have anything in common (e.g. low Flesch-Kincaid
             | readability, or a bag-of-words approach which tries to
             | detect the frequency of subordinate clauses/potentially
             | ambiguous pronouns, or word rarity, or whatever).
             | 
             | Maybe my knowledge of old-school NLP methods is useful
             | after all :-) Generally those methods tend to be far less
             | compute-intensive. If you wanted to go really crazy on
             | performance, you might even use a Bloom filter to do fast,
             | imprecise counting of words of various types.
             | 
             | Then you could add some old-school, compute-lite ML, like
             | an ordinary linear regression on the old-school-NLP-derived
             | features.
             | 
             | Really the win would be for a company like Hypermode to
             | implement this automatically for customers who want it
             | (high volume customers who don't mind saving money).
             | 
             | Actually, a company like Hypermode might be uniquely well-
             | positioned to offer this service to _smaller_ customers as
             | well, if query difficulty heuristics generalize well across
             | different workloads. Assuming they have access to data for
             | a large variety of customers, they could look for
             | heuristics that generalize well.
        
               | iamtherhino wrote:
               | I really like this approach.
               | 
               | I think there's a big advantage to be had for folks
               | brining "old school" ML approaches to LLMs. We've been
               | spending a lot of time looking at the expert systems from
               | the 90s.
               | 
               | Another one we've been looking at is applying some query
               | planning approaches to these systems to see if we can
               | pull responses from cache instead of invoking the model
               | again.
               | 
               | Obviously there's a lot of complexity to identifying
               | where we could apply some smaller ML models or cache--
               | but it's been a really fun exploration.
        
               | 0xDEAFBEAD wrote:
               | >We've been spending a lot of time looking at the expert
               | systems from the 90s.
               | 
               | No way. I would definitely be curious to hear more if you
               | want to share.
        
           | iamtherhino wrote:
           | We've been playing with that in the background. I can try to
           | shoot you a preview in a few weeks. It works pretty well for
           | reasoning tasks/NLP workloads but for workloads that need a
           | "correct" answer, it's really tough to maintain accuracy when
           | swapping models.
           | 
           | What we've seen most successful is making recommendations in
           | the agent creation process for a given tool/workload and then
           | leaving them somewhat static after creation.
        
             | 0xDEAFBEAD wrote:
             | That's fair. Maybe you could even send the user an email if
             | you detect a new model release or pricing change which
             | handles their workload for cheaper at comparable quality,
             | to notify them to investigate.
        
               | iamtherhino wrote:
               | That's a good idea-- then give them a link to "replay
               | last X inferences with model ABC" so they can do a quick
               | eyeball eval.
        
               | 0xDEAFBEAD wrote:
               | Sweet, maybe you'll like my other idea in this thread
               | too: https://news.ycombinator.com/item?id=43929194
        
       | threeducks wrote:
       | The Python API example looks like it has been written by an LLM.
       | You don't need to import json, you don't need to set the content
       | type and it is good practice to use context managers ("with"
       | statement) to release the connection in case of exceptions. Also,
       | you don't gain anything by commenting variables with the name of
       | the variable.
       | 
       | The following sample (probably) does the same thing and is almost
       | half as short. I have not tested it because there is no signup
       | (EDIT: I was mistaken, there actually is a "signup" behind the
       | login link, which is Google or GitHub login, so the naming makes
       | sense. I confused it with a previously more prominent waitlist
       | link.)                   import requests              # Your
       | Hypermode Workspace API key         api_key =
       | "<YOUR_HYP_WKS_KEY>"              # Use the Hypermode Model
       | Router API endpoint         url =
       | f"https://models.hypermode.host/v1/chat/completions"
       | headers = {"Authorization": f"Bearer {api_key}"}
       | payload = {             "model": "meta-
       | llama/llama-4-scout-17b-16e-instruct",             "messages": [
       | {"role": "system", "content": "You are a helpful assistant."},
       | {"role": "user", "content": "What is Dgraph?"},             ],
       | "max_tokens": 150,             "temperature": 0.7,         }
       | # Make the API request         with requests.post(url,
       | headers=headers, json=payload) as response:
       | response.raise_for_status()
       | print(response.json()["choices"][0]["message"]["content"])
        
         | iamtherhino wrote:
         | Signups are open: hypermode.com/sign-up
         | 
         | There's a waitlist for our prompt to agent product in the
         | banner. That's a good call to update it to be more clear.
        
           | threeducks wrote:
           | Oh, I did not catch that. Sorry!
        
             | iamtherhino wrote:
             | Not at all! I'm updating the banner now
        
         | iamtherhino wrote:
         | updated our python example too!
        
         | KTibow wrote:
         | `post` automatically releases the connection. `with` only makes
         | sense when you use a `requests.Session()`.
        
       | hobo_mark wrote:
       | Is there something like OpenRouter, but for text-to-speech
       | models?
        
         | iamtherhino wrote:
         | I haven't seen one yet-- no reason we couldn't do that with
         | Hypermode. I'll do some exploration!
        
       ___________________________________________________________________
       (page generated 2025-05-08 23:01 UTC)