[HN Gopher] Show HN: Prompt Engine - Auto pick LLMs based on you...
       ___________________________________________________________________
        
       Show HN: Prompt Engine - Auto pick LLMs based on your prompts
        
       Nowadays, a common AI tech stack has hundreds of different prompts
       running across different LLMs.  Three key problems:  - Choices,
       picking from 100s of LLMs the best LLM for that 1 prompt is gonna
       be challenging, you're probably not picking the most optimized LLM
       for a prompt you wrote.  - Scaling/Upgrading, similar to choices
       but you want to keep consistency of your output even when models
       depreciate or configurations change.  - Prompt management is scary,
       if something works, you'll never want to touch it but you should be
       able to without fear of everything breaking.  So we launched Prompt
       Engine which automatically runs your prompts for you on the best
       LLM every single time with all the tools like internet access. You
       can also store prompts for reusability and caching which increases
       performance on every run.  How it works?  tldr, we built a really
       small model that is trained on datasets comparing 100s of LLMs that
       can automatically pick a model based on your prompt.  Here's an
       article explaining the details:
       https://jigsawstack.com/blog/jigsawstack-mixture-of-agents-m...
        
       Author : yoeven
       Score  : 82 points
       Date   : 2024-12-06 12:45 UTC (10 hours ago)
        
 (HTM) web link (jigsawstack.com)
 (TXT) w3m dump (jigsawstack.com)
        
       | soco wrote:
       | I can't find that list of hundreds of models to save my life...
       | is it there some place?
        
       | ankit219 wrote:
       | The small model you trained, how did you annotate the dataset?
       | Because most of the times output from big models will be
       | subjective, and with not a drastic difference in quality. If
       | there is a drastic difference, you dont need a model/classifier.
       | For smaller differences in quality + cost, what would the
       | annotated dataset look like?
        
       | swyx wrote:
       | congrats yoeven! i have been a skeptic of model routing like
       | Martian, because while it sounds good in theory,
       | 
       | 1) in practice, people tune prompts/behavior/output to models and
       | don't dynamically switch all the time. to quote a notable founder
       | i interviewed - "people who switch models all the time arent
       | building serious AI apps"
       | 
       | 2) the prompt router, to properly do its job at the extreme, will
       | have to be as smart as its smartest model, because dumb models
       | may not recognize a tricky/tough prompt that requires an upgrade,
       | at which point you're basically just reduced to running the
       | smartest model anyway. smart people disagree with me here (you
       | guys, and https://latent.space/p/lmarena). the other side of this
       | argument is that there are only like 3-4 usage modes for models
       | to really spike on (coding, roleplay, function calling, what
       | else) where you'll just pick that model and hardcode it or let
       | the user pick - the scenario where you want a black box to pick
       | for u is rare, and diminishes over time as all the labs are
       | hellbent on bitter lessonning your switching advantage away. bad
       | idea to bet against bitter lesson
       | 
       | 3) both OAI and Anthropic will offer "good enough" routing for
       | house models soon https://x.com/swyx/status/1861229884405883210 .
       | people dont need theoretically globally perfect routing, they
       | just need good enough.
       | 
       | it seems Prompt Engine is a little fancier than model routing,
       | but reads sffecitvely still like routing to me. curious your
       | responses to criticism.
        
         | htrp wrote:
         | I'd also ask how you're dealing with the idiosyncrasies across
         | model families (its very different to prompt gemini vs claude
         | vs gpt4o) when you are routing these lm inputs
        
           | swyx wrote:
           | > The engine automatically enhances the initial prompt to
           | improve accuracy, reduce token usage, and prevent output
           | structure breakage.
           | 
           | <handwaving> the prompt will just adapt to model :)
        
       | mg wrote:
       | Is there an LLM that can solve the following simple coding task?
       | Make a simple HTML page which         uses the VideoEncoder API
       | to         create a video that the user         can download.
       | 
       | Since the VideoEncoder API is made for this exact use case and is
       | publicly available, it should be able for an LLM to figure it
       | out. But I have yet to see an LLM answer with a working solution.
        
         | Kiro wrote:
         | No, you can't do that with just the VideoEncoder API, which
         | only produces raw encoded frames. You need container muxing to
         | create something playable, which is far from a "simple coding
         | task".
         | 
         | Also, how is this relevant to the submission?
        
           | throw2342412314 wrote:
           | > Also, how is this relevant to the submission?
           | 
           | The title of the submission states "Auto pick LLMs based on
           | your prompt".
           | 
           | The GP provided a prompt where auto picking an LLM would
           | possibly help. Seems relevant to me. Even if the answer from
           | the best LLM is, "This isn't directly possible, here are
           | alternatives".
        
           | denuoweb wrote:
           | I got it to work with this prompt using GPT-o1:
           | 
           | Make a HTML page which uses the VideoEncoder API to create a
           | video that the user can download. Make sure to incorporate a
           | roll your own container muxing. Do not limit yourself on the
           | header or data.
           | 
           | https://chatgpt.com/share/67531f7c-56cc-800b-ac7c-d3860d1cf9.
           | ..
        
             | mg wrote:
             | Yay, I just tried it on my iPad and it works!
             | 
             | When you say "GPT-o1", do you mean the model "o1-preview"?
             | Because I think that is the only o1 I can access via the
             | API.
        
               | denuoweb wrote:
               | I believe they may have just changed GPT-o1-preview to
               | GPT-o1 today.
        
         | fragmede wrote:
         | Took 4 prompts and ChatGPT-4o decided to use a different API,
         | but it I made it make a thing that generates a 3 second webm to
         | download.
         | https://chatgpt.com/share/67531a38-4bfc-8009-bc58-9c823230bf...
         | 
         | Detractors will claim that it didn't complete the assignment
         | because it didn't use the proscribed VideoEncoder API, but the
         | end result, a simple HTML page that generates a 3 second long
         | webm file, speaks for itself.
        
       ___________________________________________________________________
       (page generated 2024-12-06 23:01 UTC)