[HN Gopher] Betting on DSPy for Systems of LLMs
       ___________________________________________________________________
        
       Betting on DSPy for Systems of LLMs
        
       Author : wavelander
       Score  : 73 points
       Date   : 2024-08-11 02:11 UTC (20 hours ago)
        
 (HTM) web link (blog.isaacmiller.dev)
 (TXT) w3m dump (blog.isaacmiller.dev)
        
       | okigan wrote:
       | Could we have a concise and specific explanation how DSPy works?
       | 
       | All I've seen are vague definitions of new terms (ex. signatures)
       | and "trust me this very powerful and will optimize it all for
       | you".
       | 
       | Also, what would a good way to reason between DSPy and TextGrad?
        
         | curious_cat_163 wrote:
         | My understanding is that is tries many variations of the set of
         | few shot examples and prompts and picks the ones that work best
         | as the optimized program.
        
         | ktrnka wrote:
         | Textgrad mainly optimizes the prompt but does not inject few
         | shot examples. Dspy mainly optimizes the few shot examples.
         | 
         | At least that's my understanding from reading the textgrad
         | paper recently.
        
       | bart_spoon wrote:
       | The more I've looked at DSPy, the less impressed I am. The design
       | of the project is very confusing with non-sensical, convoluted
       | abstractions. And for all the discussion surrounding it, I've yet
       | to see someone actually _using_ for something other than a toy
       | example. I'm not sure I've even seen someone prove it can do what
       | it claims to in terms of prompt optimization.
       | 
       | It reminds me very much of Langchain in that it feels like a
       | rushed, unnecessary set of abstractions that add more friction
       | than actual benefit, and ultimately boils down to an attempt to
       | stake a claim as a major framework in the still very young stages
       | of LLMs, as opposed to solving an actual problem.
        
         | Der_Einzige wrote:
         | Agreed 100%. DSPy along with libraries inspired by it (i.e.
         | https://github.com/zou-group/textgrad) are nothing more than
         | fancy prompt chains under the hood.
         | 
         | These libraries mostly exist as "cope" for the fact that we
         | don't have good fine-tuning (i.e. lora) capabilities for
         | ChatGPT et al, so we try to instead optimize the prompt.
        
           | qeternity wrote:
           | Glad to see others saying this. I haven't looked at it in
           | some months, but I previously realized it's mostly a very
           | complicated way to optimize few-shot learning prompts. It's
           | hardly whatever magical blackbox optimizer they try to market
           | it as.
        
           | dmarchand90 wrote:
           | My guess is it will be like pascal or smalltalk, an important
           | development for illustrating a concept but is ultimately
           | replaced by something more rigorous
        
           | isaacbmiller wrote:
           | > _These libraries mostly exist as "cope"_
           | 
           | > _nothing more than fancy prompt chains under the hood_
           | 
           | Some approaches using steering vectors, clever ways of fine-
           | tuning, transfer decoding, some tree search sampling-esque
           | approaches, and others all seem very promising.
           | 
           | DSPy is, yes, ultimately a fancy prompt chain. Even once we
           | integrate some of the other approaches, I don't think it
           | becomes a single-lever problem where we can only change one
           | thing(e.g., fine-tune a model) and that solves all of our
           | problems.
           | 
           | It will likely always be a combination of the few most
           | powerful levers to pull.
        
             | Der_Einzige wrote:
             | Correct, when I say "ChatGPT et al", I mean closed source
             | paywalled LLMs, open access LLM personalization is an
             | extreme gamechanger. All of what you mentioned is
             | important, and I'm particularly excited about PyReft.
             | 
             | https://github.com/stanfordnlp/pyreft
             | 
             | Anything Christopher Manning touches turns to gold.
        
         | curious_cat_163 wrote:
         | The abstractions could be cleaner. I think some of the
         | convolution is due to the evolution that it has undergone and
         | core contributors have not come around to being fully "out with
         | the old".
         | 
         | I think there might be practical benefits to it. The XMC
         | example illustrates it for me:
         | 
         | https://github.com/KarelDO/xmc.dspy
        
         | isaacbmiller wrote:
         | Disclaimer: original blog author
         | 
         | > _as opposed to solving an actual problem_
         | 
         | This was literally the point of the post. No one really knows
         | what the future of LLMs will look like, so DSPy just
         | iteratively changes in the best way it can for your metric
         | (your problem).
         | 
         | > _someone actually using for something other than a toy
         | example_
         | 
         | DSPy, among the problems I listed in the post, has some
         | scalability problems, too, but I am not going to take away from
         | that. There are at least early signs of enterprise adoption
         | from posts like this blog:
         | https://www.databricks.com/blog/optimizing-databricks-llm-pi...
        
         | isoprophlex wrote:
         | The magic sauce seems to be, at every turn, "... if you have
         | some well defined metric to optimize on."
         | 
         | And that's not really a given, in reality. It allows all sorts
         | of tricks to do what DSPy is aiming for, which you won't be
         | able to do in real life.
         | 
         | Unless I'm sorely mistaken, but that's my take on the whole
         | thing.
        
       | revskill wrote:
       | Whenever i see "ChainOfThought" for AI, it's an annoying and
       | misleading term. Machine never never thinks at all.
        
       | fsndz wrote:
       | I tried it recently and it is kinda fun:
       | https://www.lycee.ai/courses/a5b7d115-c794-410d-92f2-15d8f29...
        
       | gunalx wrote:
       | Not to say anything about dspy, but I really liked the take on
       | hvat we should use llms for.
       | 
       | We need to stop doing useless reasoning stuff, and find acttual
       | fitting problems for the llms to solve.
       | 
       | Current llms are not your db manager(if they could be you don't
       | have a db size in the real world). They are not a developer. We
       | have people for that.
       | 
       | Llms prove to be decent creative tools, classificators, and qna
       | answer generators.
        
       | thatsadude wrote:
       | I had a few problems with DSPy:
       | 
       | * Multi-hop reasoning rarely works with real data in my case. *
       | Impossible to define advanced metrics over the whole dataset. *
       | No async support
        
       ___________________________________________________________________
       (page generated 2024-08-11 23:01 UTC)