[HN Gopher] The Deep Research problem
       ___________________________________________________________________
        
       The Deep Research problem
        
       Author : cratermoon
       Score  : 92 points
       Date   : 2025-02-21 21:26 UTC (4 days ago)
        
 (HTM) web link (www.ben-evans.com)
 (TXT) w3m dump (www.ben-evans.com)
        
       | Lws803 wrote:
       | I always wondered, if deep research has an X% chance of producing
       | errors in it's report and you have to double check everything +
       | visit every source or potentially correct it yourself. Then does
       | it really save time in helping you get research done (outside of
       | coding and marketing)? .
        
         | ImaCake wrote:
         | It might depend on how much you struggle with writers block. An
         | LLM essay with sources is probably a better starting point than
         | a blank page. But it will vary between people.
        
       | baxtr wrote:
       | I urge anyone to do the following: take a subject you know really
       | really well and then feed it into one of the deep research tools
       | and check the results.
       | 
       | You might be amazed but most probably very shocked.
        
         | ilrwbwrkhv wrote:
         | Yup none of these tools are actually any close to AGI or
         | "research". They are still a much better search engine and of
         | course spam generator.
        
       | tptacek wrote:
       | I did a trial run with Deep Research this weekend to do a
       | comparative analysis of the comp packages for Village Managers in
       | suburbs around Chicagoland (it's election season, our VM's comp
       | had become an issue).
       | 
       | I have a decent idea of where to look to find comp information
       | for a given municipality. But there are a lot of Chicagoland
       | suburbs and tracking documents down for all of them would have
       | been a chore.
       | 
       | Deep Research was valuable. But it only did about 60% of the work
       | (which, of course, it presented as if it was 100%). It found
       | interesting sources I was unaware of, and assembled lots of easy-
       | to-get public data that would have been annoying for me to
       | collect that made spot-checking easier (for instance, basic stuff
       | like the name of every suburban Village Manager). But I still had
       | to spot check everything myself.
       | 
       | The premise of this post seems to be that material errors in Deep
       | Research results negate the value of the product. I can't speak
       | to how OpenAI is selling this; if the claim is "subscribe to Deep
       | Research and it will generate reliable research reports for you",
       | well, obviously, no. But as with most AI things, if you get paste
       | the hype, it's plain to see the value it's actually generating.
        
         | WhitneyLand wrote:
         | >>The premise of this post seems to be that material errors in
         | Deep Research results negate the value of the product
         | 
         | No it's not. It's that it's oversold from a marketing
         | perspective and comes with some big caveats.
         | 
         | But it does talk about big time savings for the right contexts.
         | 
         | Emphasis from the article:
         | 
         | "these things _are_ useful"
        
       | iandanforth wrote:
       | I'll share my recipe for using these products on the off chance
       | it helps someone.
       | 
       | 1. Only do searches that result in easily verifiable results from
       | non-AI sources.
       | 
       | 2. Always perform the search in multiple products (Gemini 1.5
       | Deep Research, Gemini 2.0 Pro, ChatGPT o3-mini-high, Claude 3.7
       | w/ extended thinking, Perplexity)
       | 
       | With these two rules I have found the current round of LLMs
       | useful for "researchy" queries. Collecting the results across
       | tools and then throwing out the 65-75% slop results in genuinely
       | useful information that would have taken me much longer to find.
       | 
       | Now the above could be seen as a harsh critique of these tools,
       | as in the kiddie pool is great as long as you're wearing full
       | hazmat gear, but I still derive regular and increasing value from
       | them.
        
         | munchler wrote:
         | This makes sense. How many of those products do you have to pay
         | for?
        
           | kridsdale3 wrote:
           | I'm not OP but I do similar stuff. I pay for Claude's basic
           | tier, OpenAI's $200 tier, and Gemini ultra-super-advanced I
           | get for free because I work there.
           | 
           | I combine all the 'slop' from the three of them in to Gemini
           | (1 or 2 M context window) and have it distill the valuable
           | stuff in there to a good final-enough product.
           | 
           | Doing so has got me a lot of kudos and applause from those I
           | work with.
        
             | munchler wrote:
             | Wow, that's eye-opening. So, just to be clear, you're
             | paying for Claude and OpenAI out of your own pocket, and
             | using the results at your Google job? We live in
             | interesting times, for sure. :)
        
       | submeta wrote:
       | Deep Research is in its ,,ChatGPT 2.0" phase. It will improve,
       | dramatically. And to the naysayers: When OpenAI released its
       | first models, many doubted that it will be good at coding. Now
       | after two years look at Cursor, aider, and all the llms powering
       | them, what you can do with a few prompts and iterations.
       | 
       | Deep research will dramatically improve as it's a process that
       | can be replicated and automated.
        
         | amelius wrote:
         | This is like saying: y=e^-x+1 will soon be 0, because look at
         | how fast it went through y=2!
        
           | kridsdale3 wrote:
           | I appreciate your style of humor.
        
           | PeterFBell wrote:
           | Thanks for making my day :)
        
         | nicksrose7224 wrote:
         | disagree - i actually think all the problems the author lays
         | out about Deep Research apply just as well to GPT4o / o3-mini-
         | whatever. These things just are absolutely terrible at
         | precision & recall of information
        
           | simonw wrote:
           | I think Deep Research shows that these things can be very
           | good at precision and recall of information if you give them
           | access to the right tools... but that's not enough, because
           | of source quality. A model that has great precision and
           | recall but uses flawed reports from Statista and Statcounter
           | is still going to give you bad information.
        
       | lsy wrote:
       | Research skills involve not just combining multiple pieces of
       | data, but also being able to apply very subtle skills to
       | determine whether a source is trustworthy, to cross-check numbers
       | where their accuracy is important (and to determine _when_ it 's
       | "important"), and to engage in some back and forth to determine
       | which data actually applies to the research question being asked.
       | In this sense, "deep research" is a misleading term, since the
       | output is really more akin to a probabilistic "search" over the
       | training data where the result may or may not be accurate and
       | requires you to spot-check every fact. It is probably useful for
       | surfacing new sources or making syntactic conjectures about how
       | two pieces of data may fit together, but checking all of those
       | sources for _existence_ , let alone validity, still _needs_ to be
       | done by a person, and the output, as it stands in its polished
       | form today, doesn 't compel users to take sufficient
       | responsibility for its factuality.
        
       | rollinDyno wrote:
       | Everyone who has been working on RAG is aware of how important
       | source control is. Simply directing your agent to fetch keyword
       | matching documents will lead to inaccurate claims.
       | 
       | The reality is that for now it is not possible to leave the human
       | out of research, so I think the best LLM can only help curate
       | sources and synthesize them, but cannot reliably write sound
       | conclusions.
       | 
       | Edit: this is something elicit.com recognized quite early. But
       | even when I was using it, I was wishing I had more control over
       | the space over which the tool was conducting search.
        
       | theGnuMe wrote:
       | One other existential question is Simpson's paradox, which I
       | believe is exploited by politicians to support different policies
       | from the same underlying data. I see this as a problem for
       | government especially if we have liberal or conservative trained
       | LLMs. We expect the computer to give us the correct answer, but
       | when the underlying model is trained one way by RLHF or by
       | systemic/weighted bias in its source documents -- Imagine
       | training a libertarian AI on Cato papers -- you have could have
       | highly confident pseudo-intellectual junk. Economists already
       | deal with this problem daily since their field was heavily
       | politicized. Law as well is another one.
        
         | ImaCake wrote:
         | I've never thought of Simpson's Paradox as a political problem
         | before, thanks for sharing this!
         | 
         | Arguably this applies just as well to Bayesian vs Frequentist
         | statisticians or Molecular vs Biochemical Biologists.
        
       | jppope wrote:
       | These days I'm feeling like GenAi is basically an accuracy rate
       | of 95% maybe 96%. Great at boilerplate, great at stuff you want
       | an intern to do or maybe to outsource... but it really struggles
       | with the valuable stuff. The errors are almost always in the most
       | inconvenient places and they are hard to see... So I agree with
       | Ben Evans on this one, what is one to do? the further you lean on
       | it the worse your skills and specializations get. It is
       | invaluable for some kinds of work greatly speeding you up, but
       | then some of the things you would have caught take you down a
       | rabbit hole that waste so much time. The tradeoffs here aren't
       | great.
        
         | bakari500 wrote:
         | Yeah but you have 4 to 6 % error that's not good even if you
         | have dumb computer
        
       | smusamashah wrote:
       | Watched recent Viva la dirt league videos on how trailers lie and
       | do false promises. Now I see LLM as that marketing guy. Even if
       | he knows everything, he can't help with lying. You can't trust
       | anything he says no matter how authoritative he sounds, even if
       | he is telling the truth you have know way of knowing.
       | 
       | These deep research things are a waste of time if you can't trust
       | the output. Code you can run and verify. How do you verify this.
        
       ___________________________________________________________________
       (page generated 2025-02-25 23:00 UTC)