[HN Gopher] The Differences Between Deep Research, Deep Research...
       ___________________________________________________________________
        
       The Differences Between Deep Research, Deep Research, and Deep
       Research
        
       Author : thenameless7741
       Score  : 173 points
       Date   : 2025-03-02 22:59 UTC (3 days ago)
        
 (HTM) web link (leehanchung.github.io)
 (TXT) w3m dump (leehanchung.github.io)
        
       | blacksqr wrote:
       | Yes, but what about Deep Research?
        
       | readyplayernull wrote:
       | Are you telling me that AI's are starting to diverge and that we
       | might get a combinatorial explosion of reasoning paths that will
       | produce so many different agents that we won't know which one can
       | actually become AGI?
       | 
       | https://leehanchung.github.io/assets/img/2025-02-26/05-quadr...
        
         | nwhnwh wrote:
         | Life is so complicated.
        
         | pseudocomposer wrote:
         | They are certainly diverging, and becoming more useful tools,
         | but the answer to "which one can actually become AGI?" is, as
         | always, "none of them."
        
         | OutOfHere wrote:
         | AGI is about performing actions with a high multi-task
         | intelligence. Only the top right corner (Deep, Trained) has any
         | hope of getting closer to AGI. The rest can still be useful for
         | specific tasks, e.g. "deep-research".
        
         | dartos wrote:
         | AGI is, and always was, marketing from LLM providers.
         | 
         | Real innovation is going in with task-specific models like
         | AlphaFold.
         | 
         | LLMs are starting to become more task specific too, as we've
         | seen with the performance of reasoning model on their specific
         | tasks.
         | 
         | I imagine we'll see LLMs trained specifically for medical
         | purposes, legal purposes, code purposes, and maybe even
         | editorial purposes.
         | 
         | All useful in their own way, but none of them even close to
         | sci-fi.
        
           | HeatrayEnjoyer wrote:
           | AGI is a term that's been around for decades.
        
             | dartos wrote:
             | AI was a term that was around for decades as well, but it's
             | meaning dramatically changed over the past 3 years.
             | 
             | Prior to gpt-3 AI was rarely used in marketing or to talk
             | about any number of ML methods.
             | 
             | Nowadays "AI" is just the new "smart" for marketing
             | products.
             | 
             | Terms change. The current usage of AGI, especially in the
             | context I was talking about, is specifically marketing from
             | LLM providers.
             | 
             | I'd argue that the term AGI, when used in a non fiction
             | context, has always been a meaningless marketing term of
             | some kind.
        
               | mdp2021 wrote:
               | > _has always been_
               | 
               | Well now it is not: it is now "the difference between
               | something with outputs that sounds plausible vs something
               | with outputs which are properly checked".
        
           | mirekrusin wrote:
           | History begs to differ - one of the biggest learning is that
           | larger, generic models always win (where generalization pays
           | off - ie. all those agents and what-not <<doesn't apply to
           | specialized models like alphago or alphafold, those are not
           | general models>>).
        
             | dartos wrote:
             | > one of the biggest learning is that larger, generic
             | models always win
             | 
             | You're confusing several different ideas here.
             | 
             | The idea you're talking about is called "the bitter
             | lesson." It (very basically) says that a model with more
             | compute put behind it will perform better than a cleverer
             | method which may use less compute. Has nothing to do with
             | being "generic." It's also worth noting that, afaik, it's
             | an accurate observation, but not a law or a fact. It may
             | not hold forever.
             | 
             | Either way, I'm not arguing against that. I'm saying that
             | LLMs are too general to be useful in specific, specialized,
             | domains.
             | 
             | Sure bigger generic models perform (increasingly
             | marginally) better at the benchmarks we've cooked up, but
             | they're still too general to be that useful in any specific
             | context. That's the entire reason RAG exists in the first
             | place.
             | 
             | I'm saying that a language model trained on a specific
             | domain will perform better at tasks in that domain than a
             | similar sized model (in terms of compute) trained on a lot
             | of different, unrelated text.
             | 
             | For instance, a model trained specifically on code will
             | produce better code than a similarly sized model trained on
             | all available text.
             | 
             | I really hope that example makes what I'm saying self-
             | evident.
        
               | mirekrusin wrote:
               | You're explaining it nicely and then seem to make mistake
               | that contradicts what you've just said - because code and
               | text share domain (text based) - large, generic models
               | will always out-compete smaller, specialized ones -
               | that's the lesson.
               | 
               | If you'd compare it with ie. model for self driving cars
               | - generic text models will not win because they operate
               | in different domain.
               | 
               | In all cases trying to optimize on subset/specialized
               | tasks within domain is not worth the investment because
               | state of art will be held by larger models working on the
               | whole available set.
        
           | ctoth wrote:
           | > AGI is, and always was, marketing from LLM providers.
           | 
           | TIL: The term AGI, which we've been using since at least
           | 1997[0] was invented by time-traveling LLM companies in the
           | 2020s.
           | 
           | [0]: https://ai.stackexchange.com/questions/20231/who-first-
           | coine...
        
             | freilanzer wrote:
             | He didn't say that it was invented by LLM providers...
        
       | sgt101 wrote:
       | Isn't this the worst possible case for an LLM? The integrity of
       | the product is central to the value of it and the user is by
       | definition unable to verify that integrity?
        
         | falcor84 wrote:
         | I'm not following, what's "by definition" here? You can verify
         | the integrity of an AI report in the same way you would do with
         | any other report someone prepared for you - when you encounter
         | something that feels wrong, check the referenced source
         | yourself.
        
           | sgt101 wrote:
           | I think it's normal to invest authority in a report that
           | someone else has prepared - people take what is written on
           | trust because the person who prepared it is accountable for
           | any errors... not just now, but forever.
           | 
           | LLM's are not accountable for anything.
        
       | simonw wrote:
       | I really like the distinction between DeepSearch and DeepResearch
       | proposed in this piece by Han Xiao:
       | https://jina.ai/news/a-practical-guide-to-implementing-deeps...
       | 
       | > DeepSearch runs through an iterative loop of searching,
       | reading, and reasoning until it finds the optimal answer. [...]
       | 
       | > DeepResearch builds upon DeepSearch by adding a structured
       | framework for generating long research reports
       | 
       | Given these definitions, I think DeepSearch is the more valuable
       | and interesting pattern. It's effectively RAG built using tools
       | in a loop, which is much more likely to answer questions
       | effectively than more traditional RAG where there is only one
       | attempt to find relevant documents to include in a single prompt
       | to an LLM.
       | 
       | DeepResearch is a cosmetic enhancement that wraps the results in
       | a "report" - it looks impressive but IMO is much more likely to
       | lead to inaccurate or misleading results.
       | 
       | More notes here: https://simonwillison.net/2025/Mar/4/deepsearch-
       | deepresearch...
        
         | vlovich123 wrote:
         | I thought DeepResearch has the AI driving the process because
         | it's been trained to do so vs DeepSearch is something like
         | langchain + prompt engineering?
        
           | simonw wrote:
           | There are three different launched commercial products called
           | "Deep Research" right now - from Google Gemini, OpenAI and
           | Perplexity. There are also several open source projects that
           | use the name "Deep Research".
           | 
           | DeepResearch (note the absence of the space character) is the
           | name that Han Xiao proposes for the general pattern of
           | generating a research-style report after running multiple
           | searches.
           | 
           | You might implement that pattern using prompt engineering or
           | using some custom trained model or through other means. If
           | the eventual output looks like a report and it ran multiple
           | searches along the way it fits Han's "DeepResearch"
           | definition.
        
             | dukeyukey wrote:
             | Why is the AI industry so follow-the-leader in naming
             | stuff? I've used at least three AI services called
             | "Copilot".
        
         | NitpickLawyer wrote:
         | > DeepResearch is a cosmetic enhancement that wraps the results
         | in a "report" - it looks impressive but IMO is much more likely
         | to lead to inaccurate or misleading results.
         | 
         | I think that _if done well_ deep research can be more than
         | that. At a minimum, I would say that before  "deep search"
         | you'd need some calls to an LLM to figure out what to look for,
         | what places would be best to look for (i.e. sources, trust,
         | etc), how to tabulate the data gathered and so on. Just as deep
         | search is "rag w/ tools in a loop", so can (should) be deep
         | research.
         | 
         | Think of the analogy of using aider straight up going to code
         | or using it to first /architect and then code. But for _any_
         | task that lends itself to (re)searching. At least it would
         | catch useless tangents faster.
         | 
         | At the end of the day, what's fascinating about LLM based
         | agents is that you can almost always add another layer of
         | abstraction on top. No matter what you build, you can always
         | come from another angle. That's really cool imo, and something
         | Hassabis has hinted lately in some podcasts.
        
           | samstave wrote:
           | So I've started a thing with Jim's, and the first effort I am
           | doing is erring the "tone" meaning I'm building a project
           | template that will keep the bots focused.
           | 
           | I think that one part of the deep loop needs to be a check-in
           | on expectations and goals...
           | 
           | So instead of throwing a deep task: I find that bots work
           | better in small iterative chucks of objectives..
           | 
           | I haven't formulated it completely yet but as an example ive
           | been working extensively with cursors whole anthropic
           | abstraction ai as a service:
           | 
           | So many folks suffer from "generating" quagmire;
           | 
           | And I found that telling the bot to "break any response into
           | smaller chunks to avoid context limitations" works incredibly
           | well...
           | 
           | So when my scaffold is complete the goal is to use Fabric
           | Patterns for nursery assignments to the deep bots.. whereby
           | they constantly check in.
           | 
           | Prior to "deep" things I found this to work really well by
           | telling the bots about obsessively development_diray.md and
           | .json tracking of actions (even still their memory is super
           | small, and I envisioned a multi layer of agents where the
           | initial agents actions feed the context of agents who follow
           | along and you have a waterfall of context between agents so
           | as to avoid context loss on super deep iterative research...
           | 
           | (I'll type out something more salient when I have a KVM...
           | 
           | (But I hope that doesn't sound stupid)
        
             | mola wrote:
             | What are fabric patterns?
        
               | samstave wrote:
               | Basically agentic personas or modus operandai
               | 
               | You tell the agent "grok this persona to accomplish the
               | task
        
           | simonw wrote:
           | Right - I'm finding the flawed Deep Research tools useful
           | already, but what I really want is much more control over the
           | sources of information they use.
        
             | paulsutter wrote:
             | Exactly - like my whole codebase, or repositories of
             | proprietary documents
        
             | throwup238 wrote:
             | Sadly I think that's why non-open source commercial deep
             | (re)search implementations are going to be largely useless.
             | Even if you're using a customized end point for search like
             | Kagi, the sources are mostly garbage and no one except
             | maybe Google Books has the resources and legal cover to
             | expand that deep search into books, which are much better
             | sources.
        
         | samstave wrote:
         | First, SimonW, I devour everything you write and appreciate you
         | most in the AI community and recommend you from 0 all the way
         | to 1!!!
         | 
         | Thank you.
         | 
         | -
         | 
         | Second, thank you for bringin up Jina, I recently discovered it
         | and immediate began building a Thing based on it:
         | 
         | I want to use its functions to ferret-out all the entanglements
         | from the roster from the WEF Leadership roster, similar to the
         | NGO fractal connections - I'm doing that with every WEF member,
         | through to Congress.
         | 
         | I would truly wish to work with you on such, I so inclined..
         | 
         | I prefer to build "dossiers" rather than reports, and
         | represented in json schemas
         | 
         | I'm on mobile so will provide more details when at machine...
         | 
         | Looping through a dossier of connections is much more
         | thoughtful than a "report" imo.
         | 
         | I need to see you on someone's podcast, else you and I should
         | make one!
        
           | simonw wrote:
           | Thanks! I've done quite a few podcast things recently,
           | they're tagged on my blog:
           | https://simonwillison.net/tags/podcast-appearances/
        
             | samstave wrote:
             | Dub, yeah I saw that but hadn't listened yet.
             | 
             | What I want is a "podcast" with audience participation..
             | 
             | The lex fridman DeepSeek episode was so awesome but I have
             | so many questions and I get exceedingly frustrated when lex
             | doesn't ask what may seem obv to us HNers...
             | 
             | -
             | 
             | Back topic:
             | 
             | Reports are flat; dossiers re malleable.
             | 
             | As I mentioned my goal is fractal visuals (in minamGL) of
             | the true entanglements from the WEF out.
             | 
             | Much like mike Benz on usAid - using jina deep research,
             | extraction etc will pull back the veil on the truth of the
             | globalist agenda seeking control and will reveal true
             | relationships, loyalties and connections.
             | 
             | It been running through my head for decades and I finally
             | feel that jina is a tool that can start to reveal what
             | myself and so many others can plainly see but can't verify.
        
         | derefr wrote:
         | > DeepResearch is a cosmetic enhancement that wraps the results
         | in a "report"
         | 
         | No, that's not what Xiao said here. Here's the relevant quote
         | 
         | > It often begins by creating a table of contents, then
         | systematically applies DeepSearch to each required section -
         | from introduction through related work and methodology, all the
         | way to the conclusion. Each section is generated by feeding
         | specific research questions into the DeepSearch. The final
         | phase involves consolidating all sections into a single prompt
         | to improve the overall narrative coherence.
         | 
         | (I also recommend that you stare very hard at the diagrams.)
         | 
         | Let me paraphrase what Xiao is saying here:
         | 
         | A DeepSearch is a primitive -- it does mostly the same thing a
         | regular LLM query does, but with a lot of trained-in thinking
         | and searching work, to ensure that it is producing a _rigorous_
         | answer to your question. Which is great: it means that
         | DeepSearch is more likely to say  "I don't know" than to
         | hallucinate an answer. (This is extremely important as a
         | building block; an agent needs to know when a query has failed
         | so it can try again / try something else.)
         | 
         | However, DeepSearch alone still "hallucinates" in one
         | particular way: it "hallucinates understanding" of the topic,
         | thinking that it already has a complete mental toolkit of
         | concepts needed to solve your problem. It will never say
         | "solving this sub-problem seems to require inventing a new
         | tool" and so "branch off" to another recursed DeepSearch to
         | determine how to do that. Instead, it'll try to solve your
         | problem with the toolkit it has -- and if that toolkit is
         | insufficient, it will simply fail.
         | 
         | Which, again, is great in some ways. It means that a single
         | DeepSearch will do a (semi-)bounded amount of work. Which means
         | that the costs of each marginal additional DeepSearch call are
         | predictable.
         | 
         | But it also means that you can't ask DeepSearch itself to:
         | 
         | * come up with a mathematical proof of something, where any
         | useful proof strategy will _implicitly require_ inventing new
         | math concepts to use as tools in solving the problem.
         | 
         | * do investigative journalism that involves "chasing leads"
         | down a digraph of paths; evaluating what those leads have to
         | say; and using that info to determine _new_ leads.
         | 
         | * "code me a Facebook clone" -- and have it understand that
         | doing so involves iteratively/recursively building out a
         | _software architecture_ composed of many modules -- where it
         | won 't be able to see the need for many of those modules at
         | "design time", but will only "discover" the need to write them
         | once it gets to implementation time of dependent modules and
         | realizes that to achieve some goal, it must call into some code
         | / entire library that doesn't exist yet. (And then make a buy-
         | vs-build decision on writing that code vs pulling in a
         | dependency... which requires researching the _space_ of
         | available packages in the ecosystem, and how well they solve
         | the problem... and so on.)
         | 
         | A DeepResearch model, meanwhile, is a model that looks at a
         | question, and says "is this a _leaf_ question that can be
         | answered directly -- or is this a question that needs to be
         | broken down and tackled by parts, perhaps with some of the
         | parts themselves being unknowns until earlier parts are solved?
         | "
         | 
         | A DeepResearch model does a lot of top-level work -- probably
         | using DeepSearch! -- to test the "leaf-ness" of your question;
         | and to _break down non-leaf questions_ into a  "battle plan"
         | for solving the problem. It then attempts solutions to these
         | component problems -- not by calling DeepSearch, but by
         | recursively calling itself (where that forked child will call
         | DeepSearch _if_ the sub-problem is leaf-y, or break down the
         | sub-problem further if not.)
         | 
         | A DeepResearch model will then takes the derived solutions for
         | _dependent_ problems into account in the solution space for
         | _depending_ problems. (A DeepResearch model may also be trained
         | to notice when it 's "worked into a corner" by coming up with
         | early-phase solutions that make later phases impossible; and
         | backtracking to solve the earlier phases differently, now with
         | in-context knowledge of the constraints of the later phases.)
         | 
         | Once a DeepResearch model finds a successful solution to all
         | subproblems, it takes the hierarchy of thinking/searching logs
         | it generated in the process, and strips out all the dead-ends
         | and backtracking, to present a comprehensible linear "success
         | path." (Probably it does this as the end-step of each recursive
         | self-call, before returning to self, to minimize the amount of
         | data returned.)
         | 
         | Note how this last reporting step isn't "generating a report"
         | _for human consumption_ ; it's a DeepResearch call "generating
         | a report" _for its parent DeepResearch call to consume_. _That
         | 's_ special sauce. (And if you think about it, the top-level
         | call to this whole thing is probably going to use a _non_
         | -DeepResearch model at the end to rephrase the top-level
         | DeepResearch result from a machine-readable recurse-result
         | report into a human-readable report. It might even use a
         | DeepSearch model to do that!)
         | 
         | ---
         | 
         | Bonus tangent:
         | 
         | Despite DeepSearch + DeepResearch using a scientific-research
         | metaphor, I think an enlightening comparison is with
         | intelligence agencies.
         | 
         | DeepSearch alone does what an individual intelligence analyst
         | does. You hand them an individually-actionable question; they
         | run through a "branching, but vaguely bounded in time" process
         | of thinking and searching, generating a thinking log in the
         | process, eventually arriving at a conclusion; they hand you
         | back an answer to your question, with a lot of citations --
         | _or_ they  "throw an exception" and tell you that the facts
         | available to the agency cannot support a conclusion at this
         | time.
         | 
         | Meanwhile, DeepResearch does what an intelligence agency as a
         | whole does:
         | 
         | 1. You send the agency a high-level strategic Request For
         | Information;
         | 
         | 2. the agency puts together a workgroup composed of people with
         | trained-in expertise with breaking down problems (Intelligence
         | Managers), and domain-matter experts with a wide-ranging
         | gestalt picture of the problem space (Senior Intelligence
         | Analysts), and tasks them with _breaking down_ the problem into
         | sub-problems;
         | 
         | 3. some of these sub-problems are _actionable_ -- they can be
         | assigned directly for research by a ground-level analyst; some
         | of these sub-problems have _prerequisite work_ that must be
         | done to _gather intelligence in the field_ ; and some of these
         | sub-problems are _unknown unknowns_ -- missing parts of the map
         | that cannot be  "planned into" until other sub-problems are
         | resolved.
         | 
         | 4. from there, the problem gets "scheduled" -- in parallel,
         | (the first batch of) individually-actionable questions get sent
         | to analysts, and any field missions to gather pre-requisite
         | intelligence are kicked off for planning (involving spawning
         | new sub-workgroups!)
         | 
         | 5. the top-level workgroup persists after their first meeting,
         | asynchronously observing the reports from actionable questions;
         | scheduling newly-actionable questions to analysts once field
         | data comes in to be chewed on; and exploring newly-legible
         | parts of the map to outline further sub-problems.
         | 
         | 6. If this scheduling process runs out of work to schedule,
         | it's either because the top-level question is now answerable,
         | or because the process has worked itself into a corner. In the
         | former case, a final summary reporting step is kicked off,
         | usually assigned to a senior analyst. In the latter case, the
         | workgroup reconvene to figure out how to backtrack out of the
         | corner and pursue alternate avenues. (Note that, if they have
         | the time, they'll probably make "if this strategy produces
         | results that are unworkable in a later step" plans for every
         | possible step in their original plan, in advance, so that the
         | "scheduling engine" of analyst assignments and fieldwork need
         | never run dry waiting for the workgroup to come up with a new
         | plan.)
        
           | simonw wrote:
           | You're right, Han didn't define DeepResearch as "a cosmetic
           | enhancement". I quoted his sentence long definition:
           | 
           | > DeepResearch builds upon DeepSearch by adding a structured
           | framework for generating long research reports.
           | 
           | But then called it "a cosmetic enhancement" really to be
           | slightly dismissive of it - I'm a skeptic of the report
           | format because I think the way it's presented makes the
           | information look more solid than it actually is. My complaint
           | is at the aesthetic level, not relating to the (impressive)
           | way the report synthesis is engineered.
           | 
           | So yeah, I'm being inaccurate and a bit catty about it.
           | 
           | Your explanation is much closer to what Han described, and
           | much more useful than mine.
        
         | TeMPOraL wrote:
         | > _DeepResearch is a cosmetic enhancement that wraps the
         | results in a "report" - it looks impressive but IMO is much
         | more likely to lead to inaccurate or misleading results._
         | 
         | Yup, I got the same impression reading this article - and the
         | Jina one, too. Like with langchain and agents, people are
         | making chained function calls or a loop sound like it is the
         | second coming, or a Nobel prize-worthy discovery. It's not -
         | it's _obvious_. It 's just expensive to get to work reliably
         | and productize.
        
       | EncomLab wrote:
       | One of my co-workers joked at the time that "sure AlphaGO beat
       | Lee Sedol at GO, but Lee has a much better self-driving
       | algorithm."
       | 
       | I thought this was funny at the time, but I think as more time
       | passes it does highlight the stark gulf that exists between the
       | capability of the most advanced AI systems and what we expect as
       | "normal competency" from the most average person.
        
         | tsunego wrote:
         | Love me some good old whataboutism (sure, LLMs are now super-
         | intelligent at writing software, but can they clean my kitchen?
         | No? Ha!)
        
           | jvanderbot wrote:
           | The computer beat me at chess, but it was no match for me at
           | kickboxing. - Emo Phillips
           | 
           | Tale as old as time. We can make nice software systems but
           | general purpose AI / Agents isn't here yet.
        
         | stavros wrote:
         | > it does highlight the stark gulf that exists between the
         | capability of the most advanced AI systems and what we expect
         | as "normal competency" from the most average person
         | 
         | Yes, but now we're at the point where we can compare AI to a
         | person, whereas five years ago the gap was so big that that was
         | just unthinkable.
        
           | EncomLab wrote:
           | I mean people thought ELIZA was AI back in the 1960's.
           | Everyone always thinks "this is it!!".
        
             | mdp2021 wrote:
             | > _people thought ELIZA_
             | 
             | But _which_ people? Those people which show that a
             | supplement of extra intelligence, also synthetic, is
             | sought.
        
       | jimmySixDOF wrote:
       | This gives STORM a high mark but didn't seem to get great results
       | from GPT Researcher which is the other open source project that
       | was doing this before the recent flavor of the day DeepReasearch
       | has become.
       | 
       | But there are so many ways to configure GPT Researcher for all
       | kinds of budgets so I wonder if this comparison really pushed the
       | output or just went with defaults and got default midranges for
       | comparison.
        
       | tsunego wrote:
       | Neat summary but you forgot Grok!
        
         | jvanderbot wrote:
         | Maybe the article has been edited in the last four minutes
         | since you posted, but Grok is definitely in there now.
        
       | giancarlostoro wrote:
       | Its interesting it says Grok excels at report generation, because
       | I've found myself asking it to give me answers in a table format,
       | to make it easier to 'grok' at the output, since I'm usually
       | asking it to give me comparisons I just can't do natively on
       | Amazon or any other ecommerce site.
       | 
       | Funnily enough, Amazon will pick for you products to compare, but
       | the compared items usually are terrible, and you can't just add
       | whatever you want, or choose columns.
       | 
       | With Grok, I'll have it remove columns, add columns, shorten
       | responses, so on and so forth.
        
       | ankit219 wrote:
       | Think this captures one of the bigger differences between what
       | Open AI offers and what others offer using the same name. Funnily
       | enough, Google's Gemini 2.0 Flash also has a native integration
       | to google search[1]. They have not done it with their Thinking
       | model. When they do we will have a good comparison.
       | 
       | One of the implications of OpenAI's DR is that frontier labs are
       | more likely to train specific models for a bunch of tasks,
       | resulting in the kind of quality wrappers will find hard to
       | replicate. This is leading towards model + post training RL as a
       | product, instead of keeping them separate from the final wrapper
       | as product. Might be interesting times if the trajectory
       | continues.
       | 
       | PS: There is also genspark MOA[2] which creates an indepth report
       | on a given prompt using mixtures of agents. From what i have seen
       | in 5-6 generations, this is very effective.
       | 
       | [1]: https://x.com/_philschmid/status/1896569401979081073 (i
       | might be misunderstanding this, but this seems a native call
       | instead of explicit)
       | 
       | [2]: https://www.genspark.ai/agents?type=moa_deep_research
        
         | cxie wrote:
         | deep search is the new RAG
        
       | eamag wrote:
       | Why is there a citation @article part in the end? Do people
       | actually use it?
        
       | SubiculumCode wrote:
       | The primary issue with deep research tools are veracity and
       | accurate source attribution. My issue with tools relying on
       | DeepDeek R, for example, is the high hallucination rate.
        
       | EigenLord wrote:
       | DR is a nice way to gather information, when it works, and then
       | do the real research yourself from a concentrated launching
       | point. It helps me avoid ADD braining myself into oblivion every
       | time I search the internet. The fatal mistake is thinking that
       | the LLM is now wiser for having done it. When someone does their
       | research, they are now marginally more of an authority on that
       | topic than everyone else in the room, all else being equal. But
       | for LLMs, it's not like they have suddenly acquired more
       | expertise on the subject now that they did this survey. So it's
       | actually pretty shallow, not deep, research. It's a cool
       | capability, and a nifty way to concentrate information, but much
       | deeper capabilities will be required to have models that not only
       | truly synthesize all that information, but actively apply it to
       | develop a thesis or further a research program. Truthfully, I
       | don't see how this is possible within the transformer
       | architecture, with its lack of memory or genuine statefulness and
       | therefore absence of persistent real time learning. But I am
       | generally a big transformer skeptic.
        
       | instagraham wrote:
       | I noticed that these models very quickly start to underperform
       | regular search like Perplexity Pro 3x. It might be somewhat
       | thorough in how it goes line by line, but it's not very cognizant
       | of good sources - you might ask for academic sources but if your
       | social media slider is turned on, it will overwhelmingly favour
       | Reddit.
       | 
       | You may repeat instructions multiple times, but it ignores them
       | or fails to understand a solution
        
         | ipsum2 wrote:
         | You have to be specific about which model you're referring to.
         | OpenAI DeepResearch does not have a slider, and does follow
         | when you say to only use academic sources.
        
       | dudeinhawaii wrote:
       | As a user, I've found that researching the same topics in OpenAI
       | Deep Research vs Perplexity's Deep Research results in "narrow
       | and deep" vs "shallow and broad".
       | 
       | OpenAI tends to have something like 20 high quality sources
       | selected and goes very deep in the specific topic, producing
       | something like 20-50 pages of research in all areas and adjacent
       | areas. It takes a lot to read but is quite good.
       | 
       | Perplexity tends to hit something like 60 or more sources, goes
       | fairly shallow, answers some questions in general ways but is
       | excellent at giving you the surface area of the problem space and
       | thoughts about where to go deeper if needed.
       | 
       | OpenAI takes a lot longer to complete, perhaps 20x longer. This
       | factors heavily into whether you want a surface-y answer now or a
       | deep answer later.
        
       | bilater wrote:
       | I wet through this journey myself with Deep Search / Research
       | https://github.com/btahir/open-deep-research
       | 
       | I think it really comes down to your own workflow. You sometimes
       | want to be more imperative (select the sources yourself to
       | generate a report) and sometimes more declarative (let a DFS/BFS
       | algo go and split a query into subqueries and go down rabbit
       | holes until some depth and then aggregate).
       | 
       | Been trying different ways of optimizing the former but I am
       | fascinated by the more end to end flows systems like STORM do.
        
       | toisanji wrote:
       | what is the best open source system to use?
        
       | z3c0 wrote:
       | > In natural language processing (NLP) terms, this is known as
       | report generation.
       | 
       | I'm happy to see some acknowledgement of the world before LLMs.
       | This is an old problem, and one I (or my team, really) was
       | working on at the time of DALL-E & ChatGPT's explosion. As the
       | article indicated, we deemed 3.5 unacceptable for Q&A almost
       | immediately, as the failure rate was too high for operational
       | reporting in such a demanding industry (legal). We instead
       | employed SQuAD and polished up the output with an LLM.
       | 
       | These new reasoning models that effectively retrofit Q&A
       | capabilities (an extractive task) onto a generative model are
       | impressive, but I can't help but think that it's putting the cart
       | before the horse and will inevitably give diminishing returns in
       | performance. Time will tell, I suppose.
        
       ___________________________________________________________________
       (page generated 2025-03-05 23:01 UTC)