[HN Gopher] The Differences Between Deep Research, Deep Research...
___________________________________________________________________
The Differences Between Deep Research, Deep Research, and Deep
Research
Author : thenameless7741
Score : 173 points
Date : 2025-03-02 22:59 UTC (3 days ago)
(HTM) web link (leehanchung.github.io)
(TXT) w3m dump (leehanchung.github.io)
| blacksqr wrote:
| Yes, but what about Deep Research?
| readyplayernull wrote:
| Are you telling me that AI's are starting to diverge and that we
| might get a combinatorial explosion of reasoning paths that will
| produce so many different agents that we won't know which one can
| actually become AGI?
|
| https://leehanchung.github.io/assets/img/2025-02-26/05-quadr...
| nwhnwh wrote:
| Life is so complicated.
| pseudocomposer wrote:
| They are certainly diverging, and becoming more useful tools,
| but the answer to "which one can actually become AGI?" is, as
| always, "none of them."
| OutOfHere wrote:
| AGI is about performing actions with a high multi-task
| intelligence. Only the top right corner (Deep, Trained) has any
| hope of getting closer to AGI. The rest can still be useful for
| specific tasks, e.g. "deep-research".
| dartos wrote:
| AGI is, and always was, marketing from LLM providers.
|
| Real innovation is going in with task-specific models like
| AlphaFold.
|
| LLMs are starting to become more task specific too, as we've
| seen with the performance of reasoning model on their specific
| tasks.
|
| I imagine we'll see LLMs trained specifically for medical
| purposes, legal purposes, code purposes, and maybe even
| editorial purposes.
|
| All useful in their own way, but none of them even close to
| sci-fi.
| HeatrayEnjoyer wrote:
| AGI is a term that's been around for decades.
| dartos wrote:
| AI was a term that was around for decades as well, but it's
| meaning dramatically changed over the past 3 years.
|
| Prior to gpt-3 AI was rarely used in marketing or to talk
| about any number of ML methods.
|
| Nowadays "AI" is just the new "smart" for marketing
| products.
|
| Terms change. The current usage of AGI, especially in the
| context I was talking about, is specifically marketing from
| LLM providers.
|
| I'd argue that the term AGI, when used in a non fiction
| context, has always been a meaningless marketing term of
| some kind.
| mdp2021 wrote:
| > _has always been_
|
| Well now it is not: it is now "the difference between
| something with outputs that sounds plausible vs something
| with outputs which are properly checked".
| mirekrusin wrote:
| History begs to differ - one of the biggest learning is that
| larger, generic models always win (where generalization pays
| off - ie. all those agents and what-not <<doesn't apply to
| specialized models like alphago or alphafold, those are not
| general models>>).
| dartos wrote:
| > one of the biggest learning is that larger, generic
| models always win
|
| You're confusing several different ideas here.
|
| The idea you're talking about is called "the bitter
| lesson." It (very basically) says that a model with more
| compute put behind it will perform better than a cleverer
| method which may use less compute. Has nothing to do with
| being "generic." It's also worth noting that, afaik, it's
| an accurate observation, but not a law or a fact. It may
| not hold forever.
|
| Either way, I'm not arguing against that. I'm saying that
| LLMs are too general to be useful in specific, specialized,
| domains.
|
| Sure bigger generic models perform (increasingly
| marginally) better at the benchmarks we've cooked up, but
| they're still too general to be that useful in any specific
| context. That's the entire reason RAG exists in the first
| place.
|
| I'm saying that a language model trained on a specific
| domain will perform better at tasks in that domain than a
| similar sized model (in terms of compute) trained on a lot
| of different, unrelated text.
|
| For instance, a model trained specifically on code will
| produce better code than a similarly sized model trained on
| all available text.
|
| I really hope that example makes what I'm saying self-
| evident.
| mirekrusin wrote:
| You're explaining it nicely and then seem to make mistake
| that contradicts what you've just said - because code and
| text share domain (text based) - large, generic models
| will always out-compete smaller, specialized ones -
| that's the lesson.
|
| If you'd compare it with ie. model for self driving cars
| - generic text models will not win because they operate
| in different domain.
|
| In all cases trying to optimize on subset/specialized
| tasks within domain is not worth the investment because
| state of art will be held by larger models working on the
| whole available set.
| ctoth wrote:
| > AGI is, and always was, marketing from LLM providers.
|
| TIL: The term AGI, which we've been using since at least
| 1997[0] was invented by time-traveling LLM companies in the
| 2020s.
|
| [0]: https://ai.stackexchange.com/questions/20231/who-first-
| coine...
| freilanzer wrote:
| He didn't say that it was invented by LLM providers...
| sgt101 wrote:
| Isn't this the worst possible case for an LLM? The integrity of
| the product is central to the value of it and the user is by
| definition unable to verify that integrity?
| falcor84 wrote:
| I'm not following, what's "by definition" here? You can verify
| the integrity of an AI report in the same way you would do with
| any other report someone prepared for you - when you encounter
| something that feels wrong, check the referenced source
| yourself.
| sgt101 wrote:
| I think it's normal to invest authority in a report that
| someone else has prepared - people take what is written on
| trust because the person who prepared it is accountable for
| any errors... not just now, but forever.
|
| LLM's are not accountable for anything.
| simonw wrote:
| I really like the distinction between DeepSearch and DeepResearch
| proposed in this piece by Han Xiao:
| https://jina.ai/news/a-practical-guide-to-implementing-deeps...
|
| > DeepSearch runs through an iterative loop of searching,
| reading, and reasoning until it finds the optimal answer. [...]
|
| > DeepResearch builds upon DeepSearch by adding a structured
| framework for generating long research reports
|
| Given these definitions, I think DeepSearch is the more valuable
| and interesting pattern. It's effectively RAG built using tools
| in a loop, which is much more likely to answer questions
| effectively than more traditional RAG where there is only one
| attempt to find relevant documents to include in a single prompt
| to an LLM.
|
| DeepResearch is a cosmetic enhancement that wraps the results in
| a "report" - it looks impressive but IMO is much more likely to
| lead to inaccurate or misleading results.
|
| More notes here: https://simonwillison.net/2025/Mar/4/deepsearch-
| deepresearch...
| vlovich123 wrote:
| I thought DeepResearch has the AI driving the process because
| it's been trained to do so vs DeepSearch is something like
| langchain + prompt engineering?
| simonw wrote:
| There are three different launched commercial products called
| "Deep Research" right now - from Google Gemini, OpenAI and
| Perplexity. There are also several open source projects that
| use the name "Deep Research".
|
| DeepResearch (note the absence of the space character) is the
| name that Han Xiao proposes for the general pattern of
| generating a research-style report after running multiple
| searches.
|
| You might implement that pattern using prompt engineering or
| using some custom trained model or through other means. If
| the eventual output looks like a report and it ran multiple
| searches along the way it fits Han's "DeepResearch"
| definition.
| dukeyukey wrote:
| Why is the AI industry so follow-the-leader in naming
| stuff? I've used at least three AI services called
| "Copilot".
| NitpickLawyer wrote:
| > DeepResearch is a cosmetic enhancement that wraps the results
| in a "report" - it looks impressive but IMO is much more likely
| to lead to inaccurate or misleading results.
|
| I think that _if done well_ deep research can be more than
| that. At a minimum, I would say that before "deep search"
| you'd need some calls to an LLM to figure out what to look for,
| what places would be best to look for (i.e. sources, trust,
| etc), how to tabulate the data gathered and so on. Just as deep
| search is "rag w/ tools in a loop", so can (should) be deep
| research.
|
| Think of the analogy of using aider straight up going to code
| or using it to first /architect and then code. But for _any_
| task that lends itself to (re)searching. At least it would
| catch useless tangents faster.
|
| At the end of the day, what's fascinating about LLM based
| agents is that you can almost always add another layer of
| abstraction on top. No matter what you build, you can always
| come from another angle. That's really cool imo, and something
| Hassabis has hinted lately in some podcasts.
| samstave wrote:
| So I've started a thing with Jim's, and the first effort I am
| doing is erring the "tone" meaning I'm building a project
| template that will keep the bots focused.
|
| I think that one part of the deep loop needs to be a check-in
| on expectations and goals...
|
| So instead of throwing a deep task: I find that bots work
| better in small iterative chucks of objectives..
|
| I haven't formulated it completely yet but as an example ive
| been working extensively with cursors whole anthropic
| abstraction ai as a service:
|
| So many folks suffer from "generating" quagmire;
|
| And I found that telling the bot to "break any response into
| smaller chunks to avoid context limitations" works incredibly
| well...
|
| So when my scaffold is complete the goal is to use Fabric
| Patterns for nursery assignments to the deep bots.. whereby
| they constantly check in.
|
| Prior to "deep" things I found this to work really well by
| telling the bots about obsessively development_diray.md and
| .json tracking of actions (even still their memory is super
| small, and I envisioned a multi layer of agents where the
| initial agents actions feed the context of agents who follow
| along and you have a waterfall of context between agents so
| as to avoid context loss on super deep iterative research...
|
| (I'll type out something more salient when I have a KVM...
|
| (But I hope that doesn't sound stupid)
| mola wrote:
| What are fabric patterns?
| samstave wrote:
| Basically agentic personas or modus operandai
|
| You tell the agent "grok this persona to accomplish the
| task
| simonw wrote:
| Right - I'm finding the flawed Deep Research tools useful
| already, but what I really want is much more control over the
| sources of information they use.
| paulsutter wrote:
| Exactly - like my whole codebase, or repositories of
| proprietary documents
| throwup238 wrote:
| Sadly I think that's why non-open source commercial deep
| (re)search implementations are going to be largely useless.
| Even if you're using a customized end point for search like
| Kagi, the sources are mostly garbage and no one except
| maybe Google Books has the resources and legal cover to
| expand that deep search into books, which are much better
| sources.
| samstave wrote:
| First, SimonW, I devour everything you write and appreciate you
| most in the AI community and recommend you from 0 all the way
| to 1!!!
|
| Thank you.
|
| -
|
| Second, thank you for bringin up Jina, I recently discovered it
| and immediate began building a Thing based on it:
|
| I want to use its functions to ferret-out all the entanglements
| from the roster from the WEF Leadership roster, similar to the
| NGO fractal connections - I'm doing that with every WEF member,
| through to Congress.
|
| I would truly wish to work with you on such, I so inclined..
|
| I prefer to build "dossiers" rather than reports, and
| represented in json schemas
|
| I'm on mobile so will provide more details when at machine...
|
| Looping through a dossier of connections is much more
| thoughtful than a "report" imo.
|
| I need to see you on someone's podcast, else you and I should
| make one!
| simonw wrote:
| Thanks! I've done quite a few podcast things recently,
| they're tagged on my blog:
| https://simonwillison.net/tags/podcast-appearances/
| samstave wrote:
| Dub, yeah I saw that but hadn't listened yet.
|
| What I want is a "podcast" with audience participation..
|
| The lex fridman DeepSeek episode was so awesome but I have
| so many questions and I get exceedingly frustrated when lex
| doesn't ask what may seem obv to us HNers...
|
| -
|
| Back topic:
|
| Reports are flat; dossiers re malleable.
|
| As I mentioned my goal is fractal visuals (in minamGL) of
| the true entanglements from the WEF out.
|
| Much like mike Benz on usAid - using jina deep research,
| extraction etc will pull back the veil on the truth of the
| globalist agenda seeking control and will reveal true
| relationships, loyalties and connections.
|
| It been running through my head for decades and I finally
| feel that jina is a tool that can start to reveal what
| myself and so many others can plainly see but can't verify.
| derefr wrote:
| > DeepResearch is a cosmetic enhancement that wraps the results
| in a "report"
|
| No, that's not what Xiao said here. Here's the relevant quote
|
| > It often begins by creating a table of contents, then
| systematically applies DeepSearch to each required section -
| from introduction through related work and methodology, all the
| way to the conclusion. Each section is generated by feeding
| specific research questions into the DeepSearch. The final
| phase involves consolidating all sections into a single prompt
| to improve the overall narrative coherence.
|
| (I also recommend that you stare very hard at the diagrams.)
|
| Let me paraphrase what Xiao is saying here:
|
| A DeepSearch is a primitive -- it does mostly the same thing a
| regular LLM query does, but with a lot of trained-in thinking
| and searching work, to ensure that it is producing a _rigorous_
| answer to your question. Which is great: it means that
| DeepSearch is more likely to say "I don't know" than to
| hallucinate an answer. (This is extremely important as a
| building block; an agent needs to know when a query has failed
| so it can try again / try something else.)
|
| However, DeepSearch alone still "hallucinates" in one
| particular way: it "hallucinates understanding" of the topic,
| thinking that it already has a complete mental toolkit of
| concepts needed to solve your problem. It will never say
| "solving this sub-problem seems to require inventing a new
| tool" and so "branch off" to another recursed DeepSearch to
| determine how to do that. Instead, it'll try to solve your
| problem with the toolkit it has -- and if that toolkit is
| insufficient, it will simply fail.
|
| Which, again, is great in some ways. It means that a single
| DeepSearch will do a (semi-)bounded amount of work. Which means
| that the costs of each marginal additional DeepSearch call are
| predictable.
|
| But it also means that you can't ask DeepSearch itself to:
|
| * come up with a mathematical proof of something, where any
| useful proof strategy will _implicitly require_ inventing new
| math concepts to use as tools in solving the problem.
|
| * do investigative journalism that involves "chasing leads"
| down a digraph of paths; evaluating what those leads have to
| say; and using that info to determine _new_ leads.
|
| * "code me a Facebook clone" -- and have it understand that
| doing so involves iteratively/recursively building out a
| _software architecture_ composed of many modules -- where it
| won 't be able to see the need for many of those modules at
| "design time", but will only "discover" the need to write them
| once it gets to implementation time of dependent modules and
| realizes that to achieve some goal, it must call into some code
| / entire library that doesn't exist yet. (And then make a buy-
| vs-build decision on writing that code vs pulling in a
| dependency... which requires researching the _space_ of
| available packages in the ecosystem, and how well they solve
| the problem... and so on.)
|
| A DeepResearch model, meanwhile, is a model that looks at a
| question, and says "is this a _leaf_ question that can be
| answered directly -- or is this a question that needs to be
| broken down and tackled by parts, perhaps with some of the
| parts themselves being unknowns until earlier parts are solved?
| "
|
| A DeepResearch model does a lot of top-level work -- probably
| using DeepSearch! -- to test the "leaf-ness" of your question;
| and to _break down non-leaf questions_ into a "battle plan"
| for solving the problem. It then attempts solutions to these
| component problems -- not by calling DeepSearch, but by
| recursively calling itself (where that forked child will call
| DeepSearch _if_ the sub-problem is leaf-y, or break down the
| sub-problem further if not.)
|
| A DeepResearch model will then takes the derived solutions for
| _dependent_ problems into account in the solution space for
| _depending_ problems. (A DeepResearch model may also be trained
| to notice when it 's "worked into a corner" by coming up with
| early-phase solutions that make later phases impossible; and
| backtracking to solve the earlier phases differently, now with
| in-context knowledge of the constraints of the later phases.)
|
| Once a DeepResearch model finds a successful solution to all
| subproblems, it takes the hierarchy of thinking/searching logs
| it generated in the process, and strips out all the dead-ends
| and backtracking, to present a comprehensible linear "success
| path." (Probably it does this as the end-step of each recursive
| self-call, before returning to self, to minimize the amount of
| data returned.)
|
| Note how this last reporting step isn't "generating a report"
| _for human consumption_ ; it's a DeepResearch call "generating
| a report" _for its parent DeepResearch call to consume_. _That
| 's_ special sauce. (And if you think about it, the top-level
| call to this whole thing is probably going to use a _non_
| -DeepResearch model at the end to rephrase the top-level
| DeepResearch result from a machine-readable recurse-result
| report into a human-readable report. It might even use a
| DeepSearch model to do that!)
|
| ---
|
| Bonus tangent:
|
| Despite DeepSearch + DeepResearch using a scientific-research
| metaphor, I think an enlightening comparison is with
| intelligence agencies.
|
| DeepSearch alone does what an individual intelligence analyst
| does. You hand them an individually-actionable question; they
| run through a "branching, but vaguely bounded in time" process
| of thinking and searching, generating a thinking log in the
| process, eventually arriving at a conclusion; they hand you
| back an answer to your question, with a lot of citations --
| _or_ they "throw an exception" and tell you that the facts
| available to the agency cannot support a conclusion at this
| time.
|
| Meanwhile, DeepResearch does what an intelligence agency as a
| whole does:
|
| 1. You send the agency a high-level strategic Request For
| Information;
|
| 2. the agency puts together a workgroup composed of people with
| trained-in expertise with breaking down problems (Intelligence
| Managers), and domain-matter experts with a wide-ranging
| gestalt picture of the problem space (Senior Intelligence
| Analysts), and tasks them with _breaking down_ the problem into
| sub-problems;
|
| 3. some of these sub-problems are _actionable_ -- they can be
| assigned directly for research by a ground-level analyst; some
| of these sub-problems have _prerequisite work_ that must be
| done to _gather intelligence in the field_ ; and some of these
| sub-problems are _unknown unknowns_ -- missing parts of the map
| that cannot be "planned into" until other sub-problems are
| resolved.
|
| 4. from there, the problem gets "scheduled" -- in parallel,
| (the first batch of) individually-actionable questions get sent
| to analysts, and any field missions to gather pre-requisite
| intelligence are kicked off for planning (involving spawning
| new sub-workgroups!)
|
| 5. the top-level workgroup persists after their first meeting,
| asynchronously observing the reports from actionable questions;
| scheduling newly-actionable questions to analysts once field
| data comes in to be chewed on; and exploring newly-legible
| parts of the map to outline further sub-problems.
|
| 6. If this scheduling process runs out of work to schedule,
| it's either because the top-level question is now answerable,
| or because the process has worked itself into a corner. In the
| former case, a final summary reporting step is kicked off,
| usually assigned to a senior analyst. In the latter case, the
| workgroup reconvene to figure out how to backtrack out of the
| corner and pursue alternate avenues. (Note that, if they have
| the time, they'll probably make "if this strategy produces
| results that are unworkable in a later step" plans for every
| possible step in their original plan, in advance, so that the
| "scheduling engine" of analyst assignments and fieldwork need
| never run dry waiting for the workgroup to come up with a new
| plan.)
| simonw wrote:
| You're right, Han didn't define DeepResearch as "a cosmetic
| enhancement". I quoted his sentence long definition:
|
| > DeepResearch builds upon DeepSearch by adding a structured
| framework for generating long research reports.
|
| But then called it "a cosmetic enhancement" really to be
| slightly dismissive of it - I'm a skeptic of the report
| format because I think the way it's presented makes the
| information look more solid than it actually is. My complaint
| is at the aesthetic level, not relating to the (impressive)
| way the report synthesis is engineered.
|
| So yeah, I'm being inaccurate and a bit catty about it.
|
| Your explanation is much closer to what Han described, and
| much more useful than mine.
| TeMPOraL wrote:
| > _DeepResearch is a cosmetic enhancement that wraps the
| results in a "report" - it looks impressive but IMO is much
| more likely to lead to inaccurate or misleading results._
|
| Yup, I got the same impression reading this article - and the
| Jina one, too. Like with langchain and agents, people are
| making chained function calls or a loop sound like it is the
| second coming, or a Nobel prize-worthy discovery. It's not -
| it's _obvious_. It 's just expensive to get to work reliably
| and productize.
| EncomLab wrote:
| One of my co-workers joked at the time that "sure AlphaGO beat
| Lee Sedol at GO, but Lee has a much better self-driving
| algorithm."
|
| I thought this was funny at the time, but I think as more time
| passes it does highlight the stark gulf that exists between the
| capability of the most advanced AI systems and what we expect as
| "normal competency" from the most average person.
| tsunego wrote:
| Love me some good old whataboutism (sure, LLMs are now super-
| intelligent at writing software, but can they clean my kitchen?
| No? Ha!)
| jvanderbot wrote:
| The computer beat me at chess, but it was no match for me at
| kickboxing. - Emo Phillips
|
| Tale as old as time. We can make nice software systems but
| general purpose AI / Agents isn't here yet.
| stavros wrote:
| > it does highlight the stark gulf that exists between the
| capability of the most advanced AI systems and what we expect
| as "normal competency" from the most average person
|
| Yes, but now we're at the point where we can compare AI to a
| person, whereas five years ago the gap was so big that that was
| just unthinkable.
| EncomLab wrote:
| I mean people thought ELIZA was AI back in the 1960's.
| Everyone always thinks "this is it!!".
| mdp2021 wrote:
| > _people thought ELIZA_
|
| But _which_ people? Those people which show that a
| supplement of extra intelligence, also synthetic, is
| sought.
| jimmySixDOF wrote:
| This gives STORM a high mark but didn't seem to get great results
| from GPT Researcher which is the other open source project that
| was doing this before the recent flavor of the day DeepReasearch
| has become.
|
| But there are so many ways to configure GPT Researcher for all
| kinds of budgets so I wonder if this comparison really pushed the
| output or just went with defaults and got default midranges for
| comparison.
| tsunego wrote:
| Neat summary but you forgot Grok!
| jvanderbot wrote:
| Maybe the article has been edited in the last four minutes
| since you posted, but Grok is definitely in there now.
| giancarlostoro wrote:
| Its interesting it says Grok excels at report generation, because
| I've found myself asking it to give me answers in a table format,
| to make it easier to 'grok' at the output, since I'm usually
| asking it to give me comparisons I just can't do natively on
| Amazon or any other ecommerce site.
|
| Funnily enough, Amazon will pick for you products to compare, but
| the compared items usually are terrible, and you can't just add
| whatever you want, or choose columns.
|
| With Grok, I'll have it remove columns, add columns, shorten
| responses, so on and so forth.
| ankit219 wrote:
| Think this captures one of the bigger differences between what
| Open AI offers and what others offer using the same name. Funnily
| enough, Google's Gemini 2.0 Flash also has a native integration
| to google search[1]. They have not done it with their Thinking
| model. When they do we will have a good comparison.
|
| One of the implications of OpenAI's DR is that frontier labs are
| more likely to train specific models for a bunch of tasks,
| resulting in the kind of quality wrappers will find hard to
| replicate. This is leading towards model + post training RL as a
| product, instead of keeping them separate from the final wrapper
| as product. Might be interesting times if the trajectory
| continues.
|
| PS: There is also genspark MOA[2] which creates an indepth report
| on a given prompt using mixtures of agents. From what i have seen
| in 5-6 generations, this is very effective.
|
| [1]: https://x.com/_philschmid/status/1896569401979081073 (i
| might be misunderstanding this, but this seems a native call
| instead of explicit)
|
| [2]: https://www.genspark.ai/agents?type=moa_deep_research
| cxie wrote:
| deep search is the new RAG
| eamag wrote:
| Why is there a citation @article part in the end? Do people
| actually use it?
| SubiculumCode wrote:
| The primary issue with deep research tools are veracity and
| accurate source attribution. My issue with tools relying on
| DeepDeek R, for example, is the high hallucination rate.
| EigenLord wrote:
| DR is a nice way to gather information, when it works, and then
| do the real research yourself from a concentrated launching
| point. It helps me avoid ADD braining myself into oblivion every
| time I search the internet. The fatal mistake is thinking that
| the LLM is now wiser for having done it. When someone does their
| research, they are now marginally more of an authority on that
| topic than everyone else in the room, all else being equal. But
| for LLMs, it's not like they have suddenly acquired more
| expertise on the subject now that they did this survey. So it's
| actually pretty shallow, not deep, research. It's a cool
| capability, and a nifty way to concentrate information, but much
| deeper capabilities will be required to have models that not only
| truly synthesize all that information, but actively apply it to
| develop a thesis or further a research program. Truthfully, I
| don't see how this is possible within the transformer
| architecture, with its lack of memory or genuine statefulness and
| therefore absence of persistent real time learning. But I am
| generally a big transformer skeptic.
| instagraham wrote:
| I noticed that these models very quickly start to underperform
| regular search like Perplexity Pro 3x. It might be somewhat
| thorough in how it goes line by line, but it's not very cognizant
| of good sources - you might ask for academic sources but if your
| social media slider is turned on, it will overwhelmingly favour
| Reddit.
|
| You may repeat instructions multiple times, but it ignores them
| or fails to understand a solution
| ipsum2 wrote:
| You have to be specific about which model you're referring to.
| OpenAI DeepResearch does not have a slider, and does follow
| when you say to only use academic sources.
| dudeinhawaii wrote:
| As a user, I've found that researching the same topics in OpenAI
| Deep Research vs Perplexity's Deep Research results in "narrow
| and deep" vs "shallow and broad".
|
| OpenAI tends to have something like 20 high quality sources
| selected and goes very deep in the specific topic, producing
| something like 20-50 pages of research in all areas and adjacent
| areas. It takes a lot to read but is quite good.
|
| Perplexity tends to hit something like 60 or more sources, goes
| fairly shallow, answers some questions in general ways but is
| excellent at giving you the surface area of the problem space and
| thoughts about where to go deeper if needed.
|
| OpenAI takes a lot longer to complete, perhaps 20x longer. This
| factors heavily into whether you want a surface-y answer now or a
| deep answer later.
| bilater wrote:
| I wet through this journey myself with Deep Search / Research
| https://github.com/btahir/open-deep-research
|
| I think it really comes down to your own workflow. You sometimes
| want to be more imperative (select the sources yourself to
| generate a report) and sometimes more declarative (let a DFS/BFS
| algo go and split a query into subqueries and go down rabbit
| holes until some depth and then aggregate).
|
| Been trying different ways of optimizing the former but I am
| fascinated by the more end to end flows systems like STORM do.
| toisanji wrote:
| what is the best open source system to use?
| z3c0 wrote:
| > In natural language processing (NLP) terms, this is known as
| report generation.
|
| I'm happy to see some acknowledgement of the world before LLMs.
| This is an old problem, and one I (or my team, really) was
| working on at the time of DALL-E & ChatGPT's explosion. As the
| article indicated, we deemed 3.5 unacceptable for Q&A almost
| immediately, as the failure rate was too high for operational
| reporting in such a demanding industry (legal). We instead
| employed SQuAD and polished up the output with an LLM.
|
| These new reasoning models that effectively retrofit Q&A
| capabilities (an extractive task) onto a generative model are
| impressive, but I can't help but think that it's putting the cart
| before the horse and will inevitably give diminishing returns in
| performance. Time will tell, I suppose.
___________________________________________________________________
(page generated 2025-03-05 23:01 UTC)