[HN Gopher] Open Deep Research
___________________________________________________________________
Open Deep Research
Author : transpute
Score : 135 points
Date : 2025-02-04 19:55 UTC (3 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| tkellogg wrote:
| it's just an example, but it's great to see smolagents in
| practice. I wonder how well the import whitelist approach works
| for code interpreter security.
| tptacek wrote:
| I know some of the point of this is running things locally, but
| for agent workflows like this some of this seems like a solved
| problem: just run it on a throwaway VM. There's lots of ways to
| do that quickly.
| ATechGuy wrote:
| VM is not the right abstraction because of performance and
| resource requirements. VMs are used because nothing exists
| that provides same or better isolation. Using a throwaway VM
| for each AI agent would be highly inefficient (think wasted
| compute and other resources, which is the opposite of what
| DeepSeek exemplified).
| tptacek wrote:
| To which performance and resource requirements are you
| referring? A cloud VM runs as long as the agent runs, then
| stops running.
| ATechGuy wrote:
| I mean performance overheads of an OS process running in
| a VM to (vs no VM) and additional resource requirements
| for running a VM, including memory and additional kernel.
| You can pull relevant numbers from academic papers.
| vineyardmike wrote:
| Is "DeepSeek" going to be the new trendy way to say to not
| be wasteful? I don't think DS is a good example here.
| Mostly because it's a trendy thing, and _the company still
| has $1B in capex spend to get there_.
|
| Firecracker has changed the nature of "VMs" into something
| cheap and easy to spin up and throw away while maintaining
| isolation. There's no reason not to use it (besides
| complexity, I guess).
|
| Besides, the entire rest of this is a python notebook. With
| headless browsers. Using LLMs. This is entirely setting
| silicon on fire. The overhead from a VM the least of the
| compute efficiency problems. Just hit a quick cloud API and
| run your python or browser automation in isolation and move
| on.
| tptacek wrote:
| I'm not even talking about Firecracker; for the duration
| of time things like these run, you could get a
| satisfactory UX with basic EC2.
| cma wrote:
| The rise of captchas on regular content, no longer just for
| posting content, could ruin this. Cloudflare and other
| companies have set things up to go through a few hand
| selected scrapers and only they will be able to offer AI
| browsing and research services.
| AznHisoka wrote:
| What about Cloudflare itself? It might constitute an abuse
| of sorts of their leadership position, but couldn't they
| dominate the AI research/agent market if they wanted? (Or
| maybe that's what you were implying too)
| tptacek wrote:
| I think the opposite problem is going to occur with
| captchas for whatever it's worth: LLMs are going to
| obsolete them. It's an arms race where the defender has a
| huge constraint the attacker doesn't (pissing off real
| users); in that way, it's kind of like the opposite
| dynamics that password hashes exploit.
| lasermike026 wrote:
| I think I'm in love.
| ai-christianson wrote:
| This is pretty awesome--great to see this use of smolagents. I
| analyzed the code with RA.Aid and here's what it says about how
| it works: The Open Deep Research system
| implements a sophisticated multi-agent architecture for handling
| both textual and visual content through several key components:
| **1. Agent Hierarchy:** - Manager agent (CodeAgent)
| coordinates overall task processing - Specialized sub-
| agents handle specific types of tasks - Web browser
| agent with tools for searching and navigation - All
| agents maintain memory of intermediate steps **2.
| Core Components:** - SimpleTextBrowser: Text-based web
| browser with viewport management - TextInspectorTool:
| Handles document content analysis - VisualQATool:
| Processes image analysis and captions - Various web
| tools for search, navigation, and content inspection
| **3. Key Features:** - Multi-modal processing
| supporting text, web, and visual content - Hierarchical
| delegation of tasks to specialized components -
| Integrated memory management for tracking steps -
| Support for multiple file types with specialized handlers
| - Web search capabilities through SERP API - Visual
| analysis using IDEFICS and GPT-4 models - Markdown
| conversion for consistent text formatting **4. Tool
| Integration:** - Clear separation of responsibilities
| between tools - Coordinated processing of different
| content types - Structured response formatting
| - Error handling for unsupported operations - Memory
| maintenance across operations **5. Content
| Processing:** - Web content handled by browser tools
| - Documents processed by text inspector - Images
| analyzed by visual QA tools - File type-specific
| conversion and handling - Support for large document
| processing This architecture enables systematic
| processing of complex queries involving multiple types of content
| while maintaining clear separation of concerns and coordinated
| information flow between components.
|
| Pretty cool approach! Here's the gist of the full research agent
| trace if anyone is interested: https://gist.github.com/ai-
| christianson/43447275d5cc0966b1b6...
| transpute wrote:
| https://techcrunch.com/2025/02/04/hugging-face-researchers-a...
|
| _> On GAIA, a benchmark for general AI assistants, Open Deep
| Research achieves a score of 54%. That's compared with OpenAI
| deep research's score of 67.36%..Worth noting is that there are a
| number of OpenAI deep research "reproductions" on the web, some
| of which rely on open models and tooling. The crucial component
| they -- and Open Deep Research -- lack is o3, the model
| underpinning deep research._
|
| Blog post, https://huggingface.co/blog/open-deep-research
| swyx wrote:
| theres always a lot of openTHING clones of THING after THING is
| announced. they all usually (not always[1]!) disappoint/dont
| get traction. i think the causes are
|
| 1. running things in production/self hosting is more annoying
| than just paying like 20-200/month
|
| 2. openTHING makers often overhype their superficial repros ("I
| cloned Perplexity in a weekend! haha! these VCs are clowns!")
| and trivializing the last mile, most particularly in this
| case...
|
| 3. long horizon planning trained with RL in a tight loop that
| is not available in the open (yes, even with deepseek). the
| thing that makes OAI work as a product+research company is that
| products are never launched without first establishing a
| "prompted baseline" and then finetuning the model from there
| (we covered this process in https://latent.space/p/karina
| recently) - which becomes an evals/dataset suite that
| eventually gets merged in once performance impacts stabilize
|
| 4. that said, smolagents and HF are awesome and I like that
| they are always this on the ball. how does this make money for
| HF?
|
| ---
|
| [1]: i think opendevin/allhands is a pretty decent competitor
| to devin now
| tshepom wrote:
| Maybe these open projects start to get more attention when we
| have a distribution system/App Store for AI projects. I know
| YC is looking to fund this https://www.ycombinator.com/rfs
| transpute wrote:
| Is "AI appstore" envisioned for Linux edge inference
| hardware, e.g. PC+NPU, PC+GPU, Nvidia Project Digits? Or
| only in the cloud?
|
| Apple probably wouldn't accept a 3rd-party AI app store on
| MacOS and iOS, except possibly in the EU. Perhaps antitrust
| regulation could lead to Android becoming a standalone
| company, including support for competing AI workflows
| upghost wrote:
| Hey I liked your comments but I got a little whiplash on the
| 4th item. Are you saying that HF/smolagents are _not_ guilty
| of points 1 /2 and 3(?), or they _are_ guilty but you like
| them anyway? Just trying to calibrate the comment a bit,
| sorry for being obtuse.
| littlestymaar wrote:
| Except that in this particular case (like in many others as
| far as AI goes, actually), the open version came before, by
| three full months: https://www.reddit.com/r/LocalLLaMA/commen
| ts/1gvlzug/i_creat...
|
| OpenAI pretty much never acknowledge prior art in their
| marketing material because they want you to believe they are
| the true innovators, but you should not take their marketing
| claims for granted.
| transpute wrote:
| Thanks for the pointer,
| https://github.com/TheBlewish/Automated-AI-Web-Researcher-
| Ol...
| rvz wrote:
| Of course. The first of many open source versions of 'Deep
| Research' projects are now appearing as predicted [0] but in less
| than a month. Faster than expected.
|
| Open source is already at the finish line.
|
| [0] https://news.ycombinator.com/item?id=42913379
| transpute wrote:
| _> Nothing that Perplexity + DeepSeek-R1 can already do_
|
| Any public comparisons of OAI Deep Research report quality with
| Perplexity + DeepSeek-R1, on the same query?
|
| How do cost and query limits compare?
| bossyTeacher wrote:
| So basically, Altman announced Deep Research less than a month
| ago and open-source alternatives are already out? Investors are
| not going to be happy unless OpenAI outperforms them all by an
| order of magnitude
___________________________________________________________________
(page generated 2025-02-04 23:00 UTC)