hngopher.com

       [HN Gopher] Open Deep Research
       ___________________________________________________________________
        
       Open Deep Research
        
       Author : transpute
       Score  : 135 points
       Date   : 2025-02-04 19:55 UTC (3 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | tkellogg wrote:
       | it's just an example, but it's great to see smolagents in
       | practice. I wonder how well the import whitelist approach works
       | for code interpreter security.
        
         | tptacek wrote:
         | I know some of the point of this is running things locally, but
         | for agent workflows like this some of this seems like a solved
         | problem: just run it on a throwaway VM. There's lots of ways to
         | do that quickly.
        
           | ATechGuy wrote:
           | VM is not the right abstraction because of performance and
           | resource requirements. VMs are used because nothing exists
           | that provides same or better isolation. Using a throwaway VM
           | for each AI agent would be highly inefficient (think wasted
           | compute and other resources, which is the opposite of what
           | DeepSeek exemplified).
        
             | tptacek wrote:
             | To which performance and resource requirements are you
             | referring? A cloud VM runs as long as the agent runs, then
             | stops running.
        
               | ATechGuy wrote:
               | I mean performance overheads of an OS process running in
               | a VM to (vs no VM) and additional resource requirements
               | for running a VM, including memory and additional kernel.
               | You can pull relevant numbers from academic papers.
        
             | vineyardmike wrote:
             | Is "DeepSeek" going to be the new trendy way to say to not
             | be wasteful? I don't think DS is a good example here.
             | Mostly because it's a trendy thing, and _the company still
             | has $1B in capex spend to get there_.
             | 
             | Firecracker has changed the nature of "VMs" into something
             | cheap and easy to spin up and throw away while maintaining
             | isolation. There's no reason not to use it (besides
             | complexity, I guess).
             | 
             | Besides, the entire rest of this is a python notebook. With
             | headless browsers. Using LLMs. This is entirely setting
             | silicon on fire. The overhead from a VM the least of the
             | compute efficiency problems. Just hit a quick cloud API and
             | run your python or browser automation in isolation and move
             | on.
        
               | tptacek wrote:
               | I'm not even talking about Firecracker; for the duration
               | of time things like these run, you could get a
               | satisfactory UX with basic EC2.
        
           | cma wrote:
           | The rise of captchas on regular content, no longer just for
           | posting content, could ruin this. Cloudflare and other
           | companies have set things up to go through a few hand
           | selected scrapers and only they will be able to offer AI
           | browsing and research services.
        
             | AznHisoka wrote:
             | What about Cloudflare itself? It might constitute an abuse
             | of sorts of their leadership position, but couldn't they
             | dominate the AI research/agent market if they wanted? (Or
             | maybe that's what you were implying too)
        
             | tptacek wrote:
             | I think the opposite problem is going to occur with
             | captchas for whatever it's worth: LLMs are going to
             | obsolete them. It's an arms race where the defender has a
             | huge constraint the attacker doesn't (pissing off real
             | users); in that way, it's kind of like the opposite
             | dynamics that password hashes exploit.
        
       | lasermike026 wrote:
       | I think I'm in love.
        
       | ai-christianson wrote:
       | This is pretty awesome--great to see this use of smolagents. I
       | analyzed the code with RA.Aid and here's what it says about how
       | it works:                 The Open Deep Research system
       | implements a sophisticated multi-agent architecture for handling
       | both textual and visual content through several key components:
       | **1. Agent Hierarchy:**           - Manager agent (CodeAgent)
       | coordinates overall task processing           - Specialized sub-
       | agents handle specific types of tasks           - Web browser
       | agent with tools for searching and navigation           - All
       | agents maintain memory of intermediate steps              **2.
       | Core Components:**           - SimpleTextBrowser: Text-based web
       | browser with viewport management           - TextInspectorTool:
       | Handles document content analysis           - VisualQATool:
       | Processes image analysis and captions           - Various web
       | tools for search, navigation, and content inspection
       | **3. Key Features:**           - Multi-modal processing
       | supporting text, web, and visual content           - Hierarchical
       | delegation of tasks to specialized components           -
       | Integrated memory management for tracking steps           -
       | Support for multiple file types with specialized handlers
       | - Web search capabilities through SERP API           - Visual
       | analysis using IDEFICS and GPT-4 models           - Markdown
       | conversion for consistent text formatting              **4. Tool
       | Integration:**           - Clear separation of responsibilities
       | between tools           - Coordinated processing of different
       | content types           - Structured response formatting
       | - Error handling for unsupported operations           - Memory
       | maintenance across operations              **5. Content
       | Processing:**           - Web content handled by browser tools
       | - Documents processed by text inspector           - Images
       | analyzed by visual QA tools           - File type-specific
       | conversion and handling           - Support for large document
       | processing              This architecture enables systematic
       | processing of complex queries involving multiple types of content
       | while maintaining clear separation of concerns and coordinated
       | information flow between components.
       | 
       | Pretty cool approach! Here's the gist of the full research agent
       | trace if anyone is interested: https://gist.github.com/ai-
       | christianson/43447275d5cc0966b1b6...
        
       | transpute wrote:
       | https://techcrunch.com/2025/02/04/hugging-face-researchers-a...
       | 
       |  _> On GAIA, a benchmark for general AI assistants, Open Deep
       | Research achieves a score of 54%. That's compared with OpenAI
       | deep research's score of 67.36%..Worth noting is that there are a
       | number of OpenAI deep research "reproductions" on the web, some
       | of which rely on open models and tooling. The crucial component
       | they -- and Open Deep Research -- lack is o3, the model
       | underpinning deep research._
       | 
       | Blog post, https://huggingface.co/blog/open-deep-research
        
         | swyx wrote:
         | theres always a lot of openTHING clones of THING after THING is
         | announced. they all usually (not always[1]!) disappoint/dont
         | get traction. i think the causes are
         | 
         | 1. running things in production/self hosting is more annoying
         | than just paying like 20-200/month
         | 
         | 2. openTHING makers often overhype their superficial repros ("I
         | cloned Perplexity in a weekend! haha! these VCs are clowns!")
         | and trivializing the last mile, most particularly in this
         | case...
         | 
         | 3. long horizon planning trained with RL in a tight loop that
         | is not available in the open (yes, even with deepseek). the
         | thing that makes OAI work as a product+research company is that
         | products are never launched without first establishing a
         | "prompted baseline" and then finetuning the model from there
         | (we covered this process in https://latent.space/p/karina
         | recently) - which becomes an evals/dataset suite that
         | eventually gets merged in once performance impacts stabilize
         | 
         | 4. that said, smolagents and HF are awesome and I like that
         | they are always this on the ball. how does this make money for
         | HF?
         | 
         | ---
         | 
         | [1]: i think opendevin/allhands is a pretty decent competitor
         | to devin now
        
           | tshepom wrote:
           | Maybe these open projects start to get more attention when we
           | have a distribution system/App Store for AI projects. I know
           | YC is looking to fund this https://www.ycombinator.com/rfs
        
             | transpute wrote:
             | Is "AI appstore" envisioned for Linux edge inference
             | hardware, e.g. PC+NPU, PC+GPU, Nvidia Project Digits? Or
             | only in the cloud?
             | 
             | Apple probably wouldn't accept a 3rd-party AI app store on
             | MacOS and iOS, except possibly in the EU. Perhaps antitrust
             | regulation could lead to Android becoming a standalone
             | company, including support for competing AI workflows
        
           | upghost wrote:
           | Hey I liked your comments but I got a little whiplash on the
           | 4th item. Are you saying that HF/smolagents are _not_ guilty
           | of points 1 /2 and 3(?), or they _are_ guilty but you like
           | them anyway? Just trying to calibrate the comment a bit,
           | sorry for being obtuse.
        
           | littlestymaar wrote:
           | Except that in this particular case (like in many others as
           | far as AI goes, actually), the open version came before, by
           | three full months: https://www.reddit.com/r/LocalLLaMA/commen
           | ts/1gvlzug/i_creat...
           | 
           | OpenAI pretty much never acknowledge prior art in their
           | marketing material because they want you to believe they are
           | the true innovators, but you should not take their marketing
           | claims for granted.
        
             | transpute wrote:
             | Thanks for the pointer,
             | https://github.com/TheBlewish/Automated-AI-Web-Researcher-
             | Ol...
        
       | rvz wrote:
       | Of course. The first of many open source versions of 'Deep
       | Research' projects are now appearing as predicted [0] but in less
       | than a month. Faster than expected.
       | 
       | Open source is already at the finish line.
       | 
       | [0] https://news.ycombinator.com/item?id=42913379
        
         | transpute wrote:
         | _> Nothing that Perplexity + DeepSeek-R1 can already do_
         | 
         | Any public comparisons of OAI Deep Research report quality with
         | Perplexity + DeepSeek-R1, on the same query?
         | 
         | How do cost and query limits compare?
        
       | bossyTeacher wrote:
       | So basically, Altman announced Deep Research less than a month
       | ago and open-source alternatives are already out? Investors are
       | not going to be happy unless OpenAI outperforms them all by an
       | order of magnitude
        
       ___________________________________________________________________
       (page generated 2025-02-04 23:00 UTC)