https://research.google/blog/deep-researcher-with-test-time-diffusion/

Jump to Content
 
Research
 
Research

  * Who we are
    Back to Who we are menu
    -----------------------------------------------------------------
   
    Defining the technology of today and tomorrow.

      + Philosophy

        We strive to create an environment conducive to many
        different types of research across many different time scales
        and levels of risk.

        Learn more about our Philosophy Learn more
        Philosophy
      + People

        Our researchers drive advancements in computer science
        through both fundamental and applied research.

        Learn more about our People Learn more
        People
  * Research areas
    Back to Research areas menu
    -----------------------------------------------------------------
      + Research areas

          o Explore all research areas
        Research areas
        Back to Research areas menu
        -------------------------------------------------------------
          o Explore all research areas
      + Foundational ML & Algorithms

          o Algorithms & Theory
          o Data Management
          o Data Mining & Modeling
          o Information Retrieval & the Web
          o Machine Intelligence
          o Machine Perception
          o Machine Translation
          o Natural Language Processing
          o Speech Processing
        Foundational ML & Algorithms
        Back to Foundational ML & Algorithms menu
        -------------------------------------------------------------
          o Algorithms & Theory
          o Data Management
          o Data Mining & Modeling
          o Information Retrieval & the Web
          o Machine Intelligence
          o Machine Perception
          o Machine Translation
          o Natural Language Processing
          o Speech Processing
      + Computing Systems & Quantum AI

          o Distributed Systems & Parallel
Computing
          o Hardware & Architecture
          o Mobile Systems
          o Networking
          o Quantum Computing
          o Robotics
          o Security, Privacy, & Abuse
Prevention
          o Software Engineering
          o Software Systems
        Computing Systems & Quantum AI
        Back to Computing Systems & Quantum AI menu
        -------------------------------------------------------------
          o Distributed Systems & Parallel
Computing
          o Hardware & Architecture
          o Mobile Systems
          o Networking
          o Quantum Computing
          o Robotics
          o Security, Privacy, & Abuse
Prevention
          o Software Engineering
          o Software Systems
      + Science, AI & Society

          o Climate & Sustainability
          o Economics & Electronic Commerce
          o Education Innovation
          o General Science
          o Health & Bioscience
          o Human-Computer Interaction and Visualization
          o Responsible AI
        Science, AI & Society
        Back to Science, AI & Society menu
        -------------------------------------------------------------
          o Climate & Sustainability
          o Economics & Electronic Commerce
          o Education Innovation
          o General Science
          o Health & Bioscience
          o Human-Computer Interaction and Visualization
          o Responsible AI
  * Our work
    Back to Our work menu
    -----------------------------------------------------------------
      + Projects

        We regularly open-source projects with the broader research
        community and apply our developments to Google products.

        Learn more about our Projects Learn more
        Projects
      + Publications

        Publishing our work allows us to share ideas and work
        collaboratively to advance the field of computer science.

        Learn more about our Publications Learn more
        Publications
      + Resources

        We make products, tools, and datasets available to everyone
        with the goal of building a more collaborative ecosystem.

        Learn more about our Resources Learn more
        Resources
  * Programs & events
    Back to Programs & events menu
    -----------------------------------------------------------------
   
    Shaping the future, together.

    Collaborate with us
      + Student programs

        Supporting the next generation of researchers through a wide
        range of programming.

        Learn more about our Student programs Learn more
        Student programs
      + Faculty programs

        Participating in the academic research community through
        meaningful engagement with university faculty.

        Learn more about our Faculty programs Learn more
        Faculty programs
      + Conferences & events

        Connecting with the broader research community through events
        is essential for creating progress in every aspect of our
        work.

        Learn more about our Conferences & events Learn more
        Conferences & events
    Collaborate with us
  * Careers
  * Blog

[                    ]
Search
[Deep-Resea]

 1. Home
 2. Blog

Deep researcher with test-time diffusion

September 19, 2025

Rujun Han and Chen-Yu Lee, Research Scientists, Google Cloud

We introduce Test-Time Diffusion Deep Researcher (TTD-DR), a
framework that uses a Deep Research agent to draft and revise its own
drafts using high-quality retrieved information. This approach
achieves new state-of-the-art results in writing long-form research
reports and completing complex reasoning tasks.

Quick links

  * Paper
  * Share
      +  
      +  
      +  
      +  
      + 
        [https://research.goo] Copy link
        x

The recent advances in large language models (LLMs) have fueled the
emergence of deep research (DR) agents. These agents demonstrate
remarkable capabilities, including the generation of novel ideas,
efficient information retrieval, experimental execution, and the
subsequent drafting of comprehensive reports and academic papers.

Currently, most public DR agents use a variety of clever techniques
to improve their results, like performing reasoning via
chain-of-thought or generating multiple answers and selecting the
best one. While they've made impressive progress, they often bolt
different tools together without considering the iterative nature of
human research. They're missing the key process (i.e., planning,
drafting, researching, and iterating based on feedback) on which
people rely when writing a paper about a complex topic. A key part of
that revision process is to do more research to find missing
information or strengthen your arguments. This human pattern is
surprisingly similar to the mechanism of retrieval-augmented
diffusion models that start with a "noisy" or messy output and
gradually refine it into a high-quality result. What if an AI agent's
rough draft is the noisy version, and a search tool acts as the
denoising step that cleans it up with new facts?

Today we introduce Test-Time Diffusion Deep Researcher (TTD-DR), a DR
agent that imitates the way humans do research. To our knowledge,
TTD-DR is the first research agent that models research report
writing as a diffusion process, where a messy first draft is
gradually polished into a high-quality final version. We introduce
two new algorithms that work together to enable TTD-DR. First,
component-wise optimization via self-evolution enhances the quality
of each step in the research workflow. Then, report-level refinement
via denoising with retrieval applies newly retrieved information to
revise and improve the report draft. We demonstrate that TTD-DR
achieves state-of-the-art results on long-form report writing and
multi-hop reasoning tasks.

Test-Time Diffusion Deep Researcher

TTD-DR is designed to take a user query as input and then create a
preliminary draft that serves as an evolving foundation to guide the
research plan. This evolving draft is iteratively refined using a
denoising with retrieval process (report-level refinement) that takes
the information it finds and uses it to improve the draft at each
step. This happens in a continuous loop that improves the report with
each cycle. To top it all off, a self-evolution algorithm constantly
enhances the entire process, from the initial plan to the final
report. This powerful combination of refinement and self-improvement
leads to a more coherent report writing process.

Deep-Researcher-1

Illustration of TTD-DR. We designed it to imitate typical research
practices by performing iterative cycles of drafting and revision.

Backbone DR design

The backbone DR design consists of three stages that we outline
below.

 1. Research plan generation: Produces a structured research plan
    upon receiving a user query. This plan outlines a list of key
    areas needed for the final report, serving as an initial
    guideline for the subsequent information-gathering process.
 2. Iterative search: Contains two sub-agents: Search Question
    Generation (stage 2a in the figure below) formulates a search
    query based on the research plan, the user query, and the context
    from previous search iterations (i.e., past questions and
    answers). Answer Searching (stage 2b) searches the available
    sources to find relevant documents and returns a summarized
    answer, similar to retrieval-augmented generation (RAG) systems.
 3. Final report generation: Produces a comprehensive and coherent
    final report by combining all the structured information
    gathered, that is, the plan and the series of question-answer
    pairs.

Deep-Researcher-2

Our backbone DR agent operates in three stages. Stage 1 generates a
detailed research plan; Stage 2a iteratively generates search
questions and then uses a RAG-like system to synthesize precise
answers from retrieved documents (2b); Stage 3 synthesizes all
gathered information to produce the final report.

Component-wise self-evolution

We leverage a self-evolutionary algorithm to enhance the performance
of each stage's agents in order to find and preserve the high quality
context.

  * Initial states: The leftmost blocks in the diagram below
    represent multiple diverse answer variants based on the output of
    previous stages, which are used to explore a larger search space.
    This ideally leads to discovery of more valuable information.
  * Environmental feedback: Each answer variant is assessed by an
    LLM-as-a-judge, utilizing auto-raters for metrics, such as
    helpfulness and comprehensiveness. These raters not only provide
    fitness scores but also generate textual feedback that help
    improve the answer.
  * Revision: With the scores and feedback from the previous step,
    each variant undergoes a revision step to adapt toward better
    fitness scores. The environmental feedback and revision steps
    repeat until reaching some maximum number of iterations or until
    the agent determines no more revisions are needed.
  * Cross-over: Finally, multiple revised variants are merged into a
    single, high-quality output. This merging process consolidates
    the best information from all evolutionary paths, producing
    superior context for the main report generation process.

Deep-Researcher-3

Illustration of the component-wise self-evolution algorithm applied
to Search Answer (Stage 2b). The process starts with multiple
variants of initial answers, each undergoing a self-evolving episode
where it first interacts with the environment to obtain a fitness
score and feedback. It is then revised based on the feedback. This
process repeats until the maximum number of iterations is reached.
Finally, multiple revised variants from all episodes are merged to
produce the final answer.

Report-level denoising with retrieval

Since a preliminary noisy draft is useless for complex topics without
real research, TTD-DR uses a search tool that denoises and evolves
the draft.

Specifically, we feed the current draft report into the Search
Generation stage (Stage 2a) of the backbone DR workflow to inform the
generation of the next search query. After obtaining a synthesized
answer in the Answer Searching stage (Stage 2b), the new information
is used to revise the report draft, either by adding new details or
by verifying existing information. This process of feeding the
denoised report back to generate the next search query is repeated.
The draft is progressively denoised until the search process
concludes, at which point a final agent writes the final report based
on all historical search answers and revisions (Stage 3).

Results

We evaluate TTD-DR's performance using benchmark datasets that focus
on two broad tasks: 1) Complex queries that require research agents
to produce a long-form comprehensive report (DeepConsult) and, 2)
multi-hop queries that require extensive search and reasoning to
answer (Humanity's Last Exam [HLE] and GAIA). We sub-sample 200
queries from HLE that need more search and reasoning (HLE-Search).
Both categories fit into our objective of building a general-purpose,
real-world research companion. We compare our DR systems with OpenAI
Deep Research.

TTD-DR consistently achieves better results across all benchmarks.
Notably, when compared to OpenAI DR, TTD-DR achieves 74.5% win rate
for the long-form research report generation tasks. Additionally, it
outperforms OpenAI DR by 7.7% and 1.7% on the two extensive research
datasets with short-form ground-truth answers.

Deep-Researcher-4

TTD-DR's performance against different baseline systems for benchmark
datasets. Left: Win rates (%) are computed based on OpenAI DR. Right:
Correctness is computed as matching between system predicted and
reference answers. TTD-DR outperforms OpenAI DR with significant
margins.

Ablation study

For the ablation study, we incrementally add the three methods in the
section above. Our DR agents use Gemini-2.5-pro as the base model.
All other baseline agents use their default LLMs. The charts below
show the ablation study for our DR agents. The backbone DR agent
underperforms OpenAI DR. With the addition of the proposed
self-evolution algorithm, we observe that for DeepConsult, our system
outperforms OpenAI Deep Research with 59.8% win rates. The
Correctness scores on HLE-Search and GAIA datasets also show an
improvement of 4.4% and 1.2%. Finally, incorporating diffusion with
retrieval leads to substantial gains across all benchmarks.

Deep-Researcher-5

TTD-DR's performance by incrementally adding 1) backbone DR, 2)
self-evolution, and 3) diffusion with retrieval. We observe
step-by-step improvements across the board that help us achieve new
state-of-the-art results.

The Pareto-frontier diagram below further shows the test-time scaling
efficiency of TTD-DR compared with other DR agents. We found that
TTD-DR is more efficient than OpenAI DR, as with the same latency, it
achieves the better quality per win-rate. See the paper for more
details.

Deep-Researcher-6

Pareto-frontier of research report quality vs. latency in seconds.
The blue line indicates TTD-DR, whereas grey dots indicate compared
DR agents.

Conclusion

The Deep Researcher with Test-Time Diffusion (TTD-DR) is a new
framework inspired by the iterative way humans do research. This
agent addresses the limitations of existing DR agents by
conceptualizing report generation as a diffusion process. The TTD-DR
framework significantly outperforms existing DR agents across various
benchmarks requiring intensive search and multi-hop reasoning. It
demonstrates state-of-the-art performance in generating comprehensive
long-form research reports and identifying concise answers for
multi-hop search and reasoning tasks. We believe the reason it works
so well is its "draft-first" design, which keeps the whole research
process focused and coherent, preventing important information from
getting lost along the way.

Availability on Google Cloud Platform

A product version of this work is available on Google Agentspace,
implemented with Google Cloud Agent Development Kit.

Acknowledgements

This research was conducted by Rujun Han, Yanfei Chen, Guan Sun,
Lesly Miculicich, Zoey CuiZhu, Yuanjun (Sophia) Bi, Weiming Wen, Hui
Wan, Chunfeng Wen, Solene Maitre, George Lee, Vishy Tirumalashetty,
Xiaowei Li, Emily Xue, Zizhao Zhang, Salem Haykal, Burak Gokturk,
Tomas Pfister, and Chen-Yu Lee.

    Labels:
  * Machine Intelligence
  * Natural Language Processing

Quick links

  * Paper
  * Share
      +  
      +  
      +  
      +  
      + 
        [https://research.goo] Copy link
        x

Other posts of interest

  *  
    [TimeSeries]

    September 23, 2025

    Time series foundation models can be few-shot learners
      + Generative AI *
      + Machine Intelligence
  *  
    [Sensible-_]

    September 18, 2025

    Sensible Agent: A framework for unobtrusive interaction with
    proactive AR agents
      + Human-Computer Interaction and Visualization *
      + Machine Intelligence
  *  
    [SLED-0-Her]

    September 17, 2025

    Making LLMs more accurate by using all of their layers
      + Algorithms & Theory *
      + Generative AI *
      + Machine Intelligence

Follow us

  *  
  *  
  *  
  *  

 

  * About Google
  * Google Products
  * Privacy
  * Terms

  * Help
  * Submit feedback