https://vercel.com/blog/how-google-handles-javascript-throughout-the-indexing-process
Skip to content
* Products
+ DX Platform
o
Previews
Helping teams ship 6x faster
o
AI
Powering breakthroughs
+ Managed Infrastructure
o
Rendering
Fast, scalable, and reliable
o
Observability
Trace every step
o
Security
Scale without compromising
+ Open Source
o
Next.js
The native Next.js platform
o
Turborepo
Speed with Enterprise scale
o
AI SDK
The AI Toolkit for TypeScript
* Solutions
+ Use Cases
o
AI Apps
Deploy at the speed of AI
o
Composable Commerce
Power storefronts that convert
o
Marketing Sites
Launch campaigns fast
o
Multi-tenant Platforms
Scale apps with one codebase
o
Web Apps
Ship features, not infrastructure
+ Users
o
Platform Engineers
Automate away repetition
o
Design Engineers
Deploy for every idea
* Resources
+ Tools
o
Resource Center
Today's best practices
o
Integrations
Extend and automate workflows
o
Templates
Jumpstart app development
o
Guides
Find help quickly
+ Company
o
Customers
Trusted by the best teams
o
Blog
The latest posts and changes
o
Changelog
See what shipped
* Enterprise
* Docs
* Pricing
Log InContact
Sign Up
- Back to Blog
Engineering
Wednesday, July 31st 2024
How Google handles JavaScript throughout the indexing process
MERJ and Vercel's research to demystify Google's rendering through
empirical evidence.
Posted by
Avatar for giacomozecchini
Giacomo Zecchini
R&D Director, MERJ
Avatar for alicemoore
Alice Alexandra Moore
Sr. Content Engineer, Vercel
Avatar for ryansiddle
Ryan Siddle
Managing Director, MERJ
Avatar for cramforce
Malte Ubl
CTO, Vercel
Understanding how search engines crawl, render, and index web pages
is crucial for optimizing sites for search engines. Over the years,
as search engines like Google change their processes, it's tough to
keep track of what works and doesn't--especially with client-side
JavaScript.
We've noticed that a number of old beliefs have stuck around and kept
the community unsure about best practices for application SEO:
1. "Google can't render client-side JavaScript."
2. "Google treats JavaScript pages differently."
3. "Rendering queue and timing significantly impact SEO."
4. "JavaScript-heavy sites have slower page discovery."
To address these beliefs, we've partnered with MERJ, a leading SEO &
data engineering consultancy, to conduct new experiments on Google's
crawling behavior. We analyzed over 100,000 Googlebot fetches across
various sites to test and validate Google's SEO capabilities.
Let's look at how Google's rendering has evolved. Then, we'll explore
our findings and their real-world impact on modern web apps.
---------------------------------------------------------------------
Article contents:
* The evolution of Google's rendering capabilities
* Methodology
* Myth 1: "Google can't render JavaScript content"
* Myth 2: "Google treats JavaScript pages differently"
* Myth 3: "Rendering queue and timing significantly impact SEO"
* Myth 4: "JavaScript-heavy sites have slower page discovery"
* Overall implications and recommendations
* Moving forward with new information
* About MERJ
The evolution of Google's rendering capabilities
Over the years, Google's ability to crawl and index web content has
significantly changed. Seeing this evolution is important to
understand the current state of SEO for modern web applications.
Pre-2009: Limited JavaScript support
In the early days of search, Google primarily indexed static HTML
content. JavaScript-generated content was largely invisible to search
engines, leading to the widespread use of static HTML for SEO
purposes.
2009-2015: AJAX crawling scheme
Google introduced the AJAX crawling scheme, allowing websites to
provide HTML snapshots of dynamically generated content. This was a
stopgap solution that required developers to create separate,
crawlable versions of their pages.
2015-2018: Early JavaScript rendering
Google began rendering pages using a headless Chrome browser, marking
a significant step forward. However, this older browser version still
had limitations in processing modern JS features.
2018-present: Modern rendering capabilities
Today, Google uses an up-to-date version of Chrome for rendering,
keeping pace with the latest web technologies. Key aspects of the
current system include:
1. Universal rendering: Google now attempts to render all HTML
pages, not just a subset.
2. Up-to-date browser: Googlebot uses the latest stable version of
Chrome/Chromium, supporting modern JS features.
3. Stateless rendering: Each page render occurs in a fresh browser
session, without retaining cookies or state from previous
renders. Google will generally not click on items on the page,
such as tabbed content or cookie banners.
4. Cloaking: Google prohibits showing different content to users and
search engines to manipulate rankings. Avoid code that alters
content based on User-Agent. Instead, optimize your app's
stateless rendering for Google, and implement personalization
through stateful methods.
5. Asset caching: Google speeds up webpage rendering by caching
assets, which is useful for pages sharing resources and for
repeated renderings of the same page. Instead of using HTTP
Cache-Control headers, Google's Web Rendering Service employs its
own internal heuristics to determine when cached assets are still
fresh and when they need to be downloaded again.
Today, Google's indexing process looks something like this.
Today, Google's indexing process looks something like this.
Today, Google's indexing process looks something like this.
Today, Google's indexing process looks something like this.
Today, Google's indexing process looks something like this.
Today, Google's indexing process looks something like this.
Today, Google's indexing process looks something like this.
Today, Google's indexing process looks something like this.
---------------------------------------------------------------------
With a better understanding of what Google is capable of, let's look
at some common myths and how they impact SEO.
Methodology
To investigate the following myths, we conducted a study using
Vercel's infrastructure and MERJ's Web Rendering Monitor (WRM)
technology. Our research focused on nextjs.org, with supplemental
data from monogram.io and basement.io, spanning from April 1 to April
30, 2024.
Data collection
We placed a custom Edge Middleware on these sites to intercept and
analyze requests from search engine bots. This middleware allowed us
to:
1. Identify and track requests from various search engines and AI
crawlers. (No user data was included in this query.)
2. Inject a lightweight JavaScript library in HTML responses for
bots.
The JavaScript library, triggered when a page finished rendering,
sent data back to a long-running server, including:
* The page URL.
* The unique request identifier (to match the page rendering
against regular server access logs).
* The timestamp of the rendering completion (this is calculated
using the JavaScript Library request reception time on the
server).
Data analysis
By comparing the initial request present in server access logs with
the data sent from our middleware to an external beacon server, we
could:
1. Confirm which pages were successfully rendered by search engines.
2. Calculate the time difference between the initial crawl and the
completed render.
3. Analyze patterns in how different types of content and URLs were
processed.
Data scope
For this article, we primarily focused on data from Googlebot, which
provided the largest and most reliable dataset. Our analysis included
over 37,000 rendered HTML pages matched with server-beacon pairs,
giving us a robust sample from which to draw conclusions.
We are still gathering data about other search engines, including AI
providers like OpenAI and Anthropic, and hope to talk more about our
findings in the future.
In the following sections, we'll dive into each myth, providing more
relevant methodology as necessary.
Myth 1: "Google can't render JavaScript content"
This myth has led many developers to avoid JS frameworks or resort to
complex workarounds for SEO.
The test
To test Google's ability to render JavaScript content, we focused on
three key aspects:
1. JS framework compatibility: We analyzed Googlebot's interactions
with Next.js using data from nextjs.org, which uses a mix of
static prerendering, server-side rendering, and client-side
rendering.
2. Dynamic content indexing: We examined pages on nextjs.org that
load content asynchronously via API calls. This allowed us to
determine if Googlebot could process and index content not
present in the initial HTML response.
3. Streamed content via React Server Components (RSCs): Similar to
the above, much of nextjs.org is built with the Next.js App
Router and RSCs. We could see how Googlebot processed and indexed
content incrementally streamed to the page.
4. Rendering success rate: We compared the number of Googlebot
requests in our server logs to the number of successful rendering
beacons received. This gave us insight into what percentage of
crawled pages were fully rendered.
Our findings
1. Out of over 100,000 Googlebot fetches analyzed on nextjs.org,
excluding status code errors and non-indexable pages, 100% of
HTML pages resulted in full-page renders, including pages with
complex JS interactions.
2. All content loaded asynchronously via API calls was successfully
indexed, demonstrating Googlebot's ability to process dynamically
loaded content.
3. Next.js, a React-based framework, was fully rendered by
Googlebot, confirming compatibility with modern JavaScript
frameworks.
4. Streamed content via RSCs was also fully rendered, confirming
that streaming does not adversely impact SEO.
5. Google attempts to render virtually all HTML pages it crawls, not
just a subset of JavaScript-heavy pages.
Myth 2: "Google treats JavaScript pages differently"
A common misconception is that Google has a separate process or
criteria for JavaScript-heavy pages. Our research, combined with
official statements from Google, debunks this myth.
The test
To test where Google treats JS-heavy pages differently, we took
several targeted approaches:
1. CSS @import test: We created a test page without JavaScript, but
with a CSS file that @imports a second CSS file (which would only
be downloaded and present in server logs upon rendering the first
CSS file). By comparing this behavior to JS-enabled paged, we
could verify if Google's renderer processes CSS any differently
with and without JS enabled.
2. Status code and meta tag handling: We developed a Next.js
application with middleware to test various HTTP status codes
with Google. Our analysis focused on how Google processes pages
with different status codes (200, 304, 3xx, 4xx, 5xx) and those
with noindex meta tags. This helped us understand if
JavaScript-heavy pages are treated differently in these
scenarios.
3. JavaScript complexity analysis: We compared Google's rendering
behavior across pages with varying levels of JS complexity on
nextjs.org. This included pages with minimal JS, those with
moderate interactivity, and highly dynamic pages with extensive
client-side rendering. We also calculated and compared the time
between the initial crawl and the completed render to see if more
complex JS led to longer rendering queues or processing times.
Our findings
1. Our CSS @import test confirmed that Google successfully renders
pages with or without JS.
2. Google renders all 200 status HTML pages, regardless of JS
content. Pages with 304 status are rendered using the content of
the original 200 status page. Pages with other 3xx, 4xx, and 5xx
errors were not rendered.
3. Pages with noindex meta tags in the initial HTML response were
not rendered, regardless of JS content. Client-side removal of
noindex tags is not effective for SEO purposes; if a page
contains the noindex tag in the initial HTML response, it won't
be rendered, and the JavaScript that removes the tag won't be
executed.
4. We found no significant difference in Google's success rate in
rendering pages with varying levels of JS complexity. At
nextjs.org's scale, we also found no correlation between
JavaScript complexity and rendering delay. However, more complex
JS on a much larger site can impact crawl efficiency.
Myth 3: "Rendering queue and timing significantly impact SEO
Many SEO practitioners believe that JavaScript-heavy pages face
significant delays in indexing due to a rendering queue. Our research
provides a clearer view of this process.
The test
To address the impact of rendering queue and timing on SEO, we
investigated:
1. Rendering delays: We examined the time difference between
Google's initial crawl of a page and its completion of rendering,
using data from over 37,000 matched server-beacon pairs on
nextjs.org.
2. URL types: We analyzed rendering times for URLs with and without
query strings, as well as for different sections of nextjs.org
(e.g., /docs, /learn, /showcase).
3. Frequency patterns: We looked at how often Google re-renders
pages and if there were patterns in rendering frequency for
different types of content.
Our findings
The rendering delay distribution was as follows:
* 50th percentile (median): 10 seconds.
* 75th percentile: 26 seconds
* 90th percentile: ~3 hours
* 95th percentile: ~6 hours
* 99th percentile: ~18 hours
The exact rendering delay distribution we found across over 37,000
matched server-beacon pairs.
The exact rendering delay distribution we found across over 37,000
matched server-beacon pairs.
The exact rendering delay distribution we found across over 37,000
matched server-beacon pairs.
The exact rendering delay distribution we found across over 37,000
matched server-beacon pairs.
The exact rendering delay distribution we found across over 37,000
matched server-beacon pairs.
The exact rendering delay distribution we found across over 37,000
matched server-beacon pairs.
The exact rendering delay distribution we found across over 37,000
matched server-beacon pairs.
The exact rendering delay distribution we found across over 37,000
matched server-beacon pairs.
Surprisingly, the 25th percentile of pages were rendered within 4
seconds of the initial crawl, challenging the notion of a long
"queue."
While some pages faced significant delays (up to ~18 hours at the
99th percentile), these were the exception and not the rule.
---------------------------------------------------------------------
We also observed interesting patterns related to how quickly Google
renders URLs with query strings (?param=xyz):
URL Type 50th 75th 90th
Percentile Percentile Percentile
All URLs 10 seconds 26 seconds ~3 hours
URLs without Query 10 seconds 22 seconds ~2.5 hours
String
URLs with Query String 13 seconds 31 minutes ~8.5 hours
This data suggests that Google treats URLs differently if they have
query strings that don't affect the content. For example, on
nextjs.org, pages with ?ref= parameters experienced longer rendering
delays, especially at higher percentiles.
Additionally, we noticed that frequently updated sections like /docs
had shorter median rendering times compared to more static sections.
For example, the /showcase page, despite being frequently linked,
showed longer rendering times, suggesting that Google may slow down
re-rendering for pages that don't change significantly.
Myth 4: "JavaScript-heavy sites have slower page discovery"
A persistent belief in the SEO community is that JavaScript-heavy
sites, especially those relying on client-side rendering (CSR) like
Single Page Applications (SPAs), suffer from slower page discovery by
Google. Our research provides new insights here.
The test
To investigate the impact of JavaScript on page discovery, we:
1. Analyzed link discovery in different rendering scenarios: We
compared how quickly Google discovered and crawled links in
server-rendered, statically generated, and client-side rendered
pages on nextjs.org.
2. Tested non-rendered JavaScript payloads: We added a JSON object
similar to a React Server Component (RSC) payload to the /
showcase page of nextjs.org, containing links to new, previously
undiscovered pages. This allowed us to test if Google could
discover links in JavaScript data that wasn't rendered.
3. Compared discovery times: We monitored how quickly Google
discovered and crawled new pages linked in different ways:
standard HTML links, links in client-side rendered content, and
links in non-rendered JavaScript payloads.
Our findings
1. Google successfully discovered and crawled links in fully
rendered pages, regardless of rendering method.
2. Google can discover links in non-rendered JavaScript payloads on
the page, such as those in React Server Components or similar
structures.
3. In both initial and rendered HTML, Google processes content by
identifying strings that look like URLs, using the current host
and port as a base for relative URLs. (Google did not discover an
encoded URL--i.e., https%3A%2F%2Fwebsite.com--in our RSC-like
payload, suggesting its link parsing is very strict.)
4. The source and format of a link (e.g., in an tag or embedded
in a JSON payload) did not impact how Google prioritized its
crawl. Crawl priority remained consistent regardless of whether a
URL was found in the initial crawl or post-rendering.
5. While Google successfully discovers links in CSR pages, these
pages do need to be rendered first. Server-rendered pages or
partially pre-rendered pages have a slight advantage in immediate
link discovery.
6. Google differentiates between link discovery and link value
assessment. The evaluation of a link's value for site
architecture and crawl prioritization occurs after full-page
rendering.
7. Having an updated sitemap.xml significantly reduces, if not
eliminates, the time-to-discovery differences between different
rendering patterns.
Overall implications and recommendations
Our research has debunked several common myths about Google's
handling of JavaScript-heavy websites. Here are the key takeaways and
actionable recommendations:
Implications
1. JavaScript compatibility: Google can effectively render and index
JavaScript content, including complex SPAs, dynamically loaded
content, and streamed content.
2. Rendering parity: There's no fundamental difference in how Google
processes JavaScript-heavy pages compared to static HTML pages.
All pages are rendered.
3. Rendering queue reality: While a rendering queue exists, its
impact is less significant than previously thought. Most pages
are rendered within minutes, not days or weeks.
4. Page discovery: JavaScript-heavy sites, including SPAs, are not
inherently disadvantaged in page discovery by Google.
5. Content timing: When certain elements (like noindex tags) are
added to the page is crucial, as Google may not process
client-side changes.
6. Link value assessment: Google differentiates between link
discovery and link value assessment. The latter occurs after
full-page rendering.
7. Rendering prioritization: Google's rendering process isn't
strictly first-in-first-out. Factors like content freshness and
update frequency influence prioritization more than JavaScript
complexity.
8. Rendering performance and crawl budget: While Google can
effectively render JS-heavy pages, the process is more
resource-intensive compared to static HTML, both for you and
Google. For large sites (10,000+ unique and frequently changing
pages), this can impact the site's crawl budget. Optimizing
application performance and minimizing unnecessary JS can help
speed up the rendering process, improve crawl efficiency, and
potentially allow more of your pages to be crawled, rendered, and
indexed.
Recommendations
1. Embrace JavaScript: Leverage JavaScript frameworks freely for
enhanced user and developer experiences, but prioritize
performance and adhere to Google's best practices for
lazy-loading.
2. Error handling: Implement error boundaries in React applications
to prevent total render failures due to individual component
errors.
3. Critical SEO elements: Use server-side rendering or static
generation for critical SEO tags and important content to ensure
they're present in the initial HTML response.
4. Resource management: Ensure critical resources for rendering
(APIs, JavaScript files, CSS files) are not blocked by
robots.txt.
5. Content updates: For content that needs to be quickly re-indexed,
ensure changes are reflected in the server-rendered HTML, not
just in client-side JavaScript. Consider strategies like
Incremental Static Regeneration to balance content freshness with
SEO and performance.
6. Internal linking and URL structure: Create a clear, logical
internal linking structure. Implement important navigational
links as real HTML anchor tags () rather than
JavaScript-based navigation. This approach aids both user
navigation and search engine crawling efficiency while
potentially reducing rendering delays.
7. Sitemaps: Use and regularly update sitemaps. For large sites or
those with frequent updates, use the tag in XML
sitemaps to guide Google's crawling and indexing processes.
Remember to update the only when a significant content
update occurs.
8. Monitoring: Use Google Search Console's URL Inspection Tool or
Rich Results Tool to verify how Googlebot sees your pages.
Monitor crawl stats to ensure your chosen rendering strategy
isn't causing unexpected issues.
Moving forward with new information
As we've explored, there are some differences between rendering
strategies when it comes to Google's abilities:
Static Incremental Server-Side Client-Side
Feature Site Static Rendering Rendering
Generation Regeneration (SSR) (CSR)
(SSG) (ISR)
Crawl efficiency: How
quickly and
effectively Google Excellent Excellent Very Good Poor
can access, render,
and retrieve
webpages.
Discovery: The
process of finding Excellent Excellent Excellent Average
new URLs to crawl.*
Rendering
completeness (errors,
failures, etc): How
accurately and Robust Robust Robust Might
completely Google can fail**
load and process your
web pages without
errors.
Rendering time: How
long Google takes to Excellent Excellent Excellent Poor
fully render and
process web pages.
Link structure After
evaluation: How rendering,
Google assesses links After After After links might
to understand the rendering rendering rendering be missing
website architecture if
and pages' rendering
importance. fails
Indexing: The process Might not
by which Google be indexed
stores and organizes Robust Robust Robust if
your site's content. rendering
fails
* Having an updated sitemap.xml significantly reduces, if not
eliminates, the time-to-discovery differences between different
rendering patterns.
** Rendering in Google usually doesn't fail, as proven in our
research; when it does, it's often due to blocked resources in
robots.txt or specific edge cases.
---------------------------------------------------------------------
These fine-grained differences exist, but Google will quickly
discover and index your site regardless of rendering strategy. Focus
on creating performant web applications that benefit users more than
worrying about special accommodations for Google's rendering process.
After all, page speed is still a ranking factor, since Google's page
experience ranking system evaluates the performance of your site
based on Google's Core Web Vitals metrics.
Plus, page speed is linked to good user experience--with every 100ms
of load time saved correlated to an 8% uptick in website conversion.
Fewer users bouncing off your page means Google treats it as more
relevant. Performance compounds; milliseconds matter.
Further resources
To learn more about these topics, we recommend:
* How Core Web Vitals affect your SEO: Provides a comprehensive
overview of how Core Web Vitals (CWVs) affect SEO, explaining
Google's page experience ranking system and the difference
between field data (used for ranking) and lab data (Lighthouse
scores).
* How to choose the right rendering strategy: Guides developers in
choosing optimal rendering strategies for web applications,
explaining various methods like SSG, ISR, SSR, and CSR, their use
cases, and implementation considerations using Next.js.
* The user experience of the Frontend Cloud: Explains how Vercel's
Frontend Cloud enables fast, personalized web experiences by
combining advanced caching strategies, Edge Network capabilities,
and flexible rendering options to optimize both user experience
and developer productivity.
Trusted by performance-critical applications.
Next.js and Vercel automatically optimize the performance of your
application to meet today's high standards. We can walk you through
how it works for your application.
Contact Us
About MERJ
MERJ is a leading SEO and data engineering consultancy specializing
in technical SEO and performance optimization for complex web
applications.
With a track record of success across various industries, MERJ brings
cutting-edge expertise to help businesses navigate the ever-evolving
landscape of search engine optimization.
If you need assistance with any of the SEO topics raised in this
research, or if you're looking to optimize your web application for
better search visibility and performance, don't hesitate to contact
MERJ.
Related reading
How Core Web Vitals affect SEO
Avatar for cramforceAvatar for alicemoore
Malte Ubl, Alice Alexandra Moore
How to choose the best rendering strategy for your app
Avatar for alicemoore
Alice Alexandra Moore
Posted by
Avatar for giacomozecchini
Giacomo Zecchini
R&D Director, MERJ
Avatar for alicemoore
Alice Alexandra Moore
Sr. Content Engineer, Vercel
Avatar for ryansiddle
Ryan Siddle
Managing Director, MERJ
Avatar for cramforce
Malte Ubl
CTO, Vercel
Related reading
How Core Web Vitals affect SEO
Avatar for cramforceAvatar for alicemoore
Malte Ubl, Alice Alexandra Moore
How to choose the best rendering strategy for your app
Avatar for alicemoore
Alice Alexandra Moore
Develop.Preview.Ship.
Vercel is the platform for frontend developers, providing the speed
and reliability innovators need to create at the moment
of inspiration.
Start Deploying
Tour the Product
(c) 2024
*
*
[ ]
Products
* Previews
* Next.js
* Rendering
* v0
* Observability
* Turbo
* Security
* Enterprise
* Changelog
* CLI & API
[ ]
Resources
* Docs
* Experts
* Pricing
* Guides
* Customers
* Help
* Integrations
* [?]K
* TemplatesCookie Preferences
[ ]
Company
* About
* Blog
* Careers
* Contact Us
* Next.js Conf
* Open Source
* Partners
* Security
* Privacy Policy
* Legal
*
*
Select a display theme:( )system( )light( )dark