Can Google Properly Crawl and Index JavaScript Frameworks? A JS SEO Experiment | Onely

Read original article here

Home / Blog / Blog / Can Google Properly Crawl and Index JavaScript Frameworks? A JS SEO Experiment
Can Google Properly Crawl and Index JavaScript Frameworks? A JS SEO Experiment
Bartosz Góralewicz • Published: 14 May 2017 • Edited: 07 Oct 2022
We wanted to know how much JavaScript Googlebot could read, crawl and index. To achieve that, we built a website – https://jsseo.expert/ . Each subpage had content generated by different JavaScript frameworks. We tracked server logs, crawling, and indexing to find which frameworks are fully crawlable and indexable by Google.
JavaScript SEO Experiment Findings:
Inline vs. External vs. Bundled JavaScript makes a huge difference for Googlebot.
Seeing content in Google Cache doesn’t mean it is indexed by Google.
If you want to know which frameworks work well with SEO, but don’t want to go through the experiment’s documentation, click here to scroll straight to the results section and see the charts presenting the data.
Why I Created This Experiment
In recent years, developers have been using JavaScript rich technology, believing Google can crawl and index JavaScript properly. In most cases, developers point to this Google announcement as proof that Google’s technical guidelines allow JavaScript rich websites.
Yet, there are multiple examples online of such decisions going badly. One of the most popular examples of JavaScript SEO gone bad is Hulu.com’s case study .
Even though there are tons of data and case studies clearly showing Google’s problems with JavaScript crawling and indexation, more and more websites are being launched with client-side JavaScript rendering (meaning that Googlebot or your browser needs to process JavaScript to see the content).
I believe Google’s announcement was widely misunderstood. Let me explain why.
Most developers reference this section of Google’s blog post :
Times have changed. Today, as long as you’re not blocking Googlebot from crawling your JavaScript or CSS files, we are generally able to render and understand your web pages like modern browsers. To reflect this improvement, we recently updated our technical Webmaster Guidelines to recommend against disallowing Googlebot from crawling your site’s CSS or JS files.
In the same article, you will find a few more statements that are quite interesting, yet overlooked:
Sometimes things don’t go perfectly during rendering, which may negatively impact search results for your site.
It’s always a good idea to have your site degrade gracefully. This will help users enjoy your content even if their browser doesn’t have compatible JavaScript implementations. It will also help visitors with JavaScript disabled or off, as well as search engines that can’t execute JavaScript yet.
Sometimes the JavaScript may be too complex or arcane for us to execute, in which case we can’t render the page fully and accurately.
Unfortunately, even some well-respected websites in the JavaScript development community seem to be overly optimistic about Google’s ability to crawl and index JavaScript frameworks.
Source: https://scotch.io/tutorials/angularjs-seo-with-prerender-io
The best web developers are well aware of JavaScript indexing issues, and if you want to see it first-hand, watch just a few minutes from the video below:
Jeff Whelpley
Angular U conference, June 22-25, 2015, Hyatt Regency, San Francisco Airport
“Angular 2 Server Rendering”
If you search for any competitive keyword terms, it’s always gonna be server rendered sites. And the reason is because, although Google does index client rendered HTML, it’s not perfect yet and other search engines don’t do it as well. So if you care about SEO, you still need to have server-rendered content.
Jeff Whelpley was working with Tobias Bosch on server rendering for Angular 2. Tobias Bosch is a software engineer at Google who is part of the Angular core team and works on Angular 2.
Unfortunately, I didn’t find any case studies, documentation, or clear data about how Google crawls and indexes different JavaScript frameworks. JavaScript SEO is definitely a topic that will soon become very popular, but there is no single article explaining to JavaScript SEO beginners how to start diagnosing and fixing even basic JavaScript SEO problems.
[UPDATE: Google acknowledged that they use Chrome 41 for rendering . It has since made the debugging process a lot easier and faster.]
This experiment is the first step in providing clear, actionable data on how to work with websites based on the JS framework used.
Now that we have discussed the why of this test, let’s look at how we set it up.
Setting Up the Website
The first step was to set up a simple website where each subpage is generated by a different framework. As I am not a JavaScript developer, I reached out to a good friend of mine and the smartest JavaScript guy I know – Kamil Grymuza. With around 15 years of experience in JavaScript development, Kamil quickly set up a website for our experiment:
The core of the website was coded 100% in HTML to make sure it is fully crawlable and indexable. It gets interesting when you open one of the subpages:
The structure of the subpages was dead simple. The whole page was plain HTML with a single red frame for JavaScript generated content. With JavaScript disabled, inside the red frame was empty.
JavaScript Enabled:
JavaScript Disabled:
At this point, our experiment was more or less ready to go. All we needed now was content.
Content
Our “Hello World” pages got indexed a few hours after we launched the website. To make sure there was some unique content we could “feed” Googlebot, I decided to hire artificial intelligence to write the article for us. To do that, we used Articoloo , which generates amazing content written by AI.
I decided the theme of our articles would be based on popular tourist destinations.
This is how the page looks after adding the content. Everything you see in a red frame is generated by JavaScript framework (in the case of the screenshot below – by VUE JS).
Having indexed content is only half the battle, though. A website’s architecture can only work properly if Googlebot can follow the internal and external links.
JavaScript Links
Links were always a problem with client-rendered JavaScript. You never knew if Google was going to follow the JS links or not. In fact, some SEOs still use JavaScript to “hide links”. I was never a fan of this method; however, does it even make sense from a technical point of view? Let’s find out!
We’ve found a very simple method to find out if Google was following the JavaScript generated links of a specific JS framework. We added a link into each framework’s JavaScript generated content, creating a kind of honeypot for Googlebot. Each link was pointing to http://jsseo.expert/*framework*/test/.
Let me show you an example:
To make it even easier to track, the links pointed to the *framework*/test/ URLs.
The link generated by the Angular 2 page ( https://jsseo.expert/angular2/ ) would point to https://jsseo.expert/angular2/t e s t/ (spaces added to avoid messing up the experiment with a live link!). This made it really easy to track how Googlebot crawls /test/ URLs. The links weren’t accessible to Googlebot in any other form (external links, sitemaps, GSC fetch etc.).
Tracking
To track if Googlebot visited those URLs, we tracked the server logs in Loggly.com . This way, I would have a live preview of what was being crawled by Googlebot while my log data history would be safely stored on the server.
Next, I created an alert to be notified about visits to any */test/ URL from any known Google IP addresses.
Methodology
The methodology for the experiment was dead simple. To make sure we measured everything precisely and to avoid false positives or negatives:
We had a plain HTML page as a reference to make sure Googlebot could fully access our website, content, etc.
We tracked server logs. Tools – Loggly for a live preview + full server logs stored on the server (Loggly has limited log retention time).
We carefully tracked the website’s uptime to make sure it was accessible for Googlebot. Tools – NewRelic, Onpage.org, Statuscake.
We made sure all resources (CSS, JS) were fully accessible for Googlebot.
All http://jsseo.expert/*FRAMEWORK-NAME*/test/ URLs were set to noindex, follow, and we carefully tracked if Googlebot visited any of /test/ pages via custom alerts setup in Loggly.com.
We kept this experiment secret while gathering the data (to prevent someone from sharing the test URL on social or fetching it as Googlebot to mess with our results). Of course, we couldn’t control crawlers, scrapers and organic traffic hitting the website after it got indexed in Google.
EDIT 5/25/2017
After getting feedback on this experiment from John Mueller and seeing different results across different browsers/devices, we won’t be continuing to look at cache data while proceeding with this experiment. It doesn’t reflect Googlebot’s crawling or indexing abilities.
JavaScript Crawling and Indexing Experiment – Results
After collecting all the data, we created a simple methodology to analyze all the findings pouring in.
There were five key checks we used for each JavaScript framework.
Experiment Checklist
Fetch and render via Google Search Console – does it render properly?
Is the URL indexed by Google?
Is the URL’s content visible in Google’s cache?
Are the links displayed properly in Google’s cache?
Search for unique content from the framework’s page.
Check if ”*framework*/test/” URL was crawled.
Let’s go through this checklist by looking at the Angular 2 framework. If you want to follow the same steps, check out the framework’s URL here.
1. Fetch and render via Google Search Console – does it render properly?
As we can see, Google Search Console couldn’t render the content within the red frame (JavaScript generated content) so the result of this test is obviously: FAIL.
[UPDATE 09/28/2017: It turned out that because of errors in the Angular.io Quickstart that we populated in our experiment, Google was not able to render this page.
In the Angular.io Quickstart, there were examples of a code written in ES6 syntax: “let resolvedURL = url”. Google Web Rendering service doesn’t support ES6, so it was not able to render the code. It was not only Google, as Internet Explorer

Images Powered by Shutterstock