one year in the life of a large website with botify
TRANSCRIPT
Twitter.com/botify
One year in the SEO life ofa large website
1Q Initial audit: assessing the situation
2Q Preparing for a migration and monitoring results
3Q Raising the flag immediately after a regression
4Q Making sure Google has recovered properly after a crawl issue
Measuring progress after one year
One year in the life of a large website
1Q Initial auditassessing the situation
April 2014: Audit
Explore the full website(Botify crawler)
Map to URLs receiving organic visits
(web server log files - 30 days)
Map to URLs crawled by Googlebot
(web server log files - 30 days)
Audit: A very large amount of pages, few of which are explored by Google
Audit: A very large amount of pages, few of which are explored by Google
Distribution of pages found by the Botify crawler, by subdomain
Audit: Most pages are user-generated content, actual content represents no more than 17%
Forum
Videos
wwwNews
Audit: Most Top Domains have low or very low SEO efficiency
forum videos www news
Google crawl rate
Active page rate (active: with visits from Google)
Audit: Zooming in on a part of the websiteThe example of video content
Audit: Zooming in on video content
Audit: Video contentToo much navigation generates volume and depth
● Too many navigation pages - much more than videos!● Video duplicates due to navigation● The good news: videos are not deep
videos
videos-dup-nav
categories
sub-categories
tag-navigation
nav-dup-display
Audit: Video contentGoogle has a partial view, only videos generate visits
videos videos-dup tag-navigation nav-duplicates
Distribution by depth of content pages discovered by the Botify crawler:
Audit: VideosNot so deep, low volume
So why are videos not more thoroughly crawled?
Audit: VideosSome videos neglected by internal linking
Average number of incoming links per video
2 links
between 20 and 300 links
more than 7000 links
In addition to audit reports: Daily server logs monitoring
Keeping an eye on Google's crawl
● By type of page
● By HTTP status code
● New pages discovery
May 2014: Google crawl soars on the forum subdomain
Number of distinct pages crawled every day by Google, with distribution by subdomain:
→ decision to disallow robots' crawl on the forum subdomain.
2Q Preparing for a migration and monitoring results
September 2014: Getting readyfor a migration Anticipate
● Crawl development version of website → validate accessibility to robots
● Compare to current website structure → impact on volume of pages, on depth for core content, etc.
Prepare ● List priority pages for redirects
-->active pages, most crawled…
● Export existing redirects → consolidate with new redirects
● Test new redirects→ Use an extract from Google's real crawl
Migration: monitoring Google's crawl when the new site is live
Monitoring redirects and errors
Old website
New website
Migration: monitoring Google's crawl when the new site is live
Monitoring redirects and errors for HTTP onlyThe migration included HTTP to HTTPS
New website
3Q Raising the flagimmediately after a regression
November 2014: Google's crawl soarson the news subdomain
Google crawl on news subdomain: total daily crawl volume and number of distinct crawled pages:
This surge is generated by new pages. Google crawl on news subdomain: new pages crawled in blue (never crawled before)
November 2014: Google's crawl on newsNew or existing pages?
These new pages are all redirects and 404 errors.
Google crawl on news subdomain by HTTP status code:
November 2014: Google's crawl on news Delivering content or not?
HTTP 301
HTTP 404
These new pages with redirects and 404 errors are navigation pages.Google crawl on news subdomain by HTTP status code and type of page:
November 2014: Google's crawl on news What types of pages?
→ regression is identified and corrected
4Q Making sure Googlehas recovered properlyafter a crawl issue
Google suddenly "loses" pages in February. We can see that they are recovered immediately afterwards: the problem was temporary.
Lost pages: returned HTTP 200(OK) before, and start returning another HTTP code (error or redirect) Recovered: returned an error or redirect before and start returning HTTP 200 (OK)
February 2015: Google is "losing" pages
Lost
Recovered
Content pages
Category pages
February 2015: Were these lost pages important ones?
Measuring resultsafter one yearand a number of SEO optimizations
A leaner website with a higher crawl rate
March 2014 March 2015
10% 74%
More active pages,less orphan pages
March 2014 March 2015
2% 27%
Less depth, faster pages, more links to each page
March 2014
March 2015
Twice more organic visits!
March 2014
March 2015
200K
400K