diagnosing technical issues with search engine optimization
Post on 14-Sep-2014
23.007 views
DESCRIPTION
If your site is having trouble ranking well in search engines such as Google, you've lost ranking, or you've having trouble with a site move or migration, the trouble could be with the site's technical architecture.View checklists to help diagnose issues with crawling, indexing, and ranking your site's content.TRANSCRIPT
Tools and Tactics for Diagnosing Technical Search Issues
Vanessa Fox
Diagnostic Checklists and Resources
• Search Accessibility Checklist• Search Discoverability Checklist• Diagnostic Tools
janeandrobot.com
Search Engine Tools
Created by NineByBlue.com
Google Webmaster Centralhttp://www.google.com/webmasters
Microsoft Live Search Webmaster Centerhttp://webmaster.live.com
Yahoo! Site Explorerhttp://siteexplorer.search.yahoo.com
Google Analyticshttp://www.google.com/analytics
Google Searchhttp://www.google.com
Ranking and Diagnostic Tools
Created by NineByBlue.com
SEOBook Rank Checkerhttp://tools.seobook.com/firefox/rank-checker/
Firefox Web Developer Toolbarhttps://addons.mozilla.org/en-US/firefox/addon/60 Firefox Firebughttp://getfirebug.com/
Firefox Live HTTP Headershttps://addons.mozilla.org/en-US/firefox/addon/3829
Google Searchhttp://adlab.msn.com/Keyword-Forecast/default.aspx
http://janeandrobot.com/resources
How Search Engines Work
Crawling
Discover linksCheck robots rulesBandwidth considerationsURLs
Indexing
CanonicalizationContext extractionTopic associationWeb-wide value
Ranking
RelevanceValueUniquenessDisplay
Search Engine Crawlers Haven’t Quite Grown Up Yet
Crawling
Lack of discoveryCrawl inefficiencyURL issues (infinite, redirects, dynamic)Inaccessible links
Indexing
DuplicationExtraction issuesLack of exposed contentNon-optimized media
Ranking
Display issuesLack of quality linksGuidelines violationsNon-focused content
Step 1: Get the Data
Pages crawledPages indexedWeb trafficKey ranking metrics
Crawling
Indexing
Ranking
Which pages have the search engines crawled?
What kind of pages are they?
Has the search engine indexed all of the crawled pages?
How’s the search engine traffic?
Benchmarking
Top ten queries that bring search trafficSearch results positionURL that ranks
Crawl Issues
Crawl Log Example: Apache Log Analyzer 2 Feed
1 /** 2 * @see ApacheLogAnalyzer2Feed 3 */ 4 require_once 'ApacheLogAnalyzer2Feed.php'; 5 6 // create a new instance, parse access.log and 7 // write test.xml 8 $tool = new ApacheLogAnalyzer2Feed('access.log', 9 'test.xml'); 10 // select entries matching Googlebot useragent 11 $tool->addFilter('User-Agent', 'Mozilla/5.0 12 (compatible; Googlebot/2.1; 13 +http://www.google.com/bot.html)'); 14 // run 15 $tool->run(); 12
http://code.simonecarletti.com/wiki/apachelog2feed
1 /** 2 * @see ApacheLogAnalyzer2Feed 3 */ 4 require_once 'ApacheLogAnalyzer2Feed.php'; 5 6 // create a new instance, parse access.log and write test.xml 7 $tool = new ApacheLogAnalyzer2Feed('access.log', 'test.xml'); 8 // select entries matching Googlebot useragent with a regular 9 expression pattern 10 $tool->addFilter('User-Agent', 'regexp:Googlebot'); 11 // select entries with Request matching a regular expression 12 // pattern 13 $tool->addFilter('Request', 'regexp:/site/profile\.php'); 14 // run 15 $tool->run(); 16
All Pages Google’s Crawled
All Profile Pages Google’s Crawled
Communicating with Search Robots
Extractable Link Issues: Flash
Extractable Link Issues: Images
Extractable Link Issues: AJAX
Extractable Link Issues: URL Errors
Extractable Link Issues: URLs That Expire
Comprehensive external links At least one internal link to every
page XML Sitemap referenced in
robots.txt with the comprehensive list of canonical URLs
Comprehensive HTML sitemap Ensure links load without
JavaScript, images, or other rich media
Ensure robots.txt and meta robots tag is used correctly
URL Discovery Checklist
http://janeandrobot.com/library/managing-robots-access-to-your-website
URL Structure Checklist
Keep number of parameters in dynamic URLs shortDon’t use temporary URLs that expire Ensure redirects are 301 and are shortUse dashes rather than underscores when separating wordsUse keywords in URLs for higher click through and better anchor text
Canonicalization Checklist
Have only URL for each pagePut all unneeded details in cookies, rather than URLs
(session IDs, tracking parameters)Don’t allow infinite parametersUse 301 redirects for any URL changes301 redirect www/non-wwwUse absolute URLs for internal linksEnsure canonical version is in XML SitemapUse rel=canonical attribute for optional parametersBlock print and other versions with robots.txt
http://janeandrobot.com/library/url-referrer-tracking
http://searchengineland.com/canonical-tag-16537
Crawl Efficiency Checklist
Ensure page load times aren’t slow as to reduce number of pages crawledEnsure server is responsiveReturn a 304 for unchanged contentUse compressionReturn a 404 for not found contentEnsure each page has at least one linkAvoid infinite redirects and redirect loopsEnsure most important pages are linked from home pageNo JavaScript redirects or meta refresh redirects (if possible)Reasonable crawl-delay setting (if used at all)Reasonable use of Google Webmaster Tools crawl setting
Indexing Issues
Indexing Example: XML Sitemaps
http://sitemaps.org
XML Sitemap
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/</loc> </url><url> <loc>http://www.example.com/page1.php</loc> </url> <url> <loc>http://www.example.com/page2.php</loc> </url> </urlset>
http://www.google.com/webmasters
Pages Indexed From Sitemap
Duplicate Content Issues
Partner Content
http://www.google.co.uk/search?q=%22The+Radisson+Edwardian+Vanderbilt+Hotel+stands+among+a+row+of+Victorian+townhouses+located+in+the+fashionable+Kensington+district+of+London,+England%22&hs=cN0&filter=0
Indexing Diagnostic Checklist
Have the pages ever been indexed?
If deindexed, are you sure they are no longer in the index?
Is the indexing loss across all engines?
What was percentage of loss?
Is there a pattern?
Check Google Webmaster Tools for errors/blocking
Did you change infrastructure/CMS/implement redirects?
What’s the linking pattern?
Indexing Checklist: Content Extraction
Ensure content is in text wherever possible Ensure text isn’t hidden in:
JavaScript/AJAXFlashVideoImages
Avoid multiple URLs for the same page and very similar pages
Indexing Checklist: Semantic MarkupUse keywords in title tagEnsure each page has a unique meta description tagUse keywords in (single) H1 tagAppropriate use of H2 – H6 tagRelevant anchor text in a href tagsPut Javascript in .js file (except onclick event functions)
and style details in .cssValidate HTML to ensure it rendersProvide focus for each pageEnsure pages provide unique and valuable content
beyond boilerplate template and reused content
Optimizing Images Don’t put text in images Use descriptive ALT text Use descriptive filenames Provide caption and surrounding text Be cautious about logo images Consider blocking non-useful images with robots.txt Don’t provide alternate text using CSS that styles the text off
the page (such as -9999)
http://janeandrobot.com/post/Effectively-Using-Images.aspx
RankingIssues
How’s the Search Engine Traffic?Overall Percentage Percentage Non-Branded
Do You Rank For the Right Things?arbor snowboards snowboard
Google 1 49 500+
Yahoo 1 80 500+
Live Search 3 128 500+
If ranking loss…
Drop For All Keywords
Does the site rank for different queries than before?
Did you substantially change the site content?
Did you change the underlying site infrastructure?
Was there a large change in linking behavior?
Could there be a penalty?
Drop For Only Some Keywords
Do different pages rank highest than used to rank before?
Are the pages that used to rank still indexed?
Ranking Checklist
Relevance What is the page about? Are the pages ranking for the desired query more relevant? Do the pages use the language of the searcher?
Value How many relevant links (and how authoritative are they?) What’s the value of the page? (do more useful pages rank above
yours?) SERP display
Are the title and snippet compelling? Do Sitelinks appear for navigational queries? What universal elements appear on the page?
Does the site rank for non-branded queries?
The Webmaster GuidelinesCommon Definition of Spam
On page schemes Keyword stuffingFake/ stolen contentHidden textHidden linksCloaking
Linking schemes Paid LinksLink exchangesDoorway pagesDeceptive redirects
http://google.com/support/webmasters/bin/answer.py?answer=35769
Getting Out of the Penalty Box
1. Check if you’ve been penalized– Live Search: http://webmaster.live.com – Google: http://google.com/webmasters ** maybe **
2. Review the webmaster guidelines– Google, Live Search, Yahoo
3. Identify the issue4. Fix it!5. Request re-evaluation– Google: http://google.com/webmasters – Live Search: http://webmaster.live.com
Traffic Issues
Traffic Drop
Display Issues
Would you click this link?
Does the Result Inspire Clicks?
First step in diagnosis: find the root
Ninebyblue.comTwitter.com/vanessafox
Jane and Robot Developer SummitJune 12th, 2009 – San FranciscoFREE for SMX attendees!
janeandrobot.comTwitter.com/janeandrobot