duplicate content ses ny 2009

9
Duplicate Content & Multiple Site Issues Sasi Parthasarathy Program Manager, Microsoft

Upload: melissa-powell

Post on 25-Jan-2015

592 views

Category:

Technology


0 download

DESCRIPTION

Sasi Parthasarathy, Program Manager for Live Search at Microsoft talks about duplicate content & multiple site issues at SES NY.

TRANSCRIPT

Page 1: Duplicate Content SES NY 2009

Duplicate Content & Multiple Site

Issues

Sasi Parthasarathy

Program Manager, Microsoft

Page 2: Duplicate Content SES NY 2009

Topics covered

• Duplicate content

– Internal content -> URL Canonicalization

– External content -> Spam, Geo-targeting

• Content Syndication

• Good practices

• Examples Examples Examples

Page 3: Duplicate Content SES NY 2009

URL canonicalization

• Less is more - expose only one URL per piece of content – pretty

please

• The practice of consolidating all versions of a page under one URL is

referred to as "canonicalization"

• Helps the search engine; at the same time does not split your rank juice

• Having too many duplicate URLs will waste crawl time – the crawler might

spend time indexing duplicate URLs and miss good content

• 4 ways to get to microsoft.com but we need only one

1. microsoft.com

2. www.microsoft.com

3. www.microsoft.com/en/us/default.aspx

4. www.microsoft.com/en/us/

Page 4: Duplicate Content SES NY 2009

Few recommendations for canonicalization

• Select WWW or Non-WWW, then redirect the other option to your

preferred version

• Remove the default filename from the end of your URLs

– All web servers allow you to select one or more default filenames to serve when

the browser requests a directory. Check and see if the default filename is at the

end of the URL and then trim it off

• Link internally to the canonical form of your URL

– Make sure you always link to the proper canonical form of your URLs from within

your site

• Remove query string variables or rewrite to readable URLs

– http://www.mysite.com/downloads/details.aspx?FamilyID=ab99&displaylang=en

to

http://www.mysite.com/downloads/en/family/ab99

Page 5: Duplicate Content SES NY 2009

Why duplicate content?

• Your intention is the key

• If your intent is to manipulate the search engine, you will

be penalized

Example1: Multiple domains with very little or no

difference in content and no clear intent why these

domains exist

Example2: If you are trying to falsely promote original

content as your own (please report any issues with

copied content to Live Search support)

Page 6: Duplicate Content SES NY 2009

Going International – Help Search Engines

You may have similar pages but for various regions.

Problems for search engines with geo-targeting:

• No standardized way to tell a search engine which region or

language your content is targeted for

• Top level domains may not indicate the intended audience. For

example, http://ma.tt/, an English site or Orange.com, a French

Telecom site hosted in France.

• Using search unfriendly redirection techniques

Page 7: Duplicate Content SES NY 2009

Few indicators - Help Live Search while Geo-

targeting

• Country code top-level domain (ccTLD). For example, .ca

specifically targets users in Canada

• Set all your domains in Live Search webmaster tools and make it

explicit for the region

These indicators will help us show the correct page for the correct

market

Page 8: Duplicate Content SES NY 2009

Content Syndication

• Syndicate with caution: For sites that syndicate their content on

other sites

• From our perspective, we always want to show the version we think

is appropriate to the user. This may not be the version you want or

prefer.

• Tip:

Ask your partner to use robots.txt to stop us from indexing the syndicated material

Page 9: Duplicate Content SES NY 2009

General tips to help the Search Engine

• Dynamic URLs – if the content is not changing, don’t have too many

parameters

• 301 is your best friend – use them when you can

• No 302 hijack!!

• When you do a site update, don’t have links to expired pages

• Use robots.txt for anything you don’t want crawlers to crawl

• Consistent naming convention – easy for search engines to

understand

• Follow standard URL formation practices