analysis of the modern methods for web positioning

55
Faculty of Computer Science and Management field of study: Computer Science specialization: Software Engineering Master thesis Analysis of the Modern Methods for Web Positioning Paweł Kowalski keywords: search engine, SEO, personalization, optimization, web positioning Thesis contains an analysis of personalization of search results mechanism in popular search engines. It presents experiments and considerations about impact of personalization on search rankings and how it affects on Search Engine Optimization (SEO). There are proposed some new SEO methods that take an advantage of personalization in search engines. Supervisor: dr inż. Dariusz Król ............................. ............................. name and surname grade signature Do celów archiwalnych pracę dyplomową zakwalifikowano do:* a) kategorii A (akta wieczyste) b) kategorii BE 50 (po 50 latach podlegające ekspertyzie) * niepotrzebne skreślić Stamp of the institute Wrocław 2010

Upload: pawel-kowalski

Post on 19-May-2015

5.426 views

Category:

Documents


0 download

DESCRIPTION

Master Thesis

TRANSCRIPT

Page 1: Analysis Of The Modern Methods For Web Positioning

Faculty of Computer Science and Managementfield of study: Computer Sciencespecialization: Software Engineering

Master thesis

Analysis of the Modern Methodsfor Web Positioning

Paweł Kowalski

keywords:search engine, SEO, personalization, optimization, web positioning

Thesis contains an analysis of personalization of search results mechanism in popular search engines. It presents experiments and considerations about impact of personalization on search rankings and how it affects on Search Engine Optimization (SEO). There are proposed some new SEO methods that take an advantage of personalization in search engines.

Supervisor: dr inż. Dariusz Król ............................. .............................name and surname grade signature

Do celów archiwalnych pracę dyplomową zakwalifikowano do:* a) kategorii A (akta wieczyste) b) kategorii BE 50 (po 50 latach podlegające ekspertyzie)

* niepotrzebne skreślićStamp of the institute

Wrocław 2010

Page 2: Analysis Of The Modern Methods For Web Positioning

Contents

Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1. Beginning of the SEO Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2. Search Engines Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3. The Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Chapter 2. Personalized Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1. Operation Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2. Methods for the Analysis of Behavioural Data . . . . . . . . . . . . . . . . . . 7

2.2.1. Methods of behavioural data collecting . . . . . . . . . . . . . . . . . . 82.2.2. Process of tracking user . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3. Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3.1. Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3.2. Phrase language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3.3. Search history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4. Spam Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Chapter 3. Impact of Personalized Search on SEO . . . . . . . . . . . . . . . . 21

3.1. Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.1. Areas for Consideration . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Chapter 4. SEO Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1. Website Presentation in Google Search . . . . . . . . . . . . . . . . . . . . . . 264.1.1. Title . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.1.2. Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.1.3. Sitelinks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2. Website Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2.1. Unique content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2.2. Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.3. Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.3.1. Headers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.3.2. Highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.3.3. Alternative texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.3.4. Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.4. Internal Linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.4.1. Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

i

Page 3: Analysis Of The Modern Methods For Web Positioning

4.4.2. Links anchors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.4.3. Broken links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.4.4. Nofollow attribute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.4.5. Sitemap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.5. Addresses and Redirects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.5.1. Friendly addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.5.2. Redirect 301 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.6. Other Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.6.1. Information for robots . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.6.2. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Chapter 5. The System for Global Personalization . . . . . . . . . . . . . . . . 37

5.1. Problems to Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.2.1. Web Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2.2. Cooperation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.2.3. Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.3. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.4. Visitor Session Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.5. Task Assignation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.6. Proof Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.6.1. Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.6.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Chapter 6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

ii

Page 4: Analysis Of The Modern Methods For Web Positioning

Abstract

Modern search engines are constantly improved. The most recent big stepintroduced into their algorithms concern personalization mechanism. Its goal is toextract information about user’s preferences implicitly from its search behaviourand also such factors as location, phrase language and search history. Thisinformation is the basis for building a user’s search profile. Motivation of thisprocess is to provide more relevant search results for specific user and its interests.Thesis concern details of this personalization mechanism and try to examinehow various factors affect search results. Author also analyse the methods forcollecting behavioural data by search engines. He approaches to define possibleimpact of customization of search results on the Search Engine Optimization(SEO) issues like metrics, spam filtering or changes in significance of websiteoptimization factors. Then author tries to evaluate the possibility of personalizedsearch rankings manipulation through proposed system for generating human-likeweb traffic.

Streszczenie

Nowoczesne wyszukiwarki internetowe są ciągle ulepszane. Ostatni duży krok na-przód wprowadzony w ich algorytmach dotyczy mechanizmu personalizacji. Jegozadaniem jest zdobycie informacji na temat preferencji użytkownika z jego za-chowania podczas wyszukiwania informacji w wyszukiwarce pod względem takichczynników jak jego lokalizacja, język wyszukiwanej frazy i historia wyszukiwań.Te informacje są podstawą do utworzenia profilu użytkownika. Celem tych dzia-łań jest zwrócenie konkretnemu użytkownikowi rezultatów wyszukiwania bardziejodpowiadających jego zainteresowaniom. W pracy tej znajduje się szczegółowaanaliza mechanizmu personalizacji oraz próba zbadania jak poszczególne czyn-niki wpływają na wyniki wyszukiwań. Autor analizuje także metody pozyskiwaniadanych na temat zachowań użytkowników przez wyszukiwarki. Podejmuje próbęokreślenia możliwego wpływu dostosowywania wyników wyszukiwania do użyt-kownika na tematy związane z pozycjonowaniem witryn internetowych takie, jakmetryki, filtrowanie spamu lub zmiany w znaczeniu poszczególnych czynnikóww optymalizacji stron WWW. Następnie autor próbuje ocenić możliwość mani-pulowania spersonalizowanymi wynikami wyszukiwania poprzez zaproponowanysystem służący do generowania naturalnie wyglądającego ruchu sieciowego.

iii

Page 5: Analysis Of The Modern Methods For Web Positioning

Chapter 1

Introduction

Before the Web and present-day search engines, searching meant simple matching theterms in a query to the exact appearance of these terms in a database filled with textualdocuments. Some database searches let you only locate documents where certain wordsappeared within a defined distance from other specified words from the same document.Sorting documents by relevance or importance would have been a monumental task, ifpossible at all.

1.1. Beginning of the SEO Concept

When the Internet was introduced, it revolutionized the worldwide share of information.Free access for everyone to this web without any restrictions is the reason why theInternet is considered to be one of the greatest inventions of the 20th century. But thisfreedom in the Internet has serious implication – many problems with the organizationof this enormous set of information.

Hyperlinks turned out to be insufficient for the issue. This is why the first search engineswas introduced. They quickly became the main source of visits in the commercial web-sites. A good search results started to be very important issue for content publishers.Those moment was the beginning of the SEO1 concept which is still the major elementof the Internet marketing.

The early search engines like AltaVista or Lycos were launched around 1994–1995 [7].Their algorithms have only been analysing the content of websites and keywords inmeta tags. It was easy to circumvent these algorithms by placing false informationinto keywords tags. Another popular fraud was filling website content with irrelevanttext, which was visible only for search engine robots, but not for the user. As a result,search engine result pages (hereafter SERPs) contained websites filled with spam andinappropriate content [21].

1. Search Engine Optimization (SEO) – the process of improving the volume or quality of traffic toa website from search engines. It is also used to take Web Positioning term as a synonym.

Page 6: Analysis Of The Modern Methods For Web Positioning

Chapter 1. Introduction 2

1.2. Search Engines Evolution

However, the relevance of the search results to the query is still based on keywordmatching. But search engines started to understand differences in the importance ofwords located in different parts of a page. For example, if you searched for a certainphrase, pages containing those words in their titles and headlines might be consideredmore relevant than other pages where those words also appeared, but not in those“important” parts of pages.

Google company, which started in September 1999, revolutionized search engines. Itsco-founders, Larry Page and Siergiej Brin, developed PageRank algorithm [3]. This al-gorithm redefined search issue. Content of the websites became slightly less significant.Instead of text content, PageRank rates the websites mainly on the basis of quantityand quality of links leading to these websites. With help of such improvements, theInternet works as a kind of voting system. Every link is a vote for a website which leadsto.

Relevance was also found by indexing words that link to other pages. If a link leading toa page used the phrase ”american basketball” as anchor text2, the page being pointedto would be considered relevant to the American basketball. The existence of links topages also has been used to help define the perceived importance of a page. Informationabout the quality and quantity of links to a page can be used by search engines to geta sense of implied importance of the page being linked.

Nevertheless, after a short time, the techniques [21] spoiling the results of PageRankhave also been discovered. Basically, most of them work in such a way to increase thenumber of links leading to a particular website to enhance its PageRank value. Suchactivity in a large scale is usually called linkbaiting. There are many scripts and webcatalogs to facilitate and to automate such activity. However, Google is also constantlyworking to improve their search engine algorithm. They try to make it resistant tolinkbaiting. According to [11], many new factors are being introduced into websitesevaluation process, in order to reduce the impact of linkbaiting which is a sort ofspam.

Besides, there is a limit to the effectiveness of this type of keyword matching. Whentwo people perform a search at one of the major search engines, there is a chance thateven if they use the same search terms, they might be looking for something completelydifferent. For example, when an anthropologist searches for phrase ”jaguar” he expectswebsites with information about big cats as a result. But he can also receive a collectionof websites about Jaguar cars instead.

As search engines progressed and users were given more and more websites with valu-able information, the engines needed to respond with a refined approach to search.The main idea to improve relevance of the search results was to better understandthe user’s intent and expectation then he types a certain phrase into a search box. Soit seems that the next step in search engines improvement is tracking regular usersin the Internet. Collecting data on their activity might give useful information about

2. Link label or link title, text in a hyperlink visible and clickable by the user.

Page 7: Analysis Of The Modern Methods For Web Positioning

Chapter 1. Introduction 3

which websites are valuable for them. The major engines such as Google, Yahoo andBing guard their search secrets closely, so one can never be absolutely certain how theyare operating. But they are evolving, and personalization seems to be the wave of thefuture.

1.3. The Goal

It seems to be quite clear that search engines fitted with personalization mechanismwould have two main benefits:

1. Improvement of search results relevance for specific user.

2. Decrease of number of spam entries in SERPs.

Google, the leader in search engine market, is already the first steps behind in this area.Such information reaching us from the official company blog [13]. Moreover they alreadyhas several patents connected with the personalization mechanism. For this reason thisthesis will be mainly concerned with the Google Search. But high competition in theInternet market suggests, that other popular search engines like Yahoo and Bing arealso being improved in this direction.

The goal of this thesis is to analyse the possible aspects of the personalization mech-anism in Google Search, on the basis of available information. There will be sev-eral factors taken into account, which can have an influence on the changeability ofSERPs:

• geolocation

• language of the query

• web search history

• query complexity

• search behaviour (e.g. bounce rates3, time of visits)

It will be the base for several experiments which should determine how advanced isthe current level of personalization introduced into considered search engine. This re-search also includes an analysis of the data used to describe users’ search behaviour –particularly the methods of collecting these data and types of them.

The obtained results will be used to specify the potential impact on SEO and itsmetrics, in particular:

• the possibility of using personalization of search engine to create a new SEOtechniques

3. Bounce rate is a term used in website traffic analysis. It essentially represents the percentage ofinitial visitors to a site who ”bounce” away to a different site, rather than continue on to otherpages within the same website.

Page 8: Analysis Of The Modern Methods For Web Positioning

Chapter 1. Introduction 4

• usability of website ranking in search results as the measure of success in SEOactivity

Personalization is the opportunity for search engines to make spam less significant forsearch results and make SEO workers’ life harder. But it is only the spam in the sense ofcollection of websites with content of no value for human and irrelevant hyperlinks. Butpersonalization also opens door for a another kind of information noise – behaviouraldata spam. The thesis presents the architecture of the distributed system generatingartificial web traffic, therefore, imitating search activity of a real user. However, usingsuch system can be seen as unethical, so the thesis contains only a conception and adesign. Author has no intention to implement such system, but he tries to examinewith available tools if building it would be reasonable. In this way might be indicatedpossible harmful actions, whom search engines should be protected against.

After that, there is a short analysis of the known up-to-date information about signif-icant factors in web positioning. Together with results of the personalization research,they helped to prepare a collection of advices how to build a website attractive forsearch engines. It is a sort of a guide for webmasters.

At the end of thesis there is short conclusion. It contains author’s few thoughts aboutfuture trends in search engines and SEO.

Page 9: Analysis Of The Modern Methods For Web Positioning

Chapter 2

Personalized Search

Pretschner [27] in 1999 wrote: With the exponentially growing amount of informationavailable on the Internet, the task of retrieving documents of interest has become in-creasingly difficult. Search engines usually return more than 1,500 results per query,yet out of the top twenty results, only one half turn out to be relevant to the user. Onereason for this is that Web queries are in general very short and give an incompletespecification of individual users’ information needs.

To be more specific, Speretta [31] in 2005 wrote: [...] most common query length sub-mitted to a search engine (32.6%) was only two words long and 77.2% of all querieswere three words long or less. These short queries are often ambiguous, providing littleinformation to a search engine on which to base its selection of the most relevant Webpages among millions.

According to Wikipedia, Google in 2006 has indexed over 25 billion web pages, 400million queries per day, 1.3 billion images, and over one billion Usenet messages. TheInternet grows very quickly. For this reason search accuracy is crucial area for con-stant improvement in modern search engines. One of the major solutions to meet thechallenge is personalization.

Personalized search is simply an attempt to deliver more relevant and useful resultsto the end user (searcher) and minimize less useful results. Personalization mechanismuses information about user’s past actions and behaviour to specify his profile andmatch relevant search result to this profile. It should provide more useful set of resultsor a set of results with less irrelevant or spam entries. For this reason personalizedsearch seems to be desirable to the end user.

Google puts it in this way: Search algorithms that are designed to take your personalpreferences into account, including the things you search for and the sites you visit,have better odds of delivering useful results [13]. The goal is simple: to reduce spamand to deliver better results. This looks like dangerous weapon against SEO workerswhich are major offenders in generating spam.

Page 10: Analysis Of The Modern Methods For Web Positioning

Chapter 2. Personalized Search 6

Official information [13], [25], [38] indicates that Google is the only major search enginethat already introduce personalization mechanism. First personalized search result ap-peared almost 5 years ago [13], and from that time it has been constantly evolving.

2.1. Operation Principles

Of course details of Google’s search algorithms are not public. But it can be expected,that the main principles are based on the ideas which can be found in the scientificliterature.

According to [31], personalization can be applied to search in two different ways:

1. by providing tools that help users organizing their own past searches, preferences,and visited URLs

2. by creating and maintaining sets of user’s interests, stored in profiles, that can beused by retrieval process of a search engine to provide better results.

His research proved, that user profiles can be implicitly created out of the limitedamount of information available to the search engine itself. The profiles are built onthe basis of the user’s interactions with a particular search engine. Google has appliedthis second approach in their search engine, because they do not provide any additionaltools like toolbars or browser add-ons for personalizing search.

After [31]: In order to learn about a user, systems must collect personal information,analyze it, and store the results of the analysis in a user profile. Information can becollected from users in two ways: explicitly, for example asking for feedback such aspreferences or ratings; or implicitly, for example observing user behaviors such as thetime spent reading an on-line document.

Google Search do not provide any forms, which let users to specify their interestsand preferences. So to build user profile, this information must be collected in otherway. According to [31], user browsing histories are the most frequently used sourceof information to create interest profiles. But not only browsing history (like suchpresented in figure 2.1) is significant.

For example, a user after sending search query gets search results. He selects a specificentry seemed to be interesting. He clicks on it and the website is saved in his browsinghistory. However, user quickly realizes, that the selected website did not fit to hisinterests and goes back to the search results after few seconds. Such visit should bequalified rather negatively. So not only browsing history, but also a user’s behaviourshould be taken into consideration in the personalization mechanism.

Also studying a series of searches from the same user may offer a glimpse into mod-ified search behaviour. How does an individual change their queries after receivingunsatisfactory results? Are search terms shortened, lengthened or combined with newterms? There is much other information that a search engine might collect about a userwhen a search is performed – location, language preferences indicated in their browseror the type of device they are using (mobile phone, handheld or desktop). But how

Page 11: Analysis Of The Modern Methods For Web Positioning

Chapter 2. Personalized Search 7

Figure 2.1. Google Web History panel

such behavioural data can be collected by search engine? The answer is in the nextsection.

2.2. Methods for the Analysis of Behavioural Data

Search engine robots, hereafter crawlers [26] continuously gather information fromalmost every website on the Internet. It is well known that Google collects enormousamount of data through this process. The reason for this is that these data have thegreatest significance for search engine algorithm, thus classic web positioning methodsare based on links maintenance, mainly the acquisition of them.

Google processes these data and sorts the websites according to its value to the user.User sends queries to the search engine and gets appropriate SERPs. Because Googleknows what people search, they are able to determine the popularity of specific in-

Page 12: Analysis Of The Modern Methods For Web Positioning

Chapter 2. Personalized Search 8

formation on the Internet. But eventually, it is the user who decides which websiteis valuable for him and which is not. The value of a particular website is reflected inusers’ activity – websites whose links have been clicked and time between these actions.These information is called behavioural data.

It is reasonable to make all this data useful for search engine. Certainly, Google knowsthat, too. Probably this is the reason why they collect enormous amount of behaviouraldata in addition to data collected by crawlers. This kind of information is what thisstudy is the most interested in.

2.2.1. Methods of behavioural data collecting

The entire web is based on HTTP protocol which generates requests containing follow-ing information:

• IP address of the user making request which can be used in geolocation of thisuser,

• date and time of request,

• language spoken by the user,

• operating system of the user,

• browser of the user,

• address of the website which redirected user by the link to the requested website.

These HTTP requests are being used by Google in:

Click tracking – Google logs all of its users clicks on all of its services,

Forms – Google logs every piece of information typed into every sending form,

Javascript executing – requests and sometimes even more data are sent when auser’s browser executes the script embedded in website,

Web Beacons – small (1 pixel by 1 pixel) transparent images in its websites, whichcause sending a requests every time while user’s browser tries to download suchimages,

Cookies – small text information stored on user’s computer which lets Google trackusers’ movement around the web every time when they get on any page that hasGoogle advertisement.

But these all elements has to be placed on websites being indexed by Google’s crawlers.Fortunately for Google, they have a lot of services, very useful for Internet publishers.Because they are mostly free to use, webmasters gladly use them in their websites.

Google Analytics

One of these attractive services is Google Analytics. It generates detailed statisticsabout:

Page 13: Analysis Of The Modern Methods For Web Positioning

Chapter 2. Personalized Search 9

• the visitors to the website,

• the previous website of the visitor,

• activity (navigation) of the user in the website.

Figure 2.2. Google Analytics main panel

It is the most popular and one of the most powerful tool to examine the web traffic onour website. It gives the owner a lot of useful information about visitors on his website.Figure 2.2 shows a few features of this piece of software. The information which itprovides is certainly very interesting for the Google itself. For this reason there is alot of discussions between SEO workers about possible disadvantages of using GoogleAnalytics in SEO campaigns. It is because poor results of particular website indicatedthrough Google Analytics can inform Google’s search engine to decrease value of thiswebsite in search ranking. But this is only an unconfirmed speculation.

Google Toolbar

Another Google’s tool which provides them with even more valuable data is GoogleToolbar. It is the plug-in adding a few new features to popular browsers, mainly aquick access to Google’s services. One of these features is checking the PageRank value

Page 14: Analysis Of The Modern Methods For Web Positioning

Chapter 2. Personalized Search 10

for currently viewing webpage. This gives Google the information about every websitewhich the users with installed Toolbar are viewing.

Google AdSense

There is also Google AdSense - context advertising program for website publishers.There are millions of websites which uses this service to generate some financial profitsfor their authors. The effect of this is that all these websites are displaying ads publishedby Google’s servers. It can provide to Google a similar information as Google Analyticsand Google Toolbar.

Google Public DNS

The latest service launched by Google is Public DNS (Domain Name System) [23]. Itis said to be faster and more secure than others and this is how Google encourages usto start using their DNS. It can generate massive amount of information about webtraffic. Every single query to the DNS can be analyzed by its provider. So the morepopular their DNS will be, the better for Google. It can provide a lot of informationhelpful on defining websites popularity.

But because of DNS caching mechanisms [23], Google do not get all the desired infor-mation. DNS client sends query only when he or she wants to visit a domain for thefirst time. After he gets the IP address of this domain from DNS, he caches it for aninterval determined by a value called time to live (TTL). Every next visit during thisinterval does not send any query. Consequently, Google is still in need for other servicesto gather desired information about the activity of particular website’s visitors.

Other Google Services

Google has other very popular services like for example YouTube, Google Maps (Fig.2.3) etc. They allow users to embed objects like videos or maps on their own websites.There is also Google Reader which can indicate the popularity of particular websitesby counting their RSS1 subscribers.

There are many other ways for Google to gain the useful data [9]. In fact Google itselfadmits to the use of all described techniques in its privacy policy [12]. Most of thesedata are probably used by them to improve accuracy of their search engine and qualityof their services.

2.2.2. Process of tracking user

Described services can be a great source of behavioural data. It us no doubt on that. Buttracking user’s search activity process would be incomplete without data provided bysearch engine. The next few sections present how the tracking process looks like.

1. RSS (most commonly expanded as Really Simple Syndication) is a family of web feed formats usedto publish frequently updated works – such as blog entries, news headlines, audio, and video – ina standardized format

Page 15: Analysis Of The Modern Methods For Web Positioning

Chapter 2. Personalized Search 11

Figure 2.3. Google Maps example screen

Starting the session

When the user is opening the search engine site (typing the www.google.com addressin a browser), he sends an HTTP Request [37] to the server. This request contains theIP address of the user’s computer. Thanks to this information the search engine hasthe ability to relate following search queries with particular users. Each of them hasbeen assigned a unique session identifier, stored on the server. This is the beginning ofthe user’s search session.

The identifier expires after a certain period of user’s inactivity. In this way the searchsession is being terminated.

Sending the search query

The view presented on figure 2.4 should be known by every Internet surfer. This is theplace where user cane type his search query.

After the search query is sent, it is followed by two facts:

1. The query is stored into a database and connected with user’s session identifier.Then the personalization mechanism takes advantage from it.

2. The query is analysed and used by search algorithm to provide relevant searchresults to the user.

Page 16: Analysis Of The Modern Methods For Web Positioning

Chapter 2. Personalized Search 12

Figure 2.4. Google Search main screen

After that user receives an HTTP Response [37] with the search results as the HTMLcode.

Result selection

In the figure 2.5 there is presented one of the results of ”query example” with thehyperlink highlighted.

Figure 2.5. Example of the search result

Commonly click on the link sends HTTP Request to the server which link leads to. Soin this case, it should be sent to:

http://www.wisegeek.com/what-is-query-by-example.htm

But when you look into source code of the SERP, you will find URL like this:

http://www.google.com/url?sa=t&source=web&cd=6&ved=0CDMQFjAF&url=http://www.wisegeek.com/what-is-query-by-example.htm&ei=CekOTLT4EpHu0gTYitWXDg&usg=AFQjCNE3t34-kSehUAK8TFNwh5CV9K-OWg&sig2=PdwrnqnhLhowpC8t5-06bw

The most important thing which can be noticed is that the links in the SERPs lead toGoogle’s server. But the chosen website finally appears on the user’s screen, becauseGoogle’s server is doing URL redirection (forwarding). This technique bears the down-side of the short delay caused by the additional request to search engine server.

However, in this way search engine can log every user’s click in SERPs. What is more,there are some additional data in the result’s URL which probably provide some extrainformation. For example, the value of the cd parameter is the place number in the

Page 17: Analysis Of The Modern Methods For Web Positioning

Chapter 2. Personalized Search 13

current search ranking. What is more, this URL is can be seen by target server, becausebrowser are placing it into HTTP Request data as Referer field [37]. This fact is used bysoftware like Google Analytics to aggregate traffic sources of the website with Analyticsscripts. Thanks to this, the website owner can get the information of:

• the most popular search phrases that result in visits to his website

• the place in ranking of his website for particular search phrase and particular user(it can vary due to personalization mechanism)

Of course the same information is being taken into consideration by search engine.

Behavioural data extraction

According to many research [1], [6], [7], [10], [20] and [36], more than half of a website’svisits comes from the SERPs. In case of e-commerce websites, this value is even higher,because such services usually do not have regular visitors. They mostly come fromsearch engines (even 90% of visits) or from ad appearing on other websites.

According to the analysis of real users web traffic [29], typical user spends about 2hours per session and 5 minutes per page.

These statistics concerns a website of good quality, relevant to the user’s interests. Visiton website with poor content would be terminated as soon as after few seconds – socalled bounce. Such visit should indicate irrelevance of website selected by user, so itwould be desirable if it wouldn’t appear in the search results on particular phrase.

Not only what you select/interact with from a given set of search results (or the Adsserved with them), but what you do not select or have minimal interactions with(bounce rates) can have an effect. These metrics can be used to create a greater prob-ability model for future search result sets.

What is more, a mechanism based on cookies have been recently introduced on GoogleSearch. It allows to learn about every user’s (also not logged in any Google’s service)history of search queries from the last 180 days. Officially it is used to personalizeSERPs according to past interests of the user. Google [13] says about that: Becausemany people might search from a single computer, the browser cookie may be associatedwith more than one person’s search activity. For this reason, we don’t provide a methodfor viewing this signed-out search activity.

The diagram in figure 2.6 shows the process of tracking user which has not beensigned-in to Google Account.

This is what Google [14] says about personalized search for signed-out users:

When you search using Google, you get more relevant, useful search results, recom-mendations, and other personalized features. By personalizing your results, we hope todeliver you the most useful, relevant information on the Internet.

In the past, the only way to receive better results was to sign up for personalized search.Now, you can get customized results whenever you use Google. Depending upon whether

Page 18: Analysis Of The Modern Methods For Web Positioning

Chapter 2. Personalized Search 14

Search EngineUser

Show profile based search page

Search for relevant documents

Prepare search results[Else]

Re-rank found documents

Extract profile informationEnter search engine via URL

Type search query

Look for interesting website

[Found history cookies]

[Else]

[Found something interesting]

Click chosen website Log phrase-selection

Redirect to selected website

Save cookie

Visit the website

Go back to search engine

[Else]

[Visit longer

than 3 minutes]

Log bounce

Log visit

[Curiosity satisfied]

[Else]

Close browser

Figure 2.6. Activity diagram of a visit session

or not you’re signed in to a Google Account when you search, the information we usefor customizing your experience will be different:

Signed-in personalization: When you’re signed in, Google personalizes your searchexperience based on your Web History. If you don’t want to receive personalizedresults while you’re signed in, you can turn off Web History and remove it fromyour Google Account. You can also view and remove individual items from yourWeb History.

Signed-out customization: When you’re not signed in, Google customizes your searchexperience based on past search information linked to your browser, using a cookie.Google stores up to 180 days of signed-out search activity linked to your browser’scookie, including queries and results you click.

Page 19: Analysis Of The Modern Methods For Web Positioning

Chapter 2. Personalized Search 15

Table 2.1. Information used by Google Search in personalization

Signed-in PersonalizedSearch

Signed-out PersonalizedSearch

Place of datastorage

Web History, linked to GoogleAccount

On Google’s servers, linked toan anonymous browser cookie

Time intervalof data storage

Indefinitely or until remove it Up to 180 days

Searches usedto customize

Only signed-in search activity,and only if user is signed up forWeb History

Only signed-out search activity

2.3. Research

The goal of this section is to evaluate the current level of personalization based onseveral factors. For the task should be helpful this what Google [14] says about typesof results customizations:

When you use Google to search, we try to provide the best possible results. To do that,we sometimes customize your search results based on one or more factors:

Search history: Sometimes, we customize your search results based on your pastsearch activity on Google, such as searches you’ve done or results you’ve clicked.If you’re signed in to your Google Account and have Web History enabled, thesecustomizations are based on your Web History. If you’re signed in and don’t haveWeb History enabled, no search history customizations will be made. (Using WebHistory, you can control exactly what searches are stored and used to personalizeyour results. Learn about using Web History)

If you aren’t signed in to a Google Account, your search results may be customizedbased on past search information linked to your browser using a cookie. Becausemany people might be searching on one computer, Google doesn’t show a list ofprevious search activity on this computer. Learn how to turn off these customiza-tions

Location: We try to use information about your location to customize your searchresults if there’s a reason to believe it’ll be helpful (for example, if you search fora restaurant chain, you may want to find the one near you). If you’re signed into your Google Account, that customization may rely on a default location thatyou’ve previously specified (for example, in Google Maps). If you’re not signedin, the results may be customized for an approximate location based on your IPaddress.

If you’d like Google to use a different location, you can sign in to or create aGoogle Account and provide a city or street address. Your specific location will beused not only for customizing search results, but also to improve your experiencein Google Maps and other Google products.

Page 20: Analysis Of The Modern Methods For Web Positioning

Chapter 2. Personalized Search 16

2.3.1. Location

While you can search at google.com just about anywhere in the world, you can alsoaccess Google at a number of different country specific addresses, such as google.co.uk,www.google.fr, www.google.co.in. In fact, Google automatically redirects you on theproper domain using your IP address and determining your geolocation, Browser settingwith recommended language was clear in this experiment.

This experiment was performed in one location in Poland. However, to simulate requestfrom the other locations, there was used a similar software environment like describedin section 5.6.1. The used phrased ”jaguar” is multi-lingual, so the language of thephrase does not affect the search results.

The first query was sent through three Tor hosts, where the exit host was located inLos Angeles, California, United States. The result of the query is presented in the figure2.7 (only several first entries).

All website in this SERP are in English, which is the standing language at the describedlocation. Moreover, at the near bottom there are some places indicated on the GoogleMaps, which are physically close to the location of the exit host.

The second query was sent via exit host located in Erfurt, Thuringen, Germany. Thefigure 2.8 presents results of this query.

In Official Google Blog [13] is written, that the same query typed in multiple countriesmay deserve completely different results. Presented results clearly shows that thosewords are true.

Unfortunately author failed to check if a search for the query ”football” provides dif-ferent results in the US, the UK, and Australia, because the term refers to completelydifferent sports in those countries. But it is rather possible. A preferred country mightinclude the country of the searcher as well as other countries that searcher might findacceptable, such as showing search results from the United States to people located inCanada.

2.3.2. Phrase language

It is rather clear, that language of the searched phrase is significant for the results.Search engines, despite personalization, still use matching phrases to the content ofindexed pages as the major factor for evaluation of the search relevance. For this reason,the phrases identical in the semantic meaning but in other languages are completelydifferent in general.

So serving search results with pages in English about birds would be senseless if theuser typed phrase ”Vogel” in search box, which means ”bird” in German.

Page 21: Analysis Of The Modern Methods For Web Positioning

Chapter 2. Personalized Search 17

Figure 2.7. Search results of the query sent via host located in USA

2.3.3. Search history

The most interesting factor which is said to have influence in personalization mechanismin Google search engine is user’s search history. Figure 2.9 shows search results whichwas slightly modified by re-ranking based on search history. On the fifth position, rightafter two video thumbnails, there is a link to the website, which was visited 4 times(exact number of visits is visible on the right side of the hyperlink) used by the authorof this this study to gather information.

These fluctuations appear only when user is signed into Google Account, otherwisethere is no access to the web history (figure 2.1). This modified search result was not

Page 22: Analysis Of The Modern Methods For Web Positioning

Chapter 2. Personalized Search 18

Figure 2.8. Search results of the query sent via host located in Germany

the author’s intentionally. The phrase ”personalized search” was not the object of theexperiment. However, this result shows, that search history affects future search resultson similar areas of information.

To compare modified results with the original (without impact of personalization),there are two ways to disable results customization:

1. signing-out from Google Account

2. using ”View customization” which is available on the bottom of results screen

After using one of these options, we can check the original position of the visited websitein the ranking. In this particular case, the website holds 17th position in the resultswith no customization. So after personalization re-rank, there was position increase by12 places.

But the most important is the fact that this change shows visited website on the firstSERP of the search ranking. In most cases (more than 90% of searches) users does notgo beyond first page of the results. So such change in ranking causes huge incrementof the visitors via this phrase.

Page 23: Analysis Of The Modern Methods For Web Positioning

Chapter 2. Personalized Search 19

Figure 2.9. Search results personalized by user’s search history

Unfortunately author has failed in forcing search engine to re-rank search results in-tentionally. So after this experiment, approximation of the re-rank algorithm is impos-sible.

2.4. Spam Issue

There is a huge amount of value from getting to the top of the search results. Especiallyconsidering competitive phrases related to the business. This is a marketing area withmillions of dollars in it, quite often. So spammers are highly motivated, because thereis a lot of money at stake. Unfortunately regular users, searching for valuable contentare the main victims of these practices.

One of the more interesting parts about implicit/explicit user feedback during searchpersonalization process is that it can be very effective in dealing with spam. The morepersonalized the results, the less chance that spam will will appear in search ranking.Because in most cases, spammy websites are clickable by users (which are tricked bylink with false information about target website), but after realize the real value ofthose website, they quickly go away and do not come back to them.

Page 24: Analysis Of The Modern Methods For Web Positioning

Chapter 2. Personalized Search 20

Not only will this enable them to help limit spam through personalization, it wouldalso be a great source of query/click analysis for Google. It is worth to consider thatthe click data across multiple users shows that a given entry in a query space rarely isclicked, or shows a high bounce rate. Google might just use that signal as a dampeningfactor for spam result.

Page 25: Analysis Of The Modern Methods For Web Positioning

Chapter 3

Impact of Personalized Search onSEO

3.1. Metrics

For quite a long time, SEO workers were using position in search ranking for particularphrases as the indicator of the web positioning. Increase of the website position hasbeen always a desirable consequence of the SEO actions.

After implementation of personalization mechanism, the issue is not so simple. Al-though customizations in rankings are still not very influential (only one entry in wholeSERP), the highly visible benefits of the personalization suggest, that the impact willbe increasing. For this reason ranking position cannot be the major metrics of successany longer. It is because position in ranking of particular website can be different forevery user. Especially for those of them, which are regular visitors of this website.

Of course, this indicator is still measurable, because position monitors1 are not person-alization subject (they do not use cookies and Google Account). It can also give usefulinformation about position seen by users searching for concerned keywords for the firsttime. But it is the increase of inbound traffic which has always been the main moti-vation of SEO actions. So the major metrics of this actions should be closely relatedto this motivation. For example, those are metrics for SEO in time of personalizedsearch:

Number of unique visitors. Higher value indicates good result of the advertisingcampaign and gaining popularity among new customers.

Previous search queries. As an example: if the searcher has been recently searchingthe term ‘diabetes’ and submits a query for ‘organic food’ the system attemptsto learn and presents additional results relating to organic foods that are helpfulin fighting diabetes.

1. Software which automates monitoring of a website’s position in search rankings for given phrases

Page 26: Analysis Of The Modern Methods For Web Positioning

Chapter 3. Impact of Personalized Search on SEO 22

Previously presented results. Results that have been presented to the end usercan be omitted in future results for a given period of time in exchange for otherpotentially viable results.

User query selection. Past selected or preferred documents can be analysed andsimilar documents or linking documents can be used to refine subsequent results.Furthermore, certain documents types can be seen as preferred, in what would bea combination of Universal Search concepts. Common websites that accessed canalso be tagged as preferred locations for further weighting.

Selection and bounce rates (and user activity on website). An editorial scor-ing can be devised from the amount of time a user spends on a page, the amountof scrolling activity, what has been printed, or even what has been saved or book-marked. All can be used to further refine the ‘intent’ and ‘satisfaction’ with agiven result that has been accessed.

Advertising activity. The advertisements clicked on can also begin to add to aclearer understanding of the end users preferences and interests.

User preferences. The end user can also provide specific information as to personalinterests or location specific ranking prominence. It could also include favouritetypes of music or sports, inclusive of geo-graphic preferences such as a favouritesport in a given city.

Historical user patterns. A persons surfing habits over a given period of time (e.g.6 months) can also play a role in defining what is more likely to be of interestto them in a given query result. More recent information (on above factors) islikely to be weighted more than older historical performance metrics within a setof results.

Past visited sites. Many of the above metrics, such as time spent and scrolling on agiven web page or historical patterns and preferred locations can also be collectedin a variety of ways (invasive or non-invasive). Cookies actually save resources forthe Search Engine, an added benefit.

The advices how to improve values of such metrics are presented in the next chap-ter.

Higher position in rankings not always implicate more visitors. Moreover, there is nosignificant difference for positions between 6 and 10. Very often the proper websiteoptimization of page’s title and description visible in a SERP is more important andbrings more visitors than higher position. Better website titles and meta-descriptionswould have an advantage as getting the user to engage with the SERP listing uponinitial presentation would be at a premium. Quality content as well would begin to takeon a more meaningful role than it has in the past, as bounce rates and user satisfactionnow starts to play into actual search results rankings.

Page 27: Analysis Of The Modern Methods For Web Positioning

Chapter 3. Impact of Personalized Search on SEO 23

3.1.1. Areas for Consideration

Author’s experience in commercial SEO, which is closely related to the topic of mar-keting, is rather small. Thus this section is based on [16].

Demographics

It should be ensure to leverage any obvious demographics that may apply to your site.If it is geographic, topical (sports, politics) or even a given age group, ensuring that thisis targeted effectively is important in that the ‘topical’ nature of personalized searchcan group results prior to even ranking them. If the particular website is not clear ineach of these areas, it risks less weighting to tighter demographic starting documentsets. Even your off site activities (link building, Social Media Marketing etc.) shouldbe as tightly targeted as possible.

Relevance profile

Of particular interest is potential categorization in terms of topical relevance. Ensur-ing that your site provides a strong relevance train would be particularly valuable.Much like phrase based indexing and retrieval concepts, probabilities play a large role.When refining results the search engine looks at related probable matches. Througha concerned effort with on-site and off-site relevance strengthening, you increase theodds of making it to a given set of results in a world of ‘flux’. It never hurts to reviewthe concepts surrounding Phrase Based Indexing and Retrieval as many of the relatedpatents addressed deriving concepts/topics from phrases.

One would also have to imagine that tightening up the relevance profile in your SocialMedia Marketing efforts would also be beneficial to a tighter topical link profile. Fur-thermore, many topically targeted visitors that enter a site may bookmark (or passivecollection) your site which ads to the organic search profile without ever being includedin a search result. As such, there are many exterior opportunities to be had beyondthe traditional off-site SEO.

Keyword Targeting

Building out from your core terms will be important as far as understanding searchbehaviour. The long tail as we know it would be targeted towards potential queryrefinements on a given subset of searcher types. Building out logical phrase extensionsand potential query refinements would be something to look at. Furthermore, withchangeable personalized ranks we would measure SEO success in actual traffic andconversions which puts term targeting into a new light as far as nailing money termsand having a cohesive plan that targets query refinement long-tail opportunities.

Quality Content

In considering the value of a website, user interaction becomes a consideration as faras bounce rates, time spent on page and scrolling activities are concerned. Producing

Page 28: Analysis Of The Modern Methods For Web Positioning

Chapter 3. Impact of Personalized Search on SEO 24

compelling and resourceful content would be at a premium to best leverage thesetendencies of the system. If a searcher has selected and interacted with your site onmultiple occasions your site would be given weight in their personal rankings as wellas related topical and searcher types. The more effective a resource the greater theranking weight increase.

Search result conversion

Working with the page title, meta-description and snippets takes on a more importantrole in your SEO efforts when adjusting for personalized search. I dare say using ana-lytics and a form of split testing would be a great advantage as far as satisfying whatnot only ranks, but converts.

Freshness

Another area which may be important is document freshness in that people could beable to set default date ranges or the system could passively begin to see a pattern ofa user accessing more current content. Valuable website that has been ranking well fora year that may no longer be getting all the traffic that is has been used to. It shouldbe looked at updating such pages with fresh information, or creating new related pagesand pass the flow via internal links. Depending on the nature of the content (searchergroup profile) more current content may be more popular over the larger data set andthus newer content would be weighted more overall.

Site Usability

From a crawler or the end user perspective, having logical architecture and a qualityend user experience is also at a premium. If similar searcher types embark on similarpathways and related actions (bookmark, print, navigate, and subscribe to RSS) thenthis will give greater value to those target pages within that community of search types.This also furthers the relevance profile.

Analytics

It can be noticed there is a strong need for the use of analytics in understandingtraffic flows, understanding common pathways, bottlenecks, the paths to conversion,and much more. This data will be of immeasurable use in dealing with many of thefactors that can affect Personalized Web Positioning. This issue is closely connectedwith psychology (particularly behavioural targeting).

Page 29: Analysis Of The Modern Methods For Web Positioning

Chapter 4

SEO Guide

This chapter presents a set of areas for consideration during a process of website opti-mization. The prepared advices concern only areas which may have positive result ingaining popularity of the website. They should be helpful for achieve higher position insearch rankings which should increase the number of visitors. They also should make thewebsite more attractive for users. This fact probably will decrease bounce rate whichwhich has negative impact on the website in personalization re-ranking process.

Concerned areas in this guide do not take into account any SEO techniques which areconnected with external actions. So those that require contact with other websites,such as:

• linking (the acquisition of links), free or paid

• advertising

• presell pages1

Listed methods are closely connected with generating spam. Due to this, they reducethe rate of quality content in the Internet. So the Internet surfers has no benefits fromthem.

This chapter is based mainly on the information from [13], [15], [21] and [36].

1. Presell page – page created only for SEO purpose. Text on such page is only a surrounding for linkleading to positioned website. Content has no value for human reader because. It is only preparedto look like natural for crawlers, not to be filtered as spam.

Page 30: Analysis Of The Modern Methods For Web Positioning

Chapter 4. SEO Guide 26

4.1. Website Presentation in Google Search

4.1.1. Title

Title is the first information about particular website in SERP. It is also one of themain factors in with impact on the website ranking. An example of such title in htmlcode looks like this:

<title>Jaguars, Jaguar Pictures, Jaguar Facts -National Geographic</title>

Such title presented in SERP looks like in the figure 4.1.

Figure 4.1. Presentation of a website title in Google Search

These are the issues connected with website title, which are significant in SEO:

Length up to about 65 characters

Longer titles can be also indexed by crawler, but title with 65 characters is ratheroptimal and it entire fits in SERP. Longer titles are shortened with ellipsis.

Diversity of titles

Each of the website pages relates slightly different information (e.g. product page,contact form etc.). The title should be prepared individually for each of them.

Keywords

There are 3 principles related to creating a title:

1. Keywords should be distributed on all the pages. Each of the pages must beoptimized for only 3–4 keywords. Front page title should have most general ex-pression, titles of product pages should contain words characterizing the type ofthese products etc. Sticking to this rule is very important, because in other casethe pages of the website could be treated by crawler as duplicated content.

2. The most important keywords should be place at the beginning of title.

Page 31: Analysis Of The Modern Methods For Web Positioning

Chapter 4. SEO Guide 27

3. Google can connect keywords from title into different phrases. But these whichappear one after another have the greatest impact on their position in ranking.Due to this fact, key phrase should not be separated.

4.1.2. Description

Title is the second information about particular website, presented next after title inSERP. Such description presented in SERP looks like in the figure 4.2.

Figure 4.2. Presentation of a website description in Google Search

Description presentation in SERP can be generated from following sources:

• description metatag, for example:

<meta name="description" content="Learn all you wanted to knowabout jaguars with pictures, videos, photos, facts,and news from National Geographic." />

• a fragment of the website content (in case the description metatag is too long orthere is no such one in the source code)

Here are some tips on the description page in metatag:

Length up to about 150 characters

Longer descriptions will not be presented in SERP as they have been written.

Diversity of descriptions

Just as titles, description of a particular page should be slightly different from theother. It should be specific for the information presented in the page.

Keywords

Description should contain keywords concerned by the SEO strategy. When it does, thekeywords will be bold in the search results for query phrase based on such keywords. Itshould call users’ attention on our website. However, the description should be preparedin the way to encourage users to visit the website.

Page 32: Analysis Of The Modern Methods For Web Positioning

Chapter 4. SEO Guide 28

4.1.3. Sitelinks

Sitelinks are links leading to other pages of the same website. The can be presented inSERPs in 2 ways:

1. Horizontally – 4 links in 1 row (presented in figure 4.3)

2. Vertically – 8 links in 2 columns

Figure 4.3. Presentation of a website sitelinks in Google Search

There is no manual way for publishers to force sitelinks presenting in SERP. It dependson how the website was indexed. But it can be made easier for crawler to make itcorrectly. There are two things which can be done:

1. First of all it must be well designed source code related to navigation on ourwebsite. Its syntax must be very clear.

2. Prepare a sitemap of website (e.g. in XML format). This issue will be describedlater.

4.2. Website Content

4.2.1. Unique content

The basis of the proper content optimization is its uniqueness. This means that thesame text or its larger fragments should not be reproduced on other websites or ondifferent pages of our website.

In order to verify the degree of uniqueness of our content, it can be used this tool:

http://www.copyscape.com

4.2.2. Keywords

It is very important make search engines able to relate our website to specific themeand keywords. In order to make it possible, keywords must be considered not onlyin website title and description design process. Keywords must be also contained inwebsite content.

In preparing the text for the website it suggested to stick following principles:

Page 33: Analysis Of The Modern Methods For Web Positioning

Chapter 4. SEO Guide 29

Repetition

Keywords should be repeated several times on every page. But it cannot be forgottenthat the text should be written primarily for users. The task is to find a compromisebetween attractive text for users and good for SEO. Too high density of keywords onparticular page can be treated by search engine as an abuse. In such situation ourwebsite will be penalize by ranking exclusion.

Variations and synonyms

The website content will be more natural, if contained keywords are used in manyvariations (grammatical). The proficiency of modern search engines can also detectusing synonyms. For this reason we can use for example word ”drug” in the contentbeing optimized for ”medicine” keyword.

Location

Keywords should located on whole page with similar density. This will give a betterresult in positioning than accumulation of keywords for example only at the beginningof the page.

4.3. Source Code

Website’s source code has not direct influence on the position in search ranking. How-ever, some errors can cause problem with proper indexing by search engine robots. Forthis reason it is worth to ensure that the code contains no errors and it is compatiblewith current WWW standards.

Very useful is the code validation tool, provided by the World Wide Web Consortium(W3C). It can by found here:

http://validator.w3.org

4.3.1. Headers

HTML headers tags (h1–h6) are very significant for proper indexation of the websitecontent. Right usage of them is very important in desing of a website. There coupleissues which must be considered form SEO point of view.

Hierarchy

Headers tags are designed to separate particular sections of a document. They mustbe used in the correct order and only when there is a need to use.

Page 34: Analysis Of The Modern Methods For Web Positioning

Chapter 4. SEO Guide 30

Repetition

Header of the first degree (h1 tag) by current HTML standards may occur only oncein whole document. Other headers can be used repeatedly.

Keywords

It is suggested to put keywords into header tags. It is because they have more ”posi-tioning power” than a regular text. This power is probably respective to the headershierarchy, so the most important keywords should be placed in h1 tag.

4.3.2. Highlights

Keywords can be distinguished from the rest of text by using tags ¡strong¿ (bold)and ¡em¿ (italics). In this way, keywords are highlighted either for users end crawlers.Although it should be done with restraint. Not every occurrence of the keyword shouldbe highlighted but only the most important of them.

4.3.3. Alternative texts

Sometimes there are some images placed in the document. It is recommended to includealternative texts to those images. It can be done in this way:

<img src="path/to/image" alt="alternative text" />

It is displayed on the screen in the case the browser can not display images (e.g. whenthey are unavailable on the server).

These alternative texts are also interpreted by search engine robots. These data is thenused in search for images (when search engine has such option).

4.3.4. Layout

Well indexing website should have clear and minimalistic layout. The content is themost important factor, so even ratio of text amount to html code is significant. Thehigher this value is the better and more valuable website in the search engine point ofview

4.4. Internal Linking

Quite important issue in website optimization is internal linking. Internal link is thehyperlink which leads to another page of the same website. There are some recommen-dation connected with this link type.

Page 35: Analysis Of The Modern Methods For Web Positioning

Chapter 4. SEO Guide 31

4.4.1. Distribution

Each of the pages should be available in 3 or 4 clicks at most. If not, the websitenavigation must be re-designed. Attention should be given especially to the links onthe main page. The structure of the website must be clear.

What is more, it is also very important for usability of the website. Complicated navi-gation can discourage user to continue the visit.

4.4.2. Links anchors

Link anchor is the clickable text. It is displayed for the user on a website, instead ofplain URL which is rather unreadable for human. It looks like this:

<a href="some_url.html">Anchor text</a>

Anchors should describe the content of pages which their links lead to. If links arelocated among the other text, it should match the context of whole text. For example,it is not advised to write ”click here” like it was popular couple years ago.

4.4.3. Broken links

Very important thing in website positioning is to beware of links which lead to unavail-able URLs. Such issue is very annoying and discouraging for visitors.

The website with broken links will be also less valuable for search engines, becauserobots crawl the web using links. After indexing a page robot uses one of the linksplaced on this page to go to another page. When such link is broken, crawler caninterrupt indexing process. It will cause the situation where not every page of thewebsite will be indexed.

4.4.4. Nofollow attribute

Nofollow is an HTML attribute value used to instruct some search engines that ahyperlink should not influence the link target’s ranking in the search engine’s index.This is example of such hyperlink:

<a href="some_url" rel="nofollow">Some website</a>

It is intended to reduce the effectiveness of certain types of search engine spam, therebyimproving the quality of search engine results and preventing indexing particular web-site as spam. Nofollow attribute is used commonly in outbound links2, for example inpaid advertising.

2. Links which target at other websites

Page 36: Analysis Of The Modern Methods For Web Positioning

Chapter 4. SEO Guide 32

4.4.5. Sitemap

A sitemap is a list of pages of a website accessible to crawlers or users. This helpsvisitors and search engine bots find pages on the website.

Sitemap for users

It can be prepared a page, where will be placed links leading to all website’s pages oronly the most important ones. Thanks to this, users having problems with navigationwill be able to find quickly what they are looking for. The example of such sitemaplocated in footer is presented in figure 4.4.

Figure 4.4. Example of sitemap for visitor

Sitemap for robots

Sitemap for crawlers must be easy to automatic processing. Such sitemap is mostlybeing prepared in the XML document format. This is how example looks like:

<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><url><loc>http://www.example.com/</loc><lastmod>2005-01-01</lastmod><changefreq>monthly</changefreq><priority>0.8</priority>

</url><url><loc>http://www.example.com/catalog?item=12</loc><changefreq>weekly</changefreq>

</url><url><loc>http://www.example.com/catalog?item=73</loc><lastmod>2004-12-23</lastmod>

Page 37: Analysis Of The Modern Methods For Web Positioning

Chapter 4. SEO Guide 33

<changefreq>weekly</changefreq></url><url><loc>http://www.example.com/catalog?item=74</loc><lastmod>2004-12-23T18:00:15+00:00</lastmod><priority>0.3</priority>

</url><url><loc>http://www.example.com/catalog?item=83</loc><lastmod>2004-11-23</lastmod>

</url></urlset>

As it can be noticed, such document contains some information about each link:

loc: URL to particular page

lastmod: time of last modification of the page

changefreq: average period time between changes in the page

priority: value of priority for crawler to index particular page

Such information are welcome by crawlers. It can profits to publisher with faster in-dexing by crawler.

In most cases such documents are being prepared using software tools as like:

http://www.xml-sitemaps.com/

After when the sitemap document is prepared, the search engine must be notified aboutits existence by special form.

4.5. Addresses and Redirects

Among previously described factors used in rank algorithm, search engines also considerform of indexed website’s URLs and information included in HTTP Responses.

4.5.1. Friendly addresses

Search engines give higher rank value to those websites whom pages have URLs morereadable for human. For example, address like this:

http://www.example.com/index.php?page=product&num=5

can be written in this way:

http://www.example.com/product/5

Such effect can be achieved using mod rewrite. It is module to the Apache Server, whichallow to create regular expression patterns for mapping URLs to particular pages. Such

Page 38: Analysis Of The Modern Methods For Web Positioning

Chapter 4. SEO Guide 34

possibility have also modern web frameworks, such as Django Framework or Ruby onRails.

Moreover, such possibility gives another opportunity for placing keywords. Due to thisfact, page being optimized for particular keyword should have this keyword in its URL.If it is phrase with couple words, it is suggested to separate them with dash.

4.5.2. Redirect 301

Redirect 301 is the constant redirect from one address to another. After using it:

• visitors writing into the address bar in browser the old address, will be redirectedinto the new one

• some search engines will switch the old address in the database into new one

So it is very useful after domain change.

Earlier it author said, that content of the website should not be duplicated. It is oftenforgotten, that allowing to entry a website through several addresses has the sameresult. Sometimes the same website is downloaded via:

• example.com

• www.example.com

• example.com/index.html

• www.example.com/index.html

• example.com/default

• www.example.com/default

In such case it must be decided if the main address of the website will have the ”www”prefix. If it will, it should be placed .htaccess file in the main folder of the server, withsuch content:

RewriteCond %{HTTP_HOST} ^example.com$ [NC]RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

Similarly we can manage the redirection from the index.html file:

RewriteCond %{REQUEST_FILENAME} index.htmlRewriteRule ^(.*)$ http://www.example.com [R=301,L]

There are also many other possibilities which can be managed likewise.

4.6. Other Issues

There is couple other things, which have some influence on the website ranking.

Page 39: Analysis Of The Modern Methods For Web Positioning

Chapter 4. SEO Guide 35

4.6.1. Information for robots

Sometimes publishers do not want robots to index some website’s pages, but keep themstill available for regular visitors. For example:

• results from internal search engines

• data sort results

• print version of pages

• pages which should not be indexed, like login page to administration panel

To manage this issue it can be prepared robots.txt file with the content like this:

User-agent: *Disallow: /admin-panel/

It should prevent robots from indexing pages whom URLs start with www.example.com/admin-panel/.

4.6.2. Performance

One of the most recent factors introduced Google search engine algorithm is websiteperformance value. Google promotes websites with short time period of download pro-cess. It is not as significant as internal linking or quality of content. Big informationportals are very complex, so they can not be downloaded as fast as e.g. small blog. Butvaluable information is the most important factor.

However, good performance can increase rank value of the website compared to ananother one with similar content but not so efficient.

To improve website performance it can be used several tools, like PageSpeed by Google:

http://code.google.com/speed/page-speed

It provides the analysis of the website downloading efficiency and gives some tips onhow to improve the performance. Next, will be presented the most suggestion beinggiven by this software.

Gzip compression

Modern web browser allow to use gzip compression mechanism to reduce the size ofwebsite files (images, CSS files, Javascript files). If there is such possibility, it is recom-mended to use it.

Number of DNS lookups

DNS caching mechanism [23] causes there is no need to look for IP address matchingto particular domain several times. For this reason, when every file used by the websiteis located on the same server (or another server in the same domain), there is only onDNS lookup in during download process.

Page 40: Analysis Of The Modern Methods For Web Positioning

Chapter 4. SEO Guide 36

So it should be avoided placing media files (images, CSS, Javascript) on the differentdomain without clear need.

External files

Information commonly located in the external files, like CSS Style Sheets or Javascript,can be also located inside the HTML document. However it should be avoided, becauseit causes parsing of source code process more complex. Thus the browser need moretime to display website on the screen.

4.7. Summary

Tips presented in this chapter should significantly increase the rank value of everywebsite. With the higher ranking in the search engine will be it will be more inboundtraffic. In other words, the website will gain more popularity.

Making the website more attractive to visitors should implicate better results of per-sonalization re-ranking. The assumptions of the impact of personalized search resultson global ranking are very likely. So improving the quality of website on the basis ofpresented guide should increase website’s ranking in general. Both through the per-sonalization impact and collecting inbound links as the effect of increase in popular-ity.

Page 41: Analysis Of The Modern Methods For Web Positioning

Chapter 5

The System for GlobalPersonalization

The goal of this chapter is to propose a method for improving the website’s searchranking through affecting the personalization mechanism in search engine. The idea ofthis method is to generate artificial behavioural data. The author of this thesis is theco-author of the article [19], which this chapter is based on.

In section 2.2 of this thesis it has been shown that a lot of data goes into Googleand a lot of useful manipulated data comes out. But we can only guess what happensin between or try to learn from the observation of the data coming out of Google.Evans wrote [10] that identifying the factors involved in a search engine ranking algo-rithm is extremely difficult without a large dataset of millions of SERPs and extremelysophisticated data-mining techniques.

That is why, only an observation, experience and common sense are the main sourceof knowledge on Search Engine Optimization (SEO) methods. It was according to thisknowledge that Search Engine Ranking Factors [11] was created. The last edition of itassumes that traffic generated by the visitors of a website has 7% of importance in theGoogle’s evaluation of the website value. It is, after links to the specific website and itscontent value, the most significant factor in website evaluation process. On the basis ofthe previous editions of the ranking, one can notice that the importance of this factoris increasing.

Because these all are only reasonable assumptions, the intention is to evaluate thevalidity level of the described factor in web positioning efforts. For this purpose weneed a simulation tool which will generate necessary human-like traffic on a testedwebsite. The tool is going to be a multi-agent system (MAS) which will imitate realvisitors of the websites.

5.1. Problems to Solve

Fig. 5.1 presents the main reason why the system must be distributed. A few queriesto Google, have been sent frequently one after another from the same IP address, are

Page 42: Analysis Of The Modern Methods For Web Positioning

Chapter 5. The System for Global Personalization 38

Figure 5.1. Information displayed by Google on the abuse detection

detected by Google and treated as abuse. Google suspects an automated activity andrequires completion of the captcha form in order to continue searching. In case of usingthe distributed system, the queries would be sent from many different IP addresses. Itshould guarantee, that Google will not consider this issue abusing. This issue cannotbe solved by using a set of public proxy servers. Google has probably put them intotheir black list. Every single query to Google via such proxy server leads to the sameend – captcha request.

What is more, after Tuzhilin [35] we can say that Google puts a lot of reasonableeffort into invalid clicks on advertisement filtering. There is a big chance, that some ofthose mechanism Google uses in the analysis of the web traffic. This is the reason whygenerating behavioural data should be our concern. Recognized artificial web trafficcould be treated by Google as an abuse and cause being punished (decline of websiteposition).

5.2. Objectives

5.2.1. Web Positioning

The main goal of the system is to improve the position of a website by generating trafficrelated to the website. The only activity which can be visible for Google the systemshould care of. It shows that there is no need to download all content from particularwebsite. It would only waste the bandwidth. The system should only send to Googleservices requests used by a particular website, for example:

• links to the website on SERPs,

• Google Analytics scripts,

• Google Public DNS queries,

Page 43: Analysis Of The Modern Methods For Web Positioning

Chapter 5. The System for Global Personalization 39

• Google media embedded on the website like AdSense advertisement, maps, YouTubevideos, calendars etc.

5.2.2. Cooperation

The whole idea of the system is to spread positioning traffic into world wide IP ad-dresses. As a result of this distributed character, the system require a large group ofcooperating users. Nobody will use the system if there is no benefits to him. A mech-anism which will let the system users to share their Internet connections in order tohelp themselves in web positioning must be introduced. What is more, the mechanismmust treat all users equally-fairly. It means it should not allow to take benefits withoutany contribution.

5.2.3. Control

According to [36], web positioning is not a single action, but a process. This processshould be able to be controlled. Otherwise it could be destructive, instead of improvingthe website position. For this reason, the system should allow users to:

• control the impact of the system activity on their websites,

• check the current results of the system activity (changes in the website positionon SERPs),

• check the current state of the website in the web positioning process.

5.3. Architecture

Fig. 5.2 presents the architecture of the system which take under consideration allspecified problems and objectives. Server is necessary to control the whole process ofgenerating the web traffic by specified algorithm. It gives the orders for clients to startgenerating traffic on the specified websites. It also gets the information from clientswith amount of requests sent to particular Google’s services on the website account.Database serves as storage for process statistics. They can be presented to clients viaweb interface. They are also useful to server for creating the orders in accordance withthe algorithm.

Clients are the agents of the presented MAS. They take orders from the server withparticular webiste registered in the database to be processed. Processing the websiteis to mimic its real visitor. Client performs this autonomously using the visitor sessionalgorithm described in the next section.

Page 44: Analysis Of The Modern Methods For Web Positioning

Chapter 5. The System for Global Personalization 40

Clients Cloud

Server

Database

Client 2

Client 1

Client 3

Website 1

Website 2

Website 3

Google

Search

All Google

Services

Figure 5.2. System architecture

5.4. Visitor Session Algorithm

According to many research [1], [6], [7], [10], [20] and [36], more than half of a web-site’s visits comes from the SERPs. That is why starting the single visitor session (thesequence of requests considering single website registered in the system) with queryingGoogle Search sounds reasonably. However, only if currently considered website appearson one of the first few SERPs. Otherwise, visitor session should be started directly onthe processed website or should refer to an incoming link, but it must be existing, ifthere is such one. Because of the likely Google’s actions in order to detect abuses, thevisitor session should be possibly human-like. The analysis of real users web traffic [29]is very useful at this moment. According to it, a typical user:

• visits about 22 pages in 5 websites in one sitting,

• follows 5 links before jump to a new website,

• spends about 2 hours per session and 5 minutes per page.

These statistics clearly indicate that typical visit session concerns a website of goodquality. Visit on a poor website would be aborted as soon as after few seconds. Suchvisit could have a negative impact on the website quality evaluation by Google.

1 – Server retrieves from the database information about the next website to be pro-cessed.

2 – Task assignation to the client.

3 – Client starts the visitor session.

Page 45: Analysis Of The Modern Methods For Web Positioning

Chapter 5. The System for Global Personalization 41

Visit session

6

Server

Database

Client

WebsiteGoogle

Search

All Google

Services

1

2 3 5

7

4

Figure 5.3. Visitor session algorithm

4 – Searching on SERPs for a link to the processed website.

5 – If a link has been found – click on the link, otherwise direct request.

6 – Processing the visit session.

7 – Request to the server for another website to process.

5.5. Task Assignation Algorithm

Task assignation algorithm helps server to build a queue of registered websites orderedby the visitor session priority. The website with the highest value of the priority is thenext one to start visitor session. In other words, client always receive the website withthe highest priority value to process.

The priority value PV is calculated using the function:

PV (α) = r(α) ·

√√√√t(α) · v(α)T (α)

(5.1)

whereα — record in the system (website with phrase for web positioning)

r(α) — returns current position in the search engine ranking for the α (returns0 if there is no α in the ranking)

t(α) — returns time since the end of the last visitor session on the α (inseconds)

v(α) — returns number of visitor sessions made by α owner’s clientT (α) — returns time since the registration of α in the system (in days)

Page 46: Analysis Of The Modern Methods For Web Positioning

Chapter 5. The System for Global Personalization 42

Presented function gives the highest ”power” to the ranking factor. The reason for thisis that websites with high ranking value should have more real visitors, so the systemefforts will not be so crucial for its popularity.

Time of participation in the system is not very significant. Novice participants haveequal chance to gain attention for their websites as the senior ones. However, functionpromotes continuous activity of the clients.

Worth to consider is also the possibility to dynamically modify weights of individualfactors depending on the results. Because of the fact that websites queue is built bythe server, it is possible to change whole function during the system activity.

5.6. Proof Study

Presented system requires a large number of users to work properly. In other case, thegenerated traffic would not be distributed enough, thus would look unnaturally. As itwas shown, centralized series of queries are being seen as abuse. Unfortunately, thesisauthor’s resources have been insufficient for this purpose. However, a simulation hasbeen performed, which had to prove proposed concept.

5.6.1. Tools

The idea was to use Tor application (http://www.torproject.org) to make the singlehost (the author’s computer) generate distributed traffic. In such way, the behaviouraldata of one real user could be seen by search engine as multi-user traffic.

Tor is a free software enabling Internet anonymity by thwarting network traffic analysis.Tor aims to conceal its users’ identity and their network activity from traffic analysis.Operators of the system operate an overlay network of onion routers which providesanonymity in network location as well as anonymous hidden services.

Users of a Tor network run an onion proxy on their machine. The Tor software peri-odically negotiates a virtual circuit through the Tor network. Application like browsermay be pointed at Tor, which then multiplexes the traffic through a Tor virtual circuit.Once inside a Tor network, the encrypted traffic is sent from one host to another,ultimately reaching an exit node at which point the decrypted packet is available andis forwarded on to its original destination. Viewed from the destination, the source ofthe traffic appears to be at the Tor exit node.

As the figure 5.4 shows, Tor has became quite popular, so its network involves largenumber of users. This makes Tor fit to the objective in this study. Mozilla Firefoxbrowser have been used, connected with the Tor. Additionally has been also installediMacros plug-in in order to automate executing of visitor sessions.

For analysis of the behavioural data being received by Google during the study, GoogleAnalytics (shown in figure 2.2) software has been used. It was installed on every ex-amined website.

Page 47: Analysis Of The Modern Methods For Web Positioning

Chapter 5. The System for Global Personalization 43

Figure 5.4. Tor interface screen

5.6.2. Results

Distributing traffic issue ended with success. After opening the Google Search mainpage (www.google.com), server redirects to the domain belonging to the country whichthe Tor exit node of particular session was located in. For example, when exit node wasin Germany, Google server redirected browser from google.com to google.de address.There was displayed Google Search page in the appropriate language, in spite the fact,that browser’s setting with default language was removed. After visit on examinedpages, Google Analytics also indicated, that the source of visits was not in Poland(where the study was actually conducted) but in the countries of exit nodes of thetraffic.

However, routing the traffic through distributed Tor network appeared to be insufficientsolution. Firstly, the traffic routed by the Tor is significantly slowed down. From timeto time there were even difficulties with download complete search engine site. Whatis more, despite the large number of visit sources, there are still only one browser andone real user. Because of this, this simulation could only imitate one singed-in user ora group of singed-out.

Page 48: Analysis Of The Modern Methods For Web Positioning

Chapter 5. The System for Global Personalization 44

Signed-in user

In the first case there is essentially no difference between visiting through Tor proxynetwork or directly. Like it was described, Tor is the tool to concealing the identity ofa user. But after sign-in into Google Account, the identity is evident. From the Googlepoint of view, such visit is seen as regular user travelling very quickly all around theworld (metaphorically speaking). But it is still only one user, and applied personalizedsearch results to him should not be globally significant.

Group of signed-out users

As it was described earlier, Google introduced personalization mechanism not only forusers with Google Account. There is also personalization in search results for userswith no such profile account. It is based on storing cookies in user’s browser up to180 days, which contain information about past search activities. But cookies are notrelated to specific user, but to a browser. In this case, search results are re-ranked notfor the person but rather for the particular computer which this person uses.

This simulating system uses only one browser, so there was no possibility to evaluate theimpact of personalization re-ranking on search ranking in the global perspective.

Disabling storing of cookies option in the browser makes personalization impossible toact, because there is no way to relate past queries in search engine to particular user.Moreover, browser with blocked cookies is rather rare situation nowadays. ThereforeGoogle search engine is rather suspicious about traffic with blocked cookies and forsuch requests they serve ”Sorry page” (figure 5.1).

5.7. Summary

Generating artificial traffic on the Internet seems to be not very praiseworthy as itis dangerously close to spam appearance and causes the information noise into vis-itors statistics. On the other hand, this is not worse than other SEO activities likelinkbaiting.

After [6], today’s search engines use mainly link-popularity metrics to measure the”quality” of a page. It is the main idea of the PageRank algorithm [3]. This fact causesthe ”rich-get-richer” phenomenon. More popular websites appear higher on SERPs,which brings them more popularity.

Unfortunately, it is not very beneficial for the new, unknown pages which have notgained popularity yet. There is a possibility, that these websites contain more valuableinformation than the popular ones. Despite this fact, they are ignored by search enginesbecause of small amount of links. These sites, in particular, need SEO efforts. Probablyclassic techniques will be more effective than the one presented in this paper.

Nevertheless, the methods presented in this article are likely to improve the rate ofweb positioning, because web traffic can be noticed by search engines immediately. It

Page 49: Analysis Of The Modern Methods For Web Positioning

Chapter 5. The System for Global Personalization 45

is quite opposite to crawlers, where it can lasts much longer time until they find linksto the considered websites and will improve their ratings for search engine.

Of course the best results should be given by using presented technique with classicalmethods at the same time. It would mimic the gain of popularity in the most naturalway. The amount of links leading to the website usually raises linearly with the trafficand the number of visitors. However, the present-day SEO market focuses merely onclassical methods.

What is more, presented system could be used not only in SEO. It could be also veryuseful in the other areas. Nowadays, performance testing of a web applications is quitechallenging issue. Of course, there some solutions for that, like JMeter application(http://jakarta.apache.org/jmeter). However, presented MAS system would begenerating more natural traffic, so it could better simulate users behaviour ”in reallife”.

To sum up, the aim of this study is to show that methods based on manipulation ofdynamic data are worth to involve in the SEO efforts. Moreover, these methods donot increase the amount of spam seen by the Internet users. In contrast to classicalmethods, which some of them involve generating static data links and texts with novalue for end users.

Page 50: Analysis Of The Modern Methods For Web Positioning

Chapter 6

Conclusion

Google [13] says that in the future users will have a much greater choice of service withbetter, more targeted results. For example, a search engine should be able to recommendbooks or news articles that are particularly relevant - or jobs that an individual userwould be especially well suited to.

Personalization should, and likely will have a big impact on the way people search, whatpublishers learn about their intended audiences and measuring the effectiveness of SEOcampaigns – especially to SEO firms using ranking reports as one way of measuringthe efficacy of their efforts.

No longer can the SEO practitioner think simply in terms of ranking for a main indexquery result. It is since there are so many other potential rankings for documentsthat are not as readily available in the traditional sense. Theoretically rankings couldseemingly drop a few places but traffic actually increase due to a niche crowd followingthat is assisting your rankings via personalized search activities and being popularamong a given sub-set of users.

While we are left to speculate about search engine behaviour and observe the changinglandscape, there are some steps that an SEO professional or any website owner cantake while anticipating the effects of personalization:

• learn about Social Networking Theory and Online Social Networks

• recognize and share with clients the diminishing value of ranking reports

• aim towards measuring results and conversions in a meaningful manner from logfile analysis and Web analytics tools

• find ways to learn more about your intended audiences and existing customers

Comparing selected websites on SERPs of one user to that of another using the samequery could be very telling. Although the search engines will not share their strategies,it is clear that this type of analysis is being used elsewhere on the Web. For exam-ple considering the recommendations in the Internet shops being offered when peopleperform searches at that store (”people who purchased this book were also interested

Page 51: Analysis Of The Modern Methods For Web Positioning

Chapter 6. Conclusion 47

in...”). Search engine can recommend pages selected by other users who searched usingthe same terms.

In attempting to provide personalized search results, the focus of search engines’ ef-forts has shifted from matching keywords to knowing more about the true interests ofsearchers. Keyword matching still plays a role in what search engines do when returningresults, but information gathered from those searchers is playing an increasing role inthe results they see.

Particularly interesting is how this personalization can be used in the global perspec-tive. Can be aggregated information collected from a large number of interactionsbetween users and search engines. What pages people click when faced with a list ofsearch results. If the vast majority of those searching for ”jaguar” choose pages aboutanimal, it would make sense to show more pages with animal facts in search resultsand fewer pages about cars.

However, the fact that affection on the personalization mechanism in the way presentedin the chapter 5 is not easy, seems to be quite positive conclusion. Because personal-ization is the big step to improve filtering spam in search results. And such methodused by SEO marketers could reduce the positive effect of the global personalization.Unfortunately there are already some indications, that a kind of ”SEO NetBot” is underconstruction, so in the near future, personalization mechanism will become vulnerableto manipulation.

Page 52: Analysis Of The Modern Methods For Web Positioning

List of Figures

2.1 Google Web History panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Google Analytics main panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 Google Maps example screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4 Google Search main screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5 Example of the search result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.6 Activity diagram of a visit session . . . . . . . . . . . . . . . . . . . . . . . . . . 142.7 Search results of the query sent via host located in USA . . . . . . . . . . . . . 172.8 Search results of the query sent via host located in Germany . . . . . . . . . . . 182.9 Search results personalized by user’s search history . . . . . . . . . . . . . . . . 19

4.1 Presentation of a website title in Google Search . . . . . . . . . . . . . . . . . . 264.2 Presentation of a website description in Google Search . . . . . . . . . . . . . . 274.3 Presentation of a website sitelinks in Google Search . . . . . . . . . . . . . . . . 284.4 Example of sitemap for visitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.1 Information displayed by Google on the abuse detection . . . . . . . . . . . . . 385.2 System architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.3 Visitor session algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.4 Tor interface screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Page 53: Analysis Of The Modern Methods For Web Positioning

Bibliography

[1] Bifet A., Castillo C., Chirita P., Weber I., An Analysis of Factors Used in Search EngineRanking. In First International Workshop on Adversarial Information Retrieval on theWeb, pp. 48–57. Lehigh University, Bethlehem (2005)

[2] Blankson S., Meta Tags, Optimising Your Website for Internet Search Engines. Blank-son Enterprises Limited (2007)

[3] Brin S., Page L., The Anatomy of a Large-Scale Hypertextual Web Search Engine. WorldWide Web 7, 107–117 (1998)

[4] Chirita P.A., Firan C., Nejdl W., Summarizing local context to personalize global websearch. Information and Knowledge Management (2006)

[5] Carterette B., Jones R., Evaluating Search Engines by Modeling the Relationship Be-tween Relevance and Clicks. Advances in Neural Information Processing (2007)

[6] Cho J., Roy S., Impact of Web Search Engines on Page Popularity. World Wide Web13, 20–29 (2004)

[7] Chu H., Rosenthal M., Search Engines for the World Wide Web, A Comparative Studyand Evaluation Methodology. American Society for Information Science 33, 127–135(1996)

[8] Dou Z., Song R., Wen J., A large-scale evaluation and analysis of personalized searchstrategies. Research and Development in Information Retrieval 33 (2007).

[9] Dover D., The Comprehensive List of All the Data Google Admits to Col-lecting from Users. http://www.seomoz.org/user_files/google-user-data/SEOmoz-Google-User-Data.pdf

[10] Evans M.P., Analysing Google rankings through search engine optimization data. Inter-net Research 17(1), 21–37 (2007)

[11] Fishkin R., Search Engine Ranking Factors 2009. http://www.seomoz.org/article/search-ranking-factors

[12] Google Inc., The Google Privacy Policy. http://www.google.com/privacypolicy.html

Page 54: Analysis Of The Modern Methods For Web Positioning

Bibliography 50

[13] Google Inc., The Official Google Blog. http://googleblog.blogspot.com

[14] Google Inc., Web Search Help. http://www.google.com/support/websearch

[15] Gryszko M., Darmowy poradnik o podstawach optymalizacji stron WWW. http://www.lexy.com.pl (in Polish)

[16] Harry D., The Fire Horse Guide to Google Personalized Search ForSearch Marketers. http://www.huomah.com/search-engines/learn-seo/fire-horse-guide-to-personalized-search.html

[17] Jeh G. Widom J., Scaling personalized web search. World Wide Web, 271-279 (2003)

[18] Joachims T., Optimizing search engines using clickthrough data. Knowledge discoveryand data mining 8, 133—142 (2002)

[19] Kowalski P., Król D., An Approach to Evaluate the Impact of Web Traffic in WebPositioning. KES-AMSTA 2010, LNAI 6071, Springer: 380–389 (2010)

[20] Lawrence S., Giles C.L., Searching the World Wide Web. Science 280, 98–100 (1998)

[21] Ledford J.L., SEO, Search Engine Optimization Bible. Wiley Publishing, Inc., Indi-anapolis (2008)

[22] Liu F., Yu ,C., Meng W., Personalized web search by mapping user queries to categories.Information and Knowledge Management, 558—565 (2002)

[23] Liu C., Albitz P., DNS and BIND, Fifth Edition. O’Reilly Media (2006)

[24] Micarelli A., Gasparetti F., Sciarrone F., Gauch S., Personalized search on the worldwide web. The Adaptive Web, Lecture Notes in Computer Science 4321, 195–230 (2007)

[25] Microsoft Corporation, Bing Search Blog. http://www.bing.com/toolbox/blogs/search/default.aspx

[26] Pant G., Srinivasan P., Menczer F., Crawling the Web. In, Levene, M., Poulovassilis,A. (eds.) Web Dynamics, pp. 153–178. Springer-Verlag (2004)

[27] Pretschner A., Gauch S., Ontology based personalized search. Tools with ArtificialIntelligence 11, 391-–398 (1999)

[28] Qiu F., Cho J., Automatic identification of user interest for personalized search. WorldWide Web, 727-–736 (2006)

[29] Qiu F., Liu Z., Cho J., Analysis of User Web Traffic with a Focus on Search Activities.Web and Databases 8, 103–108 (2005)

[30] Shen X., Tan B., Zhai C., Implicit user modeling for personalized search. Informationand Knowledge Management, 824—831 (2005)

[31] Spereta M., Gauch S., Personalizing Search Based on User Search Histories. In Pro-ceedings of WI ’05, pages 622-–628 (2005)

[32] Sugiyama K., Hatano K., Yoshikawa M., Adaptive web search based on user profileconstructed without any effort from users. World Wide Web 12, 675-–684 (2004)

[33] Teevan J., Dumais S.T., Horvitz E., Beyond the commons, Investigating the value ofpersonalizing Web search. The Workshop on New Technologies for Personalized Infor-mation Access (2005)

Page 55: Analysis Of The Modern Methods For Web Positioning

Bibliography 51

[34] Teevan J., Dumais S.T., Horvitz E., Personalizing search via automated analysis of in-terests and activities. Research and Development in Information Retrieval 31, 449—456(2005)

[35] Tuzhilin A., The Lane’s Gifts v. Google Report. http://googleblog.blogspot.com/pdf/Tuzhilin_Report.pdf

[36] Walter A., Building Findable Web Sites, Web Standards SEO and Beyond. New Riders,Berkeley (2008)

[37] World Wide Web Consortium, Hypertext Transfer Protocol - HTTP/1.1.

[38] Yahoo! Inc., Yahoo! Search Blog. http://www.ysearchblog.com/

[39] Yi X., Raghavan H., Leggetter C., Discovering Users’ Specific Geo Intention in WebSearch. World Wide Web (2009)