robots.txt file generator

10
What is Robots.txt file ? The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct robots (typically search engine robots) how to crawl and index pages on their website. In short Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

Upload: om-maurya

Post on 21-Feb-2017

55 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Robots.txt file generator

What is Robots.txt file ?The robots exclusion protocol (REP), or robots.txt is a

text file webmasters create to instruct robots (typically search engine robots) how to crawl and index pages on

their website. In short Web site owners use the /robots.txt file to give instructions about their site to

web robots; this is called The Robots Exclusion Protocol.

Page 2: Robots.txt file generator

How to create robots.txt file for your website?Step1 : Go to the following website :http://tools.seobook.com/robots-txt/generator/

You will the following screen :

Page 3: Robots.txt file generator

Step 2 : Suppose you don’t want the robots to have access to your about-us page of your website. Then just select /about-us.html from your website as shown in the image below :

Page 4: Robots.txt file generator

And paste that part in the files or directories tab and click add you your robots.txt will be ready as shown below :

Copy this code in notepad and save as robots.txt in your main folder

Very important point

Page 5: Robots.txt file generator

Step 3 : Then upload the robots.txt file to your website using filezilla ftp client :

Page 6: Robots.txt file generator

Step 4 : Check whether the file is uploaded to your website by typing robots.txt infront of your website’s url :

Your robots.txt file is uploaded successfully to your website

Page 7: Robots.txt file generator

You can directly write the code in notepad and save the file as robots.txt too if you don’t want to use the online tool.

Important Note :The "/robots.txt" file is a text file, with one or more records. Usually contains a single

record looking like this:User-agent: *

Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /~joe/

In this example, three directories are excluded.Note that you need a separate "Disallow" line for every URL prefix you want to exclude you cannot say "Disallow: /cgi-bin/ /tmp/" on a single line. Also, you may not have blank

lines in a record, as they are used to delimit multiple records.Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot".

Specifically, you cannot have lines like "User-agent: *bot*", "Disallow: /tmp/*" or "Disallow: *.gif".

What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve. Here follow some examples:

Page 8: Robots.txt file generator

Some examples :

1) To exclude all robots from the entire server

2) To allow all robots complete access

(or just create an empty "/robots.txt" file, or don't use one at all)

3) To exclude all robots from part of the server

User-agent: * Disallow: /

User-agent: * Disallow:

User-agent: * Disallow: /cgi-bin/

Disallow: /tmp/ Disallow: /junk/

Page 9: Robots.txt file generator

4) To exclude a single robot

5) To allow a single robot

6) To exclude all files except one

This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:

7) Alternatively you can explicitly disallow all disallowed pages :

User-agent: BadBot Disallow: /

User-agent: Google Disallow: User-agent: * Disallow: /

User-agent: * Disallow: /~joe/stuff/

User-agent: * Disallow: /~joe/junk.htmlDisallow: /~joe/foo.html Disallow: /~joe/bar.html

Page 10: Robots.txt file generator

THANK YOU