Google HelpLine Logo
  powered by
Welcome Guest!
Google Articles << Back
Home >> Articles >> Others

How to make Robots.txt?

Robots.txt is a text file to indicate search engines that certain part of your website is not to be index.

The simple robots.txt file is build of three items:

  1. Comments: Lines start with character # is considered as comments. You could write comments on the top of the file. Please see the below example for further detail.
  2. User-agent: The User-agent specifies the name of the search engine robot. You could either specify the name of the specific search engine robot or use the wildcard character "*" to specify all robots. Please find some most popular User-agent’s list in below tool.
  3. Disallow: The ‘Disallow’ directive line specify the name of the folder or file which you don’t want the specific robot to index. All the paths should starts from the root of the website.

Look at the below example for a sample Robots.txt:

# Robots.txt file for http://www.googlehelpline.com/
# Contact webmaster@googlehelpline.com

User-agent: *
Disallow: /myalbum
Disallow: /foo.html

User-agent: Google
Disallow: /temp

Steps to build robots.txt:

  1. Identify content of your website which you would like to disallow for search engine indexing.
  2. Create a blank text file with name “Robots.txt”.
  3. Copy the above example format in “Robots.txt”.
  4. Modify the content of file as instructed.
  5. Upload the file in root directory of your website and check if you could access it through www.yourwebsite.com/robots.txt

Examples :

  1. Allow all robots to visit all files.

    User-agent: *
    Disallow:

  2. Keep all robots out.

    User-agent: *
    Disallow: /

  3. Keep all robots away from the ‘temp’ and ‘images’ directories:

    User-agent: *
    Disallow: /temp/
    Disallow: /images/

  4. Keep google from all files on the server:

    User-agent: googlebot
    Disallow: /

  5. Keep googlebot away to index test.htm file:

    User-agent: googlebot
    Disallow: test.htm

Robots.txt FAQs :

Question: Does robots.txt file allows me to hide just part of a web page from the search engines listing?
Answer: No

Question: How long does it takes for Robots.txt to start working?
Answer: Once you upload the robots.txt in the root directory of your website, it starts working immediately. Whenever the next search engine crawler will come to your website will first look at the robots.txt file and then will go further.

Question: How to remove non html pages (i.e. word document) from Google cache?
Answer: Some time you have NON HTML documents(i.e. Excel, Word, PDF docs) in your website whom you do not want to index for search.

The problem with these documents is you cannot use html meta tag to stop search engines to index them.
Robots.txt is the best solution for this kind of problems. Below is the code where you could restrict all doc files to be index.

User-agent: googlebot
Disallow: /*.doc


Even you could specify the name of the document you don't want to index.

User-agent: googlebot
Disallow: /test.xls

 
Comments
No Feedback Found
Post Your Comments
Name*
Comment*
Home  |  Google Discussions  |  Google Links  |  Google Articles  |  Google FAQ's  |  Google In News
Contact Us  |  About Us  |  Privacy Policy  |  Join Us  |  Sitemap
GoogleHelpline.com (C). All rights reserved.
(This site is not affiliated with or sponsored by Google Inc.)