When search engine robots look at a website, the first file they will look at is your robots.txt file. Robots.txt file is small but very crucial for the Search engine to crawl any website. Creating wrong Robots.txt file can badly affect appearance of your web pages search result. Therefore, it’s important that we understand the purpose of robot.txt file and should learn how to check you are using it correctly.
A robox.txt file gives instructions to web robots about the pages which website owner doesn’t want to be crawled. For example, if you do not wish your images to be listed by Google or any other search engine, you can block them using your robot .txt file. Before crawling, search engine refers the robot.txt as instructions to check whether they are allowed to crawl and index on the search engine results.
It is very simple to check whether your website has robot.txt or not. Simply add “/robot.txt” after your domain name in address bar.E.g.http://www.mywebsite.com/robot.txt
Robot.txt is useful if you don’t want to crawl your website’s secured pages, index certain files of your website or index certain areas of your website.
How to create Robots.txt
- Create notepad file and save with name “Robots.txt”
- Upload it to the Root directory
Contents of Robots.txt file
After creating a file called robots.txt then write the following two line instructions in it.
User-agent: *
Disallow:
"User-agent" – It is a part whichinstructs directions to a specific robot if needed.
There are two ways to use this in your file.
- If you want to tell all robots the same instruction then put a " * " As shown as below
User-agent: *
- If you want to tell instruction to a specific robot then write as below.
User-agent: Googlebot
Disallow.
"Disallow" means robots should not look into that folders.
If you don’t want to crawl images folder to crawl then the instruction will be
User-agent: *
Disallow: /images
If you don’t want to crawl your pdf folder then
User-agent: *
Disallow: /PDF
If you do not wish to crawl any web pages then
User-agent: *
Disallow: /
If you wish to allow single robot access then
User-agent: Google
Disallow:
User-agent: *
Disallow: /
What to put in it
The "/robots.txt" file is a text file, with one or more records. Usually contains a single record looking like this:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/
To exclude all robots from the entire server
User-agent: *
Disallow: /
To allow all robots complete access
User-agent: *
Disallow:
(or just create an empty "/robots.txt" file, or don't use one at all)
To exclude all robots from part of the server
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/
To exclude a single robot
User-agent: BadBot
Disallow: /
To allow a single robot
User-agent: Google
Disallow:
User-agent: *
Disallow: /
To exclude all files except one
This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:
User-agent: *
Disallow: /~joe/stuff/
Alternatively you can explicitly disallow all disallowed pages:
User-agent: *
Disallow: /~joe/junk.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html
Company Profile
Maximize the potential of your business on Internet! Internet marketing is the best way today to drive lost of traffic to your business. Webplanners is a leading SEO company in Melbourne.