Why the Robots.txt File Matters for SEO [Beginner’s Guide]

When search engine robots look at a website, the first file they will look at is your robots.txt file. Robots.txt file is small but very crucial for the Search engine to crawl any website. Creating wrong Robots.txt file can badly affect appearance of your web pages search result. Therefore, it’s important that we understand the purpose of robot.txt file and should learn how to check you are using it correctly.

A robox.txt file gives instructions to web robots about the pages which website owner doesn’t want to be crawled. For example, if you do not wish your images to be listed by Google or any other search engine, you can block them using your robot .txt file. Before crawling, search engine refers the robot.txt as instructions to check whether they are allowed to crawl and index on the search engine results.

It is very simple to check whether your website has robot.txt or not. Simply add “/robot.txt” after your domain name in address bar.E.g.http://www.mywebsite.com/robot.txt

Robot.txt is useful if you don’t want to crawl your website’s secured pages, index certain files of your website or index certain areas of your website.

How to create Robots.txt

Create notepad file and save with name “Robots.txt”
Upload it to the Root directory

Contents of Robots.txt file

After creating a file called robots.txt then write the following two line instructions in it.

User-agent: *
Disallow:

"User-agent" – It is a part whichinstructs directions to a specific robot if needed.

There are two ways to use this in your file.

If you want to tell all robots the same instruction then put a " * " As shown as below

User-agent: *

If you want to tell instruction to a specific robot then write as below.

User-agent: Googlebot

Disallow.

"Disallow" means robots should not look into that folders.

If you don’t want to crawl images folder to crawl then the instruction will be

User-agent: *
Disallow: /images

If you don’t want to crawl your pdf folder then

User-agent: *
Disallow: /PDF

If you do not wish to crawl any web pages then

User-agent: *
Disallow: /

If you wish to allow single robot access then

User-agent: Google

Disallow:

User-agent: *

Disallow: /

What to put in it

The "/robots.txt" file is a text file, with one or more records. Usually contains a single record looking like this:

User-agent: *

Disallow: /cgi-bin/

Disallow: /tmp/

Disallow: /~joe/

To exclude all robots from the entire server

User-agent: *

Disallow: /

To allow all robots complete access

User-agent: *

Disallow:

(or just create an empty "/robots.txt" file, or don't use one at all)

To exclude all robots from part of the server

User-agent: *

Disallow: /cgi-bin/

Disallow: /tmp/

Disallow: /junk/

To exclude a single robot

User-agent: BadBot

Disallow: /

To allow a single robot

User-agent: Google

Disallow:

User-agent: *

Disallow: /

To exclude all files except one

This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:

User-agent: *

Disallow: /~joe/stuff/

Alternatively you can explicitly disallow all disallowed pages:

User-agent: *

Disallow: /~joe/junk.html

Disallow: /~joe/foo.html

Disallow: /~joe/bar.html

Company Profile

Maximize the potential of your business on Internet! Internet marketing is the best way today to drive lost of traffic to your business. Webplanners is a leading SEO company in Melbourne.

For more information please contact us .

Why the Robots.txt File Matters for SEO [Beginner’s Guide]

About the author

Webplanners

Author's recent posts

Comments

Webplanners

Services

Social Media & Web Expertise

Contact Us