Robots.txt Tips and Tutorial - Control search engine crawlers

We will discuss robots.txt tips and tutorial in this post. Use power of robots txt file and guide/control search engine crawlers (spiders or robots). Your website should have a robots.tx file located at the root of your website so that it can be accessed as http://example.com/robots.txt or http://www.example.com/robots.txt.

You can guide search engine crawlers to index or not to index certain pages, web directories or URL paths etc with the help of robots.txt file.

Table of Contents

How to create/generate robots.txt file

You can simply create a basic robots txt file or you can also generate it through Google webmaster tools.

1. Open a new txt file using notepad.
2. Write following code. The below code is used to allow all website pages for crawling.

User-agent: *
Allow: /

User-agent: *
Disallow:

3. Click save and use file name as robots.
4. Upload this file to your website root folder
5. Browse this file using path http://www.example.com/robots.txt or http://example.com/robots.txt whatever your preferred web address.
6. Now test your robots.txt file in Google Webmaster tools.

When you encounter 500 or 404 error on accessing this file then contact your webmaster or website developer.

Robots.txt Tips for SEO

Allow all webpages for crawling

User-agent: *
Allow: /

User-agent: *
Disallow:

Disallow specific path or folder for crawling

User-agent: *
Disallow: /folder

Robots.txt Wildcard Matching

Disallow query string URLs or extensions.

Disallow all URLs with query string
User-agent: *
Disallow: /*?

Disallow all URLs which ends with .asp
User-agent: *
Disallow: /*.asp$

Robots.txt Advanced Tips

If you have very large website then you can use crawl delay function so that crawlers may not harm your website performance. Although you can use this feature in Google webmaster tools and set your website crawl priority.

Example –

User-agent: Googlebot
Crawl-delay: 10

Where value 10 is in seconds

Write different rules for different crawlers

Example –

User-agent: *
Disallow: /folder1
Disallow: /folder2

User-agent: Googlebot
Disallow: /folder3

User-agent: bingbot
Disallow: /folder4