robots.txt

robots.txt is a text file webmasters create to instruct robots (typically search engine robots) how to crawl and index pages on their website.

Web Crawlers (also known as: web-indexing robots or spider bots) are programs that traverse the Web automatically. Search engines such as Google use them to index the web content, spammers use them to scan for email addresses, and they have many other uses.

User-agent is non-case-sensitive.

Crawler Bots are indexing your website and consume resources of the server where your website is hosted.
1 indexed page is the same as the user will open 1 page in the browser.

Robots.txt examples:

robots.txt example syntax:

Sitemap: http://website.com/sitemap.xml

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/themes/
Crawl-delay: 10

User-agent: Mediapartners-Google
Disallow: /wp-admin/
Disallow: /wp-includes/
Crawl-delay: 10

User-agent: Googlebot
Disallow: /wp-admin/
Disallow: /wp-includes/
Crawl-delay: 10

User-agent: Adsbot-Google
Disallow: /wp-admin/
Disallow: /wp-includes/
Crawl-delay: 10

User-agent: msnbot
Disallow: /wp-admin/
Disallow: /wp-includes/
Crawl-delay: 10

User-agent: bingbot
Disallow: /wp-admin/
Disallow: /wp-includes/
Crawl-delay: 10

User-agent: Slurp
User-agent: Yahoo
Disallow: /wp-admin/
Disallow: /wp-includes/
Crawl-delay: 10

# Block Google
User-agent: googlebot
Disallow: /

# Block Bing
User-agent: bingbot
Disallow: /

User-agent: msnbot
Disallow: /

# Block Yahoo
User-agent: slurp
User-agent: yahoo
Disallow: /

# Block Ask
User-agent: askjeeves
User-agent: jeeves
User-agent: teoma
Disallow: /

# Block Baidu
User-agent: baiduspider
Disallow: /

# Block Yandex
User-agent: yandex
Disallow: /

Leave a Comment