robots.txt is a text file webmasters create to instruct robots (typically search engine robots) how to crawl and index pages on their website.
Web Crawlers (also known as: web-indexing robots or spider bots) are programs that traverse the Web automatically. Search engines such as Google use them to index the web content, spammers use them to scan for email addresses, and they have many other uses.
User-agent is non-case-sensitive.
Crawler Bots are indexing your website and consume resources of the server where your website is hosted.
1 indexed page is the same as the user will open 1 page in the browser.
Robots.txt examples:
robots.txt example syntax:
Sitemap: http://website.com/sitemap.xml User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp-content/plugins/ Disallow: /wp-content/themes/ Crawl-delay: 10 User-agent: Mediapartners-Google Disallow: /wp-admin/ Disallow: /wp-includes/ Crawl-delay: 10 User-agent: Googlebot Disallow: /wp-admin/ Disallow: /wp-includes/ Crawl-delay: 10 User-agent: Adsbot-Google Disallow: /wp-admin/ Disallow: /wp-includes/ Crawl-delay: 10 User-agent: msnbot Disallow: /wp-admin/ Disallow: /wp-includes/ Crawl-delay: 10 User-agent: bingbot Disallow: /wp-admin/ Disallow: /wp-includes/ Crawl-delay: 10 User-agent: Slurp User-agent: Yahoo Disallow: /wp-admin/ Disallow: /wp-includes/ Crawl-delay: 10 # Block Google User-agent: googlebot Disallow: / # Block Bing User-agent: bingbot Disallow: / User-agent: msnbot Disallow: / # Block Yahoo User-agent: slurp User-agent: yahoo Disallow: / # Block Ask User-agent: askjeeves User-agent: jeeves User-agent: teoma Disallow: / # Block Baidu User-agent: baiduspider Disallow: / # Block Yandex User-agent: yandex Disallow: /