Editing the robots.txt file

To remove an entire site or specific sections and pages from search results in Google, Yandex, and other search engines, they must be blocked from indexing. After this action, the content will no longer appear in search results. Let’s review the commands you can use in the robots.txt file to prohibit indexing.

How to configure robots.txt

robots.txt is a special file that allows you to configure how your site is indexed by search engine crawlers.

Here are some settings you can apply with robots.txt:

  • Block indexing of specific site pages;
  • Prohibit indexing for specific crawlers or block the entire site from indexing;
  • Set the time interval between page visits by crawlers.


How to set the crawl rate for search engine bots

You can set the crawl rate in Yandex.Webmaster under IndexingCrawl rate. More details are available in Yandex help.

For Google, the search engine bot automatically adjusts the crawl speed depending on the server’s response. If the server slows down or returns an error, crawling may pause.

Please note:

  • Reduce the crawl rate only if the crawler creates excessive load on the server. In other cases, you do not need to change this setting.
  • Lowering the crawl rate does not affect search rankings in Yandex.

Examples:

Timeout for Yandex bot visits — no more than once every 2 seconds:
User-agent: Yandex
Crawl-delay: 2.0

Timeout for all bots — no more than once every 1 second
User-agent: *
Disallow: /search Crawl-delay: 1.0

Not all crawlers follow robots.txt rules. For example, Googlebot follows «Disallow» rules but ignores the «Crawl-delay» directive. To limit Googlebot, use Google Search Console tools. Google help: About robots.txt files
For YandexBot, the maximum crawl delay you can set via robots.txt is 2 seconds. To specify the exact crawl rate, use Yandex.Webmaster. Yandex help: Using robots.txt


How to block indexing of a directory or URL
 

# blocking indexing of vip.html for Googlebot only:
User-agent: Googlebot
Disallow: /vip.html

# blocking indexing of the /private folder for all crawlers:
User-agent: *
Disallow: /private/

# allowing YandexBot to access only pages starting with /shared:
User-agent: Yandex
Disallow: /
Allow: /shared

The User-agent directive specifies which crawler the rules apply to. You can name specific bots or set rules for all crawlers.


How to completely block a website from indexing

To block your entire website from all search engines, add this to robots.txt:

User-agent: *
Disallow: /

To block only one search engine (e.g., Yandex):

User-agent: Yandex
Disallow: /

To block all except one search engine (e.g., Google):

User-agent: *
Disallow: /
User agent: Googlebot
Allow: /

Всё ещё остались вопросы?