Robots.txt Allows All but Several Subdirectories

I want my site to be indexed in search engines, with the exception of a few subdirectories. The following are the robots.txt settings:

robots.txt in the root directory

 User-agent: * Allow: / 

Separate robots.txt in a subdirectory (exclude)

 User-agent: * Disallow: / 

Is this correct, or will the root rule override the subdirectory rule?

+5
source share
3 answers

No, this is wrong.

You do not have a robots.txt file in a subdirectory. Your robots.txt should be placed in the root directory of your host document .

If you want to prevent crawling of URLs that start with /foo , use this entry in the robots.txt file ( http://example.com/robots.txt ):

 User-agent: * Disallow: /foo 

This allows you to bypass everything (so there is no need for Allow ) except URLs such as

  • http://example.com/foo
  • http://example.com/foo/
  • http://example.com/foo.html
  • http://example.com/foobar
  • http://example.com/foo/bar
  • ...
+7
source

Yes there is

 User-agent: * Disallow: / 

This directive is useful if you are developing a new website and do not want search engines to index your incomplete website. In addition, you can get extended information on the right here.

+1
source

You can manage them using the robots.txt file, which is located in the root directory. Make sure you have enable disallow patterns.

0
source

Source: https://habr.com/ru/post/1213305/


All Articles