Robots.txt URL format

According to this page

globbing and regex are not supported in either User-agent or Disallow lines

However, I noticed that stackoverflow robots.txt contains characters like * and? in the urls. Are they supported or not?

Also, it doesn't matter if the URL contains a trailing slash or the two equivalents?

Disallow: /privacy Disallow: /privacy/ 
+6
source share
1 answer

The second question, these two are not equivalent. /privacy will block everything that starts with /privacy , including something like /privacy_xyzzy . /privacy/ , on the other hand, will not block this.

The original robots.txt file did not support wildcards or wildcards. However, many robots do this. Google, Microsoft and Yahoo agreed on the standard several years ago. See http://googlewebmastercentral.blogspot.com/2008/06/improving-on-robots-exclusion-protocol.html for more details.

Most major robots that I know support the "standard."

+10
source

All Articles