You can rewrite robots.txt to another file (name it "robots_no.txt", containing:
User-Agent: * Disallow: /
(source: http://www.robotstxt.org/robotstxt.html )
The .htaccess file will look like this:
RewriteEngine On RewriteCond %{HTTP_HOST} !^www.example.com$ RewriteRule ^robots.txt$ robots_no.txt
Use a custom robots.txt file for each (sub) domain:
RewriteEngine On RewriteCond %{HTTP_HOST} ^www.example.com$ [OR] RewriteCond %{HTTP_HOST} ^sub.example.com$ [OR] RewriteCond %{HTTP_HOST} ^example.com$ [OR] RewriteCond %{HTTP_HOST} ^www.example.org$ [OR] RewriteCond %{HTTP_HOST} ^example.org$
Instead of asking search engines to block all pages for pages other than www.example.com , you can also use <link rel="canonical"> .
If http://example.com/page.html and http://example.org/~example/page.html both point to http://www.example.com/page.html , put the following tag in <head> :
<link rel="canonical" href="http://www.example.com/page.html">
See also Googles article on rel = "canonical"
source share