How to block search engines from indexing all URLs starting with origin.domainname.com

Question

How to block search engines from indexing all URLs starting with origin.domainname.com

I have www.domainname.com, origin.domainname.com pointing to the same code base. Is there any way I can prevent all basename origin.domainname.com urls from being indexed.

Is there any rule in robot.txt for this. Both URLs point to the same folder. Also, I tried redirecting origin.domainname.com to www.domainname.com to the htaccess file, but it does not seem to work.

If someone who had a similar problem and can help, I will be grateful.

thanks

+4

url-rewriting .htaccess robots.txt

Loveleen kaur Oct 05 '10 at 6:18

source share

1 answer

Lekensteyn · Accepted Answer · 2010-10-05T06:53:19+0000

You can rewrite robots.txt to another file (name it "robots_no.txt", containing:

 User-Agent: * Disallow: /

(source: http://www.robotstxt.org/robotstxt.html )

The .htaccess file will look like this:

 RewriteEngine On RewriteCond %{HTTP_HOST} !^www.example.com$ RewriteRule ^robots.txt$ robots_no.txt

Use a custom robots.txt file for each (sub) domain:

 RewriteEngine On RewriteCond %{HTTP_HOST} ^www.example.com$ [OR] RewriteCond %{HTTP_HOST} ^sub.example.com$ [OR] RewriteCond %{HTTP_HOST} ^example.com$ [OR] RewriteCond %{HTTP_HOST} ^www.example.org$ [OR] RewriteCond %{HTTP_HOST} ^example.org$ # Rewrites the above (sub)domains <domain> to robots_<domain>.txt # example.org -> robots_example.org.txt RewriteRule ^robots.txt$ robots_${HTTP_HOST}.txt [L] # in all other cases, use default 'robots.txt' RewriteRule ^robots.txt$ - [L]

Instead of asking search engines to block all pages for pages other than www.example.com , you can also use <link rel="canonical"> .

If http://example.com/page.html and http://example.org/~example/page.html both point to http://www.example.com/page.html , put the following tag in <head> :

 <link rel="canonical" href="http://www.example.com/page.html">

See also Googles article on rel = "canonical"

How to block search engines from indexing all URLs starting with origin.domainname.com

More articles: