Stop Google from indexing

Is there any way that Google doesn't index the site?

+50
google-index
Dec 23 '08 at 23:29
source share
11 answers

robots.txt

User-agent: * Disallow: / 

this blocks all search indexes.

for more information see: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40360

+81
Dec 23 '08 at 23:32
source share
β€” -

I have to add my answer here, as the accepted answer does not really address the issue properly. Also remember that preventing Google from crawling does not mean that you can keep your content private.

My answer is based on several sources: https://developers.google.com/webmasters/control-crawl-index/docs/getting_started https://sites.google.com/site/webmasterhelpforum/en/faq--crawling-- indexing --- ranking

robots.txt file controls the scanning, but not the indexing! These two are completely different actions performed separately. Some pages may be crawled but not indexed, and some may even be indexed but never crawled . A link to a page without crawling may exist on other websites, which will cause the Google indexer to follow it and try to index.

An indexing question that collects page data so it can be accessed through search results. This can be blocked by adding a meta tag:

 <meta name="robots" content="noindex" /> 

or adding an HTTP header in the response:

 X-Robots-Tag: noindex 

If the question is about scanning, then of course you can create a robots.txt file and put the following lines:

 User-agent: * Disallow: / 

A crawl is an action performed to collect information about the structure of one particular website. For example. You added a site using Google Webmaster Tools. Crawler will take it to your account and log in to your site looking for robots.txt . If he does not find, then he will assume that he can scan everything (it is very important to have a sitemap.xml file to help with this operation, as well as indicate priorities and determine the frequency of change). If he finds a file, he will follow the rules. After a successful scan, at some point, indexing is done to crawl the pages, but you cannot tell when ...

It is important . This means that your page can still appear in Google search results regardless of robots.txt .

I hope at least some users will read this answer and let it know, because it is important to know what is actually happening.

+61
Feb 11
source share
+3
Dec 23 '08 at 23:32
source share

FYI - Google has a website for webmasters, which is worth at least checking .... http://www.google.com/webmasters/start/

+3
Dec 23 '08 at 23:39
source share

Google executes a robots.txt file.

+2
Dec 23 '08 at 23:30
source share

I use a simple aspx page to transfer the results from google to my browser using a fake Pref cookie, which gets 100 results at a time, and I didn’t want google to display this relay page, so I check the IP address and if it starts with 66.249 then I just do the redirection.

Click my name if you value privacy and want a copy.

the other trick i'm using is to have some javascript that causes the page to set a flag in the session because most (NOT ALL) web bots do not execute javascript, so you know this is a brower with javascript turned off or more than a bot.

0
Sep 22 '11 at 15:39
source share

You can also add meta robots this way:

 <head> <title>...</title> <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> </head> 

And another additional layer is to change .htaccess, but you need to deeply check it.

0
Nov 02
source share

use the nofollow meta tag:

 <meta name="robots" content="nofollow" /> 

To specify nofollow at the link level, add the rel attribute with the value nofollow to the link:

 <a href="example.html" rel="nofollow" /> 
0
Mar 27 '13 at 11:02
source share

You can disable this server by adding the following parameter all over the world in apache conf, or you can use the same parameters in vhost to disable it only for a specific vhost.

X-Robots-Tag Header Set "noindex, nofollow"

Once this is done, you can verify it by returning the apache headers.

curl -I staging.mywebsite.com HTTP / 1.1 302 Established: Sat, Nov 26 2016 22:36:33 GMT Server: Apache / 2.4.18 (Ubuntu) Location: / pages / X-Robots-Tag: noindex, nofollow Content-Type: text / html; encoding = UTF-8

0
Nov 26 '16 at 22:42
source share

Is there a way to stop Google indexing a site?

To stop Google from crawling, just add the meta tag to the head each page:

 <meta name="googlebot" content="noindex, nofollow"> 
0
Nov 20 '17 at 8:51
source share

Keep in mind that the Microsoft crawler for Bing, despite their requirement to obey robots.txt, does not always do this.

Our server statistics indicate that they have several IP addresses that run scanners that do not obey robots.txt, as well as some of them.

-one
Sep 21 '11 at 16:33
source share



All Articles