How to hide page URL from bots / spiders?

I have 1000 products on my website, and all of them have their own web page, accessible with something like product.php? id = PRODUCT_ID.

On all these pages, I have a link that has a url action.php?id=PRODUCT_ID&referer=CURRNT_PAGE_URL.. so what if I am in product.php? id = 100, this url will become action.php?prod_id=100&referer=/product.php?id=1000, clicking on this url, the user will return toreferer

Now the problem I am facing is that I continue to receive false hits from spiders. Is there a way I can avoid these false hits? I know that I can "type" this url in the robots.txt file, but there are still bots that ignore this. What would you suggest? Any ideas are welcome. Thanks

+5
source share
5 answers

Currently, the easiest way to make a link inaccessible to 99% of robots (even those who prefer to ignore the robots.txt file) is Javascript. Add unobtrusive jQuery:

<script type="text/javascript">
$(document).ready(function() {
    $('a[data-href]').attr('href', $(this).attr('data-href'));
  });
});
</script>

Design your links as follows.

<a href="" rel="nofollow" data-href="action.php?id=PRODUCT_ID&referrer=REFERRER">Click me!</a>

Since the href attribute is written only after the DOM is ready, the robots will not find anything to follow.

+2
source

Your problem consists of two separate questions:

  • multiple urls lead to the same resource
  • scanners do not respect robots.txt

The second problem is difficult to solve, read Stealth Web Scanner Detection

The first is simpler. It seems you need an option that allows the user to return to the previous page.

, ( javascript history.back();), .

refferer?
. , .

cookie?
CURRNT_PAGE_URL cookie, - URL- - cookie HTTP-referrer.

+2

robots.txt .

, , robots.txt , - . .

, , evil_webspider_crawling_everywhere , (, php ) -.

+1

, , .

, , / - - -, , .

, , , , , javascript , . .

Personally, I did not fuss about spiders or bots.

0
source

Another option is to use PHP to detect bots visiting your page.

You can use this PHP function to detect a bot (most of them get it):

function bot_detected() {
  return (
    isset($_SERVER['HTTP_USER_AGENT'])
    && preg_match('/bot|crawl|slurp|spider|mediapartners/i', $_SERVER['HTTP_USER_AGENT'])
  );
}

And then echo href links to the page only when you find that the user is not a bot:

if (bot_detected()===false)) {
echo "http://example.com/yourpage";
}
0
source

All Articles