I have a very interesting problem that I cannot explain.
Every 2-6 seconds googlebot (I searched googlebots IP, its real thing [using host IP]) requests a page on our website (works: php, apache, mongodb) that does not exist (404s), No other robot or man never requested such a page! Just googlebot.
Each request looks something like this:
/ 2de4f853c2853807b2e72387aa8928a4
/ ea5700c343d1a9798bc554af7c1a330e
/ e5aafa102d54ba7517703336846cc019
Our code does not use 32 char strings, and there are no links like our internal or external sites. We use codeigniter, so at first I thought it was session_id by default, I checked it is not.
Has anyone seen anything like this? Our site uses history.push on some pages, could this be the reason for this? Just an idea.
Raw data from an example query:
array ( 'date' => '2012-12-01', 'time' => '10:01:33 PM', 'additional_data' => array ( 'server_vars' => array ( 'REDIRECT_STATUS' => '200', 'HTTP_HOST' => 'www.xxxxxxx.com', 'HTTP_ACCEPT' => '*/*', 'HTTP_ACCEPT_ENCODING' => 'gzip,deflate', 'HTTP_FROM' => 'googlebot(at)googlebot.com', 'HTTP_USER_AGENT' => 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)', 'HTTP_X_FORWARDED_FOR' => 'xxxxxxx', 'HTTP_X_FORWARDED_PORT' => '80', 'HTTP_X_FORWARDED_PROTO' => 'http', 'HTTP_CONNECTION' => 'keep-alive', 'PATH' => '/sbin:/usr/sbin:/bin:/usr/bin:/home/ec2-user/ec2/bin', 'SERVER_SIGNATURE' => '<address>Apache/2.2.22 (Amazon) Server at www.xxxxxxx.com Port 80</address> ', 'SERVER_SOFTWARE' => 'Apache/2.2.22 (Amazon)', 'SERVER_NAME' => 'www.xxxxxxx.com', 'SERVER_ADDR' => 'xxxxxxxxxx', 'SERVER_PORT' => '80', 'REMOTE_ADDR' => '10.171.147.114', 'REMOTE_PORT' => '40759', 'REDIRECT_URL' => '/e5aafa102d54ba7517703336846cc019', 'GATEWAY_INTERFACE' => 'CGI/1.1', 'SERVER_PROTOCOL' => 'HTTP/1.1', 'REQUEST_METHOD' => 'GET', 'QUERY_STRING' => '', 'REQUEST_URI' => '/e5aafa102d54ba7517703336846cc019', 'SCRIPT_NAME' => '/index.php', 'PATH_INFO' => '/e5aafa102d54ba7517703336846cc019', 'PATH_TRANSLATED' => 'redirect:/index.php/e5aafa102d54ba7517703336846cc019', 'PHP_SELF' => '/index.php/e5aafa102d54ba7517703336846cc019', 'REQUEST_TIME' => 1354428093, ), 'codeigiter_session' => array ( 'session_id' => 'c795e40a279f58d9fbbf7f5501a26787', 'ip_address' => '10.171.147.114', 'user_agent' => 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)', 'last_activity' => 1354428093, 'user_data' => '', ), ), )
What else can I collect to understand this. It is very strange.
Update: Traffic comes from 2 primary IP addresses. 10.171.147.114 and 10.161.46.102
I looked through them and they are not GoogleBot.
I got this information from one IP search site.
Remember that the IP address ranges 10.0.0.0 - 10.255.255.255, 172.16.0.0 - 172.31.255.255, 192.168.0.0 - 192.168.255.255 and 224.0.0.0 - 239.255.255.255 reserved IP addresses for private use on the Internet and search by IP addresses for them will not return any results.
What should I do with these queries? What is the meaning of these queries? If this is a type of DOS attack, they do a very poor job.