Tomcat7 & Struts1 - Processing Multiple Google Bot Blogs

Question

Tomcat7 & Struts1 - Processing Multiple Google Bot Blogs

More than half of the hits on one of my servers are on Google Bot, constantly browsing millions of pages.

The reason we have so many pages is because the company is an auto parts store, with unique URLs for each combination of manufacturer part number and vehicles that it fits. This is not something we can get rid of; people search on these terms all the time, and we need unique landing pages for everyone (because all of our competitors have them, of course!).

So we have millions of pages that Google should know about. This means that we receive several hits per second from their searcher, around the clock, and this is traffic that is vital and necessary for any end-user traffic.

As we constantly add new products to the catalog at a speed of hundreds of thousands per week, our list of unique URLs is growing longer and traffic is constantly growing.

The Google bot does not pay any attention to cookies, which means that every time it receives a new session, so this increases the amount of memory used to the maximum level.

How do others with Tomcat7 and Struts handle such massive auto-traffic?

The method I plan to try is to terminate the session at the end of each request in the page footer tile (if and only if the user agent string is a Google crawler). Is this an effective method of saving memory?

What other strategies can help us efficiently handle bot traffic?

+4

tomcat scalability struts google-crawlers

Matt hucke May 06 '11 at 15:35

source share

1 answer

Riccardo cossu · Answer 1 · 2011-05-06T15:42:21+0000

I'm not quite in the field, but you tried to take a look: http://www.robotstxt.org/

I guess this is the standard Google should adhere to.

Tomcat7 & Struts1 - Processing Multiple Google Bot Blogs

More articles: