Since, as mentioned earlier, you can spoof user agents and IPs, they cannot be used to reliably detect bots.
I work for a security company, and our bot detection algorithm looks something like this:
Step 1 - Data Collection:
but. Cross-validate user agent against IP. (both must be right)
b. Check the header options (what is missing, what order, etc.)
from. Check your behavior (early access and robots.txt compliance, overall behavior, number of pages visited, traffic, etc.)
Step 2 - Classification:
Cross-validation of data, the bot is classified as Good, Bad or Suspicious
Step 3 - Active Tasks:
Suspicious bots have the following problems:
but. JS Challenge (can it activate JS?)
b. Cookie Challenge (can it accept flirts?)
from. If still not final → CAPTCHA
This filtering mechanism is VERY effective, but I really do not think that it can be replicated by one person or even a non-specialized provider (on the one hand, tasks and a bot database must be constantly updated by the security team).
We offer some do-it-yourself tools in the form of Botopedia.org , our catalog that can be used to cross-check IP / User-name, but for a truly effective solution you will have to rely on specialized services.
There are several free solutions for monitoring bots, including our own, and most of them will use the same strategy that I mentioned above (or the like).
GL
Igal zeifman
source share