It depends on the time that the attacker has to receive data. If most of the data is static, it might be interesting for an attacker to fire his scraper, say, after 50 days. If he is on a DSL line where he can request a βnewβ IP address twice a day, a 1% limit will not harm him.
Of course, if you need data faster (because it is outdated quickly), there are better ways (use EC2 instances, set up a BOINC project if there is public interest in the collected data, etc.).
Either there is a Pyramid scheme a la "get 10 people to run your finder, and you get PORNO, or get 100 people to scan it, and you get LOT of porn", as it was quite common a few years ago with Ad- populated websites. Due to participation in the competition (who receives the majority of referrals), you can quickly get many nodes that your crawler works on for very little money.
mihi
source share