How would you protect the database of links from scratches?

I have a large database of links that are sorted differently and tied to other information, which is valuable (for some people).

Currently, my setup (which seems to work) just calls a php file like link.php? id = 123, it registers the request with a timestamp in the database. Before he spits out the link, he checks how many requests have been made from this IP in the last 5 minutes. If its value is greater than x, it redirects you to the captcha page.

Everything works fine and dandy, but the site becomes very popular (and also receives DDOed for about 6 weeks), so php gets the floor, so Im trying to minimize the time that I have to put php to do something. I wanted to show links in plain text instead of link.php? Id = and have an onclick function to just add 1 to your view account. I still press php, but at least if it lags, it does it in the background and the user can immediately see the link they requested.

The problem is what makes the site REALLY inappropriate. Is there anything I can do to prevent this, but still not rely on php to check before spitting out the link?

+6
php screen-scraping
source share
5 answers

The bottleneck seems to be in the database. Each request inserts (registers the request), then selects (determines the number of requests from IP for the last 5 minutes), and then any database operations are necessary to perform the main function of the application.

Consider storing request throttling data (IP, request time) in server memory, rather than encumbering the database. Two solutions are memcache ( http://www.php.net/manual/en/book.memcache.php ) and memcached ( http://php.net/manual/en/book.memcached.php ).

As others noted, make sure that indexes exist for any keys that are requested (fields such as link identifier). If the indexes are installed and the database is still under load, try an HTTP accelerator such as Varnish ( http://varnish-cache.org/ ).

+2
source share

You can do IP throttling at the web server level. Perhaps a module exists for your web server, or as an example, using apache, you can write your own rewritemap and ask it to consult a daemon program so that you can do more complex things. Ask daemon to query the memory database. It will be fast.

+1
source share

Check your database. Are you indexing correctly? A table with many entries will be very fast and slow. You can also start a night process that deletes entries older than 1 hour, etc.

If this does not work, you are looking at updating / balancing the load on your server. Linking directly to pages will only buy you so much time before you have to update anyway.

0
source share

Most scraper simply parse static HTML, so encode your links and then decode them dynamically in a client web browser using JavaScript.

Certain scrapers can still get around this, but they can get around any technique if the data is valuable enough.

0
source share

Everything that you do on the client side cannot be protected. Why not just use AJAX?

You have an onClick event that calls the ajax function, which returns only the link and fills it in the DIV on your page, since the request size is small, it will work quickly for what you need. Just make sure that in the function you are calling to check the timestamp, it is easy to make a script that calls this function many times so that you become references.

You can check jQuery or other AJAX libraries (I use jQuery and sAjax). And I have many pages that dynamically change content very quickly, the client does not even know is not pure JS.

0
source share

All Articles