Website Hosting and Google App Engine Applications

Can't run a web crawler on GAE with my application, given that I'm launching a free version to run?

+4
source share
4 answers

While Google did not show scheduling, the queue API, and background tasks, you can perform any processing only as a response to an external HTTP request. You will need a heartbeat service that will process one item from the crawler queue at a time (so as not to push the GAE constraints).

To scan using GAE, you need to split the application into a queue (storing queue data in the Datastore), a queue processor that will respond to an external HTTP heartbeat and your actual crawl logic.

You need to manually monitor the use of quotas and start the heartbeat when you have a spare quota, and stop if it will be used.

When Google introduces the APIs that I mentioned at the beginning, you will have to rewrite the parts that will be implemented more efficiently using the Google APIs.

UPDATE: Google introduced the task queue API some time ago. See task queue tasks for python and java .

+3
source

Engine application code only works in response to HTTP requests, so you cannot run a persistent crawler in the background. With the upcoming release of scheduled tasks, you can write a scanner that uses this functionality, but it will be less than ideal.

+1
source

I assume that you can (i.e. not impossible) run it, but it will be slow and you will run into limits pretty quickly. Since CPU quotas will be reduced even further in late May, I would recommend against it.

0
source

It is possible. But this is not an application for the application, as Arachnid wrote. If you manage to make it work, I doubt that you will remain in qotas for free accounts.

0
source

All Articles