While Google did not show scheduling, the queue API, and background tasks, you can perform any processing only as a response to an external HTTP request. You will need a heartbeat service that will process one item from the crawler queue at a time (so as not to push the GAE constraints).
To scan using GAE, you need to split the application into a queue (storing queue data in the Datastore), a queue processor that will respond to an external HTTP heartbeat and your actual crawl logic.
You need to manually monitor the use of quotas and start the heartbeat when you have a spare quota, and stop if it will be used.
When Google introduces the APIs that I mentioned at the beginning, you will have to rewrite the parts that will be implemented more efficiently using the Google APIs.
UPDATE: Google introduced the task queue API some time ago. See task queue tasks for python and java .
source share