I have a cron job every day to call the API and get some data. For each row of data, I start a task queue for data processing (which includes searching for data using additional APIs). Once all this is over, my data does not change for the next 24 hours, so I save it.
Is there a way to find out when all the tasks that I queued have finished so that I can cache the data?
I'm currently doing this really randomly, just by scheduling two cron jobs as follows:
class fetchdata(webapp.RequestHandler):
def get(self):
todaykey = str(date.today())
memcache.delete(todaykey)
topsyurl = 'http://otter.topsy.com/search.json?q=site:open.spotify.com/album&window=d&perpage=20'
f = urllib.urlopen(topsyurl)
response = f.read()
f.close()
d = simplejson.loads(response)
albums = d['response']['list']
for album in albums:
taskqueue.add(url='/spotifyapi/', params={'url':album['url'], 'score':album['score']})
class flushcache(webapp.RequestHandler):
def get(self):
todaykey = str(date.today())
memcache.delete(todaykey)
Then my cron.yaml looks like this:
- description: gettopsy
url: /fetchdata/
schedule: every day 01:00
timezone: Europe/London
- description: flushcache
url: /flushcache/
schedule: every day 01:05
timezone: Europe/London
- , 5 , 5 , , , .
? , .