Speeding up templates in GAE-Py by combining RPC calls

Question

Speeding up templates in GAE-Py by combining RPC calls

Here is my problem:

class City(Model): name = StringProperty() class Author(Model): name = StringProperty() city = ReferenceProperty(City) class Post(Model): author = ReferenceProperty(Author) content = StringProperty()

The code is not important ... its this django template:

 {% for post in posts %} <div>{{post.content}}</div> <div>by {{post.author.name}} from {{post.author.city.name}}</div> {% endfor %}

Now let's say that I receive the first 100 messages using Post.all().fetch(limit=100) , and pass this list to the template - what happens?

He gets 200 more datastore gets - 100 to get each author, 100 to get every city of the author.

This is understandable, in fact, since the post office has only a link to the author, and the author has only a link to the city. The __get__ __get__ on post.author and author.city transparently searches and retrieves data (see this question).

Some uses

Use Post.author.get_value_for_datastore(post) to collect author keys (see link above) and then make a package to get them all - the problem is that we need to redesign the template data object ... something that is required additional code and service for each model and processor.
Write an accessory, say cached_author , which first checks memcache for the author and returns this. The problem is that post.cached_author will be called 100 times, which could probably mean 100 memcache calls.
Hold the static key in the object map (and update it, perhaps once every five minutes), if the data does not have to be very relevant. The cached_author accessory can simply reference this map.

All these ideas need additional code and maintenance, and they are not very transparent. What if we could do

 @prefetch def render_template(path, data) template.render(path, data)

Turns out we can ... hooks and the Guido Tooling module confirm this. If the @prefetch method wraps the template visualization, fixing which keys are requested, we can (at least one depth level) grab which keys are requested, return the layout of the objects and make a batch on them. This can be repeated for all depth levels until new keys are requested. The final render can intercept recipients and return objects from the map.

This will change the total amount of 200 , fall into 3 , transparently and without any additional code. Not to mention a significant reduction in the need for memcache and help in situations where memcache cannot be used.

The problem is that I do not know how to do this (yet). Before you start, did anyone else do this? Or does someone want to help? Or do you see a huge flaw in the plan?

+1

python google-app-engine django-templates

Sudhir jonathan Jan 16 '10 at 6:52

source share

2 answers

Here are some great examples of prefetching ...

http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine

0

Brandon fields Apr 20 '10 at 21:18

source share

Jason smith · Accepted Answer · 2010-01-16T09:29:52+0000

I was in a similar situation. Instead of a ReferenceProperty, I had a relationship between parents and children, but the basics were the same. My current solution is not polished, but at least it is efficient enough for reports and things with 200-1000 entities, each of which has several subsequent child objects that require selection.

You can manually search for data in batches and set them if you want.

 # Given the posts, fetches all the data the template will need # with just 2 key-only loads from the datastore. posts = get_the_posts() author_keys = [Post.author.get_value_for_datastore(x) for x in posts] authors = db.get(author_keys) city_keys = [Author.city.get_value_for_datastore(x) for x in authors] cities = db.get(city_keys) for post, author, city in zip(posts, authors, cities): post.author = author author.city = city

Now, when you create a template, there will be no additional queries or selections. It was rough around the edges, but I could not live without this template, which I just described.

You can also check that none of your entities is None , because db.get () returns None if the key is bad. However, this only happens with basic data validation. Similarly, you need to re-execute db.get () if there is a timeout, etc.

(Finally, I don’t think memcache will work as the main solution. Perhaps as a secondary level to speed up data warehouse calls, but you need to work well if memcache is empty. In addition, Memcache has several quotas such as memcache and the total amount of data transferred. Overusing memcache is a great way to kill your application.)

Speeding up templates in GAE-Py by combining RPC calls

More articles: