When to use a certain type of persistence in Google App Engine?

First of all, I will explain the question. Persistence, I mean storing data outside the execution of a single request. This may not be the best title for the question, so feel free to edit it.

As I see it, there are three types of persistence in GAE, each of which is “closer” to the request itself:


Data store

Here, all the data is likely to be based. It may temporarily move to higher levels of storage, but in the end, this is where the data really is. Unfortunately, the storage request is repeatedly slow and consumes a lot of resources.

Use when ...

  • storage of data that must be stored for an indefinite period of time.

Avoid use when ...

  • receiving data that is often requested but rarely updated.

Memcache

This is a very complex caching mechanism that stores data in memory and ensures that all users read / write to the same cache. This is a much faster way to get / set data based on the key and price than when using the data warehouse. Unfortunately, data can only remain in memory for so long, and there is no guarantee that it will remain until you say so; data may disappear at any time if memory is required elsewhere.

Use when ...

  • You need to receive data more often than you need to update. Even when data needs to be updated frequently, they can use their capabilities (if several skipped updates are considered acceptable) by setting up a task queue to save data from memcache to the data warehouse.

Avoid use when ...

  • data should be updated frequently and should be updated upon receipt.

Global variables

This is not an official method of saving data, but it works. However, this is the least reliable method, and since it does not have data synchronization between servers, the stored data can be displayed differently for different users (but from what I found, the server rarely changes for the same user). Theoretically, this should however be the method that has the least overhead when getting / setting values ​​and can use it.

Use when ...

  • Is hell freezing? I don’t know ... I don’t have enough knowledge about what is going on behind the scenes to actually rely on this method. Discuss!

Avoid use when ...

  • You rely on the fact that the data is the same on all servers.

Cookies

If the data is user-specific, it may be effective to save it as a cookie in a user browser. There are some pitfalls to watch out for:

  • Security - the user can interfere with cookies, and attackers can do the same. To keep the content unreadable and unchanged for everyone, the cookie can be encrypted using the PyCrypto library available in GAE.
  • Performance - since cookies are sent with every request (even images), it can add to the used bandwidth and slow down requests. One solution is to use a different domain for static content, so the browser will not send cookies for this content.

When should you use different types of persistence? How can they be combined to reduce / equalize the amount of resources spent?

+4
source share
4 answers

Datastore

Use a data warehouse to store any long, live information. The data warehouse should be used in the same way as a regular database for storing data that will be used on your website / application.

Memcache

Use this to access data much faster than trying to access the data warehouse. MemCache can quickly return data and can be used for any data that should span multiple calls from users. This is usually data that was originally in the data warehouse and then moved to memcache.

def get_data(): data = memcache.get("key") if data is not None: return data else: data = self.query_for_data() #get data from the datastore memcache.add("key", data, 60) return data 

Memcache will hide when the item is out of date. You set this in the last upload option shown above.

Global variables I would not use them at all, since they cannot span instances. In GAE, the request creates a new instance, well in python. If you want to use global variables, I would save the data needed in memcache.

+1
source

Your message is a good summary of 3 main options. You basically already answered the question. However, if you are currently building an application and stressing whether you need something memcache, try the following:

  • Write your application using a data warehouse for everything you need in order to survive more than one request.
  • Once your application (or some subset used) is running, run some functional tests or simulations to see where the slow points (or high quota) are.
  • Find the slowest or ineffective query path and find out how to do it faster (either using memcache or modifying your data structures so that you can make queries instead of queries or possibly store something in a global instance variable *)
  • go 2 until you are satisfied.

* Things that can be useful for a "global" variable would be relatively expensive to create / retrieve, that a significant portion of your requests will be used, and this need not be consistent between requests / users.

+1
source

I use a global variable to speed up json conversion. Before converting the data structure to json, I use hash it and check if there is json if it is already available. For my application, this gives a pretty quick acceleration, since the pure python implementation is pretty slow.

+1
source

Global variables

In addition to AutomatedTester's answer, also answer his further question on how to exchange information between GETs without memcache or data storage, under a quick illustration of the use of global variables:

 if 'i' not in globals(): i = 0 def main(): global i i += 1 print 'Status: 200' print 'Content-type: text/plain\n' print i if __name__ == '__main__': main() 

Calling this script several times will give you 1, 2, 3 ... Of course, as Blixt mentioned earlier, you should not rely too much on this trick ("i" can sometimes switch to zero), but it can be useful for storing user information in a dictionary, for example, session data.

+1
source

All Articles