Persistent Python object in memory for nginx / uwsgi server

I doubt that this is even possible, but here is the problem and the proposed solution (the appropriateness of the proposed solution is the subject of this issue):


I have some "global data" that should be available for all requests. I save this data in Riak and use Redis as a cache layer for access speed (for now ...). The data is divided into about 30 logical blocks, each about 8 KB.

Each request is required to read 4 of these 8KB chunks, resulting in 32KB of data being read from Redis or Riak. This is available in ADDITION for any query-specific data that will also need to be read (which is quite a bit).

Assuming even 3000 requests per second (this is not a live server, so I don’t have real numbers, but 3000ps is a reasonable assumption, maybe more), this means that 96KBps transfers from Redis or Riak to ADDITION to are no longer inconsequential other calls coming out from applied logic. In addition, Python parses the 8-KB JSON of these objects 3,000 times per second.


All this - especially Python, which must deserialize data multiple times - seems completely unnecessary, and a completely elegant solution would be to simply have deserialized data cached in its own object in memory in Python, which I can periodically update when all this “static” data becomes outdated. After a few minutes (or hours) instead of 3000 times per second.

But I do not know if this is possible. You really need an “always running” application to cache any data in its memory. And I know that this is not the case in the combination of nginx + uwsgi + python (compared to something like node) - the data in python memory will NOT be saved in all requests , as far as I know, unless I'm terribly mistaken.

Unfortunately, this is a system that I “inherited” and therefore cannot make too many changes in terms of underlying technology, and I am not sufficiently aware of how the combination of nginx + uwsgi + python works in terms of running Python processes and persistent data in Python memory, which means that I MAY be terribly mistaken with my assumption above!


So, a direct consultation on whether this solution would work + with links to material that could help me understand how nginx + uwsgi + python works in terms of starting new processes and allocating memory would help a lot.

PS:

  • Some documents went through for nginx, uwsgi, etc., but still do not fully understand the fork in my use case. I hope to make some progress in this direction forward.

  • If the memory operation MAY RECEIVE, I would cut out Redis, since I only cache the static data mentioned above in it. This makes the in-process persistent in-memory Python cache even more attractive to me, reducing one moving part in the system and at least four networks for each request.

+7
source share
4 answers

What you offer is directly impossible. Since new processes can be deployed up and down beyond your control, there is no way to store Python embedded data in memory.

However, there are several ways around this.

Often, one level of key value storage is all you need. And sometimes, having fixed buffers for values ​​(which you can use directly as str / bytes / bytearray , all you need for a struct to be serialized there or otherwise) is all you need. In this case, the uWSGI built-in caching infrastructure will take care of everything you need.

If you need more precise control, you can see how the cache is implemented on top of SharedArea and do something customizable. However, I would not recommend this. This basically gives you the same API that you get with the file, and the only real advantages over using the file is that the server will control the lifetime of the file; it works in all languages ​​supported by uWSGI, even those that do not allow files; and makes it easy to port your user cache to a distributed (multi-computer) cache if you need it later. I do not think any of them applies to you.

Another way to get flat storage with a key, but without fixed-size buffers is with Python stdlib anydbm . The search by key value is the same as pythonic: it looks the same as dict , except that it backs up the BDB database (or similar) to a disk cached as it is required in memory, instead of being stored in a hash table in memory.

If you need to handle a few other simple types - all that are incredibly fast for un / pickle, like int s - you can consider shelve .

If your structure is tight enough, you can use the key database for the top level, but access the values ​​through ctypes.Structure or de-serialize with struct . But usually, if you can do this, you can also eliminate the upper level, and at this point all this is just one big Structure or Array .

At this point, you can simply use a simple file for storage - either mmap it (for ctypes ), or just open and read it (for struct ).

Or use multiprocessing Common ctypes Objects to access your Structure directly from the shared memory area.

Meanwhile, if you don’t really need all the cache data all the time, just bits and pieces every once in a while, for which databases are needed. Again, anydbm etc. It may be all you need, but if you have a complex structure, create an ER diagram, turn it into a set of tables and use something like MySQL.

+3
source

You didn't say anything about writing this data, statically? In this case, the solution is simple, and I don’t know what happens with all the “not feasible” answers.

Uwsgi employees are constantly running applications. Thus, the data is absolutely saved between requests. All you have to do is store the material in a global variable, that’s all. And keep that in mind at the workplace, and workers restart from time to time, so you need the right load / null strategies.

If the data is updated very rarely (rarely, to restart the server when it does), you can save even more. Just create objects during application creation. Thus, they will be created exactly once, and then all workers will drop the master and reuse the same data. Of course, this is copy-to-write, so if you upgrade it, you will lose the memory benefits (the same thing will happen if python decides to compress its memory during gc startup, so it is not super predictable).

+1
source

I have never tried this myself, but could you use uWSGI SharedArea to accomplish what you need?

0
source

"The data in python memory will NOT be stored in all requests until my knowledge, unless I am mistaken."

you are wrong.

the whole point of using uwsgi over, say, the CGI mechanism is to save data on streams and save initialization overhead for each call. you must set processes = 1 in your .ini file or, depending on how uwsgi is configured, it can start more than one workflow on your behalf. write down env and find 'wsgi.multiprocess': False and 'wsgi.multithread': True , and all uwsgi.core threads for one worker should show the same data.

You can also see how many workflows and "main" threads under each of them you use with the built-in stats-server .

why uwsgi provides lock and unlock functions for managing data stores across multiple streams.

you can easily test this by adding the /status route to the application, which simply resets the json representation of your global data object and scans it so often after actions that update the repository.

0
source

All Articles