NumPy Arrays with SQLite

The most common SQLite interface I've seen in Python is sqlite3 , but is there anything that works well with NumPy arrays or reference frames? By this, I mean one that recognizes data types and does not require inserting row by row and retrieving NumPy (rec) into the array ...? It seems like R SQL functions in RDB or sqldf , if someone is familiar with them (they import / export / join whole tables or subsets of tables to or from R data tables).

+4
source share
4 answers

why not give redis a try?

Drivers are available for your two platforms - python (redis, via the package index) 2 ) and R (rredis, CRAN ).

The genius of redis is not that it magically recognizes the NumPy data type and allows you to insert and retrieve multidimensional NumPy arrays as if they were native redis data types, and its genius is in the wonderful ease with which you can create such an interface just a few lines of code.

There are (at least) a few redis tutorials in python; especially on the DeGizmo blog.

 import numpy as NP # create some data A = NP.random.randint(0, 10, 40).reshape(8, 5) # a couple of utility functions to (i) manipulate NumPy arrays prior to insertion # into redis db for more compact storage & # (ii) to restore the original NumPy data types upon retrieval from redis db fnx2 = lambda v : map(int, list(v)) fnx = lambda v : ''.join(map(str, v)) # start the redis server (eg from a bash prompt) $> cd /usr/local/bin # default install directory for 'nix $> redis-server # starts the redis server # start the redis client: from redis import Redis r0 = Redis(db=0, port=6379, host='localhost') # same as: r0 = Redis() # to insert items using redis 'string' datatype, call 'set' on the database, r0, and # just pass in a key, and the item to insert r0.set('k1', A[0,:]) # row-wise insertion the 2D array into redis, iterate over the array: for c in range(A.shape[0]): r0.set( "k{0}".format(c), fnx(A[c,:]) ) # or to insert all rows at once # use 'mset' ('multi set') and pass in a key-value mapping: x = dict([sublist for sublist in enumerate(A.tolist())]) r0.mset(x1) # to retrieve a row, pass its key to 'get' >>> r0.get('k0') '63295' # retrieve the entire array from redis: kx = r0.keys('*') # returns all keys in redis database, r0 for key in kx : r0.get(key) # to retrieve it in original form: A = [] for key in kx: A.append(fnx2(r0.get("{0}".format(key)))) >>> A = NP.array(A) >>> A array([[ 6., 2., 3., 3., 9.], [ 4., 9., 6., 2., 3.], [ 3., 7., 9., 5., 0.], [ 5., 2., 6., 3., 4.], [ 7., 1., 5., 0., 2.], [ 8., 6., 1., 5., 8.], [ 1., 7., 6., 4., 9.], [ 6., 4., 1., 3., 6.]]) 
+7
source

Doug's suggestion with redis is not bad, but I think its code is a bit complicated and, as a result, quite slow. For my purposes, I had to serialize + write, and then capture + deserialize a square matrix of about a million floats in less than a tenth of a second, so I did this:

For the record:

 snapshot = np.random.randn(1024,1024) serialized = snapshot.tobytes() rs.set('snapshot_key', serialized) 

Then for reading:

 s = rs.get('snapshot_key') deserialized = np.frombuffer(s).astype(np.float32) rank = np.sqrt(deserialized.size).astype(int) snap = deserialized(rank, rank) 

You can do basic performance testing using ipython using% time, but neither tobytes nor frombuffer will take more than a few milliseconds.

+2
source

It looks a little older, but is there some reason why you cannot just do fetchall () instead of iterating and then just initialize numpy on declaration?

+1
source

I found at least three Python packages for the SQLite and NumPy interface :

Each of these packages must deal with the problem that SQLite (by default) understands only standard Python types , and not NumPy data types such as numpy.int64.

RecSQL 0.7.8+ works for me (most of the time), but I find this a pretty bad hack and a look at the code, esutil.sqlite_util looks more mature.

+1
source

All Articles