How do you create a multidimensional numpy array from iterable tuples?

I would like to create a numpy array from iterable that gives tuples of values, such as a database query.

Same:

data = db.execute('SELECT col1, col2, col3, col4 FROM data') A = np.array(list(data)) 

Is there a way to speed this up without converting iterability to a list in the first place?

+7
python numpy
source share
2 answers

Although technically this is not the answer to my question, I found a way to do what I am trying to do:

 def get_cols(db, cols): def get_col(col): data = db.execute('SELECT '+col+' FROM data', dtype=np.float64) return np.fromiter((v[0] for v in data)) return np.vstack([get_col(col) for col in cols]).T 
+1
source share

I'm not an experienced numpy user, but here is a possible solution for a general question:

 >>> i = iter([(1, 11), (2, 22)]) >>> i <listiterator at 0x5b2de30> # a sample iterable of tuples >>> rec_array = np.fromiter(i, dtype='i4,i4') # mind the dtype >>> rec_array # rec_array is a record array array([(1, 11), (2, 22)], dtype=[('f0', '<i4'), ('f1', '<i4')]) >>> rec_array['f0'], rec_array[0] # each field has a default name (array([1, 2]), (1, 11)) >>> a = rec_array.view(np.int32).reshape(-1,2) # let create a view >>> a array([[ 1, 11], [ 2, 22]]) >>> rec_array[0][1] = 23 >>> a # a is a view, not a copy! array([[ 1, 23], [ 2, 22]]) 

I assume that all columns are of the same type, otherwise rec_array is already what you want.

As for your specific case, I do not quite understand what db in your example. If it is a cursor object, you can simply call its fetchall method and get a list of tuples. In most cases, the database library does not want to save the partially read query result, expecting your code to process each line, that is, by the time the execute method returns, all the data has already been saved in the list, and there is hardly a problem using fetchall instead of iterating the cursor instance.

+1
source share

All Articles