PyTables batch downloads and updates

Question

PyTables batch downloads and updates

I have daily stock data like an HDF5 file created using PyTables. I would like to get a group of strings, treat it as an array, and then write it back to disk (update strings) using PyTables. I could not figure out how to do this. Could you tell me what would be the best way to do this?

My details:

Symbol, date, price, var1, var2 abcd, 1, 2.5, 12, 12.5 abcd, 2, 2.6, 11, 10.2 abcd, 3, 2.45, 11, 10.3 defg, 1,12.34, 19.1, 18.1 defg, 2, 11.90, 19.5, 18.2 defg, 3, 11.75, 21, 20.9 defg, 4, 11.74, 22.2, 21.4

I would like to read the lines corresponding to each character as an array, do some processing, and update the var1 and var2 fields. I know all the characters in advance to skip them. I tried something like this:

 rows_array = [row.fetch_all_fields() for row in table.where('Symbol == "abcd"')]

I would like to pass row_array to another function that will calculate the values for var1 and var2 and update them for each record. Please note that var1, var2 are similar to moving averages, so I won’t calculate them inside an iterator and, therefore, I need the whole rowset to be an array.

After I calculated everything that I need using rows_array, I am not sure how to write it back to the data, i.e. update the rows with the new calculated values. When updating the whole table, I use this:

  table.cols.var1[:] = calc_something(rows_array)

However, when I want to update only part of the table, I am not the best way to do this. I guess I can re-run the “where” condition and then update each row based on my calculations, but it seems like a waste of time scanning the table.

Your suggestions are welcome ...

Thanks, -e

+6

python hdf5 pytables

Ecognium Feb 18 '11 at 2:50

source share

1 answer

FrancescAlted · Accepted Answer · 2011-02-18T17:30:23+0000

If I understand well, the following should do what you want:

 condition = 'Symbol == "abcd"' indices = table.getWhereList(condition) # get indices rows_array = table[indices] # get values new_rows = compute(rows_array) # compute new values table[indices] = new_rows # update the indices with new values

Hope this helps

PyTables batch downloads and updates

More articles: