The hdf5 file must be written in the table format (as opposed to the fixed format) in order to be requested using the pd.read_hdf where argument.
In addition, A must be declared as data_column :
df.to_hdf('/tmp/out.h5', 'results_table', mode='w', data_columns=['A'], format='table')
or, to indicate all columns as (requested) data columns:
df.to_hdf('/tmp/out.h5', 'results_table', mode='w', data_columns=True, format='table')
Then you can use
pd.read_hdf('/tmp/out.h5', 'results_table', where='A in [1,3,4]')
to select rows where the column of values โโof A is 1, 3, or 4. For example,
import numpy as np import pandas as pd df = pd.DataFrame({ 'A': [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2], 'B': [0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1], 'C': [34, 32, 35, 34, 31, 34, 29, 34, 12, 34, 32, 34], 'D': [11, 15, 22, 15, 9, 15, 11, 15, 14, 15, 13, 15]}) df.to_hdf('/tmp/out.h5', 'results_table', mode='w', data_columns=['A'], format='table') print(pd.read_hdf('/tmp/out.h5', 'results_table', where='A in [1,3,4]'))
gives
ABCD 0 1 0 34 11 2 3 1 35 22 3 4 1 34 15 5 1 0 34 15 7 3 0 34 15 8 4 1 12 14 10 1 0 32 13
If you have a very long list of vals values, you can use string formatting to create the correct where argument:
where='A in {}'.format(vals)