Problems with Python HDF5 H5Py opening multiple files

I am using the 64-bit version of Enthought Python to process data across multiple HDF5 files. I am using h5py version 1.3.1 (HDF5 1.8.4) on 64-bit Windows.

I have an object that provides a convenient interface for my particular data hierarchy, but testing h5py.File (fname, 'r') independently gives the same results. I repeat a long list (~ 100 files at a time) and try to pull certain pieces of information from files. The problem I am facing is that I get the same information from multiple files! My loop looks something like this:

files = glob(r'path\*.h5')
out_csv = csv.writer(open('output_file.csv', 'rb'))

for filename in files:
  handle = hdf5.File(filename, 'r')
  data = extract_data_from_handle(handle)
  for row in data:
     out_csv.writerow((filename, ) +row)

When I browse files using something like hdfview, I know that the internals are different. However, the csv I see seems to indicate that all files contain the same data. Has anyone seen this behavior before? Any suggestions on which I could start debugging this issue?

+5
source share
1 answer

I came to the conclusion that this is a strange manifestation of Perplexing destination behavior with an h5py object as an instance variable . I rewrote my code so that every file is processed in a function call and the variable is not reused. Using this approach, I do not see the same strange behavior, and it seems to work much better. For clarity, the solution looks more like:

files = glob(r'path\*.h5')
out_csv = csv.writer(open('output_file.csv', 'rb'))

def extract_data_from_filename(filename):
    return extract_data_from_handle(hdf5.File(filename, 'r'))

for filename in files:
  data = extract_data_from_filename(filename)
  for row in data:
     out_csv.writerow((filename, ) +row)
+4

All Articles