How to differentiate between HDF5 datasets and groups with h5py?

Question

How to differentiate between HDF5 datasets and groups with h5py?

I am using hppy python package (version 2.5.0) to access my hdf5 files.

I want to go through the contents of a file and do something with each data set.

Using the visit method:

 import h5py def print_it(name): dset = f[name] print(dset) print(type(dset)) with h5py.File('test.hdf5', 'r') as f: f.visit(print_it)

for the test file, I get:

 <HDF5 group "/x" (1 members)> <class 'h5py._hl.group.Group'> <HDF5 dataset "y": shape (100, 100, 100), type "<f8"> <class 'h5py._hl.dataset.Dataset'>

which tells me that the file has a data set and a group. However, there is no obvious way other than using type() to distinguish between data sets and groups. The h5py documentation, unfortunately, says nothing about this topic. They always assume that you know in advance what groups are and what data arrays, for example, because they created the data sets themselves.

I would like to have something like:

 f = h5py.File(..) for key in f.keys(): x = f[key] print(x.is_group(), x.is_dataset()) # does not exist

How can I distinguish between groups and datasets while reading an unknown hdf5 file in Python with h5py? How can I get a list of all datasets, all groups, all links?

+7

python hdf5 h5py

Trilarion Dec 17 '15 at 8:53

source share

4 answers

While Gall and James Smith's answers point to the solution as a whole, traversal through the HDF hierarchical structure and filtering of all datasets still needs to be done. I did this with yield from , which is available in Python 3.3+, which works great and presents it here.

 import h5py def h5py_dataset_iterator(g, prefix=''): for key in g.keys(): item = g[key] path = '{}/{}'.format(prefix, key) if isinstance(item, h5py.Dataset): # test for dataset yield (path, item) elif isinstance(item, h5py.Group): # test for group (go down) yield from h5py_dataset_iterator(item, path) with h5py.File('test.hdf5', 'r') as f: for (path, dset) in h5py_dataset_iterator(f): print(path, dset)

+3

Trilarion Dec 21 '15 at 17:14

source share

Since h5py uses python dictionaries as its selection method for interaction, you need to use the "values ()" function to actually access the elements. So you can use list filters:

 datasets = [item for item in f["Data"].values() if isinstance(item, h5py.Dataset)]

Doing this recursively should be fairly simple.

+1

James Smith Dec 17 '15 at 9:22

source share

I prefer this solution. It finds a list of all the objects in hdf5 "h5file" and then sorts them based on the class, similar to what was mentioned earlier, but not in such a short way:

 import h5py fh5 = h5py.File(h5file,'r') fh5.visit(all_h5_objs.append) all_groups = [ obj for obj in all_h5_objs if isinstance(fh5[obj],h5py.Group) ] all_datasets = [ obj for obj in all_h5_objs if isinstance(fh5[obj],h5py.Dataset) ]

0

Scott N 11 Oct '16 at 16:40

source share

Gall · Accepted Answer · 2015-12-17 09:09

Unfortunately, in the h5py api there is no built-in way to check this, but you can just check the item type with is_dataset = isinstance(item, h5py.Dataset) .

To display the entire contents of a file (except for file attributes), you can use Group.visititems with a callable that takes the name and instance of the element.

How to differentiate between HDF5 datasets and groups with h5py?

More articles: