From basic 4D image tif storage as python hdf5

I have 27GB 2D tiff files that represent movie fragments of three-dimensional images. I want to be able to slice this data as if it were a simple numpy4d array. It seems that dask.array is a good tool to easily manage the array after storing it in memory as an hdf5 file.

How can I store these files as an hdf5 file if they do not all fit into memory. I am new to h5.py and to databases in general.

Thanks.

+3
source share
1 answer

Edit: use dask.array imread function

As of dask 0.7.0 you do not need to save images in HDF5. Use the imread function imread :

 In [1]: from skimage.io import imread In [2]: im = imread('foo.1.tiff') In [3]: im.shape Out[3]: (5, 5, 3) In [4]: ls foo.*.tiff foo.1.tiff foo.2.tiff foo.3.tiff foo.4.tiff In [5]: from dask.array.image import imread In [6]: im = imread('foo.*.tiff') In [7]: im.shape Out[7]: (4, 5, 5, 3) 

Older answer that stores HDF5 images

Acquiring data is often the most difficult of problems. Dask.array does not have any automatic integration with image files (although this is doable if there is enough interest.) Fortunately, moving data to h5py easy because h5py supports numpy shorthand syntax. In the following example, we will create an empty h5py dataset, and then save the four tiny tiff files into this dataset in a for loop.

First we get the file names for our images (please forgive the data set of the toys. I have nothing realistic.)

 In [1]: from glob import glob In [2]: filenames = sorted(glob('foo.*.tiff')) In [3]: filenames Out[3]: ['foo.1.tiff', 'foo.2.tiff', 'foo.3.tiff', 'foo.4.tiff'] 

Download and check out the sample image.

 In [4]: from skimage.io import imread In [5]: im = imread(filenames[0]) # a sample image In [6]: im.shape # tiny image Out[6]: (5, 5, 3) In [7]: im.dtype Out[7]: dtype('int8') 

Now we will create an HDF5 file and an HDF5 dataset called '/x' in this file.

 In [8]: import h5py In [9]: f = h5py.File('myfile.hdf5') # make an hdf5 file In [10]: out = f.require_dataset('/x', shape=(len(filenames), 5, 5, 3), dtype=im.dtype) 

Great, now we can insert our images one at a time into the HDF5 dataset.

 In [11]: for i, fn in enumerate(filenames): ....: im = imread(fn) ....: out[i, :, :, :] = im 

At this point, dask.array can wrap out happily

 In [12]: import dask.array as da In [13]: x = da.from_array(out, chunks=(1, 5, 5, 3)) # treat each image as a single chunk In [14]: x[::2, :, :, 0].mean() Out[14]: dask.array<x_3, shape=(), chunks=(), dtype=float64> 

If you want to see more native support for stacks of images, then I recommend that you raise a question . It would be pretty easy to use dask.array for your tiff file stack directly without going through HDF5.

+4
source

All Articles