I tried using various data compression methods while saving numpy arrays to disk.
These 1D arrays contain sample data with a specific sampling rate (can be recorded with a microphone or any other measurement with any sensor): the data is essentially continuous (in the mathematical sense, of course, now there is discrete data after the sample).
I tried with HDF5 (h5py):
f.create_dataset("myarray1", myarray, compression="gzip", compression_opts=9)
but it’s rather slow, and the compression ratio is not the best we can expect.
I also tried using
numpy.savez_compressed()
but again, it may not be the best compression algorithm for such data (described earlier).
What would you choose for the best compression ratio on a numpy array , with such data?
(I was thinking about things like lossless FLAC (originally designed for audio), but is there an easy way to apply such an algorithm to numpy data?)
python arrays numpy compression lossless-compression
Basj
source share