h5py

h5py is the Python library for reading and writing HDF5 files. It exposes the HDF5 hierarchy as Python objects: a file behaves like a Python dictionary whose keys are group and dataset names, and groups behave the same way recursively.

The standard idiom uses Python’s with statement so the file closes automatically:

import numpy as np
import h5py
 
# Write
with h5py.File('./hdf5_data.h5', 'w') as hdf:
    hdf.create_dataset('dataset1', data=np.random.random((1000, 1000)))
 
# Read
with h5py.File('./hdf5_data.h5', 'r') as hdf:
    keys = list(hdf.keys())                   # like dict.keys()
    dataset = hdf.get('dataset1')             # or hdf['dataset1']
    arr = np.array(dataset)                   # materialize into memory

The file modes are the standard ones from Python’s open: 'r' for read, 'w' for write (truncates any existing file), 'a' for append (preserves existing contents and lets us add more).

Two patterns to watch. The wrapper object returned by hdf.get('dataset1') is not a NumPy array. It’s an h5py._hl.dataset.Dataset handle that knows where the bytes live on disk. Wrapping in np.array(...) is what triggers the actual disk read. This lets us read partial slices (dataset[1000:2000]) without loading everything.

Groups are created with create_group, and accept path-like names that auto-create intermediate groups:

G = hdf.create_group('Group2/Friends')        # makes both
G.create_dataset('dataset3', data=matrix_3)

Compression is set per-dataset at creation time:

hdf.create_dataset('big', data=matrix, compression='gzip', compression_opts=7)

It’s the standard Python entry point to HDF5. Almost everything else in scientific computing (MATLAB, R, C++ HDF5 code) writes files that h5py can read, and vice versa.

Idriss Rami — Notes

Explorer

h5py

Graph View

Backlinks