h5py is the Python library for reading and writing HDF5 files. It exposes the HDF5 hierarchy as Python objects: a file behaves like a Python dictionary whose keys are group and dataset names, and groups behave the same way recursively.

The standard idiom uses Python’s with statement so the file closes automatically:

import numpy as np
import h5py
 
# Write
with h5py.File('./hdf5_data.h5', 'w') as hdf:
    hdf.create_dataset('dataset1', data=np.random.random((1000, 1000)))
 
# Read
with h5py.File('./hdf5_data.h5', 'r') as hdf:
    keys = list(hdf.keys())                   # like dict.keys()
    dataset = hdf.get('dataset1')             # or hdf['dataset1']
    arr = np.array(dataset)                   # materialize into memory

The file modes are the standard ones from Python’s open: 'r' for read, 'w' for write (truncates any existing file), 'a' for append (preserves existing contents and lets us add more).

Two patterns are worth noticing. The wrapper object returned by hdf.get('dataset1') is not a NumPy array — it’s an h5py._hl.dataset.Dataset handle that knows where the bytes live on disk. Wrapping in np.array(...) is what triggers the actual disk read. This lets us read partial slices (dataset[1000:2000]) without loading everything.

Groups are created with create_group, and accept path-like names that auto-create intermediate groups:

G = hdf.create_group('Group2/Friends')        # makes both
G.create_dataset('dataset3', data=matrix_3)

Compression is set per-dataset at creation time:

hdf.create_dataset('big', data=matrix, compression='gzip', compression_opts=7)

The library is the standard Python entry point to HDF5 — almost everything else in scientific Python (MATLAB, R, C++ HDF5 code) writes files that h5py can read and vice versa.