szip is an HDF5 compression filter that implements the extended-Rice lossless compression algorithm. It came out of NASA’s Earth Observing System work and is well-suited to scientific data — particularly the kinds of correlated numerical arrays produced by satellite instruments and other scientific recordings.

The catch is patent encumbrance. The underlying extended-Rice algorithm is covered by NASA patents licensed for use with HDF, with restrictions on commercial redistribution. As a result, szip isn’t bundled with every HDF5 installation — a file written with szip on one machine may fail to open on a machine whose HDF5 build lacks szip support. This makes szip risky for files that need to travel widely.

For general use, gzip is safer because every HDF5 installation has it. szip is reserved for cases where its compression ratio on scientific data is worth the deployment friction, and the user controls the entire reading and writing environment.

The three HDF5 compression options at a glance:

  • gzip — lossless, ubiquitous, slow but safe default.
  • lzf — lossless, much faster than gzip, less compressed.
  • szip — lossless, tuned for correlated scientific arrays, sometimes unavailable due to its license.

A question that comes up about all of them: is gzip lossy or lossless? Lossless. The bytes you read back are exactly the bytes you wrote. Same for lzf and szip — all three preserve the data exactly. For lossy floating-point compression in HDF5 you need a separate filter such as ZFP or SZ, not szip.