szip is an HDF5 compression filter that implements the extended-Rice lossless compression algorithm. It came out of NASA’s Earth Observing System work and suits scientific data well, particularly the correlated numerical arrays produced by satellite instruments and other scientific recordings.
The catch is patent encumbrance. The underlying extended-Rice algorithm is covered by NASA patents licensed for use with HDF, with restrictions on commercial redistribution. So szip isn’t bundled with every HDF5 installation. A file written with szip on one machine may fail to open on a machine whose HDF5 build lacks szip support, which makes it risky for files that need to travel widely.
For general use, gzip is safer because every HDF5 installation has it. Reserve szip for cases where its compression ratio on scientific data is worth the deployment friction, and you control the entire reading and writing environment.
The three HDF5 compression options at a glance:
- gzip — lossless, ubiquitous, slow but safe default.
- lzf — lossless, much faster than gzip, less compressed.
- szip — lossless, tuned for correlated scientific arrays, sometimes unavailable due to its license.
A question that comes up about all of them: is gzip lossy or lossless? Lossless. The bytes you read back are exactly the bytes you wrote. Same for lzf and szip, all three preserve the data exactly. For lossy floating-point compression in HDF5 you need a separate filter such as ZFP or SZ, not szip.