
Enabling Encryption of Data Stored in HDF5
Data is one of today’s most valuable assets. Protecting it from unauthorized access is a top priority for organizations that collect, store, and process sensitive information. Many data management systems — from enterprise databases like Oracle and MongoDB to universal platforms such as Google Cloud and Amazon Web Services — provide native data protection features.
​
Despite HDF5's role as the de facto standard for managing, sharing, and archiving scientific data, it wasn't designed with strong, integrated encryption in mind. HDF5 files can often be opened, altered, or even corrupted using widely available tools and scripting languages.
A key example is the use of HDF5 in AI and machine learning (AI/ML) workflows. HDF5 is frequently used to store training datasets and serialized models — for example, TensorFlow 2 (TF2) saves trained models in HDF5 format. The integrity and confidentiality of these files are mission-critical; tampering or unauthorized disclosure could undermine model performance, expose intellectual property, or introduce vulnerabilities.
Encrypting the whole HDF5 file at rest is a partial solution, but it has drawbacks — accessing the data typically requires decrypting the entire file first, weakening its protection during use.
Our Solution: Native Encryption Support for HDF5
We have designed and prototyped integrated, native encryption for HDF5 to address these
issues. Our approach provides:
✅ Protection against unauthorized access:
All data and metadata within the HDF5 file are encrypted and accessible only by authorized
applications — without decrypting the entire file — regardless of storage location.
✅ Security during transit:
Encryption safeguards files when they are transferred across storage systems, downloaded from clouds, or accessed remotely.
✅ Protection during I/O:
The HDF5 library performs I/O directly on encrypted files. File metadata and raw data are
decrypted only in application memory, reducing the risk of exposure.
Technical Highlights
-
Transparent for applications — no code modifications required to use the HDF5 API.
-
Support for AES-256 and Twofish, implemented with the GNU gcrypt library.
-
Extensible architecture — adding custom encryption methods or libraries is easily supported.
Productization Roadmap
We’re currently preparing this technology for production use. Next steps include:
-
Enhancing the VFDs to support arbitrary and custom encryption methods.
-
Providing modular, configurable VFD plugins that integrate seamlessly with the standard HDF5 library.
-
Extending support for parallel HDF5 applications.
If you’d like more information, help trying our prototype, or want to collaborate on future
development, please contact us.