top of page
HDF5 Encryption_edited.jpg

Avoiding Data Corruption in HDF5

HDF5 is a powerful and flexible format for large, complex datasets — but it has a longstanding vulnerability: file corruption. A single corruption event can result in total data loss, rendering a file unreadable.


Corrupt HDF5 files often arise from:

  • Application crashes or abrupt termination

  • Unsafe concurrent modifications by multiple processes or threads

  • File operations that leave internal metadata in an unreliable state

 

For short-lived workloads, simply rerunning the application might be an option. But for long-running simulations or large data-acquisition campaigns, recreating the data may be impossible or prohibitively expensive. Furthermore, when metadata is irreparably damaged, recovery is no longer an option — valuable data is gone for good.

 

Our Solution: Coarse-Grained Metadata Journaling
We’re addressing this longstanding challenge by adding a coarse-grained metadata journaling layer to HDF5. Our approach integrates directly with HDF5’s architecture and performs periodic snapshots of internal metadata to a separate file. Each snapshot reflects a consistent state of the HDF5 metadata — this lets you quickly restore a corrupted file to a readable state and avoid permanent data loss.

 

Enhanced SWMR Support (Single Writer / Multiple Readers)

As a key by-product of this work, we’re delivering a production-grade implementation of Single
Writer / Multiple Reader (SWMR) based on a new architecture we call VFD SWMR. It brings
numerous improvements over the legacy SWMR implementation:

✅ Independence from POSIX semantics:

     Support for NFS and other networked storage.

✅ Concurrent modifications:

     Allows structural modifications to the HDF5 file while retaining read access for other processes.

✅ Controlled latency:

     Provides guarantees for maximum latency to visibility of new data.

✅ Scalable:

     Design ready for parallel HDF5 applications.

 

If you’re interested in the VFD SWMR feature or HDF5 data recovery tool, we’d love to hear from you. We’re looking for collaborators, feedback, and early adopters to help validate and improve these capabilities.

 

Contact us to learn more.

bottom of page