Breaking Down Data Silos in Earth Science: A New Approach with HDF5 VOL Connectors
- Apr 4
- 2 min read
Earth science data is rich, complex—and fragmented.
From satellite imagery stored in GeoTIFF, to atmospheric models in GRIB2, to observational data in BUFR and CDF, scientists must navigate a landscape of specialized formats, each with its own tools, APIs, and workflows. While these formats are optimized for their domains, working across them remains a major challenge.
As highlighted in our NASA SBIR Phase I work, this fragmentation creates real barriers: increased
complexity, loss of metadata during conversion, reduced computational efficiency, and ultimately, lost scientific productivity. So, what if you didn’t need to convert data at all?
A Different Approach: Bring the Interface to the Data
Instead of transforming data into a common format, our approach flips the model. Using the HDF5 Virtual Object Layer (VOL), we developed a set of connectors that allow applications to access multiple data formats—GeoTIFF, CDF, BUFR, and GRIB2—through standard HDF5 APIs.
In this model:
Applications continue using familiar HDF5 or netCDF-style interfaces
The VOL layer intercepts those calls
Format-specific connectors translate them in real time
Native libraries (e.g., ecCodes, libtiff, CDF) handle the actual data access
The result is transparent, format-agnostic data access—without conversion.
What We Built in Phase I
During Phase I, we developed working prototype connectors for all four formats and demonstrated that:
Data and metadata can be discovered and accessed through a unified HDF5 model
Existing tools can operate on these formats without modification
The approach preserves original files and metadata while eliminating conversion overhead
Each format is mapped naturally into HDF5 concepts:
GeoTIFF images → datasets
GRIB2 and BUFR messages → groups with attributes and arrays
CDF variables → datasets with associated metadata
This mapping allows both data values and metadata to be accessed consistently across formats.
Why This Matters
Traditional workflows rely on conversion:
GRIB → netCDF
BUFR → intermediate formats
GeoTIFF → analysis-ready structures
But conversion comes at a cost:
Lost metadata
Increased storage
Additional processing time
Complex pipelines
Our approach avoids these tradeoffs entirely.
By enabling direct access to native formats, the VOL connectors:
Reduce workflow complexity
Eliminate redundant data copies
Preserve domain-specific optimizations
Lower the barrier to using diverse datasets
In short, scientists spend less time managing data—and more time doing science.
Looking Ahead: Toward Scalable, Cloud-Ready Data Access
Phase I also explored how this architecture can evolve.
We investigated:
External data layout descriptors to enable efficient remote access (e.g., HTTP range reads)
Multi-threaded access strategies for modern multi-core systems
Foundations for cloud-based and scalable data workflows
These results demonstrate a clear path toward production-ready systems that can operate efficiently at scale.
From Research to Product
The connectors developed in this project are being released as open-source HDF5 plugins and form the foundation of our broader H5+ platform.
Because the solution operates at the library level:
It integrates into existing applications
Requires no rewrites
Enables immediate adoption
This makes it practical not only for NASA systems, but also for:
HPC centers
Research institutions
Commercial analytics platforms
Conclusion
Earth science data will continue to grow—in size, complexity, and diversity of formats.
The question is not whether we can standardize all data formats, but how we can work across them efficiently.
HDF5 VOL connectors offer a new path forward:
one interface, multiple formats, no conversion.


