top of page
Abstract Background

Breaking Down Data Silos in Earth Science: A New Approach with HDF5 VOL Connectors

  • Apr 4
  • 2 min read

Earth science data is rich, complex—and fragmented.


From satellite imagery stored in GeoTIFF, to atmospheric models in GRIB2, to observational data in BUFR and CDF, scientists must navigate a landscape of specialized formats, each with its own tools, APIs, and workflows. While these formats are optimized for their domains, working across them remains a major challenge.


As highlighted in our NASA SBIR Phase I work, this fragmentation creates real barriers: increased

complexity, loss of metadata during conversion, reduced computational efficiency, and ultimately, lost scientific productivity. So, what if you didn’t need to convert data at all?


A Different Approach: Bring the Interface to the Data

Instead of transforming data into a common format, our approach flips the model. Using the HDF5 Virtual Object Layer (VOL), we developed a set of connectors that allow applications to access multiple data formats—GeoTIFF, CDF, BUFR, and GRIB2—through standard HDF5 APIs.


In this model:

  • Applications continue using familiar HDF5 or netCDF-style interfaces

  • The VOL layer intercepts those calls

  • Format-specific connectors translate them in real time

  • Native libraries (e.g., ecCodes, libtiff, CDF) handle the actual data access


The result is transparent, format-agnostic data access—without conversion.


What We Built in Phase I

During Phase I, we developed working prototype connectors for all four formats and demonstrated that:

  • Data and metadata can be discovered and accessed through a unified HDF5 model

  • Existing tools can operate on these formats without modification

  • The approach preserves original files and metadata while eliminating conversion overhead


Each format is mapped naturally into HDF5 concepts:

  • GeoTIFF images → datasets

  • GRIB2 and BUFR messages → groups with attributes and arrays

  • CDF variables → datasets with associated metadata


This mapping allows both data values and metadata to be accessed consistently across formats.


Why This Matters

Traditional workflows rely on conversion:

  • GRIB → netCDF

  • BUFR → intermediate formats

  • GeoTIFF → analysis-ready structures


But conversion comes at a cost:

  • Lost metadata

  • Increased storage

  • Additional processing time

  • Complex pipelines


Our approach avoids these tradeoffs entirely.


By enabling direct access to native formats, the VOL connectors:

  • Reduce workflow complexity

  • Eliminate redundant data copies

  • Preserve domain-specific optimizations

  • Lower the barrier to using diverse datasets


In short, scientists spend less time managing data—and more time doing science.


Looking Ahead: Toward Scalable, Cloud-Ready Data Access

Phase I also explored how this architecture can evolve.


We investigated:

  • External data layout descriptors to enable efficient remote access (e.g., HTTP range reads)

  • Multi-threaded access strategies for modern multi-core systems

  • Foundations for cloud-based and scalable data workflows


These results demonstrate a clear path toward production-ready systems that can operate efficiently at scale.


From Research to Product

The connectors developed in this project are being released as open-source HDF5 plugins and form the foundation of our broader H5+ platform.


Because the solution operates at the library level:

  • It integrates into existing applications

  • Requires no rewrites

  • Enables immediate adoption


This makes it practical not only for NASA systems, but also for:

  • HPC centers

  • Research institutions

  • Commercial analytics platforms


Conclusion

Earth science data will continue to grow—in size, complexity, and diversity of formats.


The question is not whether we can standardize all data formats, but how we can work across them efficiently.


HDF5 VOL connectors offer a new path forward:

one interface, multiple formats, no conversion.

 
 
bottom of page