Page tree

How does HDF5 compare with Hadoop?

HDF5 is a software system for creating customized data containers and providing efficient access to the data stored in them, over a wide range of application domains, across platforms, and limited in size only by the hosting storage layer. Hadoop is a distributed system designed to process, generate, and store large datasets. When viewed superficially, they are both capable of storing large datasets, but that is about where the comparability stops.

Although HDF5 and Hadoop are used to achieve very different goals, there are ways in which they might nicely complement one another, for example, by providing efficient access to Hadoop to data stored in HDF5 containers.

For more information on this, see our paper Big HDF FAQs, Everything that HDF Users have Always Wanted to Know about Hadoop . . . But Were Ashamed to Ask.