From the course: Faster pandas

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Optimizing with HDF5

Optimizing with HDF5

From the course: Faster pandas

Start my 1-month free trial

Optimizing with HDF5

- Hdf5 is an industry wide high performance data storage. It's geared towards scientific application and store data in a single file. The hdf5 has great performance and allows queries of subsets of data. NASA has 10s of petabytes of telescope data in hdf5. In Pandas, you can access hdf5 data via the hdf5 store. You need to make sure that the pyTables is installed in order to work with hdf5. Python -m pip install tables. And it's already installed on my machine. Let's have a look at the stocks database. So ipython. And now we're going to import pandas as pd, and we're going to say that the store is pd.HDF5Store of stocks. Lets see what's in the store. So store.keys. And you see we have a singles table which is stocks. we can load all the data by using square brackets. So df = store of stocks. And we can look at the columns. But it's usually a good idea to look at some of the data before loading it all to memory. So we can do store.select stocks. And we can say stock after three rows…

Contents