From the course: Faster pandas

Unlock the full course today

Join today to access over 22,500 courses taught by industry experts or purchase this course individually.

Loading parts of data

Loading parts of data

From the course: Faster pandas

Start my 1-month free trial

Loading parts of data

- [Tutor] You'd like to calculate the revenue per vendor in the taxi rides dataset. There are two things we can do here to lower memory usage. We can load only the data we need, and you can read the frame in small parts. Let's first show the whole data and just part of it and see the difference. So ipython and import pandas as pd, and we say the data frame is pd.read_csv of the taxi.csv.xz. Now we define megabyte as two to the power of 20. And we do df.memory_usage(deep=true).sum() divided by megabyte and it's 153 megabytes of memory. And now let's read only the columns that we need. So usecols and we need only the vendor ID and the total amount. And when we run the calculation of the memory usage now, it's only 7.6 megabytes. The second thing that we can do is we can specify chunk size to the read CSV function. Let's have a look. We're going to do the read again and we're going to say that the chunk size is 100,000 and you see it finished almost immediately, but the data frame now…

Contents