From the course: Faster pandas

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Categorical data

Categorical data

From the course: Faster pandas

Start my 1-month free trial

Categorical data

- [Instructor] Let's load the taxi dataset. So IPython and we import pandas as pd and df equals pd dot read csv of taxi dot csv dot xz. And when we look at the dtypes, the VendorID is an integer. There are two advantages to using integers instead of strings. One they're memory efficient, and the second is that comparison are faster. However, for us humans, it's harder to understand. if we look at df VendorID, and let's take a sample of them, we see two and ones, which one is two, which one is one it's not really easy for us to understand. They can convert the VendorID to a string. We can say that the vendor names is a dictionary, where one is Creative, and two is VeriFone, and four is BigApple. And now we can say that the vendors is df VendorID, map of vendor names. And when you look now at vendors, that's sample five, we see now that we have names. Using a strir for vendor will take more memory, and it's wasteful since there are a lot of repetitions. This data is known as categorical…

Contents