From the course: Faster pandas

Unlock the full course today

Join today to access over 22,400 courses taught by industry experts or purchase this course individually.

The limitations of object dtype

The limitations of object dtype

From the course: Faster pandas

Start my 1-month free trial

The limitations of object dtype

- Pandas and Numpy, has a lot of specialized types for fast processing. For example, Numpy int64 takes less memory and can be used directly by the CPU. There are cases when loading data that pandas won't be able to guess the type, and then it will default to a Python type, which has the object dtype. Let's have a look. iPython, we can import Pandas as pd and then df = pd.read CSV off the logs that we have. And now we can look at the dtypes with df.dtypes. And you see that origin, date time, method, and path are object. Especially date and time should probably be some kind of a timestamp and not an object. So we can have a look, df at date and time.head. We can convert them to a timestamp using either pandas to daytime or while reading the CSV with the option pause dates in the C-suite. Let's see what's the performance implication of the object dtype. Say you would like to know how many unique time values there are. So we're going to do df time.nunique. and have 6533 unique times. Now…

Contents