From the course: Faster pandas
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Parsing time once
- [Instructor] Say you'd like to select only web logs that came in the morning. These logs are in logs.csv.xz. And we define morning as a time between 6:00 AM to noon. So Python, and we're going to import pandas as pd. And now df is pd.read-csv('logs.csv.xz'). Let's have a quick look at the data. So Len of df, we have 50,000 rows. And let's sample five of the rows just to handle it. Okay and let's take one value of time. So they have time, and let's take number 48. And we see that the type is a star. So we define def is morning of the timestamp. And we do t equal pd to daytime of the timestamp. And then we return the time.hour is bigger or equal to six, and the time.hour, is less than 12. So we can do Len df, where df.time.apply is morning. And we say that we have 13,000 lines in the morning. Now let's timeit. So percent timeit. So this took seven seconds. The problem here is that we're passing the time string, to a timestamp object, on every function call. We can convert the time…
Contents
-
-
-
-
-
(Locked)
The limitations of appending3m 28s
-
(Locked)
The limitations of object dtype2m 21s
-
(Locked)
The limitations of row iteration3m 18s
-
(Locked)
Understanding the isin function4m 39s
-
(Locked)
Parsing time once2m 42s
-
(Locked)
Challenge: Query a DataFrame1m 38s
-
(Locked)
Solution: Query a DataFrame1m 29s
-
(Locked)
-
-
-
-
-
-