From the course: Introduction to Spark SQL and DataFrames
Unlock the full course today
Join today to access over 22,500 courses taught by industry experts or purchase this course individually.
Aggregate data with DataFrame API
From the course: Introduction to Spark SQL and DataFrames
Aggregate data with DataFrame API
- [Instructor] Now let's take a look at aggregating using the DataFrame API. Now, I've opened a new Jupyter notebook and as I mentioned in an earlier video, I'm going to start with the data loaded. Now, if you have access to the exercise files, you'll have these commands in each individual chapter's exercise. So in the first step, I defined a string which has a data path. In the second step, I built on that data path and created a file path and pointed to a data file which has some location and temperature information and then I read that into a data frame. And then here in step three, I'm simply showing the first 10 rows. So we have a data frame called df1 and it has location and temperature information where the temperature is measured in Celsius. Now what I'd like to do is I would like to count how many different measurements we have for each location. So to do that I'm going to reference the data frame and I'm going to use the groupBy operation and I want to group by the…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
Set up a Jupyter notebook2m 1s
-
Load data into DataFrames: CSV Files7m 26s
-
Load data into DataFrames: JSON Files3m 16s
-
Basic DataFrame operations3m 26s
-
Filter data with DataFrame API2m 13s
-
Aggregate data with DataFrame API3m 47s
-
Sample data from DataFrames5m 25s
-
Save data from DataFrames3m 27s
-
-
-
-