From the course: Introduction to Spark SQL and DataFrames

Unlock the full course today

Join today to access over 22,500 courses taught by industry experts or purchase this course individually.

Aggregate data with DataFrame API

Aggregate data with DataFrame API

From the course: Introduction to Spark SQL and DataFrames

Start my 1-month free trial

Aggregate data with DataFrame API

- [Instructor] Now let's take a look at aggregating using the DataFrame API. Now, I've opened a new Jupyter notebook and as I mentioned in an earlier video, I'm going to start with the data loaded. Now, if you have access to the exercise files, you'll have these commands in each individual chapter's exercise. So in the first step, I defined a string which has a data path. In the second step, I built on that data path and created a file path and pointed to a data file which has some location and temperature information and then I read that into a data frame. And then here in step three, I'm simply showing the first 10 rows. So we have a data frame called df1 and it has location and temperature information where the temperature is measured in Celsius. Now what I'd like to do is I would like to count how many different measurements we have for each location. So to do that I'm going to reference the data frame and I'm going to use the groupBy operation and I want to group by the…

Contents