From the course: Introduction to Spark SQL and DataFrames
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Exploratory data analysis with Spark SQL
From the course: Introduction to Spark SQL and DataFrames
Exploratory data analysis with Spark SQL
- [Instructor] Now, we saw how we could get things like the count the mean standard deviation using the DataFrame API. Let's do that with Spark SQL. And to do that, we'll specify Spark SQL, and then we'll give it a command. In this case it'll be SELECT; let's select min of CPU utilization and the max of CPU utilization and the standard deviation of CPU utilization. And we'll specify from utilization, because that's the table we specified with our create or replace tempview, and let's be sure to show this, because the result is a data frame. And so we have here our minimum CPU utilization is about 22%, max is 100%, and the standard deviation is about 15, which is what we saw up above, so no surprises there. I am going to just copy this command and we'll make it a little easier to read; I'm going to make this multi-line. I'll use this backslash, we'll say FROM utilization, and now we want to specify our group by…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.