From the course: Introduction to Spark SQL and DataFrames

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Exploratory data analysis with Spark SQL

Exploratory data analysis with Spark SQL

From the course: Introduction to Spark SQL and DataFrames

Start my 1-month free trial

Exploratory data analysis with Spark SQL

- [Instructor] Now, we saw how we could get things like the count the mean standard deviation using the DataFrame API. Let's do that with Spark SQL. And to do that, we'll specify Spark SQL, and then we'll give it a command. In this case it'll be SELECT; let's select min of CPU utilization and the max of CPU utilization and the standard deviation of CPU utilization. And we'll specify from utilization, because that's the table we specified with our create or replace tempview, and let's be sure to show this, because the result is a data frame. And so we have here our minimum CPU utilization is about 22%, max is 100%, and the standard deviation is about 15, which is what we saw up above, so no surprises there. I am going to just copy this command and we'll make it a little easier to read; I'm going to make this multi-line. I'll use this backslash, we'll say FROM utilization, and now we want to specify our group by…

Contents