From the course: Introduction to Spark SQL and DataFrames

Unlock the full course today

Join today to access over 22,400 courses taught by industry experts or purchase this course individually.

Timeseries analysis with DataFrames

Timeseries analysis with DataFrames

From the course: Introduction to Spark SQL and DataFrames

Start my 1-month free trial

Timeseries analysis with DataFrames

- [Instructor] We're going to work with our utilization data again, but instead of doing just kind of a general exploratory data analysis like we could do with any data set. We're going to take a look at things we can specifically do with timeseries data, and timeseries data is data that has a set of measures and a timestamp associated with them. Now in the case of the utilization data, the measurements come at regular timed intervals. So that makes it easier to work with in some ways. So what I'm now doing is loading the data, and I'm going to load the utilization data and I've also created the utilization table so we can work with Spark SQL right away. Okay so we're going to start with Spark SQL and let's create a select statement and let's select the server ID and then we'll get the min of CPU utilization and the max of CPU utilization and the standard deviation of CPU utilization, and let's continue this on the next line and of course this will be from the utilization table, and…

Contents