From the course: Introduction to Spark SQL and DataFrames

Unlock the full course today

Join today to access over 22,500 courses taught by industry experts or purchase this course individually.

Basic machine learning with DataFrames, part 1

Basic machine learning with DataFrames, part 1

From the course: Introduction to Spark SQL and DataFrames

Start my 1-month free trial

Basic machine learning with DataFrames, part 1

- [Instructor] A commonly used technique in exploratory data analysis is called clustering. And here the idea is that we want to see if there are natural groupings among the data. So for example, let's take a look at the utilization data. Let's see if we can divide that data set into three groups that logically come together. So to do that, we're going to use, of course, we're going to use our utilization data. And we'll be using dataframes. We're also going to use some code from the machine learning package. So the first thing I did before loading the data was I imported of course, our pyspark SQL so we can have our Spark sessions. I also imported three libraries from the ml package. Vectors, vectors assembler and kmeans. And I'll explain each of those as we go through. And then I went through our usual steps to upload our utilization data from a JSON file into a dataframe called df_util. So let's take a quick look at…

Contents