From the course: Introduction to Spark SQL and DataFrames
Unlock the full course today
Join today to access over 22,500 courses taught by industry experts or purchase this course individually.
Basic machine learning with DataFrames, part 1
From the course: Introduction to Spark SQL and DataFrames
Basic machine learning with DataFrames, part 1
- [Instructor] A commonly used technique in exploratory data analysis is called clustering. And here the idea is that we want to see if there are natural groupings among the data. So for example, let's take a look at the utilization data. Let's see if we can divide that data set into three groups that logically come together. So to do that, we're going to use, of course, we're going to use our utilization data. And we'll be using dataframes. We're also going to use some code from the machine learning package. So the first thing I did before loading the data was I imported of course, our pyspark SQL so we can have our Spark sessions. I also imported three libraries from the ml package. Vectors, vectors assembler and kmeans. And I'll explain each of those as we go through. And then I went through our usual steps to upload our utilization data from a JSON file into a dataframe called df_util. So let's take a quick look at…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.