From the course: Tableau and R for Analytics Projects

Explore clustering algorithms

From the course: Tableau and R for Analytics Projects

Start my 1-month free trial

Explore clustering algorithms

- [Instructor] When you examine a data set, you will often notice that customers, products or competitors tend to fall into one of several groups. That could be a local customers who make frequent small purchases, corporate clients who make large purchases every quarter, and so on. In data analysis, these groups are called clusters, and in this movie, I will describe cluster analysis and how to implement one popular technique in R and Tableau. I've started out in a sample workbook called ExcelVBA_Cluster. That's an Excel-enabled workbook that you can find in the chapter five folder of the exercise files collection. This workbook contains a set of data and you can see those represented by the blue points in the graph, and they are just arbitrary values that are used to identify customers within the graph. The orange points are the centroids for the clusters. They've started out an random positions which I inputted, here, into this part of my worksheet. When I click the Update Centroids button, then Excel will calculate the distance from each point to each of the three centroids. It will then identify the centroid that is is closest to. The next phase of the routine will update the centroids to move them to the average position of the points they are closest to. That may seem a little confusing, but let me click Update Centroids once and we can comment on how things have gone. So I'll go ahead and click Update Centroids and you can see that they have taken significant steps toward the center of what we can visually identify as the three different clusters. I'll click Update Centroids again, and they are very close, in fact they might even be in the middle already. Oh, no, I see the one in the bottom left is not, so I'll click Update Centroids again. Everything looks about right, but I'll click Update Centroids one more time, and there was no change. So that tells me that we have a good solution. So the next question is how are we going to apply this kind of process to Tableau and R. Performing clustering analysis in Tableau and R has a couple of things we need to keep in mind. First, you will need to create a scatter plot in Tableau, very much like the workbook that I just showed you in Excel, and then you will add a calculated field to actually identify the clusters. So a better-implemented version of the calculations that I was using in my other workbook to calculate the clusters will be used to calculate the clusters for the data that you provide. One important thing to remember is that you specify the number of clusters. There are some algorithms that will try to find the best number of groups, but those aren't always reliable, and in fact it comes down to your knowledge of your business, to guess as best you can, about the number of clusters. Will you get it right the first time, almost certainly not. So that means you should review the results and change anything, especially the number of clusters if needed. When you start looking at your data, and you see it grouped together, you'll have a very good idea of when the results start to make sense. Now that doesn't mean stop experimenting the first time you see something, because you might be surprised, but let your intuition and your subject matter knowledge be your guide.

Contents