From the course: Data Science Foundations: Data Mining in Python

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

PCA

PCA - Python Tutorial

From the course: Data Science Foundations: Data Mining in Python

Start my 1-month free trial

PCA

- [Instructor] By far the most common way to reduce dimensionality in a dataset is with principal component analysis, usually just called PCA. This is a very simple and easy thing to do in Python. We'll begin by loading a few standard packages, including scikit-learn or sklearn, which gives us the PCA functionality. Then I'm going to load our datasets. We're going to load the training data, split it so we have the x attribute variables separated from the y class variables and then do the same thing for the testing data. And then once we've done that, we can look at the first few rows of the training data. And you see that we have 64 attribute variables, zero through 63, P is for pixel, and then the class variable that indicates what the digit actually is is here, y at the end. We have one, one, six and so on. We'll begin by training the model with the training data. We'll set up a PCA object. That's what we're…

Contents