From the course: Data Science on Google Cloud Platform: Exploratory Data Analytics

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Cleansing and transforming data

Cleansing and transforming data - Google Cloud Tutorial

From the course: Data Science on Google Cloud Platform: Exploratory Data Analytics

Start my 1-month free trial

Cleansing and transforming data

- [Instructor] In a typical exploratory data analysis project, data is cleansed and transformed before analysis. Transformations include creating indicator variables, binning and aggregation. Let's see what that looks like. We are going to create indicator variables for gender, age and discount columns in the dataframe. We do so by using the pandas function, get_dummies. This will create individual columns for each unique value in these columns and populate a one or zero for that column based on the row. This is an important pre-step for correlation analysis, as well as machine learning, since those algorithms require data to be in numerical format. Once we create indicator variables we concatenate them with the original dataframe. When you do a head command you will see additional columns populated in the same dataset. This was a really simple example but you can do much more including filtering, binning, creating categorical variables, joining data frames, et cetera.

Contents