From the course: Data Science on Google Cloud Platform: Exploratory Data Analytics
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Cleansing and transforming data - Google Cloud Tutorial
From the course: Data Science on Google Cloud Platform: Exploratory Data Analytics
Cleansing and transforming data
- [Instructor] In a typical exploratory data analysis project, data is cleansed and transformed before analysis. Transformations include creating indicator variables, binning and aggregation. Let's see what that looks like. We are going to create indicator variables for gender, age and discount columns in the dataframe. We do so by using the pandas function, get_dummies. This will create individual columns for each unique value in these columns and populate a one or zero for that column based on the row. This is an important pre-step for correlation analysis, as well as machine learning, since those algorithms require data to be in numerical format. Once we create indicator variables we concatenate them with the original dataframe. When you do a head command you will see additional columns populated in the same dataset. This was a really simple example but you can do much more including filtering, binning, creating categorical variables, joining data frames, et cetera.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.