From the course: More Python Tips, Tricks, and Techniques for Data Science

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Handling missing data

Handling missing data - Python Tutorial

From the course: More Python Tips, Tricks, and Techniques for Data Science

Start my 1-month free trial

Handling missing data

- [Instructor] The difference between detail found in many tutorials and data in the real world is that the real world data is rarely clean and homogenous. In particular, many interesting data sets will have some amount of data missing, to make matters worse different data sources may indicate missing data in different ways. So in this video, we're going to talk about ways to represent missing values and methods to detect and impute them. So, first off I am importing the libraties, numpy, panda's, random, string around a cell. Now, if you talk about ways to represent missing values, basically there are two ways. First one is masked boolean, it is. So separated area is basically created, which represents all the missing values as booleans. And the drawback of this method is that it adds additional storage and computational complexity. Now, the second method is sentinel values. So the hair, we use a data specific convention…

Contents