From the course: Data Science Foundations: Data Assessment for Predictive Modeling
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Exploring your missing data options
From the course: Data Science Foundations: Data Assessment for Predictive Modeling
Exploring your missing data options
- [Instructor] The fourth task of the data understanding phase is verify data quality. Here's a small piece of the Titanic data, so what are some of the first things I would be looking for? Well, we have complete ID information. That's critical because missing IDs make data integration extremely difficult. Not impossible, but it's so problematic that if you had missing IDs, it becomes a topic for the whole team to discuss. We have no missing data on the target variable. This is, perhaps, even more important. Supervised learning requires this variable to be present and accurate, but no problem here. Now, we get to the real heart of the matter. It is very rare in a real-world data set complex enough to be useful that you have no missing data among the inputs. In 25 years of doing this, I'm not sure that I've ever encountered it. Now, I've had clients that thought they were okay on missing data, but often because they…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.