From the course: Data Science Foundations: Data Assessment for Predictive Modeling

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Learning when to discard rows

Learning when to discard rows

From the course: Data Science Foundations: Data Assessment for Predictive Modeling

Start my 1-month free trial

Learning when to discard rows

- There is a lot of confusion about discarding data and machine learning. Many folks will imply that you keep all the data in the model and it's just not true. It's helpful to imagine what will really be taking place at deployment. It's not as simple as all the data being run through the model and scored. There are always exclusion criteria and often multiple models. So some data but not all data is being routed into the model. That's why only data that will be scored when the model is done should be used, When the model is developed. For instance, on a cell phone churn project I worked on, one churn reason code was military deployment. This is a different kind of churn. The reasons it is happening are different and the likely intervention strategies for a disappointed customer will be irrelevant. For all those reasons, the model is better with these cases removed. What if you're trying to predict 30 day…

Contents