From the course: Mistakes to Avoid in Machine Learning

Unlock the full course today

Join today to access over 22,400 courses taught by industry experts or purchase this course individually.

Data leakage

Data leakage

From the course: Mistakes to Avoid in Machine Learning

Start my 1-month free trial

Data leakage

- [Instructor] A common rule of thumb in machine learning. If your result looks too good to be true, it probably is. In these cases, the primary culprit is usually data leakage. Data leakage can be thought of as any time information from outside of your training set enters your model. Data leakage is especially prevalent when working with time series data and in environments where there are data cleanliness issues. The end result of this is that you may be fooled into thinking your model generalizes much better than it really does. So how can we detect and prevent data leakage? Here's an example. Let's say you are working to predict customer cancellations and you have a theory that a recent product that has been introduced makes for stickier customers. If you formulate the problem with historical data, you will likely find that all canceling customers did not buy this product. As you validate this model on new unseen data,…

Contents