From the course: Python Data Science Mistakes to Avoid

Unlock the full course today

Join today to access over 22,400 courses taught by industry experts or purchase this course individually.

Using features that will be unavailable later

Using features that will be unavailable later - Python Tutorial

From the course: Python Data Science Mistakes to Avoid

Start my 1-month free trial

Using features that will be unavailable later

- [Tutor] A common mistake in machine learning is choosing features that will not be available in the future. It is important to choose features that will be available during testing, and when your model is run on unseen data. For example, let's say that my goal is to predict whether a student has instructor one or instructor two. Say that I have access to a dataset that contains the SID which stands for Student ID, math grade, science grade, history grade, and instructor, which is either one or two for each student, from a set of students. This will serve as the training data. Now say I pick math grade, science grade, and history grade to be the features and I build a model accordingly. The model will use a student's math, science, and history grades to predict whether they have instructor one or instructor two. Next, say I encountered this testing data. This data does not include students history grades. It could…

Contents