From the course: Python Data Science Mistakes to Avoid
Unlock the full course today
Join today to access over 22,400 courses taught by industry experts or purchase this course individually.
Using features that will be unavailable later - Python Tutorial
From the course: Python Data Science Mistakes to Avoid
Using features that will be unavailable later
- [Tutor] A common mistake in machine learning is choosing features that will not be available in the future. It is important to choose features that will be available during testing, and when your model is run on unseen data. For example, let's say that my goal is to predict whether a student has instructor one or instructor two. Say that I have access to a dataset that contains the SID which stands for Student ID, math grade, science grade, history grade, and instructor, which is either one or two for each student, from a set of students. This will serve as the training data. Now say I pick math grade, science grade, and history grade to be the features and I build a model accordingly. The model will use a student's math, science, and history grades to predict whether they have instructor one or instructor two. Next, say I encountered this testing data. This data does not include students history grades. It could…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.