From the course: Data Science Foundations: Data Assessment for Predictive Modeling
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Introducing the KDD Cup 1998 data
From the course: Data Science Foundations: Data Assessment for Predictive Modeling
Introducing the KDD Cup 1998 data
- Okay to do a proper job talking about missing data, we need a data set with a lot of missing data. In this data set, the 1998 KDD Cup, KDD is the knowledge discovery in databases conference, and they have an annual competition. The 1998 cup competition data set is famously a bit of a mess. It's got an odd coding scheme, It's got lots of blanks it has lots of missing data. Frankly it's a real challenge to work with, but that's why it's perfect for our purposes. So the data set is available here in the UCI machine learning repository. There's another website that has more supporting information. That's this website, this is the competition website from years ago and it's still up. And if you scroll down and importantly, you can see all kinds of supporting files including the data dictionary, which you're going to find very valuable. Here it is you can see this is from years ago, it looks like a coding font. And…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.