From the course: Data Science Foundations: Data Assessment for Predictive Modeling

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Introducing the KDD Cup 1998 data

Introducing the KDD Cup 1998 data

From the course: Data Science Foundations: Data Assessment for Predictive Modeling

Start my 1-month free trial

Introducing the KDD Cup 1998 data

- Okay to do a proper job talking about missing data, we need a data set with a lot of missing data. In this data set, the 1998 KDD Cup, KDD is the knowledge discovery in databases conference, and they have an annual competition. The 1998 cup competition data set is famously a bit of a mess. It's got an odd coding scheme, It's got lots of blanks it has lots of missing data. Frankly it's a real challenge to work with, but that's why it's perfect for our purposes. So the data set is available here in the UCI machine learning repository. There's another website that has more supporting information. That's this website, this is the competition website from years ago and it's still up. And if you scroll down and importantly, you can see all kinds of supporting files including the data dictionary, which you're going to find very valuable. Here it is you can see this is from years ago, it looks like a coding font. And…

Contents