From the course: Data Science Foundations: Data Assessment for Predictive Modeling

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Solution: What should the row be?

Solution: What should the row be?

From the course: Data Science Foundations: Data Assessment for Predictive Modeling

Start my 1-month free trial

Solution: What should the row be?

(upbeat music) - [Tutor] Okay, here we go. This is going to be more thought process than definitive solution because to make a definitive decision about this, you'd have to look outside the data set and consult with a subject matter expert. Nonetheless, we have a lot to consider already. So this is what I've done. I've got the Titanic data set in its original form, which you can find in the originals folder, and I've added a new tab with a pivot table. I've also turned on the filters so we can take a look. So if we click down and filter here, it looks like we have as many passenger IDs as we have passengers. It looks like almost all the numbers are used so that's good. That means that if we wanted to do a risk score at the passenger level, it looks like we don't have duplicates and so on. That's certainly a plausible way to go. What about the other options? Let's take a look at Ticket for instance as a possible…

Contents