From the course: Data Science Foundations: Data Assessment for Predictive Modeling

How to envision a proper flat file

From the course: Data Science Foundations: Data Assessment for Predictive Modeling

Start my 1-month free trial

How to envision a proper flat file

- [Instructor] We're going to briefly review something quite basic, but also terribly important. By the time you reach the end of the data preparation phase, you need everything in a flat file. All of your information, and it has to be in this format. During data acquisition and assessment, you needed an ID field to merge your data. You still need it, just at this point it's to prepare you for eventual deployment. Of course nothing is more important to supervise learning than your Target field, and naturally it has to be 100% complete, no missing data. Generally, there's no missing data in the inputs either. We diagnose it during data understanding, and address it during data prep. By the time we get to this point, every row of data has been vetted and belongs there, and every column has been vetted and belongs there. Specifically, each column should be a unique and useful source of information with little or no redundancy. The whole point of data understanding is to explore the data so thoroughly that by the time we work our data prep to-do list, we have the data in exactly the form we need.

Contents