From the course: Learning Data Analytics: 1 Foundations

Cleaning data

From the course: Learning Data Analytics: 1 Foundations

Cleaning data

- [Instructor] I have had the great experience of never receiving data in usable form, or what I would call clean. I've always sort of been expected to get it cleaned up to make it usable and meaningful. I'm grateful because it helps me appreciate how to deal with any data that I see. What is data cleaning? And since we're at the beginning of the journey, I'm going to keep it really simple. It's the process of standardizing the data and making it meaningful. If you use Excel routinely as a part of your reporting process, or if Excel is all of your reporting process, you are likely cleaning a lot. It might include the removal of columns that are unneeded for the data model and the report. It might mean removing extra spaces from a field using TRIM or CLEAN commands. It might mean making all states to letter abbreviations and even making them uppercase. You might've received data where address information is all together, but you need to break it out into individual fields. You might also remove invalid records for the purpose of the report. For example, you don't need all the sales that haven't completed the full process, so you remove those in process sales records. They're valid for the sales process, but they're not valid for the type of report you're writing. You might also remove duplicated data from a large set to create a set that's appropriate for your business case. These are all simple types of data cleaning or cleansing, and in no way is this a comprehensive list of things you might encounter to produce that nice clean data set. We will work with several of these commands a little bit later in the course. It is always important that you work with one key goal in mind, which is a high quality dataset that is usable. You need to feel confident about the result and you'll need to be able to speak about the data cleaning that you applied to the data, so be sure to keep good notes.

Contents