From the course: Learning Data Analytics: 1 Foundations

Cleaning data with Power Query

From the course: Learning Data Analytics: 1 Foundations

Cleaning data with Power Query

- [Instructor] When you start to learn Power Query, it is impossible to not use it for even the basic things you do all the time to clean data. It's so much easier than recording macros, and just as powerful. It connects to data naturally because that's what it's been built to do, and it's much easier to work with once you learn a little bit about it. Let's open up CleanSpending in our chapter folder. We'll connect to the Spending Trends data in the same folder. This way when we have new data, we can either replace that file or change the connection to the new data. Okay. So I'll navigate to Data. I'll choose Get Data From File. And because the source of my file is Excel, I'll choose From Workbook. And I'll choose the Spending Trends Survey. Okay, I'll choose my Original Data. And then I have two options. I can go ahead and load it to the spreadsheet so I can see it, or I can go directly to Transform Data. Since we're going to focus on cleaning, we'll go straight to Transform Data. We can load it later. You'll see your Query Settings on the right-hand side. And what you'll notice is in the Applied Steps, every change you make will be captured. It's done some of them for us already. Okay, so the very first thing I'm going to do is delete the unnecessary columns. There's multiple ways to do this, but right now I'll just right click the Collector ID and choose Remove. Because I'm not doing any start date and end date comparison, I'll go ahead and just remove the End Date. And then I also want to get rid of IP address, email address, and first name. Let me show you another way to do that. I can go to Choose Columns, and then I can just uncheck the options that I do not want. All right, I'll go ahead and click Okay here. And then you see my steps, right? I see Removed Columns, so if I choose that, it goes back to that step, and it lets me see what I actually removed. If I choose Removed Other Columns, it shows me what I removed, but also I can choose the little gear shape here if I need to add one back or remove another one. I'm ready to go ahead and click Okay. Okay, so the Start Date field has the date and the time, and I really just need the date 'cause I'm not doing any time sensitive calculations. So I'll go ahead and hit that little data type, and change it to Date. And then again, it gave me the Changed Type. Because I already have one Changed Type and now I've changed it again, I actually want to rename this step. I'll go ahead and right click and rename this to ChangedDateRemovedTime. I like to use meaningful names because people will be looking at my file after the fact. But again, it's anybody's guess to what you'll see or how you'll name your own files. Okay, so one of the things that I noticed is that the headings are named in question format. So, I'm going to go ahead and double-click, What is your age? And change that to Age. I'm going to go ahead and change, What is your gender? Oh, let me scroll over just a little bit because we have an option for them to SelfDescribe. Okay so, what I'm going to do there is change it to Gender, change the next one to SelfDescribe. I'll go ahead and change, What is the highest level of education, to education level. And again, every time I do this, it's keeping up with the columns that I've renamed. And if I want to take a look at that, I can expand it. What this is called is M Code, and I can adjust this M Code manually if I choose to. One of the most powerful features in our toolbox as data analysts, is the ability to tie data together. In Excel, we do this with concatenate or concat. I want to do this in Power Query, and I'll use Merge. So I'm going to highlight Gender, I'm going to hold my control key and highlight SelfDescribe. I'll right click these fields and choose Merge Columns. This allows me to choose a particular separator, and it allows me to give them a new name. Let me go ahead and do GenderInfo, and click Okay. Now those two columns have been merged together to create one column. And again, I have my gear shape in the merged column that will allow me to change my options if I choose to. I will choose Cancel. And I'm going to go ahead and create the name for this as GenderCombined. Okay, so because this is a survey response, it gave me two levels of headings. The first heading is all of the questions, and then some additional information came over with the file. I need to always filter out row one, so I'll highlight this, and where there is a (null), I'll uncheck that, and that will remove that extra row there. Okay, I'm going to right click and rename this as RemoveSecondHeader. Once I've performed all of the steps that I need for my data, I can then choose Close & Load. This will actually load the data to the spreadsheet. I see my 262 rows are loaded, and now I see my nice and clean information. This is one of the most powerful tools at the fingertips of every person using a modern Excel version. It's been around for several years. It would likely hurt your feelings to know just how long you could've been using it. But there's no time like the present. Also important to know that if you learn Power Query in Excel, then you already know it in Power BI. You can learn more about Power Query, check out Oz du Soleil's course on Get and Transform in the library.

Contents