From the course: Learning Data Analytics: 1 Foundations

Modeling data with queries

From the course: Learning Data Analytics: 1 Foundations

Modeling data with queries

- [Instructor] There are lots of different ways to model data. If you have backend access to a database, you're likely to use queries to model your data. You'll leverage joins and join types to create the datasets you need. Let's take a look at the model data with queries database. Our survey data is captured in this database and people have responded on different dates. We can see that they have a start date of different days. So I have on 7/24, 7/25. I can scroll through and look through the 262 responses. We want to analyze their responses by date. We also want to look at what day of the week was most popular or even which one of the months was more popular. For example, is Monday the most popular day or was it Sunday? Let's bring in our date data. I'll go to external, new data source, from file and Excel. I'm going to use my reusable dates dataset. I'll browse and choose Dates. I don't need to link or append a copy. I'll just import the source data. I'll click okay. I see my date information. I'll go ahead and choose next. My first row does contain column headings. I'll choose next. I'll go ahead and choose next. I don't need a primary key. I'll choose next and I'll choose finish. I don't need to save my import steps. I'll choose close. I have my survey spending data and I have my date codes table. This will allow me to bring in those data points without having to constantly create those calculations. Okay, let's create a query. I'll choose create. I'll choose query design. Okay, let's get started by looking at the most popular day. All right, I'll choose create. I'll go to query design. I'll bring in my date codes and my spending data. I'll choose the official date and join it to the start date. An inner join is fine here because I just want to see the information on the respondent's actual date. Okay, I'll close this add tables. All right, so I want to bring in the weekday name and then I want to bring in all of my survey data. I'll go check those record counts. I have 262, so I should say 262 results. I'll go ahead and run that. Great, now I can see all of the weekday names and don't have to do any other additional calculations to get that information. I'm working with the pure data and not a formatted date, which is okay depending on what I'm trying to do. I like to have that data. All right, let's go ahead and name this one as most popular day for survey. All right, fantastic. Our next objective is to determine if people fill the survey out every day of the survey period. Again, that survey period ran from 7/24 to 8/30, 2020. Because my dates table contains five years of dates, the very first thing I want to do is filter that data down with a query. I'll go to create, I'll choose query design. I'll go ahead and add date codes, close that. Okay, I'll bring in all of my information. I'm going to hold my Shift key, click the last field, drag it down. Okay, in the criteria field for the query, drag that up a little bit. I'm going to say between 7/24, 2020 and 8/30, 2020 because that's the survey period. Because I'm using Access, it went ahead and wrapped my dates with the necessary pound symbols. All right, I'll go ahead and choose run. And if I take a look I can see the 38 days that the survey was active. All right, let's go ahead and save this query as survey date period, and we can click okay. All right, so I'll go ahead and close that. I'll close my most popular day. I'll close my 262 results. Now again, my objective is to determine which days did not have a survey response and which days do have a survey response. So in order to do this I have to work with my query and my join types. I'll choose create, query design. I'll bring in my spending data and I'll also bring in my survey date period. Okay, so I'll take my official date and I'll join it to my start date. I'll bring in the official date, my weekday name, my month name, my year, and even my day of week. And then I'll bring in all of my survey spending data. I'll double click that asterisk. Okay, so now what I want to do is adjust my join type because I want to see every record for survey date period. Meaning every date that's between that 7/24 and the 8/30 will show up in my results regardless of it having a survey spending record. All right, I'll go ahead and right click my join properties there. I'll tell it to all of my records from my survey date period. I'll click okay. And then I'll run that query. Let's see if we can spot a day where we did not have a survey response. Oh, here we go. So close to the end of the survey period. I see on 8/8 and 8/9 I don't have responses on that day. I also noticed that on the 11th and 12th. Okay, but there's also one thing that's a little bit perplexing to me, which is 281 results. I know that I had 282, so I need to go investigate that. I'll go open up my survey spending and I'll sort my dates. Let's go from oldest to newest and then newest to oldest. Ah, I have a date that's outside of my survey period. This was automatically excluded based on my query. Investigation over, everything is okay. All right, let's go ahead and save this as every day of survey and click okay. This is just one easy way to accomplish this task. This is a valuable technique when you need to represent data that isn't captured in the process. You're leveraging data, queries, and joins for the benefit of your requirements.

Contents