From the course: Learning Data Analytics: 1 Foundations

Dealing with the data you don't have

From the course: Learning Data Analytics: 1 Foundations

Dealing with the data you don't have

- [Instructor] I discovered early in my career with data that there was a fair amount of data that was collected in what we call source systems. These systems we use at organizations were built to capture data that was important to them anywhere from HR information on employees, accounting information or even sales data. I started to see a trend in most organizations that I supported that all the data we needed to achieve a certain report or monitor a process did not always exist naturally in their systems. We may see a calculation on a screen but it wasn't stored in the data. Also businesses grow, pivot and change and the information they need today, may not have been needed when the systems were created. Or sometimes that information is only valuable for decision making and not important for the system itself. The impact to you as an analyst is that not all data we need for measurement or decision-making is available. This is where understanding functions and how to use them for dealing with data we don't have can set you apart from others and help you build more meaningful datasets. Every person who works with data does need to understand basic math functions like add, subtract, multiply, and divide. Let's say, for example, you have the price and the quantity ordered but not the total for that customer. You could then multiply the price by the quantity to gain that information for your report. You'll also need to know aggregate functions like sum, counts and averages. The system will generate records for you and you may need to count what is ordered, provide an average of what was ordered, or even a total of everything ordered. In addition to those basic formulas and functions, also understanding how to tie fields together can make data more meaningful for display. These functions are called CONCAT or concatenation and to me they're fundamental for all analysts. One of the most important data creation functions of all is a logical function called if. The most basic if function is a logical test, what happens if true and what happens if false. An example of this would be a customer discount. The company decides that anyone who orders over $500 will receive a $50 gift card. We would write a logical test that says, is the order amount greater than or equal to 500. And if it is, they'll get $50, otherwise they'll get zero. There is no standard of what or how much of any data you will need to create using formulas or functions. Only your data and project can determine what you need to create to reach the goal. Also the amount and types of transformations you will do will be dependent upon what you're facing in your data. I don't know that there's too many data sets that I'm not working with dates. You can spend enough time learning about data functions and date functions. Also know that data types do impact you on things like dates. So knowing if the value is stored in wrong format for the math you want to do, well then you may need to know how to do some conversion functions to convert it to the proper data type. I want to show you an easy way to start learning about different functions, and I'm going to use Excel for this example. Okay, I'm in Excel. And I say this for every piece of software that I ever opened. And when I'm teaching people a new software, take a moment to click through the different tabs, just read the tabs left to right and look at all the different options. Hover over each option on the ribbon and just look at what the program tells you about it. Now let's navigate to the formulas tab. Look at the different categories of functions. We're looking at the function library. Okay, you see options like insert function, which gets you started. You see your autosome options, autosum, average, count, max and mean. Again, aggregate functions, super important. You also see recently used. Again, if you've used it once, you might use it again. And as you hover over, you can see a little bit of detail about each one. It also has the function library broken into different types of categories like logical, text, and then also date and time. You see things like look up and reference, math and trig, and then more functions. Again, Excel has one of the most comprehensive calculation libraries on earth. Okay, let's take a look at that logical function. I'll go to logical and I'll choose if. Immediately, this brings up the function palette. This will allow me to work through each piece of the information and give me details on each part. So if I click in logical test, it will tell me the value is true or false. It gives me the value if true with a little bit of information and the value if false. I also have the opportunity to click help on this function. I know for me, if I can look at working examples, then it's easier for me to apply it to my own work. Again, take advantage of the function pallet or builder in whatever tool you use, because it can help guide you through and also help you master syntax. The data will dictate the types of formulas and function you will use. We're not born knowing how to build ifs and concatenates but you will learn the syntax and then you'll also learn when to apply them.

Contents