From the course: Learning Data Analytics: 1 Foundations

Learning about data fields and types

From the course: Learning Data Analytics: 1 Foundations

Learning about data fields and types

- [Instructor] There are three key things to know about a data point. It has a field name, a datatype and a value. What most people see as data are actually values. Values are what we're entering, exporting and reporting on. If you've ever entered data on a form and it asks for first name, then you enter the value of your first name. For me, that would be Robin and that's text. The values for every field are a key consideration on how we define the data type. In a perfect world, field names would be designed to be meaningful, data types would be defined based on what our brain interprets and the values would always be perfectly entered. As you venture into the everyday life of a data analyst, you will discover the world is not perfect and data is not always, as it seems. Field names can be a broad label of information like, First Name, Last Name, or Job Title. We again hope that the field names are named meaningfully. Just be aware of, that's not always the case. People call field names, column headers and they're usually represented by the top row and typically in Excel at a minimum they're bold. Data types drive what we can and should do with the data when we work with it. All data has a type. To simplify this breakdown, I want you to understand at a high level data only has a few types, it's text, numbers, or dates. Over time, you will learn that it does go deeper than that, but we're just getting started. When you export data out of a system, it's likely stored in a database. Database tables are defined for us by the person who developed the database. You'll see the field names are created, you'll say there are data types associated to them. Let's take a look at the employee national ID alternate key. You can see the name of the field, you can also see the datatype is Short Text but when we look at the data itself, it clearly looks like a number. Also a little trick of the trade. Numbers by default are always aligned right and texts is always aligned left. So when I see a group of numbers that are aligned left, I automatically know that it believes that it's text. When this data is exported, it will follow the data type of how it's stored. Let's take a look. So here's the data from that table. And if you take a look at the employee national ID alternate key, you see it's aligned lift, and there are also green flags. If you click one of those sales and you click the little information, you will see that it is a number stored as text and you also have the ability to convert it to a number. Again, you notice this is left aligned and all the other numbers are right aligned. You don't always have access to the backend design but you can look at information and make a determination of what you believe the data type would be. For example, if you saw color as the field name and the value is the word red, and I ask you which one of the high level data types it is, you would probably say words or text. You may even hear people refer to it as strings. If I show you a one, five or 45,981, you would say that's a number. If you see data like 11/20/2025, as a human you might immediately interpret that as a date because you think about calendars, appointments, someone's birthday. When you look at data and you think about what it actually is, you're actually thinking about the data type. Data analysts have grown to understand that this is where it starts to get tricky what our eyes see and what the program says is not always the same thing. When we work with data in Excel or in databases, and that data is defined by the systems, what you see isn't exactly how it is defined or stored. What if I told you that 11/20/2025 and 45,981 to Excel are exactly the same thing. It's the format that makes a difference. Let me show you. I'll click on B2 and if I go up top, notice it shows me the number, the number, the number and then when I get to the short date, notice that it says 11/20/2025. You've probably seen this before. Sometimes when you type something in Excel and you mean one thing and it interprets it as a date, for example, you mean one fourth as in one quarter. You're typing a fraction, watch what happens in Excel. So I'll type my 1/4. Again, I mean one quarter. Excel immediately interprets this as January 4th. The minute Excel believes that you mean a date, it formats that information as a date. So you actually meant a fraction and Excel returns a date. Don't worry, it's not just happening to you, the scientists that study human genes have renamed approximately 27 human genes due to the fact that Excel misread genes like March 1 and Sep one as March 1st and September 1st. What's important to understand is that, what Excel is actually doing is it's formatting what it believes is a serial number to display a date. Because of the format, we see the human version of the date. Why? Again, it's because Excel converts dates to serial numbers so it can use them mathematically. To correct this formatting, you have to go and change the format back to a number or general. And for some people this is the maximum exposure to data type. Let me show you how to do that. So I'll click this sale, I'll go to the number and I'll go ahead and convert it to either general or number. And I'll go ahead and decrease those decimals. Common data types that data analysts should know or texts or string, date, time, numbers, and Boolean which might look like a true or false, yes or no. If you're working in programs like Microsoft Access, it's actually a negative one or zero but then tools like SQL it's one or zero. Different technologies have different types of data types. For example, access databases have a very short list of data types or a SQL database will have a multitude of different types to choose from. But at the end of the day, they still roll up to those high level dates, texts, numbers or Boolean. The reality is that all too often the field names are not meaningful, the data types might not be what you expect and the values when keyed by humans, are not always correctly entered. However, for the up-and-coming analyst or just the existing data worker, this is a fact of life and in some cases, job security.

Contents