Relational databases are the basis for understanding data storage choices and are critical to understanding multiple table joins later in the course. In this video, gain an understanding of the relational database model, looking at concepts of normalization and denormalization and its pros and cons.
- [Instructor] At first glance, it might look like the tables in a relational database would be structured in pretty much the same way as tables that you might be used to in other contexts, from Excel files or a table in a textbook. There are some key differences however, which we'll cover in this video. We're going to start With a simple sample dataset to demonstrate how relational databases organize information. This order's table looks like something you might see in an Excel file with information about customers, where they live, what they purchased, how much each item costs and the order total. Looking at this dataset, you might notice some inefficiencies in storing the data this way, because we are recording some of the same data points over and over, such as the customer's location or the price of a bottle of oil versus a case of oil. We also have multiple rows for the same order when more than one type of item is included in the order. To avoid this repetition, I'm going to transform this dataset as you would to store it in a relational database, where data points are stored as few times as possible and tables are linked to one another using pointers called keys. First, I'm going to split out the customer information into its own table, the customer table. Each customer's information is stored only once. And I've assigned each of them a customer ID that I will use in my original table as a stand-in for the entire data row. So our new data table would look like this, where we've substituted the customer data that was there before with just the IDs that represent that data from the customer table. This is referred to as a one to many relationship. Each data point is stored once in the customer table but it can be referenced many times in the order table and in any other tables that store the customer ID in this database. To deal with the rest of the duplicate data, let's break out the product information into a new table as well. We'll follow the same pattern as with the customer ID by assigning an ID to each product. You may notice that we could still remove duplicates further in this dataset, but for the sake of this example, we're not going to do that step right now. We could do the same thing we did to add the customer ID to the order table with this new product ID as shown here, but that would still leave us with multiple rows for the same order, which isn't deal. Instead, we will remove the product information from the order table entirely, eliminate multiple rows per order in the main table and add an association table also called a join table or a junction table to connect the order information with the product information in a many to many relationship. A visual representation of the way the data is organized in a database is called an entity relationship diagram or ERD which you'll see here for our new data design. The process I just went through to remove duplicate data and move into a relational database model of storage where tables are linked using key relationships is called normalization. In a perfect relational database, your data would be 100% normalized, which means that you would avoid the costs associated with storing any piece of information more than once. In reality though, you might see some intentional denormalization or storing the same piece of information more than once. Sometimes you want to allow some denormalization if it just makes your data easier to work with, or if your primary concern is speed because it's easier for the server to get everything it needs from one table, rather than jumping across multiple tables. As you can see, our dataset has transformed a lot going from this to this. And while it's not important for you to be able to draw your own ERD at this point, it is critical that you understand the conventions that it represents as you start to navigate a normalized SQL database.