From the course: Learning Data Analytics: 1 Foundations

Understanding joins

- [Instructor] One of the hardest concepts for new analysts is how to join data in multiple tables. There is no single best answer that is always applicable to every situation. If you're working with data sets that have already been joined for you, that is amazing. But when you're given the back-end access to multiple tables, you will likely need to join them, and you'll usually do this through queries. Another thing to keep in mind is that programs that work with multiple data sets typically provide you a way to join them. In Excel, it's VLOOKUPs. And in databases, it's queries. Each data set needs a way to relate the data, something shared on both sets of the data. In the best-case scenario, that is a key field that uniquely identifies the information, so you can use it to join. It's worth noting that in queries, there are different types of joins, like inner, outer, and cross joins. The type of join will determine the end result, so it's important to understand them. The more practice you have or real-life experience you gain, the more natural it will become for you to assign joins based on the results you want to achieve. There are a lot of people who are joining data in Excel. They don't realize they're creating joins. They're using VLOOKUPS to establish a join based on a key field. They're pulling in one field at a time. Let's talk about the different types of joins. A cross join occurs when you include tables in a query window and do not join them with a key field. This ties every record in the table to every record in the other table. This could be valuable if you meant to do it, but for most people, it's just simply because they forgot to include the join. When you have two tables that do have a join on a field, then the default join is an inner join. That means that if a record in each table matches, it will show in your results. An example would be if you want to see a list of sales and the products that were ordered, then an inner join would be appropriate. Then you have left and right outer. Left or right is determined by systems differently. It's more important to understand about what a left or right will do in the beginning, not so much which one is left and which one is right. Left or right outers will return every record from one table and the matching records from the other table, meaning you can see every record from one side, regardless of its matches on the other side. So if your goal was to show a list of products and if it was ordered, that would be the appropriate join. When you've been provided data to work with and if you've been given access to the back end, then you'll get lots of real-world experience working with joins. Every data program will provide you a way to join that data and will often guide you with information on the screen that will help you understand what your data set result will be.

Contents