From the course: Advanced SQL: High Performance Relational Divisions

Relational division using aggregations - SQL Tutorial

From the course: Advanced SQL: High Performance Relational Divisions

Start my 1-month free trial

Relational division using aggregations

- [Instructor] Our first task is to find candidates who have all skills. Not all skills for a particular role, just all skills in general. Let's see what data we need first. We need each candidate skills on one hand. So, select candidate and skill from candidate skills and all skills on the other. Select skill from skills. Let's execute to see the data. The essence of this technique is numeric comparison between the number of divisor elements, all skills, and the number of dividend elements, the number of skills each candidate possesses. If these two counts are equal, it means the candidate has all skills. Select count star from skills group by empty set counts the number of rows in the skills table. And we have nine in total. Does this syntax look strange? Watch logical query processing part two to learn more. Before evaluating all candidates, let's evaluate if a specific candidate has all skills. And we can generalize from there. Take Praveena for example. Select count star from candidate skills, where candidate equals Praveena grouped by candidate. Let's execute. And it seems like Praveena has five skills. A candidate has all skills only when the two counts are equal. So we know Praveena doesn't have all skills. Now we can evaluate all candidates. Obviously, the literal filter must go. And we want to find the candidate's name, not his count. So, let's replace the count with candidate. Evaluation of each candidate must be done in the having clause after the group by takes place. So we can place the query that counts the candidates skills in parentheses, and use that as a filter. Having count star equal the total number of the skills. Execute, and Darren raises its head. So far, so good. Our second task is to find all candidates who are a good fit for the DB architect role. And I remind you that Chen and Darren are the only candidates who possess all required skills. Let's use the query we already have as our starting point. First, we no longer need all skills. This time, we need to count how many skills are required for the DB architect role. And this data comes from the role skills table. So let's change from role skills, where role equals DB architect. This query is not correlated so we can execute it independently. And we can see that the DB architect role requires five skills. If we execute the full query as is, we get a wrong result. Praveena and Chen show up. And we know that Praveena is not a match for this role so how did she sneak in? And what about Darren, why doesn't he show up? Pause the video and find the bug. This query counts the number of skills each candidate has. Just not the ones that are required for the role. Chen, we know to be a perfect match. His five skills are the ones that are required. But Praveena happens to have five skills. Unfortunately, not all of them are the ones that are required for this role. And Darren has more than five skills. So the equality predicate eliminates him. To fix it, we need to count only the skills that are required for this role. And the most natural way is aware filter, for example, skill and select skill from role skills where a role equals DB architect. Let's execute now much better. Darren and Chen are the ones we wanted to see. This is a valid solution, but nonetheless, I want to make two changes to it. Remember it's my course, I can do whatever I want. First, I want to shortcut. The query that retrieves the DB architect skills appears twice, once grouped and once not. And we can encapsulate it and a with clause, name it DB architect skills and replace both references. This will also make it easier to find candidates for other roles by changing the string in only one place. Next I want to rewrite the inn predicate as an inner join. Candidate skills, inner join, DB architect skills on skill equal skill. The reason for this rewrite will become clear soon enough. Both the inn and the join syntax represent the same relational operator, it's called a semi-join. Semi-joins are binary relational operators where one set is used just as a filter meaning nothing is returned from it. Semi-joins are represented with these symbols for left and right respectively. Here is our final query. Let's execute it. And we get the correct result. Chen and Darren are our best fits for the DB architect role. Or are they?

Contents