From the course: Advanced SQL – Window Functions

ROW_NUMBER and NTILE - SQL Tutorial

From the course: Advanced SQL – Window Functions

Start my 1-month free trial

ROW_NUMBER and NTILE

- [Instructor] Row number assigns each row in a partition and number out of a sequence of monotonically increasing integers, beginning with one and up to the number of partition rows. Consider each row in this stadium as a partition and each seat as an element or a row ordered from left to right in ascending order. One of the most common use cases for row numbers is the classic top N per group challenge. Let's see how to solve it. We need to write a query to show the top three and only three animals of each species who had the most checkups, including species for which we have less than three animals. Let's start with a grouped query by species and name and a count star group aggregate function to get the number of checkups per animal. Now I hope you didn't forget that we should include species that have less than three animals, and less than three also includes zero. All species are in the reference species table, so let's incorporate that into our grouped query with the left outer join. Count star no longer makes sense as for species with no checkups, it returns a one. Either a name or checkup time will do. Now we see that our shelter accepts both ferrets and raccoons, although currently we have none. Next, we need to find a way to get the top three of each species. First, let's try a sub query approach and see how that works. Let's wrap the previous query in a CTE, calling it animal checkups. Now let's add a correlated sub query from a second instance of the CTE that counts how many animals of the same species had more checkups than itself. Those who had the fewest are the winners. Let's execute it and see what we get. The top three most checkup cats are Tigger in first place as there are no cats with more checkups than itself, and Ivy and Sadie are tied for second place. The sharp-eyed among you may notice that there's another potential issue here, but let's ignore it for now and continue as if all is well. Selecting from the CTE and filtering for animals, which there are fewer than three others with more checkups than themselves should give us the top three, right? Let's execute, and wrong. There are three dogs that are tied for third place with 11 checkups each and three rabbits tied for second place with five checkups each and all are showing up. The requirement was to return no more than three, so we're not done yet. However, the requirement didn't mention which of the animals to return in case that there are ties, so we are free to pick any three. The unique attributes per species is name, so let's use that as a tie breaker and count only animals whose name is smaller than the one from the outer query. And to do that, we must write this long logical predicate with two evaluation paths. If the number of checkups for the animal is larger than that of the one being evaluated, no problem. If it's equal, only then compare their names and count only those with a lower value name. Now we have an ugly, long, inefficient, unreadable but working solution onto the Window function. Let's copy the group query with it's CTE, add a column with a row number, Window function, partitioned by species ordered by the number of checkups in descending order and name as a tiebreaker for consistency with a sub query. By now, you know well that due to processing order, we can't use the row number result in the where clause so we must use another CTE. Wrap the row number query in a CTE, select from it with a filter for rows where the row number is three or less. The row number is one based, while the sub query count was zero for the first animal, so we need to use a smaller than or equal to operator. And with that, we're done. Not bad, right? Performance-wise, (upbeat music) I think you can do the math yourself. For the curious among you, at the bottom of the file, you'll find a third solution, which I think is even more elegant than both ones we just saw. Check it out and let me know what you think in the Q and A section. Ntile segments a partition into as equal as possible n or less segments. Each segment is called a tile. Ntile assigns each row an integer number out of a monotonically increasing sequence starting with one and ending either at N or the number of rows within the partition in case there are fewer of those. Think section numbers in a stadium, where all seats within a section are assigned the same section number. The stadium is the partition, the sections are the tiles, but unlike section numbers, ntile sections are as equal as possible in size and start with a one. I can only recall a handful of cases where I actually used ntile for real world challenges. But I will try to find an interesting use case for this chapter's challenge. Back to the animals table, select species name and admission date for all animals. Unpartitioned ntile 10 by admission date segments a 100 animals into 10 segments with exactly 10 rows each. Ntile 30 results in uneven segments and the remainder is distributed among the first partitions. That's why we see four rows in segment one, four rows in segment two, but all the following segments have three rows each. Ntile 30 partition by species will result in one tile per cat as we happen to have exactly 30 cats. Scrolling down to rabbits of which we have only 10, we see that there are only 10 segments with one row each.

Contents