From the course: Advanced SQL for Data Scientists
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Partitioning data
- [Instructor] One of the most effective ways to deal with large data sets is to use partitioning. Now, what can happen when we're dealing with really large datasets is that large tables can be difficult to query effectively because they have so much data, and especially if you're scanning or you have to maintain very large indexes. So what partitioning does is it splits tables by either rows or columns into these subsections, which we call partitions. Now, horizontal partitioning is a way of limiting the amount of data we have to scan to a subset of a set of columns. We can have local indexes for each different partition, so in this example, we have a large table which is broken down into three different partitions, we could have three distinct sets of local indexes, and so our indexes would be smaller, scanning the indexes would be smaller, or if we needed to scan the entire partition, the amount of data would be…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.