From the course: Advanced SQL for Data Scientists
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Bloom filters
- [Instructor] Now often when we work with very large data sets we often will trade accuracy for speed. So for example, we might sample a dataset and then do calculations over that sample and assume that the overall population has roughly the same measures. Well that's a probabilistic method and they're approximate. Another probabilistic method is known as a bloom filter and we can use those to create indexes which can be highly efficient in some cases. So a bloom filter is a space efficient method for determining set membership. So for example, if we need to find out which data block contains a particular piece of data or we have a complex query with multiple conditions in a where clause such as finding customers who live in a particular state or city and have been customers for less than a certain period of time and are delinquent on payments. Now, a blue filter is a lossy representation of data. So some compression…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
-
(Locked)
Federated queries4m 13s
-
(Locked)
Bloom filters4m 38s
-
(Locked)
Hstore for key-value pairs6m 23s
-
(Locked)
JSON for semi-structured data8m 34s
-
(Locked)
Hierarchical data and ltrees11m 59s
-
(Locked)
Challenge: Design a table to support unstructured data33s
-
(Locked)
Solution: Design a table to support unstructured data58s
-
(Locked)
-