From the course: Advanced SQL for Data Scientists

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Bloom filters

Bloom filters

From the course: Advanced SQL for Data Scientists

Start my 1-month free trial

Bloom filters

- [Instructor] Now often when we work with very large data sets we often will trade accuracy for speed. So for example, we might sample a dataset and then do calculations over that sample and assume that the overall population has roughly the same measures. Well that's a probabilistic method and they're approximate. Another probabilistic method is known as a bloom filter and we can use those to create indexes which can be highly efficient in some cases. So a bloom filter is a space efficient method for determining set membership. So for example, if we need to find out which data block contains a particular piece of data or we have a complex query with multiple conditions in a where clause such as finding customers who live in a particular state or city and have been customers for less than a certain period of time and are delinquent on payments. Now, a blue filter is a lossy representation of data. So some compression…

Contents