From the course: Advanced Pandas

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Beyond pandas with Dask and Koalas (Spark)

Beyond pandas with Dask and Koalas (Spark)

From the course: Advanced Pandas

Start my 1-month free trial

Beyond pandas with Dask and Koalas (Spark)

- [Instructor] There may come a time when the volume of your data has become so large, that you find using Pandas to be constraining. Thankfully there have been rapid advancements in big data processing. And in this lesson, we'll discuss two frameworks. Dask and Spark, that have proven to be especially useful. We'll also see how Pandas can be easily translated into these cutting edge approaches. So what is Dask? At a high level, Dask is a framework to speed up your Python workload using parallel computing. So instead of running all tasks sequentially, the Dask scheduler allows for simultaneous computing. This takes full advantage of all the compute you have available. One of the key benefits of Dask, is it is very compatible with all the existing work you do with Python and Pandas in particular. Dask has a dataframe concept, just as Pandas does. In fact you can think of a Dask dataframe as consisting of a series…

Contents