From the course: Faster pandas

Unlock the full course today

Join today to access over 22,500 courses taught by industry experts or purchase this course individually.

Using Dask

Using Dask

From the course: Faster pandas

Start my 1-month free trial

Using Dask

- [Instructor] Dask provides easy-to-use parallelism. It has several schedulers it can call in order to parallelize your work. It can use a single machine with threads, processes, and synchronize, and it can use Dask.distributed to spread the work over several machines. You will need to install Dask first. So python -m pip install, and we're going to install Dask with all the dependencies. So we say dask[complete]. Once we install Dask, we can use it. So ipython, and if you look at the directory, you have several files with logs in them and one of the nice things about Dask, it can combine several files into a single dataframe. So first we do import dask.dataframe as dd. And then we can say the dataframe is dd.read_csv and we say *.csv.xz. We're going to say that the compression is lzma and we're going to say that the blocksize is None. And you saw this finished almost instantly. And when we look at the data frame we see that Dask read only enough information to know how the data…

Contents