From the course: Python for Data Science Tips, Tricks, & Techniques

Unlock the full course today

Join today to access over 22,400 courses taught by industry experts or purchase this course individually.

Work with Parquet files

Work with Parquet files - Python Tutorial

From the course: Python for Data Science Tips, Tricks, & Techniques

Start my 1-month free trial

Work with Parquet files

- [Instructor] Now, I want to show you how to work with some files that you probably are going to come across if you're working in the big data space, especially with platforms like Hadoop, or any of the cloud-era stuff. This is a Parquet file format. Now, these are used because you can compress them and they often work better when you're handling very large volumes of data. And, working with them in Python can be sort of a challenge. So, if I open up my 01_03 exercise file here, you can see that we have quite a bit of stuff to run through but, first and foremost, we need to install something called Pyarrow. Now, this is the Python implementation of Apache Arrow. Now, Apache Arrow is a whole separate platform that allows you to work with big data files in a very columnar, vector, table-like container format. So, something that you're probably familiar with like a dataframe, but we're working with Parquet files. In order to do that, I need to run this conda install conda-forge pyarrow…

Contents