From the course: Introduction to Spark SQL and DataFrames

Unlock the full course today

Join today to access over 22,500 courses taught by industry experts or purchase this course individually.

Querying DataFrames with SQL

Querying DataFrames with SQL

From the course: Introduction to Spark SQL and DataFrames

Start my 1-month free trial

Querying DataFrames with SQL

- [Instructor] Up to now, we've been using the Spark DataFrame API to work with DataFrames. Now, we're going to switch gears and we're going to work with SQL. In particular, we're going to use Spark SQL for working with DataFrames. As in previous videos, I'm started with data already loaded. Let's just quickly go through the steps that are involved with that. First, I import a pyspark library that allows us to work with SQL. I create a Spark session global variable which allows us to work with a distributed Spark session. Then I've defined a string that points to my directory which holds my data. And then, I create another string which points to the file I want to load, and then I execute a Spark read command specifying the JSON format. And then finally, I've listed out the first 10 rows of this DataFrame, which I called df. Let me briefly explain some of the columns. In this DataFrame, we have utilization data about…

Contents