From the course: Introduction to Spark SQL and DataFrames
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Load data into DataFrames: JSON Files
From the course: Introduction to Spark SQL and DataFrames
Load data into DataFrames: JSON Files
- [Instructor] Now, I'm back in my Jupyter Notebook homepage and I saved out that last workbook that we were working with. I called it simply 03.01 Loading csv files into dataframes. And now I'm going to create a new Notebook, also with Python 3. And, in this example, I'd like to show you how to read a json file. Now, the formats going to be pretty similar. For example, the first thing we want to do is import from pyspark.sql and we want to import SparkSession and then we want to create a spark context which is the variable again that gives us a reference point for communicating and manipulating the cluster. And, to do that we call SparkSession and we call the builder and within the builder we want to call the getOrCreate command. Now, we also want it to find our data path and that's the same thing I used before. And again, you'll change this to wherever you happen to store the data files. And, also I just want to point out…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
(Locked)
Set up a Jupyter notebook2m 1s
-
(Locked)
Load data into DataFrames: CSV Files7m 26s
-
(Locked)
Load data into DataFrames: JSON Files3m 16s
-
(Locked)
Basic DataFrame operations3m 26s
-
(Locked)
Filter data with DataFrame API2m 13s
-
(Locked)
Aggregate data with DataFrame API3m 47s
-
(Locked)
Sample data from DataFrames5m 25s
-
(Locked)
Save data from DataFrames3m 27s
-
(Locked)
-
-
-