From the course: Introduction to Spark SQL and DataFrames
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Sample data from DataFrames
From the course: Introduction to Spark SQL and DataFrames
Sample data from DataFrames
- [Instructor] Now in this lesson, we're going to take a look at sampling. Now we may want to use sampling sometimes, particularly when we have very large data sets, and we're doing kind of an exploratory analysis, we just want to get kind of an understanding at a high level of what the data is like. Sampling can be really useful for doing quick operations. So let me just get the kernel. I'm going to restart and clear the output just so we can start fresh here. And what I'm going to do is load the data. There, so this is our location temperature data set that we've been working with, and the first thing I want to do is check the data frame to find out how many rows are in there. So I'll just do a simple count, and we see there are 500,000 rows. So let's see how we can take a sample, or a subset of that, but randomly select a subset. So I'm going to create a new data frame, and I'm going to call it data frame one underscore…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
(Locked)
Set up a Jupyter notebook2m 1s
-
(Locked)
Load data into DataFrames: CSV Files7m 26s
-
(Locked)
Load data into DataFrames: JSON Files3m 16s
-
(Locked)
Basic DataFrame operations3m 26s
-
(Locked)
Filter data with DataFrame API2m 13s
-
(Locked)
Aggregate data with DataFrame API3m 47s
-
(Locked)
Sample data from DataFrames5m 25s
-
(Locked)
Save data from DataFrames3m 27s
-
(Locked)
-
-
-