From the course: Amazon Web Services: Data Analytics

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Query EMR with Apache Spark

Query EMR with Apache Spark - Amazon Web Services (AWS) Tutorial

From the course: Amazon Web Services: Data Analytics

Start my 1-month free trial

Query EMR with Apache Spark

- [Narrator] In a previous movie, we set up an instance of Amazon EMR or Elastic MapReduce. Now we're going to work with it and see how the Spark service is used to process data. In order to do that we need to connect to the EMR master node using SSH. And then we're going to use the Spark shell. Now there's lots of other ways to run Spark processes on an EMR cluster, but this is kind of like the hello Spark or the hello world. It's the simplest possible way to verify connectivity and to get some sense of working with Spark. So you can see here's our cluster. We're going to click on the name. And we'll see that we can SSH in. Now we have a couple of preparation steps before we can SSH into the master node. The first step is we're going to need to set up our security rule so our client machine can SSH in. We're going to scroll down, and you can see here's our security group. Now I've actually already set up this rule, but I'm going to show you where and how you would do that. So I'm…

Contents