From the course: Amazon Web Services: Data Services

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Explore Hadoop and Spark on AWS

Explore Hadoop and Spark on AWS - Amazon Web Services (AWS) Tutorial

From the course: Amazon Web Services: Data Services

Start my 1-month free trial

Explore Hadoop and Spark on AWS

- [Instructor] In this section, we're going to look at workloads that are large or huge and have varying levels of complexity. Now, they're going to interact with smaller medium. That's how they become large or huge. And on AWS data services, your usual choice for workloads that are large or huge is the Hadoop ecosystem. And some people would just say Hadoop, but Hadoop in the wild is really not usable, so it's generally Hadoop plus a number of other libraries, partner tools, and other services. So let's first think about core Hadoop in case it's unfamiliar to you or just to define terminology. Core Hadoop I define as two parts, files, which are shown to the right here, and files can either be stored in the Hadoop Distributed File System, HDFS, or in the Amazon implementation in S3, and processing on top of those files, and the processing that is core to Hadoop is called MapReduce, and it's a distributed processing…

Contents