From the course: Amazon Web Services: Data Services
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Explore Hadoop and Spark on AWS - Amazon Web Services (AWS) Tutorial
From the course: Amazon Web Services: Data Services
Explore Hadoop and Spark on AWS
- [Instructor] In this section, we're going to look at workloads that are large or huge and have varying levels of complexity. Now, they're going to interact with smaller medium. That's how they become large or huge. And on AWS data services, your usual choice for workloads that are large or huge is the Hadoop ecosystem. And some people would just say Hadoop, but Hadoop in the wild is really not usable, so it's generally Hadoop plus a number of other libraries, partner tools, and other services. So let's first think about core Hadoop in case it's unfamiliar to you or just to define terminology. Core Hadoop I define as two parts, files, which are shown to the right here, and files can either be stored in the Hadoop Distributed File System, HDFS, or in the Amazon implementation in S3, and processing on top of those files, and the processing that is core to Hadoop is called MapReduce, and it's a distributed processing…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.