From the course: Data Engineering Foundations

Unlock the full course today

Join today to access over 22,400 courses taught by industry experts or purchase this course individually.

MapReduce and Hadoop

MapReduce and Hadoop

From the course: Data Engineering Foundations

Start my 1-month free trial

MapReduce and Hadoop

- [Instructor] It's time to talk about specific parallel computing frameworks. We'll focus on frameworks that are currently hot in the data engineering world. When it comes to big data systems, Hadoop is the most popular and used frameworks. And MapReduce was one of the most popular processing techniques. So, what is Hadoop? It is the ecosystem of open-source tools that has changed the way enterprises store, process, and analyze data. It's a collection of open-source projects that is maintained by the Apache Software Foundation. Some of them are a bit outdated, but it is still relevant to talk about them. It uses the MapReduce algorithm. A Hadoop plays a central role in developing ETL pipelines, where ETL stands for Extract, Transform, and Load. There are two Hadoop projects we want to focus on in this particular video; MapReduce and HDFS. So let's first talk about HDFS. It is a distributed file system. It is…

Contents