From the course: Data Engineering Foundations
Unlock the full course today
Join today to access over 22,400 courses taught by industry experts or purchase this course individually.
Hive
- [Instructor] First up in the software programs that are under the Hadoop umbrella is Hive. It offers features that help in the extraction part of the ETL data pipeline. Hive is a layer on top of the Hadoop ecosystem that makes data from several sources queryable in a structured way using Hive's SQL variant, which is called Hive SQL. It provides an SQL-like interface to query data. We can also extract data from databases and file systems that integrate with Hadoop. Back when there were no choice of tools, developers had to implement queries in MapReduce Java API, which was pretty hard. Facebook initially developed Hive, but the Apache Software Foundation now maintains the project. Although MapReduce was initially responsible for running the Hive jobs, it now integrates well with several other data processing tools. Let's look at this example where we are using the same Olympic events dataset as we saw earlier.…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.