From the course: Data Engineering Foundations

Unlock the full course today

Join today to access over 22,400 courses taught by industry experts or purchase this course individually.

Hive

Hive

From the course: Data Engineering Foundations

Start my 1-month free trial

Hive

- [Instructor] First up in the software programs that are under the Hadoop umbrella is Hive. It offers features that help in the extraction part of the ETL data pipeline. Hive is a layer on top of the Hadoop ecosystem that makes data from several sources queryable in a structured way using Hive's SQL variant, which is called Hive SQL. It provides an SQL-like interface to query data. We can also extract data from databases and file systems that integrate with Hadoop. Back when there were no choice of tools, developers had to implement queries in MapReduce Java API, which was pretty hard. Facebook initially developed Hive, but the Apache Software Foundation now maintains the project. Although MapReduce was initially responsible for running the Hive jobs, it now integrates well with several other data processing tools. Let's look at this example where we are using the same Olympic events dataset as we saw earlier.…

Contents