From the course: Data Engineering Foundations

Unlock the full course today

Join today to access over 22,500 courses taught by industry experts or purchase this course individually.

Scheduling ETL pipeline using Airflow

Scheduling ETL pipeline using Airflow

From the course: Data Engineering Foundations

Start my 1-month free trial

Scheduling ETL pipeline using Airflow

- [Instructor] It's time to put everything together and schedule the jobs that we have defined so far. As discussed, we'll have to use a scheduling tool for this, and the most commonly used scheduling tool is Apache Airflow. We'll be using the code written so far, and then we'll define a directed acyclic graph using Apache Airflow. So first of all, let's set up Airflow on our machine. So, I am first creating a directory called Airflow, where all the configuration file and database would reside. So mkdir airflow, that's the command. Now, the next step is to set the AIRFLOW_HOME variable. So AIRFLOW_HOME, export the path to this variable that is actually required by the Airflow configuration file. Chapter_4, and the airflow directory. This is the home that we have set. Now, the next step is to actually install Apache Airflow. So sudo pip install apache-airflow. Hit Enter. Now, this is going to take a few seconds, close to…

Contents