From the course: Data Engineering Foundations
Unlock the full course today
Join today to access over 22,500 courses taught by industry experts or purchase this course individually.
Scheduling ETL pipeline using Airflow
From the course: Data Engineering Foundations
Scheduling ETL pipeline using Airflow
- [Instructor] It's time to put everything together and schedule the jobs that we have defined so far. As discussed, we'll have to use a scheduling tool for this, and the most commonly used scheduling tool is Apache Airflow. We'll be using the code written so far, and then we'll define a directed acyclic graph using Apache Airflow. So first of all, let's set up Airflow on our machine. So, I am first creating a directory called Airflow, where all the configuration file and database would reside. So mkdir airflow, that's the command. Now, the next step is to set the AIRFLOW_HOME variable. So AIRFLOW_HOME, export the path to this variable that is actually required by the Airflow configuration file. Chapter_4, and the airflow directory. This is the home that we have set. Now, the next step is to actually install Apache Airflow. So sudo pip install apache-airflow. Hit Enter. Now, this is going to take a few seconds, close to…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
Sources of data extraction4m 46s
-
Data extraction from a PostgreSQL database4m 51s
-
Challenge: Data extraction40s
-
Solution: Data extraction51s
-
Transforming data2m 3s
-
Challenge: Transforming data42s
-
Solution: Transforming data58s
-
Loading data into a DB4m 11s
-
Challenge: Loading data59s
-
Solution: Loading data1m
-
Scheduling ETL pipeline using Airflow9m 3s
-
-