From the course: Data Engineering Foundations

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Airflow

Airflow

From the course: Data Engineering Foundations

Start my 1-month free trial

Airflow

- [Instructor] Before we learn how workflow scheduling frameworks work, let's first to understand what a workflow is. Let's take an example. You can write a Spark job that pulls data from a CSV file, filters out some corrupt records, and loads the data into a SQL database ready for analysis. However, let's say you need to do this job every day. One option is to run the job every day manually, but of course that's not scalable. What about the weekends? Now for this, there are simple tools that could solve this problem, like cron, which is a Linux tool. However, let's say you have one job for the CSV file and another job that will clean the data from an API, and a third job that joins the data from the CSV and the API together. And the third job basically just depends on the first two jobs to finish first. It quickly becomes apparent that we need a more holistic approach. And a simple tool like cron won't suffice. It's…

Contents