From the course: Data Engineering Foundations
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Airflow
- [Instructor] Before we learn how workflow scheduling frameworks work, let's first to understand what a workflow is. Let's take an example. You can write a Spark job that pulls data from a CSV file, filters out some corrupt records, and loads the data into a SQL database ready for analysis. However, let's say you need to do this job every day. One option is to run the job every day manually, but of course that's not scalable. What about the weekends? Now for this, there are simple tools that could solve this problem, like cron, which is a Linux tool. However, let's say you have one job for the CSV file and another job that will clean the data from an API, and a third job that joins the data from the CSV and the API together. And the third job basically just depends on the first two jobs to finish first. It quickly becomes apparent that we need a more holistic approach. And a simple tool like cron won't suffice. It's…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.