In this course, Harshit Tyagi explains the fundamentals of data engineering. He covers key topics like data wrangling, database schema, and developing ETL pipelines. He also details several data engineering tools like Hive, Hadoop, Spark, and Airflow. By the end of this course, it should be abundantly clear why the data engineer is one of the most valuable people in a data-driven organization.
Skill Level Intermediate
- [Harshit] There's so many appealing buzzwords in tech these days. Knowing your way around machine learning, artificial intelligence, neural networks, et cetera, and while writing algorithms to make predictions is a hot new skill, ot is not the only one you need to get started in this business. In the modern big data system, the person who writes algorithms is not the one who cleans the data. Now laying out data for scientists and researchers has become a complex problem in itself. This is where data engineers step in and find their role in this ecosystem. Hi, I am Harshit Tyagi and in this course we are going to learn the foundations of data engineering. We'll start by understanding the meaning of data engineering, how it is different from data science, and what tools you should master in order to develop data pipelines. Then we'll study the foundations of a big data system, like databases and distributed computing. Further on, we'll learn about the tools that would come in handy to address each type of problem while developing an ETL data pipeline. In the last chapter, we'll apply the learned concepts and tools to develop and schedule our own ETL pipeline. So what are you waiting for? Let's get started.