From the course: Data Engineering Foundations
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Transforming data
- [Instructor] So after the successful extraction of the data from the two tables, we are now going to add a simple transformation to this extracted data. Now we know analytical databases are optimized for creating aggregated data. So here we are going to merge the two data frames so that we can have the average rating corresponding to the movie itself. For doing that, We would first need to group all the ratings based on the movie ID. So we are using the group by function over here pause on the movie ID column. And then we have to calculate the average rating. So we're using the mean function over here, plus the rating column to this main function and then the next step is to join the movie's data frame with this average rating data frame that we have just created. So movies_ DF dot join. So join is the function that we are using plus the data frame, which is average rating. And then, the common column on which…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
(Locked)
Sources of data extraction4m 46s
-
(Locked)
Data extraction from a PostgreSQL database4m 51s
-
(Locked)
Challenge: Data extraction40s
-
(Locked)
Solution: Data extraction51s
-
(Locked)
Transforming data2m 3s
-
(Locked)
Challenge: Transforming data42s
-
(Locked)
Solution: Transforming data58s
-
(Locked)
Loading data into a DB4m 11s
-
(Locked)
Challenge: Loading data59s
-
(Locked)
Solution: Loading data1m
-
(Locked)
Scheduling ETL pipeline using Airflow9m 3s
-
(Locked)
-