From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Combine - Google Cloud Tutorial
From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Combine
- [Narrator] In this video we will explore the Combine Transform in Apache Beam. Apache Beam's Combine Transform can be used to combine data for a specific key and summarize data in a single step. It can also be used to combine data globally too. It is a combination of GroupBy and Map if required. Specifically when the summarization is a general function. The combine function can also be implemented for custom aggregation logic. In the example code, we will now count the total number of transactions, by type of customer. We will also execute multiple transforms in a single pipeline. This example will use transactions P collection we used to use before, and create a new P collection called cust, type, count. The data gets both grouped and summarized to provide counts by customer type. Let us look at the code now. The code for this is in line 99, in the operations data flow, dot pipeline five. In a single pipeline here, we do two things. First we use the extract customer type class to…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.