From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
GroupBy - Google Cloud Tutorial
From the course: Data Science on Google Cloud Platform: Building Data Pipelines
GroupBy
- [Narrator] The next transform I will explore is the group by transform. Group by transform is used to group data elements based on a specific key. It takes the first value in the info client and uses it as the key. It groups values across all records with the same key and returns them as a single record. This tip only collects the values and use these values as an array it does not by itself do any aggregation This is a pre-step for future processing of data by these values. The function is called Beam.groupbykey In our example we are going to use the ProdTypePrice Pcollection and group data by the product type. This will then be stored in the prodTypeGroups Pcollection The sample data shows how data is transformed by this groupby. Note that each unique key has one row and the values for that key are provided as an array. Let us now look at the code. The code for this available in line number 73 It has a single pipeline that uses the groupby key We then print the contents of the…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.