From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

GroupBy

GroupBy

- [Narrator] The next transform I will explore is the group by transform. Group by transform is used to group data elements based on a specific key. It takes the first value in the info client and uses it as the key. It groups values across all records with the same key and returns them as a single record. This tip only collects the values and use these values as an array it does not by itself do any aggregation This is a pre-step for future processing of data by these values. The function is called Beam.groupbykey In our example we are going to use the ProdTypePrice Pcollection and group data by the product type. This will then be stored in the prodTypeGroups Pcollection The sample data shows how data is transformed by this groupby. Note that each unique key has one row and the values for that key are provided as an array. Let us now look at the code. The code for this available in line number 73 It has a single pipeline that uses the groupby key We then print the contents of the…

Contents