From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Other capabilities

Other capabilities - Google Cloud Tutorial

From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Start my 1-month free trial

Other capabilities

- [Instructor] Apache Beam contains a number of other advanced capabilities to process data too. They all have similar syntax and easy usage. First, we have FlatMap. Using a dual function, FlatMap can be used to eliminate individual records or break up a record into multiple ones. A flatten transform can be used to merge multiple PCollections into one. The merge is row by row, so if you have a PCollection one with five records, and PCollection two with three records, they will be merged into a PCollection of eight records. A partition transform can be used to break up a PCollection into multiple PCollections based on some partitioning logic. The logic is implemented in a custom function that will return a partition key for the record. The break up is, again, row by row. If the pattern PCollection has eight rows, the child partitions can have three and five rows. Then there is side inputs. For any dual function that is called inside a transform, a side PCollection can be supplied for…

Contents