From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Other capabilities - Google Cloud Tutorial
From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Other capabilities
- [Instructor] Apache Beam contains a number of other advanced capabilities to process data too. They all have similar syntax and easy usage. First, we have FlatMap. Using a dual function, FlatMap can be used to eliminate individual records or break up a record into multiple ones. A flatten transform can be used to merge multiple PCollections into one. The merge is row by row, so if you have a PCollection one with five records, and PCollection two with three records, they will be merged into a PCollection of eight records. A partition transform can be used to break up a PCollection into multiple PCollections based on some partitioning logic. The logic is implemented in a custom function that will return a partition key for the record. The break up is, again, row by row. If the pattern PCollection has eight rows, the child partitions can have three and five rows. Then there is side inputs. For any dual function that is called inside a transform, a side PCollection can be supplied for…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.