From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Transforms

Transforms

- [Instructor] An apache beam transform represents a processing step in the beam pipeline. Transform typically take inputs from one or more PCollections, transform them, and then send the output to one or more PCollections down the pipeline. In the example pipeline, you'll see that transform one uses PCollection one and rides it output to PCollection two. Transforms are used for cleansing data record by record. Given that these are record level operations, they can be executed in parallel across multiple processing nodes based on the execution engine being used. It is used to transform data. This includes computations, grouping data based on keys, and joining data. It is also used to perform aggregations of data. Standard aggregation functions like sum, mean, max are supported, you can also write custom application logic for the transform. Transforms are defined using the language SDK. The logic can be as simple or as complex based on the requirements of the application. Using a…

Contents