From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Transforms - Google Cloud Tutorial
From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Transforms
- [Instructor] An apache beam transform represents a processing step in the beam pipeline. Transform typically take inputs from one or more PCollections, transform them, and then send the output to one or more PCollections down the pipeline. In the example pipeline, you'll see that transform one uses PCollection one and rides it output to PCollection two. Transforms are used for cleansing data record by record. Given that these are record level operations, they can be executed in parallel across multiple processing nodes based on the execution engine being used. It is used to transform data. This includes computations, grouping data based on keys, and joining data. It is also used to perform aggregations of data. Standard aggregation functions like sum, mean, max are supported, you can also write custom application logic for the transform. Transforms are defined using the language SDK. The logic can be as simple or as complex based on the requirements of the application. Using a…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.