From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Unlock the full course today

Join today to access over 22,500 courses taught by industry experts or purchase this course individually.

Windowing with Dataflow

Windowing with Dataflow - Google Cloud Tutorial

From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Start my 1-month free trial

Windowing with Dataflow

- [Kumaran] In this video, I will explore windowing with Apache Bean and dataflow. Windowing allows processing of real-time data in micro-batches, typically of the size of a few seconds. Data is broken up into windows. Each window then becomes a PCollection. Note that this windowed Pcollection is created at the end of the window, hence it is bounded with respect to the window. Once a bounded windowed Pcollection is available, any kind of transforms and pipelines can be run on this PCollection, similar to how batch processing is done. As you keep retrieving data, this method of windowing and computing helps, and generates, metrics and the cells in these windowed intervals. Apache Bean supports multiple types of windows. Fixed time windows are set for fixed intervals. Windows do not overlap each other. Sliding time windows overlap each other. The slide interval is usually a multiple of the windowed interval. Using sliding time windows, you can compute summaries for the last minute, or…

Contents