From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Streaming and windowing example

Streaming and windowing example - Google Cloud Tutorial

From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Start my 1-month free trial

Streaming and windowing example

- [Kumaran] In this video, I will demonstrate, with a simple example, how windowing works. We first need a publisher to publish data continuously. The exercise file "0603_publish_to_dataflow.py" contains the required code. Let us explore that code, now. This code runs continuously, sleeps for a varied time from one to three seconds, and publishes a transaction that contains the type of product, whether is is MacBook, Windows PC, or Linux PC, and a random value for the product. Next, we will explore the streaming "dataflow.py" script, which actually subscribes, and then has a pipeline built in Apache Beam that can process this data. First, there is the pipeline "io" in line number 22 for reading from pub/sub. It is a simple read, providing the name of the subscription. Then, this is piped into a windowing function in line 24. It is a fixed size windowing function of five seconds. So, as data keeps flowing in from pub/sub, a PCollection is created for every five seconds. This…

Contents