From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Streaming and windowing example - Google Cloud Tutorial
From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Streaming and windowing example
- [Kumaran] In this video, I will demonstrate, with a simple example, how windowing works. We first need a publisher to publish data continuously. The exercise file "0603_publish_to_dataflow.py" contains the required code. Let us explore that code, now. This code runs continuously, sleeps for a varied time from one to three seconds, and publishes a transaction that contains the type of product, whether is is MacBook, Windows PC, or Linux PC, and a random value for the product. Next, we will explore the streaming "dataflow.py" script, which actually subscribes, and then has a pipeline built in Apache Beam that can process this data. First, there is the pipeline "io" in line number 22 for reading from pub/sub. It is a simple read, providing the name of the subscription. Then, this is piped into a windowing function in line 24. It is a fixed size windowing function of five seconds. So, as data keeps flowing in from pub/sub, a PCollection is created for every five seconds. This…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.