From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Unlock the full course today

Join today to access over 22,400 courses taught by industry experts or purchase this course individually.

Streaming with Dataflow

Streaming with Dataflow - Google Cloud Tutorial

From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Start my 1-month free trial

Streaming with Dataflow

- [Narrator] Streaming is a data processing technique, to process incoming data in real time and produce results. (mumbles) That produces streams and these streams can be processed in real-time using Apache Beam and Dataflow. In streaming you want to process data and produce insights as they arrive. We do not wait for receiving the entire data set before starting to process. Streaming is used for generating real-time metrics and insights. It is used for real-time predictions. Streaming usually works with data being published from real-time queues like Kafka and Pulsar. In Apache Beam when data is processed in batch as we have seen in earlier videos they produced bounded PCollections. A bounded PCollection is one who's entire size is known ahead of time. The PCollection is fully loaded before we start doing any transforms on it. As stream produces data continuously and it results in unbounded Pcollections who's size is not known ahead of time. We do not wait for the PCollection to…

Contents