From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Unlock the full course today
Join today to access over 22,400 courses taught by industry experts or purchase this course individually.
Streaming with Dataflow - Google Cloud Tutorial
From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Streaming with Dataflow
- [Narrator] Streaming is a data processing technique, to process incoming data in real time and produce results. (mumbles) That produces streams and these streams can be processed in real-time using Apache Beam and Dataflow. In streaming you want to process data and produce insights as they arrive. We do not wait for receiving the entire data set before starting to process. Streaming is used for generating real-time metrics and insights. It is used for real-time predictions. Streaming usually works with data being published from real-time queues like Kafka and Pulsar. In Apache Beam when data is processed in batch as we have seen in earlier videos they produced bounded PCollections. A bounded PCollection is one who's entire size is known ahead of time. The PCollection is fully loaded before we start doing any transforms on it. As stream produces data continuously and it results in unbounded Pcollections who's size is not known ahead of time. We do not wait for the PCollection to…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.