From the course: Data Science on Google Cloud Platform: Exploratory Data Analytics

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Cloud Dataflow

Cloud Dataflow - Google Cloud Tutorial

From the course: Data Science on Google Cloud Platform: Exploratory Data Analytics

Start my 1-month free trial

Cloud Dataflow

- [Instructor] Cloud Dataflow is essentially a data processing and transformation tool, but it can also be used for exploratory data analytics at a cloud scale. You can use it to cleanse, transform, and aggregate data. You can also use it to extract meaningful information. It supports both batch and realtime data processing. It is built on the Apache Beam programming model, hence the code built on data flow is easily portable to other platforms like Apache Spark and Flink. It can be used to build extract, transform, load pipelines. You can build in the pipeline data exploration capabilities. It has excellent integrations with other GCP products. Finally, it supports multiple runners, including Apache Spark and Flink. This allows for experimentation and choice for the developer. What are the strengths of Cloud Dataflow? It is built on Apache Beam. This provides for abstraction and portability. You have other runner alternatives to Cloud Dataflow, like Apache Spark. You can perform…

Contents