From the course: Data Science on Google Cloud Platform: Exploratory Data Analytics
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Cloud Dataflow - Google Cloud Tutorial
From the course: Data Science on Google Cloud Platform: Exploratory Data Analytics
Cloud Dataflow
- [Instructor] Cloud Dataflow is essentially a data processing and transformation tool, but it can also be used for exploratory data analytics at a cloud scale. You can use it to cleanse, transform, and aggregate data. You can also use it to extract meaningful information. It supports both batch and realtime data processing. It is built on the Apache Beam programming model, hence the code built on data flow is easily portable to other platforms like Apache Spark and Flink. It can be used to build extract, transform, load pipelines. You can build in the pipeline data exploration capabilities. It has excellent integrations with other GCP products. Finally, it supports multiple runners, including Apache Spark and Flink. This allows for experimentation and choice for the developer. What are the strengths of Cloud Dataflow? It is built on Apache Beam. This provides for abstraction and portability. You have other runner alternatives to Cloud Dataflow, like Apache Spark. You can perform…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.