From the course: Data Science on Google Cloud Platform: Building Data Pipelines

GCP data pipeline options - Google Cloud Tutorial

From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Start my 1-month free trial

GCP data pipeline options

- [Instructor] Let us explore the various options that are available on GCP to build data pipelines. GCP provides an array of technologies for building data pipelines. We start off with Cloud Dataproc, which is a managed version of Hadoop and Apache Spark. This option provides portability from and to other platforms, like AWS and enterprise-specific implementations. Then there is Cloud Dataflow, which is a Apache Beam-based pipeline design and execution product that utilizes the manageability and scalability of GCP to provide big data processing. There is Cloud Pub/Sub, which is a managed message queue, similar to Kafka and Amazon Kinesis. Finally, there is Cloud Dataprep, which offers the ability to process data for analytics. All these technologies provide native integrations to various sources and sinks available on GCP, including Cloud Storage, BigTable, and BigQuery. These pipelines can be wired into downstream analytics and machine learning tasks to create an end-to-end data flow. They all come with standard GCP management and monitoring support. This makes it simple and easy to configure, scale, manage, and troubleshoot deployments.

Contents