From the course: Data Science on Google Cloud Platform: Building Data Pipelines
GCP data pipeline options - Google Cloud Tutorial
From the course: Data Science on Google Cloud Platform: Building Data Pipelines
GCP data pipeline options
- [Instructor] Let us explore the various options that are available on GCP to build data pipelines. GCP provides an array of technologies for building data pipelines. We start off with Cloud Dataproc, which is a managed version of Hadoop and Apache Spark. This option provides portability from and to other platforms, like AWS and enterprise-specific implementations. Then there is Cloud Dataflow, which is a Apache Beam-based pipeline design and execution product that utilizes the manageability and scalability of GCP to provide big data processing. There is Cloud Pub/Sub, which is a managed message queue, similar to Kafka and Amazon Kinesis. Finally, there is Cloud Dataprep, which offers the ability to process data for analytics. All these technologies provide native integrations to various sources and sinks available on GCP, including Cloud Storage, BigTable, and BigQuery. These pipelines can be wired into downstream analytics and machine learning tasks to create an end-to-end data flow. They all come with standard GCP management and monitoring support. This makes it simple and easy to configure, scale, manage, and troubleshoot deployments.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.