From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Cloud Dataproc - Google Cloud Tutorial

From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Start my 1-month free trial

Cloud Dataproc

- [Instructor] Cloud Dataproc is a managed Hadoop and Apache service running on GCP. This means it comes with HDFS, MapReduce, and Spark programming capabilities. Cloud Dataproc is managed. It provides automatic cluster setup, scale-up, and scale-down, and monitoring. There is minimal administrative work required to run Cloud Dataproc. It has built-in integrations with other GCP data processing products and data stores which makes building pipelines easy. It has a pay-as-you-go model so you only get billed when you actually execute code. The key advantage of Cloud Dataproc is that you can pull code that you have have returned and produced on Spark for an enterprise or AWS deployment to GCP without modifications. So you can easily move that code to different production environments.

Contents