From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Unlock the full course today

Join today to access over 22,400 courses taught by industry experts or purchase this course individually.

Executing in Dataflow

Executing in Dataflow - Google Cloud Tutorial

From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Start my 1-month free trial

Executing in Dataflow

- [Instructor] Let us now see how to execute the same simple pipeline on Google Cloud Dataflow. In order to execute this, I have a shell script that contains the command to execute the simple pipeline on Dataflow. The script has Python, and the name of the script, but there are some additional parameters you need to pass to the script. The first parameter is the project. The project belongs to the Google Cloud project you're gonna be executing it on. And remember to use the project ID here, not the project name. The next line shows the runner to user, and the runner is going to be DataFlowRunner. Earlier, we we're using the native runner. In order to use DataFlowRunner, you have to also provide two additional parameters. One is the staging location, used by DataFlow, and that will be a Google Cloud storage location of your choice. Similarly, you need a temporary location, also to be used by DataFlowRunner, also on Google Cloud. So this is the command that you have to execute to run…

Contents