From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Unlock the full course today
Join today to access over 22,400 courses taught by industry experts or purchase this course individually.
Executing in Dataflow - Google Cloud Tutorial
From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Executing in Dataflow
- [Instructor] Let us now see how to execute the same simple pipeline on Google Cloud Dataflow. In order to execute this, I have a shell script that contains the command to execute the simple pipeline on Dataflow. The script has Python, and the name of the script, but there are some additional parameters you need to pass to the script. The first parameter is the project. The project belongs to the Google Cloud project you're gonna be executing it on. And remember to use the project ID here, not the project name. The next line shows the runner to user, and the runner is going to be DataFlowRunner. Earlier, we we're using the native runner. In order to use DataFlowRunner, you have to also provide two additional parameters. One is the staging location, used by DataFlow, and that will be a Google Cloud storage location of your choice. Similarly, you need a temporary location, also to be used by DataFlowRunner, also on Google Cloud. So this is the command that you have to execute to run…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.