From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Pipeline I/O - Google Cloud Tutorial
From the course: Data Science on Google Cloud Platform: Building Data Pipelines
Pipeline I/O
- [Narrator] A data pipeline needs to interface with external data sources and sinks to read and write data. Pipeline I/O in Apache Beam provides libraries, interfaces and capabilities to interact with various standard external data sources and sinks. External data sources can be read directly into PCollections using Pipeline I/O. Similarly, a PCollection can be sent to an internal sink using Pipeline I/O. Pipeline I/O saves the user from writing source or sink specific code. It takes care of all the interfacing work for Apache Beam. Apache Beam provides Pipeline I/O support for an extensive set of popular databases and message queues. This list continues to grow rapidly. It supports FileIO, TextIO, and XmlIO for working with flat files. It supports interfaces to message queues like Amazon Kinesis, Apache Kafka, and Google Pub/Sub, and it supports popular databases like Cassandra, HBase, Elasticsearch, BigQuery, MongoDB, and standard JDBC.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.