From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Pipeline I/O

Pipeline I/O - Google Cloud Tutorial

From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Start my 1-month free trial

Pipeline I/O

- [Narrator] A data pipeline needs to interface with external data sources and sinks to read and write data. Pipeline I/O in Apache Beam provides libraries, interfaces and capabilities to interact with various standard external data sources and sinks. External data sources can be read directly into PCollections using Pipeline I/O. Similarly, a PCollection can be sent to an internal sink using Pipeline I/O. Pipeline I/O saves the user from writing source or sink specific code. It takes care of all the interfacing work for Apache Beam. Apache Beam provides Pipeline I/O support for an extensive set of popular databases and message queues. This list continues to grow rapidly. It supports FileIO, TextIO, and XmlIO for working with flat files. It supports interfaces to message queues like Amazon Kinesis, Apache Kafka, and Google Pub/Sub, and it supports popular databases like Cassandra, HBase, Elasticsearch, BigQuery, MongoDB, and standard JDBC.

Contents