From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Data science modules covered - Google Cloud Tutorial

From the course: Data Science on Google Cloud Platform: Building Data Pipelines

Start my 1-month free trial

Data science modules covered

- [Instructor] Data science is essentially a pipeline that contains a number of modules which work on data progressively to deliver insights and actions. The process starts with acquisition of data from various sources. Connectors to these sources understand, acquire, and transform data as they are pushed into the pipeline. Next comes transport. Depending upon the data source and the destination, this could be within a LAN or around the globe. Data transport ensures reliability while delivering data at speed required by the business. Then there is storage. That data acquired from the sources is stored in persistent stores like databases. Processing jobs clean, process, and transform data and place it back into persistent stores. Data in these stores are used for exploratory analytics to extract insights about the business or entities of interest. Data is also used for predictive analytics to predict future actions or behavior. So how does the Google Cloud Platform, or GCP, support this modules? GCP provides end-to-end support for all modules and activities in data science. It can be used as a data infrastructure, a platform, or a service in these pipelines. There are multiple options available for each module. For example, for data storage, GCP supports more than five types of data stores. GCP is fully managed and minimizes administration and monitoring effort for this modules. It also provides horizontal scaling as the volumes grow and processing jobs multiply. This course focuses on building data pipelines. This covers the transport and processing modules of data science. We look at means of moving data between the sources and destinations and also processing and transforming data.

Contents