From the course: Azure Spark Databricks Essential Training

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Business scenarios for Spark

Business scenarios for Spark

From the course: Azure Spark Databricks Essential Training

Start my 1-month free trial

Business scenarios for Spark

- [Instructor] So as a working cloud architect, what types of business scenarios have I found that are a best fit for Apache Spark technologies? In a nutshell, those are around distributed compute, and really what's driving it is the volume of data. For example, I've been doing quite a lot of work recently in genomic sequencing and analysis of genomic information. The kinds of tasks that I've used Spark for in these types of workflows included data cleansing, or Extract, Transform, and Load; fast data serving pipelines; scalable complex processing; and distributed machine learning. You can think of Azure Databricks as a set of three components. You have the Databricks tools, services, and optimizations that surround the core open source Apache Spark distribution, and Apache Spark itself provides the distributed computation needed for these intensive workloads, and this sits on top of some sort of file system. Now natively in Databricks, you have the DBFS, or the Databricks file…

Contents