From the course: Apache Spark Deep Learning Essential Training

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

The origins of Spark and Databricks

The origins of Spark and Databricks - Spark DataFrames Tutorial

From the course: Apache Spark Deep Learning Essential Training

Start my 1-month free trial

The origins of Spark and Databricks

- [Instructor] Spark started in 2009 as a research project in the UC Berkeley RAD Lab. The researchers in the lab had been previously working on Hadoop MapReduce and observed that MapReduce was inefficient for iterative and interactive computing jobs. So, right from the beginning Spark was designed to be fast for interactive queries and iterative algorithms. It brought in ideas like support for in-memory storage and efficient fault recovery. Research papers were published about Spark at academic conferences and soon after its creation it was already 10 to 20 times faster than MapReduce for certain jobs. In Matei's, 2009 paper they say that while Spark is still currently a working prototype the performance results they were getting were very encouraging. Even at that time Spark could outperform machine learning workloads by a factor of 10 and you can see this on page five of their paper. As part of their experiments into Sparks performance, they performed a logistic regression job…

Contents