From the course: Apache Spark Deep Learning Essential Training
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
The origins of Spark and Databricks - Spark DataFrames Tutorial
From the course: Apache Spark Deep Learning Essential Training
The origins of Spark and Databricks
- [Instructor] Spark started in 2009 as a research project in the UC Berkeley RAD Lab. The researchers in the lab had been previously working on Hadoop MapReduce and observed that MapReduce was inefficient for iterative and interactive computing jobs. So, right from the beginning Spark was designed to be fast for interactive queries and iterative algorithms. It brought in ideas like support for in-memory storage and efficient fault recovery. Research papers were published about Spark at academic conferences and soon after its creation it was already 10 to 20 times faster than MapReduce for certain jobs. In Matei's, 2009 paper they say that while Spark is still currently a working prototype the performance results they were getting were very encouraging. Even at that time Spark could outperform machine learning workloads by a factor of 10 and you can see this on page five of their paper. As part of their experiments into Sparks performance, they performed a logistic regression job…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.