From the course: Apache Spark Deep Learning Essential Training

Deep learning libraries - Spark DataFrames Tutorial

From the course: Apache Spark Deep Learning Essential Training

Start my 1-month free trial

Deep learning libraries

- [Instructor] Now there are a couple of deep learning libraries available out there for Spark. In this course, we'll only look at Deep Learning Pipelines. This is a package developed by Databricks that integrates deep learning functionality into Spark's ML Pipeline's API. It currently also supports TensorFlow and Keras. It's used for inference, transfer learning, model training, and it integrates with Spark's SQL, and ticks most of the boxes. Let's take a look at some of the other deep learning frameworks available. TensorFrames or TensorFlowOnSpark dataframes lets you manipulate Spark's dataframes with TensorFlow programs. TensorFrames main contribution is making it easy to pass data between Spark dataframes and TensorFlow. It's mainly an inference and transfer learning oriented library and supports both Python and Scala. BigDL is a distributed deep learning framework for Apache Spark, primarily developed by Intel and is modeled after Torch. Now one of the key advantages of BigDL over the other libraries is that it's optimized to use CPUs instead of GPUs. This means it is efficient to run on existing CPU based clusters, like an Apache Hadoop environment. TensorFlowOnSpark was developed by Yahoo for large scale distributed deep learning on Hadoop clusters in Yahoo's private cloud. It enables distributed deep learning on a cluster of GPUs and CPU servers. It enables both distributed TensorFlow training and inferencing on Spark clusters and what's great is that the amount of change that you need to make to code to run existing TensorFlow programs is minimal. Deeplearning4j is an open source distributed deep learning project in Java and Scala that provides both single node and distributed training options. One of it's advantages over Python based deep learning frameworks is that it was designed for the Java Virtual Machine, or JVM. So it's great for those who don't want to have Python as part of their development process. It also has support for CPUs as well as GPUs.

Contents