From the course: Azure Spark Databricks Essential Training

Unlock the full course today

Join today to access over 22,600 courses taught by industry experts or purchase this course individually.

Optimize a cluster and job

Optimize a cluster and job

From the course: Azure Spark Databricks Essential Training

Start my 1-month free trial

Optimize a cluster and job

- [Instructor] Our upsize cluster is now available. It has four notebooks attached for testing, and one library, a variant spark library, and see we can go from four to 12 workers of the size that will auto-scale, and now it terminates after 360 minutes of activity. The notebook that we're working with next is available from this location, and I've already uploaded it and attached it and started running it. We're going to go into that notebook, and take a look at some performance overhead. Roughly we have three sections, this is the load of the files, this is run of the machine learning algorithm, and this is the visualization. You can see this is still in process. The difference here is we did two optimizations, we made the cluster potentially bigger with the number of nodes, and we changed the data by using bz2 or zipped input file. If we click on clusters, we can see that it's running. If this were to go into an auto-scaling mode, because of the computational overhead, then this…

Contents