From the course: Azure Spark Databricks Essential Training
Unlock the full course today
Join today to access over 22,600 courses taught by industry experts or purchase this course individually.
Optimize a cluster and job
From the course: Azure Spark Databricks Essential Training
Optimize a cluster and job
- [Instructor] Our upsize cluster is now available. It has four notebooks attached for testing, and one library, a variant spark library, and see we can go from four to 12 workers of the size that will auto-scale, and now it terminates after 360 minutes of activity. The notebook that we're working with next is available from this location, and I've already uploaded it and attached it and started running it. We're going to go into that notebook, and take a look at some performance overhead. Roughly we have three sections, this is the load of the files, this is run of the machine learning algorithm, and this is the visualization. You can see this is still in process. The difference here is we did two optimizations, we made the cluster potentially bigger with the number of nodes, and we changed the data by using bz2 or zipped input file. If we click on clusters, we can see that it's running. If this were to go into an auto-scaling mode, because of the computational overhead, then this…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.