From the course: Introduction to Spark SQL and DataFrames

Unlock the full course today

Join today to access over 22,400 courses taught by industry experts or purchase this course individually.

SQL for DataFrames

SQL for DataFrames

From the course: Introduction to Spark SQL and DataFrames

Start my 1-month free trial

SQL for DataFrames

- [Instructor] There are a couple of different ways of working with DataFrames. One way is to use the DataFrame API. And basically that is structured around using methods on DataFrame objects. For example, let's assume we have a DataFrame called df for short. There is a method on the DataFrame object called select. And I can select a column by putting the column name in double quotes, and passing that to the select method. So in this example, I have a DataFrame called df, I'm calling the select method, and I'm asking for the CPU utilization column. Now I also want to display the results, so I use the .show method. Now, like in SQL, you can do things like group by a particular column. So in this example, I have a DataFrame called df, I'm applying the group by method, and I'm telling PySpark that I want to group by the server ID. And after I do the grouping, I'd like to do a count, so I'd like to count the number of rows in each group. And then I'd like to show the results. So these are…

Contents