Introduction to PySpark OrderBy
PySpark orderBy is a spark sorting function used to sort the data frame / RDD in a PySpark Framework. It is used to sort one more column in a PySpark Data Frame. The Desc method is used to order the elements in descending order. By default the sorting technique used is in Ascending order, so by the use of Descending method, we can sort the element in Descending order in a PySpark Data Frame and we can understand the how to use orderBy function.
The orderBy clause is used to return the row in a sorted Manner. It guarantees the total order of the output. The order by function can be used with one column as well as more than one column can be used in OrderBy. It takes two parameter Asc for ascending and Desc for Descending order. By Descending order we mean that column will highest value will come at first followed by the one with 2nd Highest to lowest.
PySpark OrderBy Syntax
The syntax for PySpark orderBy function is:
from pyspark.sql.functions import desc b.orderBy(desc("col_Name")).show()
- Desc: The Descending Function to be Imported.
- OrderBy: The Order By Function in PySpark.
- B: The Data Frame where the operation needs to be done.
Working of OrderBy in PySpark
The orderBy is a sorting clause that is used to sort the rows in a data Frame. Sorting may be termed as arranging the elements in a particular manner that is defined. The order can be ascending or descending order the one to be given by the user as per demand. The Default sorting technique used by order by is ASC. We can import the PySpark function and used the DESC method to sort the data frame in Descending order.
We can sort the elements by passing the columns within the Data Frame, the sorting can be done with one column to multiple column. It takes the column name as the parameter, this column name is used for sorting the elements. The order by Function creates a Sort logical operator with global flag, this is used for sorting data frames in a PySpark application.
The is how the use of orderBy in PySpark.
PySpark OrderBy Examples
Let us see some Example how PySpark orderBy function works
Let’s start by creating a PySpark Data Frame. A data frame of students with the concerned Dept. and overall semester marks is taken for consideration and data frame is made upon that.
data1 = (("Bob", "IT", 4500), \ ("Maria", "IT", 4600), \ ("James", "IT", 3850), \ ("Maria", "HR", 4500), \ ("James", "IT", 4500), \ ("Sam", "HR", 3300), \ ("Jen", "HR", 3900), \ ("Jeff", "Marketing", 4500), \ ("Anand", "Marketing", 2000),\ ("Shaid", "IT", 3850) \ ) col= ["Name", "MBA_Stream", "SEM_MARKS"] b = spark.createDataFrame(data1,col) The create data to create the Data Frame from the column and Data b.printSchema() b.show()
Let’s try doing the Order By operation using the descending function.
We will import the SQL Function Desc to use orderBy in Descending order.
from pyspark.sql.functions import desc b.orderBy(desc("Name")).show()
This will orderBy Name in descending order.
The same can be done with others columns also in the data Frame. We can order by the same using MBA_Stream Column and SEM_MARKS Column.
b.orderBy(desc("MBA_Stream")).show() This sorts according to MBA_Stream.
This will sort according to SEM_MARKS Column in the Data Frame.
The same order can be used with Multiple Conditions also over the column.
It will sort according to the columns provided.
The orderBy function can also be used with Spark SQL function by creating a temporary Table of the Data Frame. The Temp Table can be used by then with the Spark. SQL function where we can use the Order By Function. The DESC function is used to order it in Descending Order in the PySpark SQL Data Frame.
From above example, we saw the use of orderBy function with PySpark