PySpark – collect()

pyspark-mytechmint

PySpark RDD/DataFrame collect() is an action operation that is used to retrieve all the elements of the dataset (from all nodes) to the driver node. We should …

Read More ➜

PySpark – show()

pyspark-mytechmint

PySpark DataFrame show() is used to display the contents of the DataFrame in a Table Row and Column Format. By default, it shows only 20 …

Read More ➜

PySpark – Create DataFrame

pyspark-mytechmint

You can manually create a PySpark DataFrame using toDF() and createDataFrame() methods, both these function takes different signatures in order to create DataFrame from existing RDD, list, and DataFrame. You …

Read More ➜