PySpark

Rename Column Name case in Dataframe

Requirement: To change column names to upper case or lower case in PySpark

Create a dummy dataframe

#create a dataframe with sample values
columns = ["Emp_id","Emp_name","Emp_dept"]
data = [("1", "Falcon","Admin"), ("2", "Winter Soldier","HR"), ("3","Wanda", "Technology"),("4","Vision","Data Analytics")]
rdd = spark.sparkContext.parallelize(data)
df_employee = rdd.toDF(columns)
df_employee.printSchema()

root
 |-- Emp_id: string (nullable = true)
 |-- Emp_name: string (nullable = true)
 |-- Emp_dept: string (nullable = true)

Convert column names to uppercase in PySpark

You can use “withColumnRenamed” function in FOR loop to change all the columns in PySpark dataframe to uppercase by using “upper” function.

#convert all column name to uppercase
for col in df_employee.columns:
    df_employee = df_employee.withColumnRenamed(col, col.upper())

#print column names
df_employee.printSchema()

root
 |-- EMP_ID: string (nullable = true)
 |-- EMP_NAME: string (nullable = true)
 |-- EMP_DEPT: string (nullable = true)

Convert column names to lowercase in PySpark

You can use “withColumnRenamed” function in FOR loop to change all the columns in PySpark dataframe to lowercase by using “lower” function.

#convert all column name to lowercase
for col in df_employee.columns:
    df_employee = df_employee.withColumnRenamed(col, col.lower())

#print column names
df_employee.printSchema()

root
 |-- emp_id: string (nullable = true)
 |-- emp_name: string (nullable = true)
 |-- emp_dept: string (nullable = true)

You can also use “swapcase” or “capitalize” function in place of upper or lower as per requirement.

Leave a Reply