Requirement: To change column names to upper case or lower case in PySpark
Create a dummy dataframe
#create a dataframe with sample values
columns = ["Emp_id","Emp_name","Emp_dept"]
data = [("1", "Falcon","Admin"), ("2", "Winter Soldier","HR"), ("3","Wanda", "Technology"),("4","Vision","Data Analytics")]
rdd = spark.sparkContext.parallelize(data)
df_employee = rdd.toDF(columns)
df_employee.printSchema()
root
|-- Emp_id: string (nullable = true)
|-- Emp_name: string (nullable = true)
|-- Emp_dept: string (nullable = true)Convert column names to uppercase in PySpark
You can use “withColumnRenamed” function in FOR loop to change all the columns in PySpark dataframe to uppercase by using “upper” function.
#convert all column name to uppercase
for col in df_employee.columns:
df_employee = df_employee.withColumnRenamed(col, col.upper())
#print column names
df_employee.printSchema()
root
|-- EMP_ID: string (nullable = true)
|-- EMP_NAME: string (nullable = true)
|-- EMP_DEPT: string (nullable = true)Convert column names to lowercase in PySpark
You can use “withColumnRenamed” function in FOR loop to change all the columns in PySpark dataframe to lowercase by using “lower” function.
#convert all column name to lowercase
for col in df_employee.columns:
df_employee = df_employee.withColumnRenamed(col, col.lower())
#print column names
df_employee.printSchema()
root
|-- emp_id: string (nullable = true)
|-- emp_name: string (nullable = true)
|-- emp_dept: string (nullable = true)You can also use “swapcase” or “capitalize” function in place of upper or lower as per requirement.
