Requirement: To change column names to upper case or lower case in PySpark
Create a dummy dataframe
#create a dataframe with sample values columns = ["Emp_id","Emp_name","Emp_dept"] data = [("1", "Falcon","Admin"), ("2", "Winter Soldier","HR"), ("3","Wanda", "Technology"),("4","Vision","Data Analytics")] rdd = spark.sparkContext.parallelize(data) df_employee = rdd.toDF(columns) df_employee.printSchema() root |-- Emp_id: string (nullable = true) |-- Emp_name: string (nullable = true) |-- Emp_dept: string (nullable = true)
Convert column names to uppercase in PySpark
You can use “withColumnRenamed” function in FOR loop to change all the columns in PySpark dataframe to uppercase by using “upper” function.
#convert all column name to uppercase for col in df_employee.columns: df_employee = df_employee.withColumnRenamed(col, col.upper()) #print column names df_employee.printSchema() root |-- EMP_ID: string (nullable = true) |-- EMP_NAME: string (nullable = true) |-- EMP_DEPT: string (nullable = true)
Convert column names to lowercase in PySpark
You can use “withColumnRenamed” function in FOR loop to change all the columns in PySpark dataframe to lowercase by using “lower” function.
#convert all column name to lowercase for col in df_employee.columns: df_employee = df_employee.withColumnRenamed(col, col.lower()) #print column names df_employee.printSchema() root |-- emp_id: string (nullable = true) |-- emp_name: string (nullable = true) |-- emp_dept: string (nullable = true)
You can also use “swapcase” or “capitalize” function in place of upper or lower as per requirement.