Apache Spark

Spark Dataframe Column list

Recently I was working on a task where I wanted Spark Dataframe Column List in a variable. This was required to do further processing depending on some technical columns present in the list. So we know that you can print Schema of Dataframe using printSchema method. It will show tree hierarchy of columns along with data type and other info. Example:

scala> df_pres.printSchema()
 |-- pres_id: byte (nullable = true)
 |-- pres_name: string (nullable = true)
 |-- pres_dob: date (nullable = true)
 |-- pres_bp: string (nullable = true)
 |-- pres_bs: string (nullable = true)
 |-- pres_in: date (nullable = true)
 |-- pres_out: date (nullable = true)

To Fetch column details, we can use “columns” to return all the column names in the dataframe. This return array of Strings.

Dataframe Columns

scala> df_pres.columns
res8: Array[String] = Array(pres_id, pres_name, pres_dob, pres_bp, pres_bs, pres_in, pres_out)

The requirement was to get this info into a variable. So we can convert Array of String to String using “mkString” method. This will result in “String” return type.

scala> df_pres.columns.mkString(",")
res11: String = pres_id,pres_name,pres_dob,pres_bp,pres_bs,pres_in,pres_out

I wanted the column list to be comma separated. Let’s store this output into a variable to be used later for processing.

scala> var ColList = df_pres.columns.mkString(",")
ColList: String = pres_id,pres_name,pres_dob,pres_bp,pres_bs,pres_in,pres_out

To check value of this variable we can print and check it.

scala> print (ColList)

We can also specify the separator to be used inside mkString method. You can change the delimiter too. Below we set it to “|” delimiter in place of “,”

scala> var ColList = df_pres.columns.mkString("|")
ColList: String = pres_id|pres_name|pres_dob|pres_bp|pres_bs|pres_in|pres_out

scala> print (ColList)

This way we can fetch all the columns present in a Dataframe and store in a variable with desired delimiter.

Leave a Reply