Spark Dataframe Column list

Recently I was working on a task where I wanted Spark Dataframe Column List in a variable. This was required to do further processing depending on some technical columns present in the list. So we know that you can print Schema of Dataframe using printSchema method. It will show tree hierarchy of columns along with data type and other info. Example:

scala> df_pres.printSchema()
 |-- pres_id: byte (nullable = true)
 |-- pres_name: string (nullable = true)
 |-- pres_dob: date (nullable = true)
 |-- pres_bp: string (nullable = true)
 |-- pres_bs: string (nullable = true)
 |-- pres_in: date (nullable = true)
 |-- pres_out: date (nullable = true)

To Fetch column details, we can use “columns” to return all the column names in the dataframe. This return array of Strings. Example:

scala> df_pres.columns
res8: Array[String] = Array(pres_id, pres_name, pres_dob, pres_bp, pres_bs, pres_in, pres_out)

The requirement was to get this info into a variable. So we can convert Array of String to String using “mkString” method. This will result in “String” return type. We can also specify the separator to be used inside mkString method. I wanted the column list to be comma separated. Example:

scala> df_pres.columns.mkString(",")
res11: String = pres_id,pres_name,pres_dob,pres_bp,pres_bs,pres_in,pres_out

Let’s store this output into a variable to be used later for processing.

scala> var ColList = df_pres.columns.mkString(",")
ColList: String = pres_id,pres_name,pres_dob,pres_bp,pres_bs,pres_in,pres_out

To check value of this variable we can print and check it.

scala> print (ColList)

You can change the delimiter too. Below we set it to “|” delimiter in place of “,”

scala> var ColList = df_pres.columns.mkString("|")
ColList: String = pres_id|pres_name|pres_dob|pres_bp|pres_bs|pres_in|pres_out

scala> print (ColList)

This way we can fetch all the columns present in a Dataframe and store in a variable with desired delimiter.

Leave a Reply

Your email address will not be published. Required fields are marked *