SPARK Dataframe Alias AS

ALIAS is defined in order to make columns or tables name more readable or even shorter. If you wish to rename your columns while displaying it to the user or if you are using tables in joins then you may need to have alias for table names. Other than making column names or table names more readable, alias also helps in making developer life better by writing smaller table names in join conditions. You may have to give alias name to DERIVED table as well in SQL. Now let’s see how to give alias names to columns or tables in Spark SQL. We will use alias() function with column names and table names. If you can recall the “SELECT” query from our previous post , we will add alias to the same query and see the output.
Original Query:

scala> df_pres.select($"pres_id",$"pres_dob",$"pres_bs").show()

Query with Alias

Spark alias for dataframe and alias for column name example
scala> df_pres.alias("President table").select($"pres_id",$"pres_dob".alias("Date Of Birth"),$"pres_bs").show()
+-------+-------------+--------------------+
|pres_id|Date Of Birth|             pres_bs|
+-------+-------------+--------------------+
|      1|   1732-02-22|            Virginia|
|      2|   1735-10-30|       Massachusetts|
|      3|   1743-04-13|            Virginia|
|      4|   1751-03-16|            Virginia|
|      5|   1758-04-28|            Virginia|
|      6|   1767-07-11|       Massachusetts|
|      7|   1767-03-15|South/North Carolina|
|      8|   1782-12-05|            New York|
|      9|   1773-02-09|            Virginia|
|     10|   1790-03-29|            Virginia|
|     11|   1795-11-02|      North Carolina|
|     12|   1784-11-24|            Virginia|
|     13|   1800-01-07|            New York|
|     14|   1804-11-23|       New Hampshire|
|     15|   1791-04-23|        Pennsylvania|
|     16|   1809-02-12|            Kentucky|
|     17|   1808-12-29|      North Carolina|
|     18|   1822-04-27|                Ohio|
|     19|   1822-10-04|                Ohio|
|     20|   1831-11-19|                Ohio|
+-------+-------------+--------------------+
only showing top 20 rows

We have used “President table” as table alias and “Date Of Birth” as column alias in above query. You could also use “as()” in place of “alias()”. In the next post we will see how to use WHERE i.e. apply filter in SparkSQL DataFrame.

See how Spark Dataframe ALIAS works:

3 thoughts on “SPARK Dataframe Alias AS

  1. I have to handle the scenario in which I require handling the column names dynamically. I have created a mapping json file and use that to keep track of the column name changes.
    I have a DF with two columns Last_Name and First_Name.
    val columnvalue = “Last_Name”
    I fetch the LastName from the dataframe as below:
    df.select(s”$columnvalue”) and it works properly but if I need to give alias to this column how can that be done…
    Any thoughts here… I am not able to this and wants this for one of my project that i am working on.

    1. Hi Nikunj
      Please try this:
      df_pres.select(column(s”$columnvalue”).as(“test_alias”))

      Hope it helps.

  2. Nice topic, thanks for posting about spark SQL aliases. Just I came your blog, I saw and read the articles about Hadoop really your explanation is good on topics.Keep posting this type useful content.

Leave a Reply