SPARK Dataframe Alias AS

ALIAS is defined in order to make columns or tables more readable or even shorter. If you wish to rename your columns while displaying it to the user or if you are using tables in joins then you may need to have alias for table names. Other than making column names or table names more readable, alias also helps in making developer life better by writing smaller table names in join conditions. You may have to give alias name to DERIVED table as well in SQL. Now let’s see how to give alias names to columns or tables in Spark SQL. We will use alias() function with column names and table names. If you can recall the “SELECT” query from our previous post , we will add alias to the same query and see the output.
Original Query:

scala> df_pres.select($"pres_id",$"pres_dob",$"pres_bs").show()

Query with Alias:

scala> df_pres.alias("President table").select($"pres_id",$"pres_dob".alias("Date Of Birth"),$"pres_bs").show()
+-------+-------------+--------------------+
|pres_id|Date Of Birth|             pres_bs|
+-------+-------------+--------------------+
|      1|   1732-02-22|            Virginia|
|      2|   1735-10-30|       Massachusetts|
|      3|   1743-04-13|            Virginia|
|      4|   1751-03-16|            Virginia|
|      5|   1758-04-28|            Virginia|
|      6|   1767-07-11|       Massachusetts|
|      7|   1767-03-15|South/North Carolina|
|      8|   1782-12-05|            New York|
|      9|   1773-02-09|            Virginia|
|     10|   1790-03-29|            Virginia|
|     11|   1795-11-02|      North Carolina|
|     12|   1784-11-24|            Virginia|
|     13|   1800-01-07|            New York|
|     14|   1804-11-23|       New Hampshire|
|     15|   1791-04-23|        Pennsylvania|
|     16|   1809-02-12|            Kentucky|
|     17|   1808-12-29|      North Carolina|
|     18|   1822-04-27|                Ohio|
|     19|   1822-10-04|                Ohio|
|     20|   1831-11-19|                Ohio|
+-------+-------------+--------------------+
only showing top 20 rows

We have used “President table” as table alias and “Date Of Birth” as column alias in above query. You could also use “as()” in place of “alias()”. In the next post we will see how to use WHERE i.e. apply filter in SparkSQL DataFrame.

One Reply to “SPARK Dataframe Alias AS”

  1. Nice topic, thanks for posting about spark SQL aliases. Just I came your blog, I saw and read the articles about Hadoop really your explanation is good on topics.Keep posting this type useful content.

Leave a Reply

Your email address will not be published. Required fields are marked *