Apache Spark

Spark Dataframe WHEN case

In SQL, if we have to check multiple conditions for any column value then we use case statement. In Spark SQL dataframes also we can replicate same functionality by using WHEN clause multiple times, once for each conditional check. No requirement to add CASE keyword though. So let's see an example to see how to check for multiple conditions and replicate SQL CASE statement in Spark SQL.

    when($"pres_bs" === "Virginia", "VA")
      .when($"pres_bs" === "Massachusetts", "MA")
      .when($"pres_bs" === "Ohio", "OH")

|           pres_name|  pres_dob|             pres_bs|state_abbr|
|   George Washington|1732-02-22|            Virginia|        VA|
|          John Adams|1735-10-30|       Massachusetts|        MA|
|    Thomas Jefferson|1743-04-13|            Virginia|        VA|
|       James Madison|1751-03-16|            Virginia|        VA|
|        James Monroe|1758-04-28|            Virginia|        VA|
|   John Quincy Adams|1767-07-11|       Massachusetts|        MA|
|      Andrew Jackson|1767-03-15|South/North Carolina|    Others|
|    Martin Van Buren|1782-12-05|            New York|    Others|
|William Henry Har...|1773-02-09|            Virginia|        VA|
|          John Tyler|1790-03-29|            Virginia|        VA|
|       James K. Polk|1795-11-02|      North Carolina|    Others|
|      Zachary Taylor|1784-11-24|            Virginia|        VA|
|    Millard Fillmore|1800-01-07|            New York|    Others|
|     Franklin Pierce|1804-11-23|       New Hampshire|    Others|
|      James Buchanan|1791-04-23|        Pennsylvania|    Others|
|     Abraham Lincoln|1809-02-12|            Kentucky|    Others|
|      Andrew Johnson|1808-12-29|      North Carolina|    Others|
|    Ulysses S. Grant|1822-04-27|                Ohio|        OH|
| Rutherford B. Hayes|1822-10-04|                Ohio|        OH|
|   James A. Garfield|1831-11-19|                Ohio|        OH|
only showing top 20 rows

Spark CASE is similar to IF…ELSE IF…ELSE statement in common programming language. If multiple when statement is true, the first one will be executed and remaining will be ignored for any given row. In Spark, ELSE is denoted by "otherwise".

1 thought on “Spark Dataframe WHEN case”

  1. Hi Nitin,

    How can we add more conditions within single WHEN clause?
    For example –
    when(condition1 && condition2 , "this value").otherwise("that value")

    Thank you in advance.

Leave a Comment

Your email address will not be published.