Apache Spark

Spark Dataframe WHEN case

In SQL, if we have to check multiple conditions for any column value then we use case statement. In Spark SQL dataframes also we can replicate same functionality by using WHEN clause multiple times, once for each conditional check. No requirement to add CASE keyword though. So let’s see an example to see how to check for multiple conditions and replicate SQL CASE statement in Spark SQL.

df_pres
  .select(
    $"pres_name",
    $"pres_dob",
    $"pres_bs",
    when($"pres_bs" === "Virginia", "VA")
      .when($"pres_bs" === "Massachusetts", "MA")
      .when($"pres_bs" === "Ohio", "OH")
      .otherwise("Others")
      .alias("state_abbr")
  ).show()

+--------------------+----------+--------------------+----------+
|           pres_name|  pres_dob|             pres_bs|state_abbr|
+--------------------+----------+--------------------+----------+
|   George Washington|1732-02-22|            Virginia|        VA|
|          John Adams|1735-10-30|       Massachusetts|        MA|
|    Thomas Jefferson|1743-04-13|            Virginia|        VA|
|       James Madison|1751-03-16|            Virginia|        VA|
|        James Monroe|1758-04-28|            Virginia|        VA|
|   John Quincy Adams|1767-07-11|       Massachusetts|        MA|
|      Andrew Jackson|1767-03-15|South/North Carolina|    Others|
|    Martin Van Buren|1782-12-05|            New York|    Others|
|William Henry Har...|1773-02-09|            Virginia|        VA|
|          John Tyler|1790-03-29|            Virginia|        VA|
|       James K. Polk|1795-11-02|      North Carolina|    Others|
|      Zachary Taylor|1784-11-24|            Virginia|        VA|
|    Millard Fillmore|1800-01-07|            New York|    Others|
|     Franklin Pierce|1804-11-23|       New Hampshire|    Others|
|      James Buchanan|1791-04-23|        Pennsylvania|    Others|
|     Abraham Lincoln|1809-02-12|            Kentucky|    Others|
|      Andrew Johnson|1808-12-29|      North Carolina|    Others|
|    Ulysses S. Grant|1822-04-27|                Ohio|        OH|
| Rutherford B. Hayes|1822-10-04|                Ohio|        OH|
|   James A. Garfield|1831-11-19|                Ohio|        OH|
+--------------------+----------+--------------------+----------+
only showing top 20 rows

Spark CASE is similar to IF…ELSE IF…ELSE statement in common programming language. If multiple when statement is true, the first one will be executed and remaining will be ignored for any given row. In Spark, ELSE is denoted by “otherwise”.

2 thoughts on “Spark Dataframe WHEN case”

  1. try something like
    df.withColumn(“type”, when(col(“flag1”), lit(“type_1”)).when(!col(“flag1”) && (col(“flag2”) || col(“flag3”) || col(“flag4”) || col(“flag5”)), lit(“type2”)).otherwise(lit(“other”)))

  2. Hi Nitin,

    How can we add more conditions within single WHEN clause?
    For example –
    when(condition1 && condition2 , “this value”).otherwise(“that value”)

    Thank you in advance.

Leave a Reply