Apache Spark

Spark Dataframe Replace String

Replace String – TRANSLATE & REGEXP_REPLACE

It is very common sql operation to replace a character in a string with other character or you may want to replace string with other string . This is possible in Spark SQL Dataframe easily using regexp_replace or translate function.

Spark TRANSLATE function

If we want to replace any given character in String with some other character then use Translate to change that character value.

scala> df_pres.select($"pres_name",translate($"pres_name","J","Z").as("new_name")).show()

+--------------------+--------------------+
|           pres_name|            new_name|
+--------------------+--------------------+
|   George Washington|   George Washington|
|          John Adams|          Zohn Adams|
|    Thomas Jefferson|    Thomas Zefferson|
|       James Madison|       Zames Madison|
|        James Monroe|        Zames Monroe|
|   John Quincy Adams|   Zohn Quincy Adams|
|      Andrew Jackson|      Andrew Zackson|
|    Martin Van Buren|    Martin Van Buren|
|William Henry Har...|William Henry Har...|
|          John Tyler|          Zohn Tyler|
|       James K. Polk|       Zames K. Polk|
|      Zachary Taylor|      Zachary Taylor|
|    Millard Fillmore|    Millard Fillmore|
|     Franklin Pierce|     Franklin Pierce|
|      James Buchanan|      Zames Buchanan|
|     Abraham Lincoln|     Abraham Lincoln|
|      Andrew Johnson|      Andrew Zohnson|
|    Ulysses S. Grant|    Ulysses S. Grant|
| Rutherford B. Hayes| Rutherford B. Hayes|
|   James A. Garfield|   Zames A. Garfield|
+--------------------+--------------------+
only showing top 20 rows

So in the above example I have replace “J” with “Z” and you can see it in the example too.

Spark REGEXP_REPLACE function

If the requirement is to replace not just one character but complete string. Then we can use regexp_replace function to achieve it.

scala> df_pres.select($"pres_name",regexp_replace($"pres_name","James","Martin").as("new_name")).show()
                                                                                    
+--------------------+--------------------+
|           pres_name|            new_name|
+--------------------+--------------------+
|   George Washington|   George Washington|
|          John Adams|          John Adams|
|    Thomas Jefferson|    Thomas Jefferson|
|       James Madison|      Martin Madison|
|        James Monroe|       Martin Monroe|
|   John Quincy Adams|   John Quincy Adams|
|      Andrew Jackson|      Andrew Jackson|
|    Martin Van Buren|    Martin Van Buren|
|William Henry Har...|William Henry Har...|
|          John Tyler|          John Tyler|
|       James K. Polk|      Martin K. Polk|
|      Zachary Taylor|      Zachary Taylor|
|    Millard Fillmore|    Millard Fillmore|
|     Franklin Pierce|     Franklin Pierce|
|      James Buchanan|     Martin Buchanan|
|     Abraham Lincoln|     Abraham Lincoln|
|      Andrew Johnson|      Andrew Johnson|
|    Ulysses S. Grant|    Ulysses S. Grant|
| Rutherford B. Hayes| Rutherford B. Hayes|
|   James A. Garfield|  Martin A. Garfield|
+--------------------+--------------------+
only showing top 20 rows

In the above example we have used regexp_replace and changed “James” with “Martin”.

Leave a Reply