Spark Dataframe Replace String

It is very common sql operation to replace a character in a string with other character or you may want to replace string with other string . This is possible in Spark SQL Dataframe easily using regexp_replace or translate function. Let’s see if we want to replace any given character in String with some other character then how Translate can help us.

scala> df_pres.select($"pres_name",translate($"pres_name","J","Z").as("new_name")).show()
+--------------------+--------------------+
|           pres_name|            new_name|
+--------------------+--------------------+
|   George Washington|   George Washington|
|          John Adams|          Zohn Adams|
|    Thomas Jefferson|    Thomas Zefferson|
|       James Madison|       Zames Madison|
|        James Monroe|        Zames Monroe|
|   John Quincy Adams|   Zohn Quincy Adams|
|      Andrew Jackson|      Andrew Zackson|
|    Martin Van Buren|    Martin Van Buren|
|William Henry Har...|William Henry Har...|
|          John Tyler|          Zohn Tyler|
|       James K. Polk|       Zames K. Polk|
|      Zachary Taylor|      Zachary Taylor|
|    Millard Fillmore|    Millard Fillmore|
|     Franklin Pierce|     Franklin Pierce|
|      James Buchanan|      Zames Buchanan|
|     Abraham Lincoln|     Abraham Lincoln|
|      Andrew Johnson|      Andrew Zohnson|
|    Ulysses S. Grant|    Ulysses S. Grant|
| Rutherford B. Hayes| Rutherford B. Hayes|
|   James A. Garfield|   Zames A. Garfield|
+--------------------+--------------------+
only showing top 20 rows

So in the above example I have replace “J” with “Z” and you can see it in the example too.
However many times the requirement will be to replace not just one character but complete string. Then we can use regexp_replace function to achieve it.

scala> df_pres.select($"pres_name",regexp_replace($"pres_name","James","Martin").as("new_name")).show()
+--------------------+--------------------+
|           pres_name|            new_name|
+--------------------+--------------------+
|   George Washington|   George Washington|
|          John Adams|          John Adams|
|    Thomas Jefferson|    Thomas Jefferson|
|       James Madison|      Martin Madison|
|        James Monroe|       Martin Monroe|
|   John Quincy Adams|   John Quincy Adams|
|      Andrew Jackson|      Andrew Jackson|
|    Martin Van Buren|    Martin Van Buren|
|William Henry Har...|William Henry Har...|
|          John Tyler|          John Tyler|
|       James K. Polk|      Martin K. Polk|
|      Zachary Taylor|      Zachary Taylor|
|    Millard Fillmore|    Millard Fillmore|
|     Franklin Pierce|     Franklin Pierce|
|      James Buchanan|     Martin Buchanan|
|     Abraham Lincoln|     Abraham Lincoln|
|      Andrew Johnson|      Andrew Johnson|
|    Ulysses S. Grant|    Ulysses S. Grant|
| Rutherford B. Hayes| Rutherford B. Hayes|
|   James A. Garfield|  Martin A. Garfield|
+--------------------+--------------------+
only showing top 20 rows

In the above example we have used regexp_replace and changed “James” with “Martin”.

Leave a Reply

Your email address will not be published. Required fields are marked *