Replace String – TRANSLATE & REGEXP_REPLACE
It is very common sql operation to replace a character in a string with other character or you may want to replace string with other string . This is possible in Spark SQL Dataframe easily using regexp_replace or translate function.
Spark TRANSLATE function
If we want to replace any given character in String with some other character then use Translate to change that character value.
scala> df_pres.select($"pres_name",translate($"pres_name","J","Z").as("new_name")).show()
+--------------------+--------------------+
| pres_name| new_name|
+--------------------+--------------------+
| George Washington| George Washington|
| John Adams| Zohn Adams|
| Thomas Jefferson| Thomas Zefferson|
| James Madison| Zames Madison|
| James Monroe| Zames Monroe|
| John Quincy Adams| Zohn Quincy Adams|
| Andrew Jackson| Andrew Zackson|
| Martin Van Buren| Martin Van Buren|
|William Henry Har...|William Henry Har...|
| John Tyler| Zohn Tyler|
| James K. Polk| Zames K. Polk|
| Zachary Taylor| Zachary Taylor|
| Millard Fillmore| Millard Fillmore|
| Franklin Pierce| Franklin Pierce|
| James Buchanan| Zames Buchanan|
| Abraham Lincoln| Abraham Lincoln|
| Andrew Johnson| Andrew Zohnson|
| Ulysses S. Grant| Ulysses S. Grant|
| Rutherford B. Hayes| Rutherford B. Hayes|
| James A. Garfield| Zames A. Garfield|
+--------------------+--------------------+
only showing top 20 rows
So in the above example I have replace “J” with “Z” and you can see it in the example too.
Spark REGEXP_REPLACE function
If the requirement is to replace not just one character but complete string. Then we can use regexp_replace function to achieve it.
scala> df_pres.select($"pres_name",regexp_replace($"pres_name","James","Martin").as("new_name")).show()
+--------------------+--------------------+
| pres_name| new_name|
+--------------------+--------------------+
| George Washington| George Washington|
| John Adams| John Adams|
| Thomas Jefferson| Thomas Jefferson|
| James Madison| Martin Madison|
| James Monroe| Martin Monroe|
| John Quincy Adams| John Quincy Adams|
| Andrew Jackson| Andrew Jackson|
| Martin Van Buren| Martin Van Buren|
|William Henry Har...|William Henry Har...|
| John Tyler| John Tyler|
| James K. Polk| Martin K. Polk|
| Zachary Taylor| Zachary Taylor|
| Millard Fillmore| Millard Fillmore|
| Franklin Pierce| Franklin Pierce|
| James Buchanan| Martin Buchanan|
| Abraham Lincoln| Abraham Lincoln|
| Andrew Johnson| Andrew Johnson|
| Ulysses S. Grant| Ulysses S. Grant|
| Rutherford B. Hayes| Rutherford B. Hayes|
| James A. Garfield| Martin A. Garfield|
+--------------------+--------------------+
only showing top 20 rowsIn the above example we have used regexp_replace and changed “James” with “Martin”.
