Replace String – TRANSLATE & REGEXP_REPLACE
It is very common sql operation to replace a character in a string with other character or you may want to replace string with other string . This is possible in Spark SQL Dataframe easily using regexp_replace or translate function.
Spark TRANSLATE function
If we want to replace any given character in String with some other character then use Translate to change that character value.
scala> df_pres.select($"pres_name",translate($"pres_name","J","Z").as("new_name")).show() +--------------------+--------------------+ | pres_name| new_name| +--------------------+--------------------+ | George Washington| George Washington| | John Adams| Zohn Adams| | Thomas Jefferson| Thomas Zefferson| | James Madison| Zames Madison| | James Monroe| Zames Monroe| | John Quincy Adams| Zohn Quincy Adams| | Andrew Jackson| Andrew Zackson| | Martin Van Buren| Martin Van Buren| |William Henry Har...|William Henry Har...| | John Tyler| Zohn Tyler| | James K. Polk| Zames K. Polk| | Zachary Taylor| Zachary Taylor| | Millard Fillmore| Millard Fillmore| | Franklin Pierce| Franklin Pierce| | James Buchanan| Zames Buchanan| | Abraham Lincoln| Abraham Lincoln| | Andrew Johnson| Andrew Zohnson| | Ulysses S. Grant| Ulysses S. Grant| | Rutherford B. Hayes| Rutherford B. Hayes| | James A. Garfield| Zames A. Garfield| +--------------------+--------------------+ only showing top 20 rows
So in the above example I have replace “J” with “Z” and you can see it in the example too.
Spark REGEXP_REPLACE function
If the requirement is to replace not just one character but complete string. Then we can use regexp_replace function to achieve it.
scala> df_pres.select($"pres_name",regexp_replace($"pres_name","James","Martin").as("new_name")).show() +--------------------+--------------------+ | pres_name| new_name| +--------------------+--------------------+ | George Washington| George Washington| | John Adams| John Adams| | Thomas Jefferson| Thomas Jefferson| | James Madison| Martin Madison| | James Monroe| Martin Monroe| | John Quincy Adams| John Quincy Adams| | Andrew Jackson| Andrew Jackson| | Martin Van Buren| Martin Van Buren| |William Henry Har...|William Henry Har...| | John Tyler| John Tyler| | James K. Polk| Martin K. Polk| | Zachary Taylor| Zachary Taylor| | Millard Fillmore| Millard Fillmore| | Franklin Pierce| Franklin Pierce| | James Buchanan| Martin Buchanan| | Abraham Lincoln| Abraham Lincoln| | Andrew Johnson| Andrew Johnson| | Ulysses S. Grant| Ulysses S. Grant| | Rutherford B. Hayes| Rutherford B. Hayes| | James A. Garfield| Martin A. Garfield| +--------------------+--------------------+ only showing top 20 rows
In the above example we have used regexp_replace and changed “James” with “Martin”.