If you migrate from any RDBMS platform to another, one technical challenge you may face is different Decimal Rounding on both the platforms. I was recently working for a client where we migrated Teradata application into Spark on EMR and there were many measures like Amount which were not matching. On analysis, we realised the issue is with decimal rounding. Example:
Teradata: select cast(20.12345 as decimal(18,4)) ; -- Output: 20.1234
Spark: select cast(20.12345 as decimal(18,4)) ; -- Output: 20.1235
Now you may say that the how scale of 0.0001 can result in considerable difference ? Initially we also thought so however it was amount field and with several million rows in the table, if the difference is $20 or so then it does make a difference in reports. So conclusion was to fix this issue.
If you are facing decimal rounding issue in Spark when compared to other platform like Teradata then use BROUND function to replicate same rounding, In Spark, you can use BROUND function which actually rounds decimal as per HALF_EVEN method and mostly in banking system we follow the same. Example:
Spark: select bround(20.12345,4) -- Output: 20.1234
So next time, if you face any challenge due to Decimal Rounding, bround is your function.