Problem with Decimal Rounding & solution

If you migrate from any RDBMS platform to another, one technical challenge you may face is different Decimal Rounding on both the platforms. I was recently working for a client where we migrated Teradata application into Spark on EMR and there were many measures like Amount which were not matching. On analysis, we realised the issue is with decimal rounding. Example:

Teradata:
select cast(20.12345 as decimal(18,4)) ; -- Output: 20.1234

Spark:
 select cast(20.12345 as decimal(18,4)) ; -- Output: 20.1235

Now you may say that the how scale of 0.0001 can result in considerable difference ? Initially we also thought so however it was amount field and with several million rows in the table, if the difference is $20 or so then it does make a difference in reports. So conclusion was to fix this issue.

BROUND function

If you are facing decimal rounding issue in Spark when compared to other platform like Teradata then use BROUND function to replicate same rounding, In Spark, you can use BROUND function which actually rounds decimal as per HALF_EVEN method and mostly in banking system we follow the same. Example:

Spark:
 select bround(20.12345,4)  -- Output: 20.1234

So next time, if you face any challenge due to Decimal Rounding, bround is your function.

Thank you for sharing this. I will give it a try as well.

Might be interesting to add a PySpark dialect to SQLglot https://github.com/tobymao/sqlglot https://github.com/tobymao/sqlglot/tree/main/sqlglot/dialects

try something like df.withColumn("type", when(col("flag1"), lit("type_1")).when(!col("flag1") && (col("flag2") || col("flag3") || col("flag4") || col("flag5")), lit("type2")).otherwise(lit("other")))

It will be great if you can have a link to the convertor. It helps the community for anyone starting…

I am wondering if there is a way to preserve time information when adding/subtracting days from a datetime. If I…

BROUND function

Leave a ReplyCancel reply