Max Iterations error is not very common error in Spark however if you are working with Spark SQL you may encounter this error. The error mostly comes while running query which generates very long query plans. I was recently working on such query which involved many joins and derived tables & CTE etc. In short, it was a pretty complex query which actually runs on Netezza everyday. We were checking the feasibility and also comparing the query performance in Netezza against Apache Spark2.
So we know that Spark takes advantage of Catalyst Optimizer while using DataFrames. Spark run the same algorithm iteratively across multiple executors to get the result. By Default, the value is set to 100 which is a comfortable value for most of the jobs. In my case, I was getting error: “Max iterations (100) reached for batch Resolution”.
To overcome you can try to modify your SQL query in order to generate smaller & simpler query plan or you can increase the default value from 100 to say 200. You can use below property to increase it:
spark.sql.optimizer.maxIterations 200
Once I increased the value to 200 , the query worked perfectly fine. Now those of you who are wondering what was the outcome of comparison between Netezza & Spark, the answer is Netezza gave better results in terms of execution time. We executed the query as it was present in Netezza (with minor modifications to make it compatible with Spark SQL). We were yet to do performance tuning on Spark.
