Spark JDBC connection to RDBMS

Recently I have received few queries regarding the query which we are passing to “load” function when using JDBC connection to connect to any RDBMS. The question is whether that query should be Spark SQL compliant or should be RDBMS specific. This is actually a very valid question because Spark SQL does not support all SQL constructs which are supported by typical RDBMS like Teradata , Netezza etc. Answer to this question is : Query must be RDBMS specific. When…

Continue Reading

Connect to different RDBMS from Spark

In this post, we will see how to connect to 3 very popular RDBMS using Spark. We will create connection and will fetch some records via spark. The dataframe will hold data and we can use it as per requirement. We will talk about JAR files required for connection and JDBC connection string to fetch data and load dataframe. Connect to Netezza from Spark RDBMS: Netezza Jar Required: nzjdbc.jar Step 1: Open Spark shell and add jar spark-shell –jars /tmp/nz/nzjdbc.jar…

Continue Reading

How to implement recursive queries in Spark?

Recently I was working on a project in which client data warehouse was in Teradata. The requirement was to have something similar on Hadoop also for a specific business application. At a high level, the requirement was to have same data and run similar sql on that data to produce exactly same report on hadoop too. I don’t see any challenge in migrating data from Teradata to Hadoop. Also transforming SQL into equivalent HIVE/SPARK is not that difficult now. The…

Continue Reading

Max iterations (100) reached for batch Resolution – Spark Error

Max Iterations error is not very common error in Spark however if you are working with Spark SQL you may encounter this error. The error mostly comes while running query which generates very long query plans. I was recently working on such query which involved many joins and derived tables & CTE etc. In short, it was a pretty complex query which actually runs on Netezza everyday. We were checking the feasibility and also comparing the query performance in Netezza…

Continue Reading
Spark1 in HDP

How to select SPARK2 as default spark version

Hi Guys. I have been using HDP2.5 for sometime now and few of my friends asked me that how can they select SPARK2 by default. In HDP2.5 we have Spark1.X & Spark2 both available. However when you will start SPARK-SHELL, it will show you a prompt and will select SPARK1.X as default. The answer to the question is present in the prompt itself. You can see it displays on screen that SPARK_MAJOR_VERSION is not set hence taking SPARK1 as default.…

Continue Reading