Spark-SQL DataFrame is the closest thing a SQL Developer can find in Apache Spark. I am technically from SQL background with 10+ years of experience working in traditional RDBMS like Teradata, Oracle, Netezza, Sybase etc. So when I moved from traditional RDBMS to Hadoop for my new projects, I was excited to look for SQL options available in it. I must admit HIVE is the most relevant one and it made my life so simple in my new project. Next comes the Apache SPARK.
Apache Spark has Spark SQL as one of the components which is blessing for people like me. We don’t prefer writing java applications but SQL is our forte. Since Apache Spark is perhaps the loudest buzz word in the market today so working on Spark SQL is exciting too. One of the core object in Spark SQL is DataFrame and it is as good as any Table in RDBMS. You can apply all sorts of SQL operations on a DataFrame directly or indirectly.
Below are the posts using Scala DataFrame which I would like to share with you and hope it can help you in transitioning from SQL to Spark SQL.