DataFrame is the closest thing a SQL Developer can find in Apache Spark to a regular table in RDBMS. I am technically from SQL background initially working in traditional RDBMS like Teradata, Oracle, Netezza, Sybase etc. So when I moved from traditional RDBMS to Hadoop for my new projects, I was excited to look for SQL options available in it. I must admit HIVE was the most relevant one and it made my life so simple in my first Hadoop project. Next comes the Apache SPARK.
Apache Spark has Spark SQL as one of the components or API which is blessing for people like me. We don’t prefer writing java applications but SQL is our forte. Since Apache Spark is very popular in the market today so working on Spark SQL is exciting too. One of the core object in Spark SQL is DataFrame and it is as good as any Table in RDBMS. You can apply all sorts of SQL operations on a DataFrame directly or indirectly.
Below are the posts using Scala and PySpark DataFrame which I would like to share with you and hope it can help you in transitioning from SQL to Spark.
Spark Scala Dataframe Examples
- SPARK DATAFRAME ADD MULTIPLE COLUMNS WITH VALUE
- SPARK DATAFRAME ALIAS AS
- SPARK DATAFRAME COLUMN LIST
- SPARK DATAFRAME CONCATENATE STRINGS
- SPARK DATAFRAME DISTINCT OR DROP DUPLICATES
- SPARK DATAFRAME EXPLODE
- SPARK DATAFRAME GROUPBY AGGREGATE FUNCTIONS
- SPARK DATAFRAME IN-NOT IN
- SPARK DATAFRAME JOINS – Complete Guide
- SPARK DATAFRAME LIKE NOT LIKE RLIKE
- SPARK DATAFRAME MONOTONICALLY_INCREASING_ID
- SPARK DATAFRAME NULL VALUES
- SPARK DATAFRAME ORDERBY SORT
- SPARK DATAFRAME REPARTITION
- SPARK DATAFRAME REPLACE STRING
- SPARK DATAFRAME SELECT
- SPARK DATAFRAME SHOW
- SPARK DATAFRAME UNION/UNION ALL
- SPARK DATAFRAME UPDATE COLUMN VALUE
- SPARK DATAFRAME WHEN CASE
- SPARK DATAFRAME WHERE FILTER