Skip to content
SQL & Hadoop

SQL on Hadoop with Hive, Spark & PySpark on EMR & AWS Glue

  • Home
  • About
  • Privacy Policy
SQL & Hadoop

SQL on Hadoop with Hive, Spark & PySpark on EMR & AWS Glue

  • About
  • AWS Glue
  • Blog
  • Free Online SQL to PySpark Converter
  • Generate Spark JDBC Connection String online
  • Home
  • Optimise Spark Configurations – Online Generator
  • Privacy Policy
  • PySpark Cheat Sheet
  • Apache Spark Tutorial
Apache Spark

Spark Dataframe NULL values

Leave a Comment / Apache Spark / Raj

In this post, we will see how to Handle NULL values in any given dataframe. Many people confuse it with BLANK or empty string however there is a difference. NULL means unknown where BLANK is empty. Alright now let’s see what all operations are available in Spark Dataframe which can help us in handling NULL […]

Spark Dataframe NULL values Read More »

Apache Spark

Spark Dataframe – Explode

Leave a Comment / Apache Spark / Raj

In Spark, we can use “explode” method to convert single column values into multiple rows. Explode can be used to convert one row into multiple rows in Spark. Recently I was working on a task to convert Cobol VSAM file which often has nested columns defined in it. In Spark my requirement was to convert

Spark Dataframe – Explode Read More »

Apache Spark

Spark Dataframe SHOW

Leave a Comment / Apache Spark / Raj

In Spark Dataframe, SHOW method is used to display Dataframe records in readable tabular format. This method is used very often to check how the content inside Dataframe looks like. Let’s see it with an example. Few things to observe here: 1) By default, SHOW function will return only 20 records. This is equivalent to

Spark Dataframe SHOW Read More »

Apache Spark

Spark Dataframe Column list

Leave a Comment / Apache Spark / Raj

Recently I was working on a task where I wanted Spark Dataframe Column List in a variable. This was required to do further processing depending on some technical columns present in the list. So we know that you can print Schema of Dataframe using printSchema method. It will show tree hierarchy of columns along with

Spark Dataframe Column list Read More »

Apache Spark

Spark Dataframe – UNION/UNION ALL

Leave a Comment / Apache Spark / Raj

UNION method is used to MERGE data from 2 dataframes into one. The dataframe must have identical schema. If you are from SQL background then please be very cautious while using UNION operator in SPARK dataframes. Unlike typical RDBMS, UNION in Spark does not remove duplicates from resultant dataframe. It simply MERGEs the data without

Spark Dataframe – UNION/UNION ALL Read More »

← Previous 1 … 6 7 8 … 15 Next →

Topics

  • Amazon EMR
  • Apache HIVE
  • Apache Spark
  • AWS Glue
  • PySpark
  • SQL on Hadoop

Recent Posts

  • AWS Glue create dynamic frame
  • AWS Glue read files from S3
  • How to check Spark run logs in EMR
  • PySpark apply function to column
  • Run Spark Job in existing EMR using AIRFLOW

Recent Posts

  • AWS Glue create dynamic frame
  • AWS Glue read files from S3
  • How to check Spark run logs in EMR
  • PySpark apply function to column
  • Run Spark Job in existing EMR using AIRFLOW

Join the discussion

  1. Ramkumar on Spark Performance Tuning with help of Spark UIFebruary 3, 2025

    Great. Keep writing more articles.

  2. Raj on Free Online SQL to PySpark ConverterAugust 9, 2022

    Thank you for sharing this. I will give it a try as well.

  3. John K-W on Free Online SQL to PySpark ConverterAugust 8, 2022

    Might be interesting to add a PySpark dialect to SQLglot https://github.com/tobymao/sqlglot https://github.com/tobymao/sqlglot/tree/main/sqlglot/dialects

  4. Meena M on Spark Dataframe WHEN caseJuly 28, 2022

    try something like df.withColumn("type", when(col("flag1"), lit("type_1")).when(!col("flag1") && (col("flag2") || col("flag3") || col("flag4") || col("flag5")), lit("type2")).otherwise(lit("other")))

  5. tagu on Free Online SQL to PySpark ConverterJuly 20, 2022

    It will be great if you can have a link to the convertor. It helps the community for anyone starting…

Copyright © 2025 SQL & Hadoop