Skip to content
SQL & Hadoop

SQL on Hadoop with Hive, Spark & PySpark on EMR & AWS Glue

  • Home
  • About
  • Privacy Policy
SQL & Hadoop

SQL on Hadoop with Hive, Spark & PySpark on EMR & AWS Glue

  • About
  • AWS Glue
  • Blog
  • Free Online SQL to PySpark Converter
  • Generate Spark JDBC Connection String online
  • Home
  • Optimise Spark Configurations – Online Generator
  • Privacy Policy
  • PySpark Cheat Sheet
  • Apache Spark Tutorial
Apache Spark

Spark Performance Tuning with help of Spark UI

1 Comment / Apache Spark / Raj

Spark is distributed data processing engine which relies a lot on memory available for computation. Also if you have worked on spark, then you must have faced job/task/stage failures due to memory issues. Hence making memory management as one of the key techniques for efficient Spark environment. In this post, we will see how Spark […]

Spark Performance Tuning with help of Spark UI Read More »

PySpark

PySpark -Convert SQL queries to Dataframe

Leave a Comment / PySpark / Raj

In PySpark, you can run dataframe commands or if you are comfortable with SQL then you can run SQL queries too. In this post, we will see how to run different variations of SELECT queries on table built on Hive & corresponding Dataframe commands to replicate same output as SQL query. Let’s create a dataframe

PySpark -Convert SQL queries to Dataframe Read More »

Apache Spark

Problem with Decimal Rounding & solution

Leave a Comment / Apache Spark / Raj

If you migrate from any RDBMS platform to another, one technical challenge you may face is different Decimal Rounding on both the platforms. I was recently working for a client where we migrated Teradata application into Spark on EMR and there were many measures like Amount which were not matching. On analysis, we realised the

Problem with Decimal Rounding & solution Read More »

Apache Spark

Never run INSERT OVERWRITE again – try Hadoop Distcp

Leave a Comment / Apache Spark / Raj

Recently, I was working on one project where the ETL requirement was to have daily snapshot of the table. It was 15+ years old data model on which datawarehouse was designed and the client wanted to replicate it on Hadoop. So you can convert the ETL to Spark SQL however not everything works as-is on

Never run INSERT OVERWRITE again – try Hadoop Distcp Read More »

Apache Hadoop

Columnar Storage & why you must use it

2 Comments / SQL on Hadoop / Raj

If you are working on Hadoop or any other platform and storing structured data, I am sure you must have heard about columnar storage types. In the past 7-8 years the popularity “columnar” has gained confirms that the buzz is not a bubble and this is the future of Data Analytics from storage perspective. What

Columnar Storage & why you must use it Read More »

← Previous 1 … 4 5 6 … 15 Next →

Topics

  • Amazon EMR
  • Apache HIVE
  • Apache Spark
  • AWS Glue
  • PySpark
  • SQL on Hadoop

Recent Posts

  • AWS Glue create dynamic frame
  • AWS Glue read files from S3
  • How to check Spark run logs in EMR
  • PySpark apply function to column
  • Run Spark Job in existing EMR using AIRFLOW

Recent Posts

  • AWS Glue create dynamic frame
  • AWS Glue read files from S3
  • How to check Spark run logs in EMR
  • PySpark apply function to column
  • Run Spark Job in existing EMR using AIRFLOW

Join the discussion

  1. Ramkumar on Spark Performance Tuning with help of Spark UIFebruary 3, 2025

    Great. Keep writing more articles.

  2. Raj on Free Online SQL to PySpark ConverterAugust 9, 2022

    Thank you for sharing this. I will give it a try as well.

  3. John K-W on Free Online SQL to PySpark ConverterAugust 8, 2022

    Might be interesting to add a PySpark dialect to SQLglot https://github.com/tobymao/sqlglot https://github.com/tobymao/sqlglot/tree/main/sqlglot/dialects

  4. Meena M on Spark Dataframe WHEN caseJuly 28, 2022

    try something like df.withColumn("type", when(col("flag1"), lit("type_1")).when(!col("flag1") && (col("flag2") || col("flag3") || col("flag4") || col("flag5")), lit("type2")).otherwise(lit("other")))

  5. tagu on Free Online SQL to PySpark ConverterJuly 20, 2022

    It will be great if you can have a link to the convertor. It helps the community for anyone starting…

Copyright © 2025 SQL & Hadoop