Skip to content
SQL & Hadoop

SQL on Hadoop with Hive, Spark & PySpark on EMR & AWS Glue

  • Home
  • About
  • Privacy Policy
SQL & Hadoop

SQL on Hadoop with Hive, Spark & PySpark on EMR & AWS Glue

  • About
  • AWS Glue
  • Blog
  • Free Online SQL to PySpark Converter
  • Generate Spark JDBC Connection String online
  • Home
  • Optimise Spark Configurations – Online Generator
  • Privacy Policy
  • PySpark Cheat Sheet
  • Apache Spark Tutorial
PySpark

How to convert SQL Queries into PySpark

Leave a Comment / PySpark / Raj

In the previous post, we saw many common conversions from SQL to Dataframe in PySpark. In this post, we will see the strategy which you can follow to convert typical SQL query to dataframe in PySpark. If you have not checked previous post, I will strongly recommend to do it as we will refer to […]

How to convert SQL Queries into PySpark Read More »

PySpark

PySpark Read Write Parquet Files

Leave a Comment / PySpark / Raj

In this post, we will see how you can read parquet files using pyspark and will also see common options and challenges which you must consider while reading or writing parquet files. What is Parquet File Format ? Parquet is a columnar file format and is becoming very popular because of the optimisations it brings

PySpark Read Write Parquet Files Read More »

PySpark

Rename Column Name case in Dataframe

Leave a Comment / PySpark / Raj

Requirement: To change column names to upper case or lower case in PySpark Create a dummy dataframe Convert column names to uppercase in PySpark You can use “withColumnRenamed” function in FOR loop to change all the columns in PySpark dataframe to uppercase by using “upper” function. Convert column names to lowercase in PySpark You can

Rename Column Name case in Dataframe Read More »

PySpark

Spark Case Study – optimise executor memory and cores per executor

2 Comments / Apache Spark / Raj

I was recently working on a task where I have to read more than a Terabyte of data spread across multiple parquet files. Also some filters were applied on that data to get the required result set. I did a small test where I ran the same spark read command with filter condition multiple times.

Spark Case Study – optimise executor memory and cores per executor Read More »

Apache Hadoop

Namenode is in safe mode – Hadoop

Leave a Comment / Amazon EMR / Raj

The most common reason for namenode to go into safemode is due to under-replicated blocks. This is generally caused by storage issues on hdfs or when some jobs like Spark applications are suddenly aborted that leaves temp files which are under-replicated. If your namenode is in safemode then your hadoop cluster is in read-only mode

Namenode is in safe mode – Hadoop Read More »

← Previous 1 2 3 4 … 15 Next →

Topics

  • Amazon EMR
  • Apache HIVE
  • Apache Spark
  • AWS Glue
  • PySpark
  • SQL on Hadoop

Recent Posts

  • AWS Glue create dynamic frame
  • AWS Glue read files from S3
  • How to check Spark run logs in EMR
  • PySpark apply function to column
  • Run Spark Job in existing EMR using AIRFLOW

Recent Posts

  • AWS Glue create dynamic frame
  • AWS Glue read files from S3
  • How to check Spark run logs in EMR
  • PySpark apply function to column
  • Run Spark Job in existing EMR using AIRFLOW

Join the discussion

  1. Ramkumar on Spark Performance Tuning with help of Spark UIFebruary 3, 2025

    Great. Keep writing more articles.

  2. Raj on Free Online SQL to PySpark ConverterAugust 9, 2022

    Thank you for sharing this. I will give it a try as well.

  3. John K-W on Free Online SQL to PySpark ConverterAugust 8, 2022

    Might be interesting to add a PySpark dialect to SQLglot https://github.com/tobymao/sqlglot https://github.com/tobymao/sqlglot/tree/main/sqlglot/dialects

  4. Meena M on Spark Dataframe WHEN caseJuly 28, 2022

    try something like df.withColumn("type", when(col("flag1"), lit("type_1")).when(!col("flag1") && (col("flag2") || col("flag3") || col("flag4") || col("flag5")), lit("type2")).otherwise(lit("other")))

  5. tagu on Free Online SQL to PySpark ConverterJuly 20, 2022

    It will be great if you can have a link to the convertor. It helps the community for anyone starting…

Copyright © 2025 SQL & Hadoop