Skip to content
SQL & Hadoop

SQL on Hadoop with Hive, Spark & PySpark on EMR & AWS Glue

  • Home
  • About
  • Privacy Policy
SQL & Hadoop

SQL on Hadoop with Hive, Spark & PySpark on EMR & AWS Glue

  • About
  • AWS Glue
  • Blog
  • Free Online SQL to PySpark Converter
  • Generate Spark JDBC Connection String online
  • Home
  • Optimise Spark Configurations – Online Generator
  • Privacy Policy
  • PySpark Cheat Sheet
  • Apache Spark Tutorial
PySpark

PySpark handle scientific number

Leave a Comment / PySpark / Raj

What is scientific notation or exponent number ? Recently I was working on PySpark process in which requirement was to apply some aggregation on big numbers. The result in output was accurate however it was in exponential format or scientific notation which definitely does not look ok in display. I am talking about numbers which […]

PySpark handle scientific number Read More »

PySpark

PySpark script example and how to run pyspark script

1 Comment / PySpark / Raj

In the previous post we saw how to create and run a very basic pyspark script in Hadoop environment. In this post, we will walkthrough a pyspark script template in detail. We will see different options while creating a pyspark script and also how to run a pyspark script with multiple configurations. In this post,

PySpark script example and how to run pyspark script Read More »

Apache Spark

[EMR] 5 settings for better Spark environment

Leave a Comment / Amazon EMR / Raj

I have been working on Spark for many years now. Initially I started with working on on-premises Hadoop cluster using CDH or HDP. In the past few years, I have been working a lot on EMR primarily for Spark or PySpark tasks. In this post, I would like to share some of the general settings

[EMR] 5 settings for better Spark environment Read More »

PySpark

Your first PySpark Script – Create and Run

Leave a Comment / PySpark / Raj

In this post, we will see how you can create your first PySpark script and then run it in batch mode. Many people I have seen use notebooks like Jupyter, Zeppelin however you may want to create pyspark script and run it as per schedule. This is especially helpful if you want to run ETL

Your first PySpark Script – Create and Run Read More »

PySpark

PySpark Filter – 25 examples to teach you everything

Leave a Comment / PySpark / Raj

PySpark Filter is used to specify conditions and only the rows that satisfies those conditions are returned in the output. You can use WHERE or FILTER function in PySpark to apply conditional checks on the input rows and only the rows that pass all the mentioned checks will move to output result set. PySpark WHERE

PySpark Filter – 25 examples to teach you everything Read More »

← Previous 1 2 3 … 15 Next →

Topics

  • Amazon EMR
  • Apache HIVE
  • Apache Spark
  • AWS Glue
  • PySpark
  • SQL on Hadoop

Recent Posts

  • AWS Glue create dynamic frame
  • AWS Glue read files from S3
  • How to check Spark run logs in EMR
  • PySpark apply function to column
  • Run Spark Job in existing EMR using AIRFLOW

Recent Posts

  • AWS Glue create dynamic frame
  • AWS Glue read files from S3
  • How to check Spark run logs in EMR
  • PySpark apply function to column
  • Run Spark Job in existing EMR using AIRFLOW

Join the discussion

  1. Ramkumar on Spark Performance Tuning with help of Spark UIFebruary 3, 2025

    Great. Keep writing more articles.

  2. Raj on Free Online SQL to PySpark ConverterAugust 9, 2022

    Thank you for sharing this. I will give it a try as well.

  3. John K-W on Free Online SQL to PySpark ConverterAugust 8, 2022

    Might be interesting to add a PySpark dialect to SQLglot https://github.com/tobymao/sqlglot https://github.com/tobymao/sqlglot/tree/main/sqlglot/dialects

  4. Meena M on Spark Dataframe WHEN caseJuly 28, 2022

    try something like df.withColumn("type", when(col("flag1"), lit("type_1")).when(!col("flag1") && (col("flag2") || col("flag3") || col("flag4") || col("flag5")), lit("type2")).otherwise(lit("other")))

  5. tagu on Free Online SQL to PySpark ConverterJuly 20, 2022

    It will be great if you can have a link to the convertor. It helps the community for anyone starting…

Copyright © 2025 SQL & Hadoop