Skip to content
SQL & Hadoop

SQL on Hadoop with Hive, Spark & PySpark on EMR & AWS Glue

  • Home
  • About
  • Privacy Policy
SQL & Hadoop

SQL on Hadoop with Hive, Spark & PySpark on EMR & AWS Glue

  • About
  • AWS Glue
  • Blog
  • Free Online SQL to PySpark Converter
  • Generate Spark JDBC Connection String online
  • Home
  • Optimise Spark Configurations – Online Generator
  • Privacy Policy
  • PySpark Cheat Sheet
  • Apache Spark Tutorial
AWS Glue

AWS Glue create dynamic frame

Leave a Comment / AWS Glue / Raj

We can create AWS Glue dynamic frame using data present in S3 or tables that exists in Glue catalog. In addition to that we can create dynamic frames using custom connections as well. In this post, we will create new Glue Job that will read S3 & Glue catalog table to create new AWS Glue […]

AWS Glue create dynamic frame Read More »

AWS Glue

AWS Glue read files from S3

Leave a Comment / AWS Glue / Raj

You can use aws glue crawler to read file from S3 and create corresponding table in the Glue catalog. In this tutorial we will read few files present in S3 and will create corresponding tables in AWS Glue catalog. We will use Glue crawler to identify the S3 file schema and create tables. Check the

AWS Glue read files from S3 Read More »

Apache Spark

How to check Spark run logs in EMR

Leave a Comment / Amazon EMR / Raj

Situation: Someone in my team has executed Spark application in EMR and the job failed. The user is new to EMR and does not have much idea how to check the Spark logs. Now he has asked me to debug it and find the error. Only information I have is the yarn application_id. In this

How to check Spark run logs in EMR Read More »

PySpark

PySpark apply function to column

Leave a Comment / PySpark / Raj

PySpark apply function to column in dataframe to get desired transformation as output. In this post, we will see 2 of the most common ways of applying function to column in PySpark. First is applying spark built-in functions to column and second is applying user defined custom function to columns in Dataframe. PySpark apply spark

PySpark apply function to column Read More »

Run Spark applications using Airflow

Run Spark Job in existing EMR using AIRFLOW

Leave a Comment / Amazon EMR / Raj

In this post, we will see how you can run Spark application on existing EMR cluster using Apache Airflow. The most basic way of scheduling jobs in EMR is CRONTAB. But if you have worked with crontab you know how much pain it is to manage and secure it. I will not talk in depth

Run Spark Job in existing EMR using AIRFLOW Read More »

1 2 … 15 Next →

Topics

  • Amazon EMR
  • Apache HIVE
  • Apache Spark
  • AWS Glue
  • PySpark
  • SQL on Hadoop

Recent Posts

  • AWS Glue create dynamic frame
  • AWS Glue read files from S3
  • How to check Spark run logs in EMR
  • PySpark apply function to column
  • Run Spark Job in existing EMR using AIRFLOW

Recent Posts

  • AWS Glue create dynamic frame
  • AWS Glue read files from S3
  • How to check Spark run logs in EMR
  • PySpark apply function to column
  • Run Spark Job in existing EMR using AIRFLOW

Join the discussion

  1. Ramkumar on Spark Performance Tuning with help of Spark UIFebruary 3, 2025

    Great. Keep writing more articles.

  2. Raj on Free Online SQL to PySpark ConverterAugust 9, 2022

    Thank you for sharing this. I will give it a try as well.

  3. John K-W on Free Online SQL to PySpark ConverterAugust 8, 2022

    Might be interesting to add a PySpark dialect to SQLglot https://github.com/tobymao/sqlglot https://github.com/tobymao/sqlglot/tree/main/sqlglot/dialects

  4. Meena M on Spark Dataframe WHEN caseJuly 28, 2022

    try something like df.withColumn("type", when(col("flag1"), lit("type_1")).when(!col("flag1") && (col("flag2") || col("flag3") || col("flag4") || col("flag5")), lit("type2")).otherwise(lit("other")))

  5. tagu on Free Online SQL to PySpark ConverterJuly 20, 2022

    It will be great if you can have a link to the convertor. It helps the community for anyone starting…

Copyright © 2025 SQL & Hadoop