Skip to content

SQL & Hadoop

SQL on Hadoop with Hive, Spark & PySpark on EMR & AWS Glue

  • Home
  • About
  • Contact
  • Privacy Policy

SQL & Hadoop

SQL on Hadoop with Hive, Spark & PySpark on EMR & AWS Glue

Close menu
  • Home
  • About
  • Contact
  • Privacy Policy

SQL & Hadoop

SQL on Hadoop with Hive, Spark & PySpark on EMR & AWS Glue

Toggle menu

Category: Amazon EMR

How to check Spark run logs in EMR

Situation: Someone in my team has executed Spark application in EMR and the job failed. The user is new to EMR and does not have much idea how to check the Spark logs. Now he has asked me to debug […]

Read more
Amazon EMRBy Raj0 comments

Run Spark Job in existing EMR using AIRFLOW

In this post, we will see how you can run Spark application on existing EMR cluster using Apache Airflow. The most basic way of scheduling jobs in EMR is CRONTAB. But if you have worked with crontab you know how […]

Read more
Amazon EMRBy Raj0 comments

[EMR] 5 settings for better Spark environment

I have been working on Spark for many years now. Initially I started with working on on-premises Hadoop cluster using CDH or HDP. In the past few years, I have been working a lot on EMR primarily for Spark or […]

Read more
Amazon EMRBy Raj0 comments

Namenode is in safe mode – Hadoop

The most common reason for namenode to go into safemode is due to under-replicated blocks. This is generally caused by storage issues on hdfs or when some jobs like Spark applications are suddenly aborted that leaves temp files which are […]

Read more
Amazon EMRBy Raj0 comments

EMR – No space left on device [Solved]

I was recently working on EMR running some pyspark jobs and I encountered “No space left on device” error. Now the error seems to be obvious that the system has run out of storage space and require some clean up. […]

Read more
Amazon EMRBy Raj0 comments

Recent Posts

  • AWS Glue create dynamic frame
  • AWS Glue read files from S3
  • How to check Spark run logs in EMR
  • PySpark apply function to column
  • Run Spark Job in existing EMR using AIRFLOW

Join the discussion

  1. Raj on Free Online SQL to PySpark ConverterAugust 9, 2022

    Thank you for sharing this. I will give it a try as well.

  2. John K-W on Free Online SQL to PySpark ConverterAugust 8, 2022

    Might be interesting to add a PySpark dialect to SQLglot https://github.com/tobymao/sqlglot https://github.com/tobymao/sqlglot/tree/main/sqlglot/dialects

  3. Meena M on Spark Dataframe WHEN caseJuly 28, 2022

    try something like df.withColumn("type", when(col("flag1"), lit("type_1")).when(!col("flag1") && (col("flag2") || col("flag3") || col("flag4") || col("flag5")), lit("type2")).otherwise(lit("other")))

  4. tagu on Free Online SQL to PySpark ConverterJuly 20, 2022

    It will be great if you can have a link to the convertor. It helps the community for anyone starting…

  5. Kyle on Hive Date Functions – all possible Date operationsMay 13, 2022

    I am wondering if there is a way to preserve time information when adding/subtracting days from a datetime. If I…

© 2023 SQL & Hadoop.