Skip to content
SQL & Hadoop

SQL on Hadoop with Hive, Spark & PySpark on EMR & AWS Glue

  • Home
  • About
  • Privacy Policy
SQL & Hadoop

SQL on Hadoop with Hive, Spark & PySpark on EMR & AWS Glue

  • About
  • AWS Glue
  • Blog
  • Free Online SQL to PySpark Converter
  • Generate Spark JDBC Connection String online
  • Home
  • Optimise Spark Configurations – Online Generator
  • Privacy Policy
  • PySpark Cheat Sheet
  • Apache Spark Tutorial
Apache Spark

Spark Dataframe LIKE NOT LIKE RLIKE

7 Comments / Apache Spark / Raj

LIKE condition is used in situation when you don’t know the exact value or you are looking for some specific word pattern in the output. LIKE is similar as in SQL and can be used to specify any pattern in WHERE/FILTER or even in JOIN conditions. Spark LIKE Let’s see an example to find out […]

Spark Dataframe LIKE NOT LIKE RLIKE Read More »

Apache Spark

Spark Dataframe IN-ISIN-NOT IN

Leave a Comment / Apache Spark / Raj

IN or NOT IN conditions are used in FILTER/WHERE or even in JOINS when we have to specify multiple possible values for any column. If the value is one of the values mentioned inside “IN” clause then it will qualify. It is opposite for “NOT IN” where the value must not be among any one

Spark Dataframe IN-ISIN-NOT IN Read More »

Apache Spark

Spark Dataframe Filter

Leave a Comment / Apache Spark / Raj

As the name suggests, spark dataframe FILTER is used in Spark SQL to filter out records as per the requirement. If you do not want complete data set and just wish to fetch few records which satisfy some condition then you can use FILTER function. It is equivalent to SQL “WHERE” clause and is more

Spark Dataframe Filter Read More »

Apache Spark

SPARK Dataframe Alias AS

3 Comments / Apache Spark / Raj

ALIAS is defined in order to make columns or tables name more readable or even shorter. If you wish to rename your columns while displaying it to the user or if you are using tables in joins then you may need to have alias for table names. Other than making column names or table names

SPARK Dataframe Alias AS Read More »

Apache Spark

Spark Dataframe Select

1 Comment / Apache Spark / Raj

In this post, we will see how to fetch data from HIVE table into SPARK DataFrame and perform few SQL like “SELECT” operations on it. I have a table in HIVE database which has details of all the US Presidents (src: https://en.wikipedia.org/) .If you don’t know how to create table in hive or load data

Spark Dataframe Select Read More »

← Previous 1 … 12 13 14 15 Next →

Topics

  • Amazon EMR
  • Apache HIVE
  • Apache Spark
  • AWS Glue
  • PySpark
  • SQL on Hadoop

Recent Posts

  • AWS Glue create dynamic frame
  • AWS Glue read files from S3
  • How to check Spark run logs in EMR
  • PySpark apply function to column
  • Run Spark Job in existing EMR using AIRFLOW

Recent Posts

  • AWS Glue create dynamic frame
  • AWS Glue read files from S3
  • How to check Spark run logs in EMR
  • PySpark apply function to column
  • Run Spark Job in existing EMR using AIRFLOW

Join the discussion

  1. Ramkumar on Spark Performance Tuning with help of Spark UIFebruary 3, 2025

    Great. Keep writing more articles.

  2. Raj on Free Online SQL to PySpark ConverterAugust 9, 2022

    Thank you for sharing this. I will give it a try as well.

  3. John K-W on Free Online SQL to PySpark ConverterAugust 8, 2022

    Might be interesting to add a PySpark dialect to SQLglot https://github.com/tobymao/sqlglot https://github.com/tobymao/sqlglot/tree/main/sqlglot/dialects

  4. Meena M on Spark Dataframe WHEN caseJuly 28, 2022

    try something like df.withColumn("type", when(col("flag1"), lit("type_1")).when(!col("flag1") && (col("flag2") || col("flag3") || col("flag4") || col("flag5")), lit("type2")).otherwise(lit("other")))

  5. tagu on Free Online SQL to PySpark ConverterJuly 20, 2022

    It will be great if you can have a link to the convertor. It helps the community for anyone starting…

Copyright © 2025 SQL & Hadoop