Skip to content

SQL & Hadoop

Learn SQL on Hadoop with examples
  • Home
  • SPARK-SQL Dataframe
  • Privacy Policy
  • About

SQL & Hadoop

Learn SQL on Hadoop with examples
  • Home
  • SPARK-SQL Dataframe
  • Privacy Policy
  • About

Topics

  • Apache HIVE
  • Apache Spark
  • PySpark
  • SQL on Hadoop

Recent Posts

  • Spark Performance Tuning with help of Spark UI
  • PySpark -Convert SQL queries to Dataframe
  • Problem with Decimal Rounding & solution
  • Never run INSERT OVERWRITE again – try Hadoop Distcp
  • Columnar Storage & why you must use it
  • PySpark RDD operations – Map, Filter, SortBy, reduceByKey, Joins
  • Basic RDD operations in PySpark
  • Spark Dataframe add multiple columns with value
  • Spark Dataframe Repartition
  • Spark Dataframe – monotonically_increasing_id
  • Spark Dataframe NULL values
  • Spark Dataframe – Explode
  • Spark Dataframe SHOW
  • Spark Dataframe Column list
  • Spark Dataframe – UNION/UNION ALL
Apache Spark

Spark Performance Tuning with help of Spark UI

  • by Raj
  • July 3, 2020
  • Apache Spark

Spark is distributed data processing engine which relies a lot on memory available for computation. Also if you have worked on spark, then you must… Read More »Spark Performance Tuning with help of Spark UI

PySpark

PySpark -Convert SQL queries to Dataframe

  • by Raj
  • February 7, 2020November 4, 2020
  • PySpark

In PySpark, you can run dataframe commands or if you are comfortable with SQL then you can run SQL queries too. In this post, we… Read More »PySpark -Convert SQL queries to Dataframe

Apache Spark

Problem with Decimal Rounding & solution

  • by Raj
  • February 7, 2020August 23, 2020
  • Apache Spark

If you migrate from any RDBMS platform to another, one technical challenge you may face is different Decimal Rounding on both the platforms. I was… Read More »Problem with Decimal Rounding & solution

Apache Spark

Never run INSERT OVERWRITE again – try Hadoop Distcp

  • by Raj
  • February 5, 2020August 23, 2020
  • Apache Spark

Recently, I was working on one project where the ETL requirement was to have daily snapshot of the table. It was 15+ years old data… Read More »Never run INSERT OVERWRITE again – try Hadoop Distcp

Apache Hadoop

Columnar Storage & why you must use it

  • by Raj
  • February 4, 2020April 17, 2020
  • 1 Comment
  • SQL on Hadoop

If you are working on Hadoop or any other platform and storing structured data, I am sure you must have heard about columnar storage types.… Read More »Columnar Storage & why you must use it

PySpark

PySpark RDD operations – Map, Filter, SortBy, reduceByKey, Joins

  • by Raj
  • July 29, 2019August 23, 2020
  • PySpark

In the last post, we discussed about basic operations on RDD in PySpark. In this post, we will see other common operations one can perform… Read More »PySpark RDD operations – Map, Filter, SortBy, reduceByKey, Joins

  • 1
  • 2
  • 3
  • …
  • 9
  • Next »

Stay in Touch

What others are reading

  • Spark Dataframe WHERE Filter
  • Hive Date Functions - all possible Date operations
  • Spark Dataframe - Distinct or Drop Duplicates
  • How to Subtract TIMESTAMP-DATE-TIME in HIVE
  • Spark Dataframe NULL values
  • Spark Dataframe LIKE NOT LIKE RLIKE
  • Hive - BETWEEN
  • Spark Dataframe Replace String
  • SPARK Dataframe Alias AS
  • Spark Dataframe concatenate strings
© 2021 sqlandhadoop.com
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.Accept Read More
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.