Blog – Page 5 – SQL & Hadoop

Spark Performance Tuning with help of Spark UI

Spark is distributed data processing engine which relies a lot on memory available for computation. Also if you have worked on spark, then you must have faced job/task/stage failures due to memory issues. Hence making memory management as one of the key techniques for efficient Spark environment. In this post, we will see how Spark […]

Spark Performance Tuning with help of Spark UI Read More »

PySpark -Convert SQL queries to Dataframe

Problem with Decimal Rounding & solution

Never run INSERT OVERWRITE again – try Hadoop Distcp

Columnar Storage & why you must use it

2 Comments / SQL on Hadoop / Raj

If you are working on Hadoop or any other platform and storing structured data, I am sure you must have heard about columnar storage types. In the past 7-8 years the popularity “columnar” has gained confirms that the buzz is not a bubble and this is the future of Data Analytics from storage perspective. What

Columnar Storage & why you must use it Read More »

Ramkumar on Spark Performance Tuning with help of Spark UIFebruary 3, 2025
Great. Keep writing more articles.
Raj on Free Online SQL to PySpark ConverterAugust 9, 2022
Thank you for sharing this. I will give it a try as well.
John K-W on Free Online SQL to PySpark ConverterAugust 8, 2022
Might be interesting to add a PySpark dialect to SQLglot https://github.com/tobymao/sqlglot https://github.com/tobymao/sqlglot/tree/main/sqlglot/dialects
Meena M on Spark Dataframe WHEN caseJuly 28, 2022
try something like df.withColumn("type", when(col("flag1"), lit("type_1")).when(!col("flag1") && (col("flag2") || col("flag3") || col("flag4") || col("flag5")), lit("type2")).otherwise(lit("other")))
tagu on Free Online SQL to PySpark ConverterJuly 20, 2022
It will be great if you can have a link to the convertor. It helps the community for anyone starting…