Home

Welcome to my website. I am Nitin Srivastava. A Data Engineer by profession with 15+ years of professional experience.I have worked with multiple enterprises using various technologies supporting Data Analytics requirements.

As a Data Engineer, primary skill has always been SQL. So when I started working on Hadoop projects I was excited to explore different SQL options available in it. I worked a lot on Apache Hive & Apache Spark. 

During early days of Hadoop, it was on-premises Hadoop infrastructure in which enterprises invested heavily. So I got the opportunity to work on Hortonworks, Cloudera & MapR distribution.

From all that experience enterprises realised that Apache Spark is the best bet. Hence Apache Spark turns out to be the best thing coming out of that era. Now Spark is widely used by different enterprises for different data analytics requirements.

After few years, I got the opportunity to work on Apache Spark/Hive on AWS platform primarily leveraging AWS Glue & Amazon EMR.


Nitin Srivastava
Nitin Srivastava


Can you relate to my work experience ? Wish to know more about me ? Then check ABOUT page

Get started on Apache Spark with these free stuff


SQL to PySpark Convertor

SQL to PySpark Convertor

Do you want to convert SQL into PySpark Dataframe code ?

I created this utility as my weekend project. I was able to convert basic sql queries into pyspark code.

I have shared the code used for the project and you are free to use it , customise it as per your requirement.


Read More


Spark Memory Configuration

Spark Memory Configuration Generator

I created this utility when I was learning about optimising spark memory and about memory management.

Try this utility to generate optimised Spark memory configuration for your spark application.


Read More


Spark JDBC Connection

SQL JDBC Connection String Generator

Do you connect Spark to different RDBMS via JDBC ?

Then this utility will help you in quickly generating Spark JDBC connection string for Importing & Exporting data.


Read More


PySpark

PySpark Cheat Sheet

Starting with PySpark ? Check this PySpark Cheat Sheet to help you get started quickly.

Read More

Check my blog post list

In this website I have shared my experience with SQL on “Hadoop” platform. I share posts about Apache Hive, Apache Spark, PySpark , Amazon EMR & AWS Glue.

Apache Hive
Apache Spark
PySpark
Amazon EMR
AWS Glue
Apache Hive

Apache Hive Basics:

Apache Hive Date/Timestamp

Apache Hive Table Design

Apache Spark
Apache Spark Basics

  • spark_major_version
  • spark.sql.optimizer.maxiterations
  • spark recursive query
  • spark sql round
  • spark performance tuning
  • spark dynamicallocation enabled
  • spark executor cores
  • spark configuration
  • spark insert overwrite
  • Apache Spark Dataframe

  • spark select
  • spark alias
  • spark dataframe filter
  • spark isin
  • spark rlike
  • spark case when
  • spark dataframe orderby
  • spark replace
  • spark concat
  • spark drop duplicates
  • spark join
  • spark update column value
  • spark aggregate functions
  • spark union
  • spark column to list
  • spark show
  • spark explode
  • spark dataframe null value
  • monotonically_increasing_id
  • spark repartition
  • spark add multiple columns
  • Apache Spark JDBC

  • spark rdbms
  • spark jdbc
  • spark connection string generator
  • PySpark

    PySpark Basics

    PySpark Dataframe

    PySpark Date

    Amazon EMR
    Amazon EMR – Basics

  • emr no space left on device
  • name node is in safe mode
  • aws emr spark tutorial
  • airflow spark submit example
  • emr logs
  • AWS Glue
    AWS Glue Basics

  • aws glue tutorial
  • glue read from s3
  • glue dynamic frame