Welcome to my website. I am Nitin Srivastava. A Data Engineer by profession with 15+ years of professional experience.I have worked with multiple enterprises using various technologies supporting Data Analytics requirements.

As a Data Engineer, primary skill has always been SQL. So when I started working on Hadoop projects I was excited to explore different SQL options available in it. I worked a lot on Apache Hive & Apache Spark. 

During early days of Hadoop, it was on-premises Hadoop infrastructure in which enterprises invested heavily. So I got the opportunity to work on Hortonworks, Cloudera & MapR distribution.

From all that experience enterprises realised that Apache Spark is the best bet. Hence Apache Spark turns out to be the best thing coming out of that era. Now Spark is widely used by different enterprises for different data analytics requirements.

After few years, I got the opportunity to work on Apache Spark/Hive on AWS platform primarily leveraging AWS Glue & Amazon EMR.

Nitin Srivastava
Nitin Srivastava

Get started on Apache Spark with these free stuff

SQL to PySpark Convertor

Do you want to convert SQL into PySpark Dataframe code ?

I created this utility as my weekend project. I was able to convert basic sql queries into pyspark code.

I have shared the code used for the project and you are free to use it , customise it as per your requirement.

Spark Memory Configuration Generator

I created this utility when I was learning about optimising spark memory and about memory management.

Try this utility to generate optimised Spark memory configuration for your spark application.

SQL JDBC Connection String Generator

Do you connect Spark to different RDBMS via JDBC ?

Then this utility will help you in quickly generating Spark JDBC connection string for Importing & Exporting data.

PySpark Cheat Sheet
Starting with PySpark ? Check this PySpark Cheat Sheet to help you get started quickly.

Check my blog post list

In this website I have shared my experience with SQL on “Hadoop” platform. I share posts about Apache Hive, Apache Spark, PySpark , Amazon EMR & AWS Glue.