Welcome to my website. I am Nitin Srivastava. A Data Engineer by profession with 15+ years of professional experience.I have worked with multiple enterprises using various technologies supporting Data Analytics requirements.
As a Data Engineer, primary skill has always been SQL. So when I started working on Hadoop projects I was excited to explore different SQL options available in it. I worked a lot on Apache Hive & Apache Spark.
During early days of Hadoop, it was on-premises Hadoop infrastructure in which enterprises invested heavily. So I got the opportunity to work on Hortonworks, Cloudera & MapR distribution.
From all that experience enterprises realised that Apache Spark is the best bet. Hence Apache Spark turns out to be the best thing coming out of that era. Now Spark is widely used by different enterprises for different data analytics requirements.
After few years, I got the opportunity to work on Apache Spark/Hive on AWS platform primarily leveraging AWS Glue & Amazon EMR.
Can you relate to my work experience ? Wish to know more about me ? Then check ABOUT page
Get started on Apache Spark with these free stuff
SQL to PySpark Convertor
Do you want to convert SQL into PySpark Dataframe code ?
I created this utility as my weekend project. I was able to convert basic sql queries into pyspark code.
I have shared the code used for the project and you are free to use it , customise it as per your requirement.
Spark Memory Configuration Generator
I created this utility when I was learning about optimising spark memory and about memory management.
Try this utility to generate optimised Spark memory configuration for your spark application.
SQL JDBC Connection String Generator
Do you connect Spark to different RDBMS via JDBC ?
Then this utility will help you in quickly generating Spark JDBC connection string for Importing & Exporting data.
PySpark Cheat Sheet
Starting with PySpark ? Check this PySpark Cheat Sheet to help you get started quickly.
Read More
Check my blog post list
In this website I have shared my experience with SQL on “Hadoop” platform. I share posts about Apache Hive, Apache Spark, PySpark , Amazon EMR & AWS Glue.
Apache Hive
Apache Spark
PySpark
Amazon EMR
AWS Glue
Apache Hive
Apache Hive Basics:
- hive sql tutorial
- hive variables
- hive partition
- hive select query
- hive distinct
- hive where
- hive subquery example
- hive between
- bucketized tables do not support
Apache Hive Date/Timestamp
Apache Hive Table Design
Apache Spark
Apache Spark Basics
Apache Spark Dataframe
Apache Spark JDBC
PySpark
PySpark Basics
- first pyspark script
- pyspark script
- zipwithindex
- convert from teradata to pyspark
- pyspark rdd operations
- pyspark map_filter
- pyspark md5 hash
- pyspark read csv
- pyspark read parquet
- sql to pyspark converter – Concept
- sql to dataframe conversion – Manual
- sql to pyspark converter – Automation
PySpark Dataframe
- pyspark distinct
- pyspark lowercase
- pyspark filter
- pyspark cheat sheet
- pyspark format number
- pyspark apply function to column
PySpark Date
Amazon EMR
Amazon EMR – Basics
AWS Glue
AWS Glue Basics