Recently many people reached out to me requesting if I can assist them in learning PySpark , I thought of coming up with a utility which can convert SQL to PySpark code. I am sharing my weekend project with you guys where I have given a try to convert input SQL into PySpark dataframe code.
Feel free to use it and share you feedback. I am reading your feedback & comments now and will release new version of the utility depending on comments. So please do leave comment.
Generate PySpark Code Automatically
Enter your SQL here
It is almost impossible to cover all types SQL and this utility is no exception. Considering this is my weekend project and I am still working on it, the SQL coverage may not be as much you or I would have loved to cover. That being said, I would like to share some points with you which you can consider while using the utility.
- The utility does not support JOIN & SUBQUERIES right now.
- While using aggregate functions make sure to use group by too
- Try to use alias for derived columns.
- Look at the sample query and you can use similar SQL to convert to PySpark.
- I have tried to make sure that the output generated is accurate however I will recommend you to verify the results at your end too.