JOINS are used to retrieve data from more than one table or dataframes. You can replicate almost all types of joins possible in any typical… Read More »Spark Dataframe JOINS – Only post you need to read
DISTINCT or dropDuplicates is used to remove duplicate rows in the Dataframe. Row consists of columns, if you are selecting only one column then output… Read More »Spark Dataframe – Distinct or Drop Duplicates
Recently I was working on a project in which client data warehouse was in Teradata. The requirement was to have something similar on Hadoop also… Read More »How to implement recursive queries in Spark?