In PySpark, you can do almost all the date operations you can think of using in-built functions. Let’s quickly jump to example and see it one by one. Create a dataframe with sample date values: >>>df_1 = spark.createDataFrame([(‘2019-02-20′,’2019-10-18’,)],[‘start_dt’,’end_dt’]) Check dataframe info >>> df_1 DataFrame[start_dt: string, end_dt: string] Now the problem I see here is that columns start_dt & end_dt are of type string and not date. So let’s quickly convert it into date. >>> df_2 = df_1.select(df_1.start_dt.cast(‘date’),df_1.end_dt.cast(‘date’)) >>> df_2 DataFrame[start_dt: date, end_dt: date] Now we are good. We have a dataframe with 2 columns start_dt & end_dt. Both the columns are of datatype ‘date’. Let’sRead More →