In PySpark, you can do almost all the date operations you can think of using in-built functions. Let’s quickly jump to example and see it one by one.
Create a dataframe with sample date values:
>>>df_1 = spark.createDataFrame([('2019-02-20','2019-10-18',)],['start_dt','end_dt'])
Check dataframe info
>>> df_1 DataFrame[start_dt: string, end_dt: string]
Now the problem I see here is that columns start_dt & end_dt are of type string and not date. So let’s quickly convert it into date.
>>> df_2 = df_1.select(df_1.start_dt.cast('date'),df_1.end_dt.cast('date'))
>>> df_2
DataFrame[start_dt: date, end_dt: date]
Now we are good. We have a dataframe with 2 columns start_dt & end_dt. Both the columns are of datatype ‘date’. Let’s do some Date operations on this.
PySpark Change Date Format
>>> df_2.select("start_dt","end_dt",date_format("start_dt",'dd/MM/yyyy').alias("dt_format")).show()
+----------+----------+----------+
| start_dt| end_dt| dt_format|
+----------+----------+----------+
|2019-02-20|2019-10-18|20/02/2019|
+----------+----------+----------+
PySpark Fetch Current Date
>>> df_2.select("start_dt","end_dt",current_date().alias("cur_dt")).show()
+----------+----------+----------+
| start_dt| end_dt| cur_dt|
+----------+----------+----------+
|2019-02-20|2019-10-18|2019-03-17|
+----------+----------+----------+
PySpark Add Days to date
>>> df_2.select("start_dt","end_dt",date_add("start_dt",2).alias("add_2_dt")).show()
+----------+----------+----------+
| start_dt| end_dt| add_2_dt|
+----------+----------+----------+
|2019-02-20|2019-10-18|2019-02-22|
+----------+----------+----------+
Here we have added 2 days to start_dt. Just change the number with number of days you want to add to any date.
PySpark Add Months to date
>>> df_2.select("start_dt","end_dt",add_months("start_dt",2).alias("add_2_mth")).show()
+----------+----------+----------+
| start_dt| end_dt| add_2_mth|
+----------+----------+----------+
|2019-02-20|2019-10-18|2019-04-20|
+----------+----------+----------+
Here we have added 2 months to start_dt. Just change the number with number of months you want to add to any date.
PySpark Add Years to date
>>> df_2.select("start_dt","end_dt",add_months("start_dt",2*12).alias("add_2_yrs")).show()
+----------+----------+----------+
| start_dt| end_dt| add_2_yrs|
+----------+----------+----------+
|2019-02-20|2019-10-18|2021-02-20|
+----------+----------+----------+
Here we have added 24 (12*2) months to start_dt. Just change the number with number of years you want to add to any date.
PySpark Subtract 2 dates
>>> df_2.select("start_dt","end_dt",datediff("end_dt","start_dt").alias("sub_2_dt")).show()
+----------+----------+--------+
| start_dt| end_dt|sub_2_dt|
+----------+----------+--------+
|2019-02-20|2019-10-18| 240|
+----------+----------+--------+
datediff returns the number of days between 2 dates.
PySpark Extract Year from Date
>>> df_2.select("start_dt","end_dt",year("start_dt").alias("ext_year")).show()
+----------+----------+--------+
| start_dt| end_dt|ext_year|
+----------+----------+--------+
|2019-02-20|2019-10-18| 2019|
+----------+----------+--------+
PySpark Extract Month from Date
>>> df_2.select("start_dt","end_dt",month("start_dt").alias("ext_month")).show()
+----------+----------+--------+
| start_dt| end_dt|ext_month|
+----------+----------+--------+
|2019-02-20|2019-10-18| 2|
+----------+----------+--------+
PySpark Extract Day from Date
>>> df_2.select("start_dt","end_dt",dayofmonth("start_dt").alias("ext_day")).show()
+----------+----------+-------+
| start_dt| end_dt|ext_day|
+----------+----------+-------+
|2019-02-20|2019-10-18| 20|
+----------+----------+-------+
PySpark Subtract days from date
>>> df_2.select("start_dt","end_dt",date_sub("start_dt",2).alias("sub_2_dt")).show()
+----------+----------+----------+
| start_dt| end_dt| sub_2_dt|
+----------+----------+----------+
|2019-02-20|2019-10-18|2019-02-18|
+----------+----------+----------+
Here we have subtracted 2 days from start_dt.
PySpark Fetch Day of the year
>>> df_2.select("start_dt","end_dt",dayofyear("start_dt").alias("day_of_year")).show()
+----------+----------+-----------+
| start_dt| end_dt|day_of_year|
+----------+----------+-----------+
|2019-02-20|2019-10-18| 51|
+----------+----------+-----------+
This is useful if you want to check “Julian Date”
PySpark Last Day of Month
>>> df_2.select("start_dt","end_dt",last_day("start_dt").alias("last_day")).show()
+----------+----------+----------+
| start_dt| end_dt| last_day|
+----------+----------+----------+
|2019-02-20|2019-10-18|2019-02-28|
+----------+----------+----------+
PySpark Determine how many months between 2 Dates
>>> df_2.select("start_dt","end_dt",months_between("end_dt","start_dt").alias("mth_btw")).show()
+----------+----------+----------+
| start_dt| end_dt| mth_btw|
+----------+----------+----------+
|2019-02-20|2019-10-18|7.93548387|
+----------+----------+----------+
If you want months between to be whole number only then you can use below transformation
>>> df_2.select("start_dt","end_dt",round(months_between("end_dt","start_dt")).cast('int').alias("mth_btw")).show()
+----------+----------+-------+
| start_dt| end_dt|mth_btw|
+----------+----------+-------+
|2019-02-20|2019-10-18| 8|
+----------+----------+-------+
PySpark Identify date of next Monday
>>> df_2.select("start_dt","end_dt",next_day("start_dt","Mon").alias("nxt_Mon")).show()
+----------+----------+----------+
| start_dt| end_dt| nxt_Mon|
+----------+----------+----------+
|2019-02-20|2019-10-18|2019-02-25|
+----------+----------+----------+
Identify date of next Tuesday
>>> df_2.select("start_dt","end_dt",next_day("start_dt","Tue").alias("nxt_Tue")).show()
+----------+----------+----------+
| start_dt| end_dt| nxt_Tue|
+----------+----------+----------+
|2019-02-20|2019-10-18|2019-02-26|
+----------+----------+----------+
Identify date of next Wednesday
>>> df_2.select("start_dt","end_dt",next_day("start_dt","Wed").alias("nxt_Wed")).show()
+----------+----------+----------+
| start_dt| end_dt| nxt_Wed|
+----------+----------+----------+
|2019-02-20|2019-10-18|2019-02-27|
+----------+----------+----------+
Identify date of next Thursday
>>> df_2.select("start_dt","end_dt",next_day("start_dt","Thu").alias("nxt_Thu")).show()
+----------+----------+----------+
| start_dt| end_dt| nxt_Thu|
+----------+----------+----------+
|2019-02-20|2019-10-18|2019-02-21|
+----------+----------+----------+
Identify date of next Friday
>>> df_2.select("start_dt","end_dt",next_day("start_dt","Fri").alias("nxt_Fri")).show()
+----------+----------+----------+
| start_dt| end_dt| nxt_Fri|
+----------+----------+----------+
|2019-02-20|2019-10-18|2019-02-22|
+----------+----------+----------+
Identify date of next Saturday
>>> df_2.select("start_dt","end_dt",next_day("start_dt","Sat").alias("nxt_Sat")).show()
+----------+----------+----------+
| start_dt| end_dt| nxt_Sat|
+----------+----------+----------+
|2019-02-20|2019-10-18|2019-02-23|
+----------+----------+----------+
Identify date of next Sunday
>>> df_2.select("start_dt","end_dt",next_day("start_dt","Sun").alias("nxt_Sun")).show()
+----------+----------+----------+
| start_dt| end_dt| nxt_Sun|
+----------+----------+----------+
|2019-02-20|2019-10-18|2019-02-24|
+----------+----------+----------+
PySpark Fetch quarter of the year
>>> df_2.select("start_dt","end_dt",quarter("start_dt").alias("Quarter_of_Year")).show()
+----------+----------+---------------+
| start_dt| end_dt|Quarter_of_Year|
+----------+----------+---------------+
|2019-02-20|2019-10-18| 1|
+----------+----------+---------------+
PySpark Truncate Date to Year
>>> df_2.select("start_dt","end_dt",trunc("start_dt","year").alias("trunc_Year")).show()
+----------+----------+----------+
| start_dt| end_dt|trunc_Year|
+----------+----------+----------+
|2019-02-20|2019-10-18|2019-01-01|
+----------+----------+----------+
PySpark Truncate Date to Month
>>> df_2.select("start_dt","end_dt",trunc("start_dt","month").alias("trunc_Month")).show()
+----------+----------+-----------+
| start_dt| end_dt|trunc_Month|
+----------+----------+-----------+
|2019-02-20|2019-10-18| 2019-02-01|
+----------+----------+-----------+
PySpark Fetch week of the Year
>>> df_2.select("start_dt","end_dt",weekofyear("start_dt").alias("week_of_year")).show()
+----------+----------+------------+
| start_dt| end_dt|week_of_year|
+----------+----------+------------+
|2019-02-20|2019-10-18| 8|
+----------+----------+------------+
If there is any other DATE operation which you do and is missing here, let us know in comments.

What about a minimum date – say you want to replace all dates that are less than a certain date with like 1900-01-01?
How to do WTD ? and idetify week start date and week end date