In Spark , you can perform aggregate operations on dataframe. This is similar to what we have in SQL like MAX, MIN, SUM etc. We can also perform aggregation on some specific columns which is equivalent to GROUP BY clause we have in typical SQL. Let’s see it with some examples. First method we can use is “agg”. To calculate count, max, min, sum we can use below syntax: scala> df_pres.agg(count($”pres_id”),min($”pres_id”),max($”pres_id”),sum(“pres_id”)).show() +————–+————+————+————+ |count(pres_id)|min(pres_id)|max(pres_id)|sum(pres_id)| +————–+————+————+————+ | 45| 1| 45| 1035| +————–+————+————+————+ Let’s add alias name to columns. scala> df_pres.agg(count($”pres_id”).as(“count”),min($”pres_id”).as(“min”),max($”pres_id”).as(“max”),sum(“pres_id”).as(“sum”)).show() +—–+—+—+—-+ |count|min|max| sum| +—–+—+—+—-+ | 45| 1| 45|1035| +—–+—+—+—-+ Second method is to use “agg” with “groupBy”Read More →