As the name suggests, FILTER is used in Spark SQL to filter out records as per the requirement. If you do not want complete data set and just wish to fetch few records which satisfy some condition then you can use FILTER function. It is equivalent to SQL “WHERE” clause and is more commonly used in Spark-SQL.
Let’s fetch all the presidents who were born in New York.
scala> df_pres.filter($"pres_bs"==="New York").SELECT($"pres_name",
$"pres_dob".alias("Date Of Birth"),
From performance perspective, it is highly recommended to use FILTER at the beginning so that subsequent operations handle less volume of data. In the next post, we will see how to specify IN or NOT IN conditions in FILTER.
Privacy & Cookies Policy
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.