What is Repartition in Spark ?
Spark Repartition is the process of movement or shuffling of data into given number of logical partitions. Repartition is done on the basis of some column or expression or in a random manner. Default number of shuffle partitions in Spark is 200.
Where do I use repartition in Spark ?
You may want to do Repartition when you have understanding of your data. Also you can improve the performance of dataframe transformations like joins , merge by repartitioning it on the basis of some key columns.
Other common use-case for repartition is during dataframe write operation. When you want to restrict number of output file parts generated during spark dataframe write.
Should I repartition Spark Dataframe ?
Kindly understand that repartition is a costly operation because it requires shuffling of all the data across nodes. You can increase or decrease the number of partitions using “Repartition” method. Only use repartition when you understand your data and are sure that it will help in optimising subsequent dataframe transformations and actions.
How do you repartition in Spark ?
Apply the repartition method to existing dataframe to create desired number of logical partitions for any given dataframe. Repartition method takes input argument which can be a number or a column or an expression on the basis of which it will generate output partitions.
Let’s see this with an example:
scala> df_states.show() +-----------+----------+-------------+ | state_name|state_abbr|state_capital| +-----------+----------+-------------+ | Alabama| AL| Montgomery| | Alaska| AK| Juneau| | Arizona| AZ| Phoenix| | Arkansas| AR| Little Rock| | California| CA| Sacramento| | Colorado| CO| Denver| |Connecticut| CT| Hartford| | Delaware| DE| Dover| | Florida| FL| Tallahassee| | Georgia| GA| Atlanta| | Hawaii| HI| Honolulu| | Idaho| ID| Boise| | Illinois| IL| Springfield| | Indiana| IN| Indianapolis| | Iowa| IA| Des Moines| | Kansas| KS| Topeka| | Kentucky| KY| Frankfort| | Louisiana| LA| Baton Rouge| | Maine| ME| Augusta| | Maryland| MD| Annapolis| +-----------+----------+-------------+ only showing top 20 rows
Spark check number of partitions for dataframe
You can check number of partitions for given dataframe by converting to rdd and applying partitions & size to it.
scala> df_states.rdd.partitions.size res6: Int = 1
So this means all the data is present in 1 partition only.
Spark change number of partitions in a dataframe
Re-Partition by giving number of partitions you want (say 5) and verify partitions size.
scala> val df_states_part5 = df_states.repartition(5) df_states_part5: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [state_name: string, state_abbr: string ... 1 more field] scala> df_states_part5.rdd.partitions.size res7: Int = 5
So we have repartitioned existing dataframe from 1 partition to 5. The data was “randomly” shuffled to number of partitions required.This can be confirmed from explain plan.
scala> df_states_part5.explain() == Physical Plan == Exchange RoundRobinPartitioning(5)
Spark repartition dataframe based on column
You can also specify the column on the basis of which repartition is required. The data is repartitioned using “HASH” and number of partition will be determined by value set for “numpartitions” i.e.spark.sql.shuffle.partitions. Change this value if want different number of partitions.
scala> val df_states_partCol = df_states.repartition($"state_abbr") df_states_partCol: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [state_name: string, state_abbr: string ... 1 more field] scala> df_states_partCol.explain() == Physical Plan == Exchange hashpartitioning(state_abbr#35, 200) scala> spark.sql("set spark.sql.shuffle.partitions").show(false) +----------------------------+-----+ |key |value| +----------------------------+-----+ |spark.sql.shuffle.partitions|200 | +----------------------------+-----+
The number of partitions determine the file parts created when the dataframe is saved as file. Since the values were less i.e. 50 and also for some HASH values resultant partition was same, we will get 200 parts but most of them will be empty files. It determines by using formula: VALUE.hashCode()%numpartitions. Let’s verify this too.
scala> df_states_partCol.write.format("csv").save("/tmp/raj/dfdata")
[root@sandbox-hdp ~]# hdfs dfs -ls -S /tmp/raj/dfdata/ Found 201 items -rw-r--r-- 1 root hdfs 81 2019-08-21 04:10 /tmp/raj/dfdata/part-00004-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 81 2019-08-21 04:10 /tmp/raj/dfdata/part-00110-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 79 2019-08-21 04:10 /tmp/raj/dfdata/part-00175-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 77 2019-08-21 04:10 /tmp/raj/dfdata/part-00049-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 77 2019-08-21 04:10 /tmp/raj/dfdata/part-00066-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 76 2019-08-21 04:10 /tmp/raj/dfdata/part-00091-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 49 2019-08-21 04:10 /tmp/raj/dfdata/part-00115-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 48 2019-08-21 04:10 /tmp/raj/dfdata/part-00185-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 47 2019-08-21 04:10 /tmp/raj/dfdata/part-00078-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 47 2019-08-21 04:10 /tmp/raj/dfdata/part-00154-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 46 2019-08-21 04:10 /tmp/raj/dfdata/part-00047-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 46 2019-08-21 04:10 /tmp/raj/dfdata/part-00065-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 46 2019-08-21 04:10 /tmp/raj/dfdata/part-00195-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 45 2019-08-21 04:10 /tmp/raj/dfdata/part-00009-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 45 2019-08-21 04:10 /tmp/raj/dfdata/part-00070-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 45 2019-08-21 04:10 /tmp/raj/dfdata/part-00097-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 44 2019-08-21 04:10 /tmp/raj/dfdata/part-00042-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 44 2019-08-21 04:10 /tmp/raj/dfdata/part-00051-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 44 2019-08-21 04:10 /tmp/raj/dfdata/part-00183-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 42 2019-08-21 04:10 /tmp/raj/dfdata/part-00010-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 42 2019-08-21 04:10 /tmp/raj/dfdata/part-00116-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 41 2019-08-21 04:10 /tmp/raj/dfdata/part-00080-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 41 2019-08-21 04:10 /tmp/raj/dfdata/part-00095-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 41 2019-08-21 04:10 /tmp/raj/dfdata/part-00108-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 40 2019-08-21 04:10 /tmp/raj/dfdata/part-00055-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 40 2019-08-21 04:10 /tmp/raj/dfdata/part-00071-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 40 2019-08-21 04:10 /tmp/raj/dfdata/part-00074-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 39 2019-08-21 04:10 /tmp/raj/dfdata/part-00059-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 39 2019-08-21 04:10 /tmp/raj/dfdata/part-00094-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 39 2019-08-21 04:10 /tmp/raj/dfdata/part-00167-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 37 2019-08-21 04:10 /tmp/raj/dfdata/part-00016-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 37 2019-08-21 04:10 /tmp/raj/dfdata/part-00130-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 36 2019-08-21 04:10 /tmp/raj/dfdata/part-00054-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 36 2019-08-21 04:10 /tmp/raj/dfdata/part-00075-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 36 2019-08-21 04:10 /tmp/raj/dfdata/part-00165-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 36 2019-08-21 04:10 /tmp/raj/dfdata/part-00199-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 35 2019-08-21 04:10 /tmp/raj/dfdata/part-00067-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 35 2019-08-21 04:10 /tmp/raj/dfdata/part-00107-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 34 2019-08-21 04:10 /tmp/raj/dfdata/part-00083-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 34 2019-08-21 04:10 /tmp/raj/dfdata/part-00179-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 33 2019-08-21 04:10 /tmp/raj/dfdata/part-00030-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 33 2019-08-21 04:10 /tmp/raj/dfdata/part-00151-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 33 2019-08-21 04:10 /tmp/raj/dfdata/part-00169-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 29 2019-08-21 04:10 /tmp/raj/dfdata/part-00060-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/_SUCCESS -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00000-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00001-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00002-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00003-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00005-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00006-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00007-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00008-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00011-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00012-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00013-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00014-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00015-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00017-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00018-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00019-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00020-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00021-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00022-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00023-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00024-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00025-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00026-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00027-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00028-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00029-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00031-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00032-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00033-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00034-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00035-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00036-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00037-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00038-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00039-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00040-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00041-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00043-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00044-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00045-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00046-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00048-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00050-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00052-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00053-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00056-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00057-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00058-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00061-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00062-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00063-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00064-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00068-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00069-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00072-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00073-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00076-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00077-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00079-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00081-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00082-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00084-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00085-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00086-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00087-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00088-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00089-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00090-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00092-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00093-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00096-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00098-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00099-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00100-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00101-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00102-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00103-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00104-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00105-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00106-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00109-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00111-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00112-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00113-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00114-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00117-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00118-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00119-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00120-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00121-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00122-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00123-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00124-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00125-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00126-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00127-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00128-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00129-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00131-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00132-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00133-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00134-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00135-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00136-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00137-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00138-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00139-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00140-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00141-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00142-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00143-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00144-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00145-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00146-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00147-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00148-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00149-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00150-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00152-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00153-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00155-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00156-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00157-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00158-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00159-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00160-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00161-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00162-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00163-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00164-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00166-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00168-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00170-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00171-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00172-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00173-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00174-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00176-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00177-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00178-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00180-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00181-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00182-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00184-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00186-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00187-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00188-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00189-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00190-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00191-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00192-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00193-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00194-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00196-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00197-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv -rw-r--r-- 1 root hdfs 0 2019-08-21 04:10 /tmp/raj/dfdata/part-00198-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
It generated 200 file parts. Also we can see top 44 files have data while remaining are empty files.