Spark Dataframe Repartition

What is Repartition in Spark ?

Spark Repartition is the process of movement or shuffling of data into given number of logical partitions. Repartition is done on the basis of some column or expression or in a random manner. Default number of shuffle partitions in Spark is 200.

Where do I use repartition in Spark ?

You may want to do Repartition when you have understanding of your data. Also you can improve the performance of dataframe transformations like joins , merge by repartitioning it on the basis of some key columns.
Other common use-case for repartition is during dataframe write operation. When you want to restrict number of output file parts generated during spark dataframe write.

Should I repartition Spark Dataframe ?

Kindly understand that repartition is a costly operation because it requires shuffling of all the data across nodes. You can increase or decrease the number of partitions using “Repartition” method. Only use repartition when you understand your data and are sure that it will help in optimising subsequent dataframe transformations and actions.

How do you repartition in Spark ?

Apply the repartition method to existing dataframe to create desired number of logical partitions for any given dataframe. Repartition method takes input argument which can be a number or a column or an expression on the basis of which it will generate output partitions.

Let’s see this with an example:

scala> df_states.show()
+-----------+----------+-------------+
| state_name|state_abbr|state_capital|
+-----------+----------+-------------+
|    Alabama|        AL|   Montgomery|
|     Alaska|        AK|       Juneau|
|    Arizona|        AZ|      Phoenix|
|   Arkansas|        AR|  Little Rock|
| California|        CA|   Sacramento|
|   Colorado|        CO|       Denver|
|Connecticut|        CT|     Hartford|
|   Delaware|        DE|        Dover|
|    Florida|        FL|  Tallahassee|
|    Georgia|        GA|      Atlanta|
|     Hawaii|        HI|     Honolulu|
|      Idaho|        ID|        Boise|
|   Illinois|        IL|  Springfield|
|    Indiana|        IN| Indianapolis|
|       Iowa|        IA|   Des Moines|
|     Kansas|        KS|       Topeka|
|   Kentucky|        KY|    Frankfort|
|  Louisiana|        LA|  Baton Rouge|
|      Maine|        ME|      Augusta|
|   Maryland|        MD|    Annapolis|
+-----------+----------+-------------+
only showing top 20 rows

Spark check number of partitions for dataframe

You can check number of partitions for given dataframe by converting to rdd and applying partitions & size to it.

scala> df_states.rdd.partitions.size
res6: Int = 1

So this means all the data is present in 1 partition only.

Spark change number of partitions in a dataframe

Re-Partition by giving number of partitions you want (say 5) and verify partitions size.

scala> val df_states_part5 = df_states.repartition(5)
df_states_part5: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [state_name: string, state_abbr: string ... 1 more field]

scala> df_states_part5.rdd.partitions.size
res7: Int = 5

So we have repartitioned existing dataframe from 1 partition to 5. The data was “randomly” shuffled to number of partitions required.This can be confirmed from explain plan.

scala> df_states_part5.explain()
== Physical Plan ==
Exchange RoundRobinPartitioning(5)

Spark repartition dataframe based on column

You can also specify the column on the basis of which repartition is required. The data is repartitioned using “HASH” and number of partition will be determined by value set for “numpartitions” i.e.spark.sql.shuffle.partitions. Change this value if want different number of partitions.

scala> val df_states_partCol = df_states.repartition($"state_abbr")
df_states_partCol: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [state_name: string, state_abbr: string ... 1 more field]

scala> df_states_partCol.explain()
== Physical Plan ==
Exchange hashpartitioning(state_abbr#35, 200)

scala> spark.sql("set spark.sql.shuffle.partitions").show(false)
+----------------------------+-----+
|key                         |value|
+----------------------------+-----+
|spark.sql.shuffle.partitions|200  |
+----------------------------+-----+

The number of partitions determine the file parts created when the dataframe is saved as file. Since the values were less i.e. 50 and also for some HASH values resultant partition was same, we will get 200 parts but most of them will be empty files. It determines by using formula: VALUE.hashCode()%numpartitions. Let’s verify this too.

scala> df_states_partCol.write.format("csv").save("/tmp/raj/dfdata")
[root@sandbox-hdp ~]# hdfs dfs -ls -S /tmp/raj/dfdata/
Found 201 items
-rw-r--r--   1 root hdfs         81 2019-08-21 04:10 /tmp/raj/dfdata/part-00004-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         81 2019-08-21 04:10 /tmp/raj/dfdata/part-00110-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         79 2019-08-21 04:10 /tmp/raj/dfdata/part-00175-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         77 2019-08-21 04:10 /tmp/raj/dfdata/part-00049-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         77 2019-08-21 04:10 /tmp/raj/dfdata/part-00066-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         76 2019-08-21 04:10 /tmp/raj/dfdata/part-00091-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         49 2019-08-21 04:10 /tmp/raj/dfdata/part-00115-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         48 2019-08-21 04:10 /tmp/raj/dfdata/part-00185-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         47 2019-08-21 04:10 /tmp/raj/dfdata/part-00078-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         47 2019-08-21 04:10 /tmp/raj/dfdata/part-00154-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         46 2019-08-21 04:10 /tmp/raj/dfdata/part-00047-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         46 2019-08-21 04:10 /tmp/raj/dfdata/part-00065-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         46 2019-08-21 04:10 /tmp/raj/dfdata/part-00195-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         45 2019-08-21 04:10 /tmp/raj/dfdata/part-00009-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         45 2019-08-21 04:10 /tmp/raj/dfdata/part-00070-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         45 2019-08-21 04:10 /tmp/raj/dfdata/part-00097-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         44 2019-08-21 04:10 /tmp/raj/dfdata/part-00042-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         44 2019-08-21 04:10 /tmp/raj/dfdata/part-00051-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         44 2019-08-21 04:10 /tmp/raj/dfdata/part-00183-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         42 2019-08-21 04:10 /tmp/raj/dfdata/part-00010-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         42 2019-08-21 04:10 /tmp/raj/dfdata/part-00116-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         41 2019-08-21 04:10 /tmp/raj/dfdata/part-00080-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         41 2019-08-21 04:10 /tmp/raj/dfdata/part-00095-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         41 2019-08-21 04:10 /tmp/raj/dfdata/part-00108-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         40 2019-08-21 04:10 /tmp/raj/dfdata/part-00055-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         40 2019-08-21 04:10 /tmp/raj/dfdata/part-00071-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         40 2019-08-21 04:10 /tmp/raj/dfdata/part-00074-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         39 2019-08-21 04:10 /tmp/raj/dfdata/part-00059-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         39 2019-08-21 04:10 /tmp/raj/dfdata/part-00094-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         39 2019-08-21 04:10 /tmp/raj/dfdata/part-00167-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         37 2019-08-21 04:10 /tmp/raj/dfdata/part-00016-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         37 2019-08-21 04:10 /tmp/raj/dfdata/part-00130-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         36 2019-08-21 04:10 /tmp/raj/dfdata/part-00054-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         36 2019-08-21 04:10 /tmp/raj/dfdata/part-00075-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         36 2019-08-21 04:10 /tmp/raj/dfdata/part-00165-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         36 2019-08-21 04:10 /tmp/raj/dfdata/part-00199-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         35 2019-08-21 04:10 /tmp/raj/dfdata/part-00067-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         35 2019-08-21 04:10 /tmp/raj/dfdata/part-00107-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         34 2019-08-21 04:10 /tmp/raj/dfdata/part-00083-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         34 2019-08-21 04:10 /tmp/raj/dfdata/part-00179-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         33 2019-08-21 04:10 /tmp/raj/dfdata/part-00030-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         33 2019-08-21 04:10 /tmp/raj/dfdata/part-00151-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         33 2019-08-21 04:10 /tmp/raj/dfdata/part-00169-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs         29 2019-08-21 04:10 /tmp/raj/dfdata/part-00060-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/_SUCCESS
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00000-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00001-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00002-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00003-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00005-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00006-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00007-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00008-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00011-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00012-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00013-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00014-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00015-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00017-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00018-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00019-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00020-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00021-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00022-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00023-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00024-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00025-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00026-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00027-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00028-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00029-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00031-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00032-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00033-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00034-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00035-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00036-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00037-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00038-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00039-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00040-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00041-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00043-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00044-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00045-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00046-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00048-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00050-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00052-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00053-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00056-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00057-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00058-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00061-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00062-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00063-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00064-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00068-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00069-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00072-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00073-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00076-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00077-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00079-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00081-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00082-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00084-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00085-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00086-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00087-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00088-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00089-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00090-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00092-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00093-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00096-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00098-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00099-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00100-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00101-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00102-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00103-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00104-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00105-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00106-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00109-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00111-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00112-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00113-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00114-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00117-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00118-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00119-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00120-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00121-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00122-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00123-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00124-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00125-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00126-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00127-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00128-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00129-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00131-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00132-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00133-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00134-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00135-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00136-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00137-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00138-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00139-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00140-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00141-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00142-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00143-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00144-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00145-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00146-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00147-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00148-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00149-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00150-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00152-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00153-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00155-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00156-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00157-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00158-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00159-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00160-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00161-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00162-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00163-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00164-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00166-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00168-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00170-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00171-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00172-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00173-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00174-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00176-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00177-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00178-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00180-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00181-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00182-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00184-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00186-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00187-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00188-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00189-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00190-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00191-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00192-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00193-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00194-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00196-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00197-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv
-rw-r--r--   1 root hdfs          0 2019-08-21 04:10 /tmp/raj/dfdata/part-00198-f153bf3d-0759-42b5-87b8-c4cc28fb568d-c000.csv

It generated 200 file parts. Also we can see top 44 files have data while remaining are empty files.

Leave a Reply