Apache Spark

Spark Dataframe SHOW

In Spark Dataframe, SHOW method is used to display Dataframe records in readable tabular format. This method is used very often to check how the content inside Dataframe looks like. Let’s see it with an example.

scala> df_pres.select($"pres_name").show()
+--------------------+
|           pres_name|
+--------------------+
|   George Washington|
|          John Adams|
|    Thomas Jefferson|
|       James Madison|
|        James Monroe|
|   John Quincy Adams|
|      Andrew Jackson|
|    Martin Van Buren|
|William Henry Har...|
|          John Tyler|
|       James K. Polk|
|      Zachary Taylor|
|    Millard Fillmore|
|     Franklin Pierce|
|      James Buchanan|
|     Abraham Lincoln|
|      Andrew Johnson|
|    Ulysses S. Grant|
| Rutherford B. Hayes|
|   James A. Garfield|
+--------------------+
only showing top 20 rows

Few things to observe here:
1) By default, SHOW function will return only 20 records. This is equivalent to Sample/Top/Limit 20 we have in other SQL environment.
2) You can see the string which is longer than 20 characters is truncated. Like “William Henry Har…” in place of “William Henry Harrison”. This is equivalent to width/colwidth etc in typical SQL environment.

This is equivalent to below syntax:

scala> df_pres.select($"pres_name").show(20,true)
+--------------------+
|           pres_name|
+--------------------+
|   George Washington|
|          John Adams|
|    Thomas Jefferson|
|       James Madison|
|        James Monroe|
|   John Quincy Adams|
|      Andrew Jackson|
|    Martin Van Buren|
|William Henry Har...|
|          John Tyler|
|       James K. Polk|
|      Zachary Taylor|
|    Millard Fillmore|
|     Franklin Pierce|
|      James Buchanan|
|     Abraham Lincoln|
|      Andrew Johnson|
|    Ulysses S. Grant|
| Rutherford B. Hayes|
|   James A. Garfield|
+--------------------+
only showing top 20 rows

We can change the number of rows we want in the output by passing the number as first parameter.

scala> df_pres.select($"pres_name").show(45,true)
+--------------------+
|           pres_name|
+--------------------+
|   George Washington|
|          John Adams|
|    Thomas Jefferson|
|       James Madison|
|        James Monroe|
|   John Quincy Adams|
|      Andrew Jackson|
|    Martin Van Buren|
|William Henry Har...|
|          John Tyler|
|       James K. Polk|
|      Zachary Taylor|
|    Millard Fillmore|
|     Franklin Pierce|
|      James Buchanan|
|     Abraham Lincoln|
|      Andrew Johnson|
|    Ulysses S. Grant|
| Rutherford B. Hayes|
|   James A. Garfield|
|   Chester A. Arthur|
|    Grover Cleveland|
|   Benjamin Harrison|
|    Grover Cleveland|
|    William McKinley|
|  Theodore Roosevelt|
| William Howard Taft|
|      Woodrow Wilson|
|   Warren G. Harding|
|     Calvin Coolidge|
|      Herbert Hoover|
|Franklin D. Roose...|
|     Harry S. Truman|
|Dwight D. Eisenhower|
|     John F. Kennedy|
|   Lyndon B. Johnson|
|    Richard M. Nixon|
|      Gerald R. Ford|
|        Jimmy Carter|
|       Ronald Reagan|
|   George H. W. Bush|
|        Bill Clinton|
|      George W. Bush|
|        Barack Obama|
|        Donald Trump|
+--------------------+

Now let’s discuss about truncated string output. So we saw that any string which is greater than 20 characters is truncated by default.

scala> df_pres.select($"pres_name").filter(length($"pres_name")>20).show(2,true)
+--------------------+
|           pres_name|
+--------------------+
|William Henry Har...|
|Franklin D. Roose...|
+--------------------+

In order to avoid any truncation of values and see complete string, pass false as the second parameter. If you don’t want to specify rows explicitly you can just pass “FALSE” as the only parameter to SHOW function.

scala> df_pres.select($"pres_name").filter(length($"pres_name")>20).show(2,false)
+----------------------+
|pres_name             |
+----------------------+
|William Henry Harrison|
|Franklin D. Roosevelt |
+----------------------+

The text will be left-aligned and not truncated.
Hope this helps.

Leave a Reply