Hive – SELECT

SELECT statement in HIVE

SELECT statement is used to fetch records from table. A table may consists of many columns and you can specify the column name which you want to fetch in SELECT statement. You can mention column name/s or can simply use “*” to represent all columns in the table in SELECT statement.
Example:

SELECT pres_name,pres_dob,pres_bs from usa_president;

+————————-+————-+———————–+–+
| pres_name | pres_dob | pres_bs |
+————————-+————-+———————–+–+
| George Washington | 1732-02-22 | Virginia |
| John Adams | 1735-10-30 | Massachusetts |
| Thomas Jefferson | 1743-04-13 | Virginia |
| James Madison | 1751-03-16 | Virginia |
| James Monroe | 1758-04-28 | Virginia |
| John Quincy Adams | 1767-07-11 | Massachusetts |
| Andrew Jackson | 1767-03-15 | South/North Carolina |
| Martin Van Buren | 1782-12-05 | New York |
| William Henry Harrison | 1773-02-09 | Virginia |
| John Tyler | 1790-03-29 | Virginia |
| James K. Polk | 1795-11-02 | North Carolina |
| Zachary Taylor | 1784-11-24 | Virginia |
| Millard Fillmore | 1800-01-07 | New York |
| Franklin Pierce | 1804-11-23 | New Hampshire |
| James Buchanan | 1791-04-23 | Pennsylvania |
| Abraham Lincoln | 1809-02-12 | Kentucky |
| Andrew Johnson | 1808-12-29 | North Carolina |
| Ulysses S. Grant | 1822-04-27 | Ohio |
| Rutherford B. Hayes | 1822-10-04 | Ohio |
| James A. Garfield | 1831-11-19 | Ohio |
| Chester A. Arthur | 1829-10-05 | Vermont |
| Grover Cleveland | 1837-03-18 | New Jersey |
| Benjamin Harrison | 1833-08-20 | Ohio |
| Grover Cleveland | 1837-03-18 | New Jersey |
| William McKinley | 1843-01-29 | Ohio |
| Theodore Roosevelt | 1858-10-27 | New York |
| William Howard Taft | 1857-09-15 | Ohio |
| Woodrow Wilson | 1856-12-28 | Virginia |
| Warren G. Harding | 1865-11-02 | Ohio |
| Calvin Coolidge | 1872-07-04 | Vermont |
| Herbert Hoover | 1874-08-10 | Iowa |
| Franklin D. Roosevelt | 1882-01-30 | New York |
| Harry S. Truman | 1884-05-08 | Missouri |
| Dwight D. Eisenhower | 1890-10-14 | Texas |
| John F. Kennedy | 1917-05-29 | Massachusetts |
| Lyndon B. Johnson | 1908-08-27 | Texas |
| Richard M. Nixon | 1913-01-09 | California |
| Gerald R. Ford | 1913-07-14 | Nebraska |
| Jimmy Carter | 1924-10-01 | Georgia |
| Ronald Reagan | 1911-02-06 | Illinois |
| George H. W. Bush | 1924-06-12 | Massachusetts |
| Bill Clinton | 1946-08-19 | Arkansas |
| George W. Bush | 1946-07-06 | Connecticut |
| Barack Obama | 1961-08-04 | Hawaii |
| Donald Trump | 1946-06-14 | New York |
+————————-+————-+———————–+–+

In order to get all the columns in the table , run below query:

select * from usa_president;

+————————+————————–+————————-+————————+————————+————————+————————-+–+
| usa_president.pres_id | usa_president.pres_name | usa_president.pres_dob | usa_president.pres_bp | usa_president.pres_bs | usa_president.pres_in | usa_president.pres_out |
+————————+————————–+————————-+————————+————————+————————+————————-+–+
| 1 | George Washington | 1732-02-22 | Westmoreland County | Virginia | 1789-04-30 | 1797-03-04 |
| 2 | John Adams | 1735-10-30 | Braintree | Massachusetts | 1797-03-04 | 1801-03-04 |
| 3 | Thomas Jefferson | 1743-04-13 | Shadwell | Virginia | 1801-03-04 | 1809-03-04 |
| 4 | James Madison | 1751-03-16 | Port Conway | Virginia | 1809-03-04 | 1817-03-04 |
| 5 | James Monroe | 1758-04-28 | Monroe Hall | Virginia | 1817-03-04 | 1825-03-04 |
| 6 | John Quincy Adams | 1767-07-11 | Braintree | Massachusetts | 1825-03-04 | 1829-03-04 |
| 7 | Andrew Jackson | 1767-03-15 | Waxhaws Region | South/North Carolina | 1829-03-04 | 1837-03-04 |
| 8 | Martin Van Buren | 1782-12-05 | Kinderhook | New York | 1837-03-04 | 1841-03-04 |
| 9 | William Henry Harrison | 1773-02-09 | Charles City County | Virginia | 1841-03-04 | 1841-04-04 |
| 10 | John Tyler | 1790-03-29 | Charles City County | Virginia | 1841-04-04 | 1845-03-04 |
| 11 | James K. Polk | 1795-11-02 | Pineville | North Carolina | 1845-03-04 | 1849-03-04 |
| 12 | Zachary Taylor | 1784-11-24 | Barboursville | Virginia | 1849-03-04 | 1850-07-09 |
| 13 | Millard Fillmore | 1800-01-07 | Summerhill | New York | 1850-07-09 | 1853-03-04 |
| 14 | Franklin Pierce | 1804-11-23 | Hillsborough | New Hampshire | 1853-03-04 | 1857-03-04 |
| 15 | James Buchanan | 1791-04-23 | Cove Gap | Pennsylvania | 1857-03-04 | 1861-03-04 |
| 16 | Abraham Lincoln | 1809-02-12 | Sinking spring | Kentucky | 1861-03-04 | 1865-04-15 |
| 17 | Andrew Johnson | 1808-12-29 | Raleigh | North Carolina | 1865-04-15 | 1869-03-04 |
| 18 | Ulysses S. Grant | 1822-04-27 | Point Pleasant | Ohio | 1869-03-04 | 1877-03-04 |
| 19 | Rutherford B. Hayes | 1822-10-04 | Delaware | Ohio | 1877-03-04 | 1881-03-04 |
| 20 | James A. Garfield | 1831-11-19 | Moreland Hills | Ohio | 1881-03-04 | 1881-09-19 |
| 21 | Chester A. Arthur | 1829-10-05 | Fairfield | Vermont | 1881-09-19 | 1885-03-04 |
| 22 | Grover Cleveland | 1837-03-18 | Caldwell | New Jersey | 1885-03-04 | 1889-03-04 |
| 23 | Benjamin Harrison | 1833-08-20 | North Bend | Ohio | 1889-03-04 | 1893-03-04 |
| 24 | Grover Cleveland | 1837-03-18 | Caldwell | New Jersey | 1893-03-04 | 1897-03-04 |
| 25 | William McKinley | 1843-01-29 | Niles | Ohio | 1897-03-04 | 1901-09-14 |
| 26 | Theodore Roosevelt | 1858-10-27 | Manhattan | New York | 1901-09-14 | 1909-03-04 |
| 27 | William Howard Taft | 1857-09-15 | Cincinnati | Ohio | 1909-03-04 | 1913-03-04 |
| 28 | Woodrow Wilson | 1856-12-28 | Staunton | Virginia | 1913-03-04 | 1921-03-04 |
| 29 | Warren G. Harding | 1865-11-02 | Blooming Grove | Ohio | 1921-03-04 | 1923-08-02 |
| 30 | Calvin Coolidge | 1872-07-04 | Plymouth | Vermont | 1923-08-02 | 1929-03-04 |
| 31 | Herbert Hoover | 1874-08-10 | West Branch | Iowa | 1929-03-04 | 1933-03-04 |
| 32 | Franklin D. Roosevelt | 1882-01-30 | Hyde Park | New York | 1933-03-04 | 1945-04-12 |
| 33 | Harry S. Truman | 1884-05-08 | Lamar | Missouri | 1945-04-12 | 1953-01-20 |
| 34 | Dwight D. Eisenhower | 1890-10-14 | Denison | Texas | 1953-01-20 | 1961-01-20 |
| 35 | John F. Kennedy | 1917-05-29 | Brookline | Massachusetts | 1961-01-20 | 1963-11-22 |
| 36 | Lyndon B. Johnson | 1908-08-27 | Stonewall | Texas | 1963-11-22 | 1969-01-20 |
| 37 | Richard M. Nixon | 1913-01-09 | Yorba Linda | California | 1969-01-20 | 1974-08-09 |
| 38 | Gerald R. Ford | 1913-07-14 | Omaha | Nebraska | 1974-08-09 | 1977-01-20 |
| 39 | Jimmy Carter | 1924-10-01 | Plains | Georgia | 1977-01-20 | 1981-01-20 |
| 40 | Ronald Reagan | 1911-02-06 | Tampico | Illinois | 1981-01-20 | 1989-01-20 |
| 41 | George H. W. Bush | 1924-06-12 | Milton | Massachusetts | 1989-01-20 | 1993-01-20 |
| 42 | Bill Clinton | 1946-08-19 | Hope | Arkansas | 1993-01-20 | 2001-01-20 |
| 43 | George W. Bush | 1946-07-06 | New Haven | Connecticut | 2001-01-20 | 2009-01-20 |
| 44 | Barack Obama | 1961-08-04 | Honolulu | Hawaii | 2009-01-20 | 2017-01-20 |
| 45 | Donald Trump | 1946-06-14 | Queens | New York | 2017-01-20 | NULL |
+————————+————————–+————————-+————————+————————+————————+————————-+–+

However one must avoid using “*” as it can result in some problems. Like if a table structure changed and one more column is added to it. Now this query will start showing one more column and if you have not handled it then it may throw error in case of INSERT-SELECT etc.

Also depending on the Storage Type you are using “SELECT *” may result in fetching entire row where as your query may require only 2 columns from that table. This may impact query performance. We will discuss more on this later. However for time being, good to remember is that SELECT Column-List is preferred over SELECT *. And both ways are correct in HIVE.

Leave a Reply

Your email address will not be published. Required fields are marked *