In the previous post, we discussed about HIVE transactional tables; how to create it, properties and configurations required and example of HIVE transactional table DDL with INSERT/UPDATE/DELETE/MERGE statements on HIVE ACID table. In this post, we will talk about limitations with HIVE transactional tables and possible solutions if any. 1) [Error 10302]: Updating values of bucketing columns is not supported. Since ACID table must be bucketed to enable transactional property, you cannot run UPDATE command to set BUCKETED column. In the table DDL, you can see “CLUSTERED BY (pres_bs) INTO 4 BUCKETS”. Now we cannot run UPDATE command with set pres_bs=’any_value’. NOT ALLOWED: update usa_prez_tx
We all know HDFS does not support random deletes, updates. With HIVE ACID properties enabled, we can directly run UPDATE/DELETE on HIVE tables. Hive ACID tables support UPDATE, DELETE, INSERT, MERGE query constructs with some limitations and we will talk about that too. I am using HDP 2.6 & Hive 1.2 for examples mentioned below. Points to consider: 1) Only ORC storage format is supported presently. 2) Table must have CLUSTERED BY column 3) Table properties must have : “transactional”=”true” 4) External tables cannot be transactional. 5) Transactional tables cannot be read by non ACID session. 6) Table cannot be loaded using “LOAD DATA…” command.
BETWEEN operator in HIVE When we have to check if column value lies between a range of value then we use BETWEEN operator in the query. You have to specify MIN and MAX value for the range when using BETWEEN operator. Example: select pres_name,pres_bs,pres_dob from usa_president where pres_dob between ‘1850-01-01’ and ‘1900-01-01’; +————————+———–+————-+–+ | pres_name | pres_bs | pres_dob | +————————+———–+————-+–+ | Theodore Roosevelt | New York | 1858-10-27 | | William Howard Taft | Ohio | 1857-09-15 | | Woodrow Wilson | Virginia | 1856-12-28 | | Warren G. Harding | Ohio | 1865-11-02 | | Calvin Coolidge | Vermont | 1872-07-04 | |
IN operator is used to compare any column value in the table with multiple possible values. It returns TRUE if value is matching with any value listed inside IN operator. Example: select pres_name,pres_dob,pres_bs from usa_president where pres_bs IN (‘Virginia’,’New York’,’New Jersey’); +————————-+————-+————-+–+ | pres_name | pres_dob | pres_bs | +————————-+————-+————-+–+ | George Washington | 1732-02-22 | Virginia | | Thomas Jefferson | 1743-04-13 | Virginia | | James Madison | 1751-03-16 | Virginia | | James Monroe | 1758-04-28 | Virginia | | Martin Van Buren | 1782-12-05 | New York | | William Henry Harrison | 1773-02-09 | Virginia | | John Tyler |
WHERE clause in HIVE WHERE clause is used to FILTER out records from final result set. When you run any query in HIVE and you want to FILTER out records such that all the rows in output shall abide by some condition then use WHERE clause. You can mention conditions in WHERE which must be true for row in order to be in output result set. Example: select pres_name,pres_dob,pres_bs from usa_president where pres_bs=’New York’; +————————+————-+———–+–+ | pres_name | pres_dob | pres_bs | +————————+————-+———–+–+ | Martin Van Buren | 1782-12-05 | New York | | Millard Fillmore | 1800-01-07 | New York | | Theodore Roosevelt
select DISTINCT in HIVE DISTINCT keyword is used in SELECT statement in HIVE to fetch only unique rows. The row does not mean entire row in the table but it means “row” as per column listed in the SELECT statement. If the SELECT has 3 columns listed then SELECT DISTINCT will fetch unique row for those 3 column values only. Also it will not fetch DISTINCT value for 1 of the column. The row comprising of 3 columns will be UNIQUE, not 1, not 2 but all 3 columns. DISTINCT cannot be applied to individual column if multiple columns are listed in SELECT statement. It