Hive Transactional Tables: Limitations and Considerations (Part 2)

In the previous post, we discussed about HIVE transactional tables; how to create it, properties and configurations required and example of HIVE transactional table DDL with INSERT/UPDATE/DELETE/MERGE statements on HIVE ACID table. In this post, we will talk about limitations with HIVE transactional tables and possible solutions if any. 1) [Error 10302]: Updating values of bucketing columns is not supported. Since ACID table must be bucketed to enable transactional property, you cannot run UPDATE command to set BUCKETED column. In…

Continue Reading

Hive Transactional Tables: Everything you must know (Part 1)

We all know HDFS does not support random deletes, updates. With HIVE ACID properties enabled, we can directly run UPDATE/DELETE on HIVE tables. Hive ACID tables support UPDATE, DELETE, INSERT, MERGE query constructs with some limitations and we will talk about that too. I am using HDP 2.6 & Hive 1.2 for examples mentioned below. Points to consider: 1) Only ORC storage format is supported presently. 2) Table must have CLUSTERED BY column 3) Table properties must have : “transactional”=”true”…

Continue Reading

Hive – BETWEEN

BETWEEN operator in HIVE When we have to check if column value lies between a range of value then we use BETWEEN operator in the query. You have to specify MIN and MAX value for the range when using BETWEEN operator. Example: select pres_name,pres_bs,pres_dob from usa_president where pres_dob between ‘1850-01-01’ and ‘1900-01-01’; +————————+———–+————-+–+ | pres_name | pres_bs | pres_dob | +————————+———–+————-+–+ | Theodore Roosevelt | New York | 1858-10-27 | | William Howard Taft | Ohio | 1857-09-15 | |…

Continue Reading

Hive – IN

IN operator is used to compare any column value in the table with multiple possible values. It returns TRUE if value is matching with any value listed inside IN operator. Example: select pres_name,pres_dob,pres_bs from usa_president where pres_bs IN (‘Virginia’,’New York’,’New Jersey’); +————————-+————-+————-+–+ | pres_name | pres_dob | pres_bs | +————————-+————-+————-+–+ | George Washington | 1732-02-22 | Virginia | | Thomas Jefferson | 1743-04-13 | Virginia | | James Madison | 1751-03-16 | Virginia | | James Monroe | 1758-04-28 |…

Continue Reading

Hive – WHERE

WHERE clause in HIVE WHERE clause is used to FILTER out records from final result set. When you run any query in HIVE and you want to FILTER out records such that all the rows in output shall abide by some condition then use WHERE clause. You can mention conditions in WHERE which must be true for row in order to be in output result set. Example: select pres_name,pres_dob,pres_bs from usa_president where pres_bs=’New York’; +————————+————-+———–+–+ | pres_name | pres_dob |…

Continue Reading

Hive – DISTINCT

select DISTINCT in HIVE DISTINCT keyword is used in SELECT statement in HIVE to fetch only unique rows. The row does not mean entire row in the table but it means “row” as per column listed in the SELECT statement. If the SELECT has 3 columns listed then SELECT DISTINCT will fetch unique row for those 3 column values only. Also it will not fetch DISTINCT value for 1 of the column. The row comprising of 3 columns will be…

Continue Reading

Hive – SELECT

SELECT statement in HIVE SELECT statement is used to fetch records from table. A table may consists of many columns and you can specify the column name which you want to fetch in SELECT statement. You can mention column name/s or can simply use “*” to represent all columns in the table in SELECT statement. Example: SELECT pres_name,pres_dob,pres_bs from usa_president; +————————-+————-+———————–+–+ | pres_name | pres_dob | pres_bs | +————————-+————-+———————–+–+ | George Washington | 1732-02-22 | Virginia | | John Adams…

Continue Reading

Hive Date Functions – all possible Date operations

Handling DATE in HIVE is not as easy as compared to many traditional RDBMS in the market. One of the reason why I feel so is the inability of HIVE to throw proper error when dealing with wrong format. If any function or date format is not compatible, it will give NULL as output in place of error. This makes identification of wrong date manipulation in HIVE more tricky. We have used DATE_FORMAT which is available from Hive 1.2 onwards.…

Continue Reading
Show Partitions in HIVE table

Hive Partitions – Everything you must know

What are partitions in HIVE? How partitions are implemented in HIVE? What is the benefit of partition in HIVE? If you have these or similar questions about partitions in HIVE then we have answered such questions and many more in this post. What is HIVE ? HIVE is a software built on top of Hadoop to provide environment where user can run SQL queries rather than map reduce program on data. It is developed primarily for people like me who…

Continue Reading

How to use variables in HIVE Query

While working on HIVE, you may want to use variables in the query to get results. A good coding practice is not to hardcode values in the query itself so we should know how to use variables in the HIVE query. Hive variables can be referred using “hivevar” keyword. We can set value of HIVE variable using below command: SET hivevar:VARIABLE_NAME=’VARIABLE_VALUE’; We can use this value by using ${hivevar:VARIABLE_NAME} in the query. Example: set hivevar:desc=’Legislators’; select * from sample_07 where…

Continue Reading