Have you encountered this error while working on HIVE tables with clusters and buckets ? This is one of the most common error we face in HIVE while doing transactions on tables. Since transactions in HIVE is still not as mature as compared to RDBMS if we try to run some DML on a Target table and using Target table itself as one of the Sources then we get this error very often
Error Description:
“Bucketized tables do not support INSERT INTO…”
Solution to the problem:
- Most easy but bad way of overcoming this error is to remove clusters/buckets from table definition. This looks straight forward answer however removing buckets from the table definition may impact Query Performance. So it is bad idea to drop clusters in order to overcome this error.
- Enable transaction on table. If the table is ORC then you can set transaction to TRUE in table properties to overcome this error. However we still have some time before HIVE becomes more mature more to handle transactions in more efficient manner. Hence we may ignore this solution for some time.
- The third and most effective solution is to Create a Derived table from Target table and use that Derived table as Source rather than using Target table itself. We will look into this solution in this post
From our last post we know how to connect to HIVE in HDP. So let’s connect to HIVE using beeline CLI.
Step-1 : Connect to BEELINE
- Open your sandbox terminal and type beeline.
- Use command !connect jdbc:hive2://localhost:10000/xademo to connect to beeline via JDBC
Step-2 : Create a table with clusters and buckets
CREATE TABLE recharge_details_agg_clus ( phone_number string, last_rec_date string, amount string) clustered BY (phone_number) INTO 3 buckets ROW FORMAT DELIMITED FIELDS TERMINATED BY '~' STORED AS ORC;
Step-3 : INSERT new records into this table
INSERT INTO recharge_details_agg_clus SELECT tb1.phone_number, MAX(tb1.rec_date), SUM(tb1.amount) FROM recharge_details tb1 LEFT OUTER JOIN recharge_details_agg_clus tb2 ON tb1.phone_number = tb2.phone_number WHERE tb2.phone_number IS NULL GROUP BY tb1.phone_number;
Output: Error while compiling statement: FAILED: SemanticException [Error 10122]: Bucketized tables do not support INSERT INTO: Table: recharge_details_agg_clus (state=42000,code=10122)
Solution:
Create a derived table for Target Table as Source. We will create a Common Table Expression in HIVE and will use that in place of Target table in Left Outer Join.
Updated Query:
WITH CTE1 AS ( SELECT phone_number FROM recharge_details_agg_clus) INSERT INTO recharge_details_agg_clus SELECT tb1.phone_number, MAX(tb1.rec_date), SUM(tb1.amount) FROM recharge_details tb1 LEFT OUTER JOIN CTE1 tb2 ON tb1.phone_number = tb2.phone_number WHERE tb2.phone_number IS NULL GROUP BY tb1.phone_number;
Output: Success. New records inserted into the Target table.
There can be other ways to overcome this error however I have shared the easiest three methods in the post. Hope it helps.