Apache Hive

Bucketized tables do not support INSERT INTO

Have you encountered this error while working on HIVE tables with clusters and buckets ? This is one of the most common error we face in HIVE while doing transactions on tables. Since transactions in HIVE is still not as mature as compared to RDBMS if we try to run some DML on a Target table and using Target table itself as one of the Sources then we get this error very often

Error Description:

“Bucketized tables do not support INSERT INTO…”

Solution to the problem:

  1. Most easy but bad way of overcoming this error is to remove clusters/buckets from table definition. This looks straight forward answer however removing buckets from the table definition may impact Query Performance. So it is bad idea to drop clusters in order to overcome this error.
  2. Enable transaction on table. If the table is ORC then you can set transaction to TRUE in table properties to overcome this error. However we still have some time before HIVE becomes more mature more to handle transactions in more efficient manner. Hence we may ignore this solution for some time.
  3. The third and most effective solution is to Create a Derived table from Target table and use that Derived table as Source rather than using Target table itself. We will look into this solution in this post

From our last post we know how to connect to HIVE in HDP. So let’s connect to HIVE using beeline CLI.

Step-1 : Connect to BEELINE

  • Open your sandbox terminal and type beeline.
  • Use command !connect jdbc:hive2://localhost:10000/xademo to connect to beeline via JDBC

Step-2 : Create a table with clusters and buckets

CREATE TABLE recharge_details_agg_clus ( phone_number string,
last_rec_date string,
amount string) 
clustered BY (phone_number)
INTO 3 buckets 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '~' 
STORED AS ORC;

Step-3 : INSERT new records into this table

INSERT
	INTO
	recharge_details_agg_clus
SELECT
	tb1.phone_number,
	MAX(tb1.rec_date),
	SUM(tb1.amount)
FROM
	recharge_details tb1
LEFT OUTER JOIN recharge_details_agg_clus tb2 ON
	tb1.phone_number = tb2.phone_number
WHERE
	tb2.phone_number IS NULL
GROUP BY
	tb1.phone_number;

Output: Error while compiling statement: FAILED: SemanticException [Error 10122]: Bucketized tables do not support INSERT INTO: Table: recharge_details_agg_clus (state=42000,code=10122)

Solution:

Create a derived table for Target Table as Source. We will create a Common Table Expression in HIVE and will use that in place of Target table in Left Outer Join.

Updated Query:

WITH CTE1 AS (
SELECT
	phone_number
FROM
	recharge_details_agg_clus)
INSERT
	INTO
	recharge_details_agg_clus
SELECT
	tb1.phone_number,
	MAX(tb1.rec_date),
	SUM(tb1.amount)
FROM
	recharge_details tb1
LEFT OUTER JOIN CTE1 tb2 ON
	tb1.phone_number = tb2.phone_number
WHERE
	tb2.phone_number IS NULL
GROUP BY
	tb1.phone_number;

Output: Success. New records inserted into the Target table.

There can be other ways to overcome this error however I have shared the easiest three methods in the post. Hope it helps.

Leave a Reply