We may want to subtract two timestamps in order to find out the difference between occurence of two events. This is a very common operation which we perform on any TIMESTAMP or DATE or TIME data type. Now the question is how to find out the difference or subtract two timestamp ? In HIVE we can directly subtract two timestamp columns and get the result. Let’s see this with an example: TIMESTAMP 1: 2017-09-22 17:22:38 TIMESTAMP 2: 2017-09-12 14:53:19 Now I want to find the difference between these 2 timestamp values. I can directly subtract it. select CAST(‘2017-09-22 17:22:38’ as timestamp) – CAST(‘2017-09-12 14:53:19’ as
Hive supports traditional UNIX timestamp data type with nanosecond upto 9 decimal precision (in Teradata it is till 6 decimal precision for timestamp data type). Typical TIMESTAMP data type has DATE part which is YEAR/MONTH/DAY and TIME part which is HOUR/MINUTE/SECOND/NanoSecond. We can represent these components as : TIMESTAMP : 2017-11-29 14:53:50.724 S.No Part Format Output Comment 1 Year Y 2017 Prints the Year 2 Year y 2017 Prints the Year 3 Year YY 17 Prints the year in 2 digit 4 Year yy 17 Prints the year in 2 digit 6 Year YYYY 2017 Prints the Year 7 Month M 11 Prints the month
Once you have access to HIVE , the first thing you would like to do is Create a Database and Create few tables in it. Before we start with the SQL commands, it is good to know how HIVE stores the data. Since in HDFS everything is FILE so HIVE stores all the information in FILEs only. So we should know the path of FILEs also for better understanding the concepts. TIP: Open two putty sessions: one two execute hive queries via beeline , second to run OS & HDFS commands. Once you login into HIVE via beeline, run below command: set hive.metastore.warehouse.dir; 0: jdbc:hive2://localhost:10000>
Have you encountered this error while working on HIVE tables with clusters and buckets ? This is one of the most common error we face in HIVE while doing transactions on tables. Since transactions in HIVE is still not as mature as compared to RDBMS if we try to run some DML on a Target table and using Target table itself as one of the Sources then we get this error very often “Bucketized tables do not support INSERT INTO…” There are many ways to overcome this error: Most easy but bad way of overcoming this error is to remove clusters/buckets from table definition. This
Hadoop is a very popular framework for data storage and data processing. So it suffice two main purposes: Distributed Data Storage using HDFS ( Hadoop Distributed File System) Data processing using Map-Reduce. In Hadoop everything is in File format. It is capable of processing huge volume of File Data in a very efficient manner. Now the obvious question is how can I run SQL queries if everything is in File and not Tables ? That is actually a very good question as SQL cannot run queries on data present in files. The dependency on table is real and SQL works on data present in rows
We all have been using SQL on RDBMS for so long now. The time has come when we shall switch to SQL on Hadoop. SQL (Structured Query Language) help us in communicating with any RDBMS like Teradata, Oracle, Netezza etc which are mostly used for OLTP or OLAP purposes. Traditional Datawarehouse systems used to store structured data where data is stored in Tables in rows and columns. However in the past couple of years there has been significant changes in the DW/BI world. The need for real time data is ever increasing and traditional RDBMS may not be able to handle streaming data that well.