Hadoop is a very popular framework for data storage and data processing. So it suffice two main purposes: Distributed Data Storage using HDFS ( Hadoop Distributed File System) Data processing using Map-Reduce. In Hadoop everything is in File format. It is capable of processing huge volume of File Data in a very efficient manner. Now the obvious question is how can I run SQL queries if everything is in File and not Tables ? That is actually a very good question as SQL cannot run queries on data present in files. The dependency on table is real and SQL works on data present in rows
We all have been using SQL on RDBMS for so long now. The time has come when we shall switch to SQL on Hadoop. SQL (Structured Query Language) help us in communicating with any RDBMS like Teradata, Oracle, Netezza etc which are mostly used for OLTP or OLAP purposes. Traditional Datawarehouse systems used to store structured data where data is stored in Tables in rows and columns. However in the past couple of years there has been significant changes in the DW/BI world. The need for real time data is ever increasing and traditional RDBMS may not be able to handle streaming data that well.