Hive and SQL
To start with any SQL Database, please learn:
- DDL – create, alter, drop, truncate
- DML – insert, update, delete
- DQL – select query
- DCL – grant, revoke
- TCL – commit, rollback, savepoint
Hive is SQL DB, but not like traditional DB/RDBMS. Its a datastore available within Hadoop ecosystem and stores all the data in HDFS – Hadoop Distributed File System.
HQL is known as Hive Query Language and is quite similar to SQL i.e. Structured Query Language.
But the major difference lies in the way backend processing works, here Hive internally uses Java based MapReduce framework for faster processing of Big Data.
With Hive Version 2.0, it introduced concepts like Hive LLC, Vectorization, etc. that further improves the performance for data processing.
Hive supports parallel/distributed processing via MapReduce and is quite faster as compared to traditional RDBMS like MySQL, Oracle when dealing with Terabytes of data.
Important Concepts in Hive:
1. Internal Table
2. External Table
3. Partitioning and Bucketing
DDL for a sample Hive Table
drop table if exists Passengers;
create table if not exists Passengers (
PassengerId int,
Name string,
Age int,
SeatNum string,
Pclass int)
row format delimited
fields terminated by ','
;
Hive is facing competition from:
- Kite
- Presto
- Kudo
- Apache Phoenix, etc.
Refer my GitHub account for BigData and Hive examples and codebase
Refer below GitHub link for the usecases on housing data analytics along with the solution

Leave a Reply