Database vs DataWarehouse
For anyone working in the field of data management and analytics, its quite imperative to clearly know when to use a database and when to use data warehouse. Here I want to explain the basic difference:
Databases are On-Line Transaction Processing/OLTP systems where transactions happening every day has to be recorded. Say for example if any transaction delivers cash to a customer via the ATM, has to be recorded in the system, otherwise the Bank will soon go bankrupt. So the banking system is designed to make sure that every transaction gets recorded within the time you stand before the ATM machine.
- Common DBs are:
- Oracle 10g/11g/12c
- MySQL
- Postgres
- SAP HANA
On the other hand, Data Warehouse (DW/EDW) is also a database (yes true, it’s a database) that is designed for facilitating the querying and analysis for the prediction and forecasting of business. EDW is an On-Line Analytical Processing/OLAP system that contain read-only data that can be queried and analyzed far more efficiently as compared to the OLTP based application databases.
- Common DWs are:
- Teradata
- Oracle Exadata
- Netezza
- HP Vertica
- Greenplum
While a normal Database is optimized for transactional activity (while keeping a small amount of history), a Data Warehouse will be optimized for large scale reporting.
Within a Data Warehouse data from several systems will typically be merged together to present a global enterprise view.
Data Warehouses will also typically keep a very long history from several years to the entire life of the company so that very long term trends can be viewed.
Using the proper testing process, we can validate the accuracy of data flow from Source/Operational systems to the Target/DataWarehouse systems.
Database/OLTP systems are write-optimized whereas the EDW/OLAP system is designed to be read-optimized.
At the end we can say that all data warehouses are databases, but not all databases are data warehouses.
1 comment so far