Search This Blog

Sunday, February 20, 2011

What is Data Warehouse(DWH) & DataMart(DM)?

Data warehouse (DWH)

Data warehouse is a repositary or central aggregation of data where data is stored from different sources (like Excel, TextFiles, SQL Sever, MySQL, Oracle and lagacy applications etc).

Basically  DWH are specially designed for reporting and analytics purpose.

Fundamental stages of  DWH

There are four fundamental stages of DWH:

1 - Offline Operational DB
2 - Offline DWH
3 - Real Time DWH
4 - Integrated DWH

1 - Offline Operational DB

In this initially stage, DWH's are developed by simply copying of the DB of an operation system to an off line server where the processing load of reporting does not impact on the operational system performance.

2 - Offline DWH

In this stage, the DWH's are updated on a regular time cycle (Dailly, Weekly, Monthly) from the operational system & the data is stored in an integrated reporing oriented data structure.

3 - Real Time DWH

At this stage, the DWH's are upadted on a transection or event basis, every thime an operational system perform a transection.

4 - Integrated DWH

At this stage, the DWH's are used to generate the activity or transection that are passed back into the operational system for the use in the dailly activity of the organization.

There are two approachs to stored the data into DWH:

1 - Dimension Approach
2 - Normalized Approach

1 - Dimension Approach

In this approach, the data is stored in de-normlized formate, the tables are divided into two cetegories one is Fact and other is Dimension.

Fact : These types of table generally strored the numeric data.

Dimension : These types of table has the reference of Fact table data. And These tables are used for slicing & dicing of the fact data for analysis.

The advantage of this approach is that data is retrieved faster as normlized approach.
And the disadvantage of this approach is that it is typical to maintain the DWH.

2 - Normalized Approach

In this approach, the data in DWH in 3-normal form like OLTP. So You can easily maintain the data stored in DWH but the disadvantage is that data retrieval is slow than Dimension approach because of mainy relation between tables in case of 3-NF.

Datamart (DM)

Datamart is also a small DWH but it is basically designed for specific Department (like HR, Finance, Payroll etc). A DM is a collection of Subject Areas which is organized for decision support based on the need of a given department.

[Subject Areas : We can break a huge schema into smaller parts for easier management purpose & each part is known as Subject area.]

There are two types of DM:

1 - Dependent DM
2 - Independent DM

1 - Dependent DM

A dependent DM is DM whose source is a DWH. And dependent datamart is more stabled than independent datamart.

2 - Independent DM

An independent datamart is one whose source is the lagacy application environment. And independent datamarts are unstabled

Regards
Manish

No comments:

Post a Comment