Search This Blog

Wednesday, February 23, 2011

DWH Concept

ETL (Extract Transformation Load)

ETL is a process in datawarehousing that involve :

- Extraction data from outsiding sources
- Tranforming it to fit business need
- Loading it into end target (i.e Datawarehouse)

E (Extract) - The first part of ETL is to extract the data from the source systems.

T (Transform) - The transform stage applies a series of rules to the extracted data from the source to derived the data to be loaded to the end target.

L (Load) - The load phase load the data into end target.

OLAP (On Line Analytical Processing)

OLAP is a database technology which are specially designed to deal with the high volumn of data. OLAP provide you a very good view that what is happening but it does not provide that why is happening and what will be happened in the future.

DataMining

Datamining is a technology which perform a process of analyzing the data from different prospective & summurized is into a useful information that information can be used to increase the revenue, cut the cost etc.
Datamining provide you that why is happening and what will be happened in the future.

Data Analysis

Data Analysis is an approach in which row data is ordered and organized in such a way that useful information can be extracted from it.

Data Analysis Phases

1 - Data Cleansing
2 - Transforming
3 - Data Modeling 

1 - Data Cleansing

Data cleansing is a process of detecting & removing the error from a record set, table and database. This process is also known as Data Scrubbing. During data cleansing records are checked & either corrected or deleted as per as requirment.

2 - Transforming

In the Transforming phase, the source data is transformed as per as DWH tables.

3 - Data Modeling 

Data Modeling is a process in a Data Model is build of data structure (Tables) as per as requirement of the Database.A Data Model represent the nature of data, business rules govering the data & how it will be organized in the DB.

A Data Model is comprised of two parts :

- Logical Design
- Physical Design

Data model helps functional & technical team in designing the DB.

The Functional Team (that create the logical design) refers to one or more Business Analyst, Business Managers, End Users. And Technical Team (that create the physical design) refers to the one or more programmers & DBAs.

Data Modelers are responsible for designing the Data Model & they communicate with the functional team to get the business requirments & technical team to implement the DB.

EX :
------
Suppose a company (XYZ) is planning to build a Guest House (DB) & it call the Building Architect (Data Modeler) & Building Architect (Data Modeler) take the Guest House (DB) requirment (Business requirmnet) from the function team (Business Analyst, Business Managers) and develop a plan (Data Model) & give it to the Company (XYZ). And finally XYZ company call the civil engineer (Programmer, DBA) to construct the Guest House (DB). This concept is called Data Modelling.

Data Integration

Data integration is a process of combining the data from different sources & providing users with a unified view of these data.

Data Agreegation

Data Agreegation is a process in which the information is gathered & expressed in a summary form. The purpose of agreegation  is to get more information about the particular group.

Business Intelligence (BI)

Business Intelligence (BI) is a technology which is used to gethering, storing & analysing the data to make a better decision.

BI solution help a company in making the business decision faster, accurate & market oriented.

Technical DW-MSBI work flow

[Sources] -> [ETL(Extract tha data using SSIS)] -> [Data Cleansing & Tranfoming of data] -> [DWH(stored the data in DWH (Fact & Dimesion))] -> [Create the OLAP cube(using SSAS)] -> [Analyse the data & create the Reports] -> [Implement Data Mining structure(Data Mining)] -> BI (make a better decision)

Regards
Manish

Sunday, February 20, 2011

What is Data Warehouse(DWH) & DataMart(DM)?

Data warehouse (DWH)

Data warehouse is a repositary or central aggregation of data where data is stored from different sources (like Excel, TextFiles, SQL Sever, MySQL, Oracle and lagacy applications etc).

Basically  DWH are specially designed for reporting and analytics purpose.

Fundamental stages of  DWH

There are four fundamental stages of DWH:

1 - Offline Operational DB
2 - Offline DWH
3 - Real Time DWH
4 - Integrated DWH

1 - Offline Operational DB

In this initially stage, DWH's are developed by simply copying of the DB of an operation system to an off line server where the processing load of reporting does not impact on the operational system performance.

2 - Offline DWH

In this stage, the DWH's are updated on a regular time cycle (Dailly, Weekly, Monthly) from the operational system & the data is stored in an integrated reporing oriented data structure.

3 - Real Time DWH

At this stage, the DWH's are upadted on a transection or event basis, every thime an operational system perform a transection.

4 - Integrated DWH

At this stage, the DWH's are used to generate the activity or transection that are passed back into the operational system for the use in the dailly activity of the organization.

There are two approachs to stored the data into DWH:

1 - Dimension Approach
2 - Normalized Approach

1 - Dimension Approach

In this approach, the data is stored in de-normlized formate, the tables are divided into two cetegories one is Fact and other is Dimension.

Fact : These types of table generally strored the numeric data.

Dimension : These types of table has the reference of Fact table data. And These tables are used for slicing & dicing of the fact data for analysis.

The advantage of this approach is that data is retrieved faster as normlized approach.
And the disadvantage of this approach is that it is typical to maintain the DWH.

2 - Normalized Approach

In this approach, the data in DWH in 3-normal form like OLTP. So You can easily maintain the data stored in DWH but the disadvantage is that data retrieval is slow than Dimension approach because of mainy relation between tables in case of 3-NF.

Datamart (DM)

Datamart is also a small DWH but it is basically designed for specific Department (like HR, Finance, Payroll etc). A DM is a collection of Subject Areas which is organized for decision support based on the need of a given department.

[Subject Areas : We can break a huge schema into smaller parts for easier management purpose & each part is known as Subject area.]

There are two types of DM:

1 - Dependent DM
2 - Independent DM

1 - Dependent DM

A dependent DM is DM whose source is a DWH. And dependent datamart is more stabled than independent datamart.

2 - Independent DM

An independent datamart is one whose source is the lagacy application environment. And independent datamarts are unstabled

Regards
Manish