Coming soon.........
DWH & MS BI
Search This Blog
Monday, February 28, 2011
Wednesday, February 23, 2011
DWH Concept
ETL (Extract Transformation Load)
ETL is a process in datawarehousing that involve :
- Extraction data from outsiding sources
- Tranforming it to fit business need
- Loading it into end target (i.e Datawarehouse)
E (Extract) - The first part of ETL is to extract the data from the source systems.
T (Transform) - The transform stage applies a series of rules to the extracted data from the source to derived the data to be loaded to the end target.
L (Load) - The load phase load the data into end target.
OLAP (On Line Analytical Processing)
OLAP is a database technology which are specially designed to deal with the high volumn of data. OLAP provide you a very good view that what is happening but it does not provide that why is happening and what will be happened in the future.
DataMining
Datamining is a technology which perform a process of analyzing the data from different prospective & summurized is into a useful information that information can be used to increase the revenue, cut the cost etc.
Datamining provide you that why is happening and what will be happened in the future.
Data Analysis
Data Analysis is an approach in which row data is ordered and organized in such a way that useful information can be extracted from it.
Data Analysis Phases
1 - Data Cleansing
2 - Transforming
3 - Data Modeling
1 - Data Cleansing
Data cleansing is a process of detecting & removing the error from a record set, table and database. This process is also known as Data Scrubbing. During data cleansing records are checked & either corrected or deleted as per as requirment.
2 - Transforming
In the Transforming phase, the source data is transformed as per as DWH tables.
3 - Data Modeling
Data Modeling is a process in a Data Model is build of data structure (Tables) as per as requirement of the Database.A Data Model represent the nature of data, business rules govering the data & how it will be organized in the DB.
A Data Model is comprised of two parts :
- Logical Design
- Physical Design
Data model helps functional & technical team in designing the DB.
The Functional Team (that create the logical design) refers to one or more Business Analyst, Business Managers, End Users. And Technical Team (that create the physical design) refers to the one or more programmers & DBAs.
Data Modelers are responsible for designing the Data Model & they communicate with the functional team to get the business requirments & technical team to implement the DB.
EX :
------
Suppose a company (XYZ) is planning to build a Guest House (DB) & it call the Building Architect (Data Modeler) & Building Architect (Data Modeler) take the Guest House (DB) requirment (Business requirmnet) from the function team (Business Analyst, Business Managers) and develop a plan (Data Model) & give it to the Company (XYZ). And finally XYZ company call the civil engineer (Programmer, DBA) to construct the Guest House (DB). This concept is called Data Modelling.
Data Integration
Data integration is a process of combining the data from different sources & providing users with a unified view of these data.
Data Agreegation
Data Agreegation is a process in which the information is gathered & expressed in a summary form. The purpose of agreegation is to get more information about the particular group.
Business Intelligence (BI)
Business Intelligence (BI) is a technology which is used to gethering, storing & analysing the data to make a better decision.
BI solution help a company in making the business decision faster, accurate & market oriented.
Technical DW-MSBI work flow
[Sources] -> [ETL(Extract tha data using SSIS)] -> [Data Cleansing & Tranfoming of data] -> [DWH(stored the data in DWH (Fact & Dimesion))] -> [Create the OLAP cube(using SSAS)] -> [Analyse the data & create the Reports] -> [Implement Data Mining structure(Data Mining)] -> BI (make a better decision)
Regards
Manish
ETL is a process in datawarehousing that involve :
- Extraction data from outsiding sources
- Tranforming it to fit business need
- Loading it into end target (i.e Datawarehouse)
E (Extract) - The first part of ETL is to extract the data from the source systems.
T (Transform) - The transform stage applies a series of rules to the extracted data from the source to derived the data to be loaded to the end target.
L (Load) - The load phase load the data into end target.
OLAP (On Line Analytical Processing)
OLAP is a database technology which are specially designed to deal with the high volumn of data. OLAP provide you a very good view that what is happening but it does not provide that why is happening and what will be happened in the future.
DataMining
Datamining is a technology which perform a process of analyzing the data from different prospective & summurized is into a useful information that information can be used to increase the revenue, cut the cost etc.
Datamining provide you that why is happening and what will be happened in the future.
Data Analysis
Data Analysis is an approach in which row data is ordered and organized in such a way that useful information can be extracted from it.
Data Analysis Phases
1 - Data Cleansing
2 - Transforming
3 - Data Modeling
1 - Data Cleansing
Data cleansing is a process of detecting & removing the error from a record set, table and database. This process is also known as Data Scrubbing. During data cleansing records are checked & either corrected or deleted as per as requirment.
2 - Transforming
In the Transforming phase, the source data is transformed as per as DWH tables.
3 - Data Modeling
Data Modeling is a process in a Data Model is build of data structure (Tables) as per as requirement of the Database.A Data Model represent the nature of data, business rules govering the data & how it will be organized in the DB.
A Data Model is comprised of two parts :
- Logical Design
- Physical Design
Data model helps functional & technical team in designing the DB.
The Functional Team (that create the logical design) refers to one or more Business Analyst, Business Managers, End Users. And Technical Team (that create the physical design) refers to the one or more programmers & DBAs.
Data Modelers are responsible for designing the Data Model & they communicate with the functional team to get the business requirments & technical team to implement the DB.
EX :
------
Suppose a company (XYZ) is planning to build a Guest House (DB) & it call the Building Architect (Data Modeler) & Building Architect (Data Modeler) take the Guest House (DB) requirment (Business requirmnet) from the function team (Business Analyst, Business Managers) and develop a plan (Data Model) & give it to the Company (XYZ). And finally XYZ company call the civil engineer (Programmer, DBA) to construct the Guest House (DB). This concept is called Data Modelling.
Data Integration
Data integration is a process of combining the data from different sources & providing users with a unified view of these data.
Data Agreegation
Data Agreegation is a process in which the information is gathered & expressed in a summary form. The purpose of agreegation is to get more information about the particular group.
Business Intelligence (BI)
Business Intelligence (BI) is a technology which is used to gethering, storing & analysing the data to make a better decision.
BI solution help a company in making the business decision faster, accurate & market oriented.
Technical DW-MSBI work flow
[Sources] -> [ETL(Extract tha data using SSIS)] -> [Data Cleansing & Tranfoming of data] -> [DWH(stored the data in DWH (Fact & Dimesion))] -> [Create the OLAP cube(using SSAS)] -> [Analyse the data & create the Reports] -> [Implement Data Mining structure(Data Mining)] -> BI (make a better decision)
Regards
Manish
Sunday, February 20, 2011
What is Data Warehouse(DWH) & DataMart(DM)?
Data warehouse (DWH)
Data warehouse is a repositary or central aggregation of data where data is stored from different sources (like Excel, TextFiles, SQL Sever, MySQL, Oracle and lagacy applications etc).
Basically DWH are specially designed for reporting and analytics purpose.
Fundamental stages of DWH
There are four fundamental stages of DWH:
1 - Offline Operational DB
2 - Offline DWH
3 - Real Time DWH
4 - Integrated DWH
1 - Offline Operational DB
In this initially stage, DWH's are developed by simply copying of the DB of an operation system to an off line server where the processing load of reporting does not impact on the operational system performance.
2 - Offline DWH
In this stage, the DWH's are updated on a regular time cycle (Dailly, Weekly, Monthly) from the operational system & the data is stored in an integrated reporing oriented data structure.
3 - Real Time DWH
At this stage, the DWH's are upadted on a transection or event basis, every thime an operational system perform a transection.
4 - Integrated DWH
At this stage, the DWH's are used to generate the activity or transection that are passed back into the operational system for the use in the dailly activity of the organization.
There are two approachs to stored the data into DWH:
1 - Dimension Approach
2 - Normalized Approach
1 - Dimension Approach
In this approach, the data is stored in de-normlized formate, the tables are divided into two cetegories one is Fact and other is Dimension.
Fact : These types of table generally strored the numeric data.
Dimension : These types of table has the reference of Fact table data. And These tables are used for slicing & dicing of the fact data for analysis.
The advantage of this approach is that data is retrieved faster as normlized approach.
And the disadvantage of this approach is that it is typical to maintain the DWH.
2 - Normalized Approach
In this approach, the data in DWH in 3-normal form like OLTP. So You can easily maintain the data stored in DWH but the disadvantage is that data retrieval is slow than Dimension approach because of mainy relation between tables in case of 3-NF.
Datamart (DM)
Datamart is also a small DWH but it is basically designed for specific Department (like HR, Finance, Payroll etc). A DM is a collection of Subject Areas which is organized for decision support based on the need of a given department.
[Subject Areas : We can break a huge schema into smaller parts for easier management purpose & each part is known as Subject area.]
There are two types of DM:
1 - Dependent DM
2 - Independent DM
1 - Dependent DM
A dependent DM is DM whose source is a DWH. And dependent datamart is more stabled than independent datamart.
2 - Independent DM
An independent datamart is one whose source is the lagacy application environment. And independent datamarts are unstabled
Regards
Manish
Data warehouse is a repositary or central aggregation of data where data is stored from different sources (like Excel, TextFiles, SQL Sever, MySQL, Oracle and lagacy applications etc).
Basically DWH are specially designed for reporting and analytics purpose.
Fundamental stages of DWH
There are four fundamental stages of DWH:
1 - Offline Operational DB
2 - Offline DWH
3 - Real Time DWH
4 - Integrated DWH
1 - Offline Operational DB
In this initially stage, DWH's are developed by simply copying of the DB of an operation system to an off line server where the processing load of reporting does not impact on the operational system performance.
2 - Offline DWH
In this stage, the DWH's are updated on a regular time cycle (Dailly, Weekly, Monthly) from the operational system & the data is stored in an integrated reporing oriented data structure.
3 - Real Time DWH
At this stage, the DWH's are upadted on a transection or event basis, every thime an operational system perform a transection.
4 - Integrated DWH
At this stage, the DWH's are used to generate the activity or transection that are passed back into the operational system for the use in the dailly activity of the organization.
There are two approachs to stored the data into DWH:
1 - Dimension Approach
2 - Normalized Approach
1 - Dimension Approach
In this approach, the data is stored in de-normlized formate, the tables are divided into two cetegories one is Fact and other is Dimension.
Fact : These types of table generally strored the numeric data.
Dimension : These types of table has the reference of Fact table data. And These tables are used for slicing & dicing of the fact data for analysis.
The advantage of this approach is that data is retrieved faster as normlized approach.
And the disadvantage of this approach is that it is typical to maintain the DWH.
2 - Normalized Approach
In this approach, the data in DWH in 3-normal form like OLTP. So You can easily maintain the data stored in DWH but the disadvantage is that data retrieval is slow than Dimension approach because of mainy relation between tables in case of 3-NF.
Datamart (DM)
Datamart is also a small DWH but it is basically designed for specific Department (like HR, Finance, Payroll etc). A DM is a collection of Subject Areas which is organized for decision support based on the need of a given department.
[Subject Areas : We can break a huge schema into smaller parts for easier management purpose & each part is known as Subject area.]
There are two types of DM:
1 - Dependent DM
2 - Independent DM
1 - Dependent DM
A dependent DM is DM whose source is a DWH. And dependent datamart is more stabled than independent datamart.
2 - Independent DM
An independent datamart is one whose source is the lagacy application environment. And independent datamarts are unstabled
Regards
Manish
Subscribe to:
Posts (Atom)