Current Online Members: 0
Page Hit Count:
+000388518


News Update:
Datawarehouse/Database

Data Warehouse Overview

Data warehouse is a process for building decision support systems and knowledge management environment that supports both day-to-day tactical decision making and long-term business strategies.
Accroding to Bill Inmon (popularly known as father of data warehouse); the definition for DWH is "Subject-oriented, integrated, time variant, non-volatile collection of data in support of management's decision making process."
Subject-Oriented
A data warehouse can be used to analyze a particular subject area. For example, "sales" can be a particular subject.
Integrated
A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.
Time-Variant
Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.
Non-volatile
Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered.

DW Methodologies

Top-Down

Bottom-Up

Practitioner

Bill Inmon

Ralph Kimball

Emphasize

Data Warehouse

Data Marts

Design

Enterprise based normalized model; marts use a subject orient dimensional model

Dimensional model of data mart, consists star schema

Architect

Multi-tier comprised of staging area and dependent data marts

Staging area and data marts

Data set

DW atomic level data; marts summary data

Contains both atomic and summary data

Star Schema
A star schema can have any number of dimension tables. The crow's feet at the end of the links connecting the tables indicate a many-to-one relationship between the fact table and each dimension table.
The following figure shows a star schema with a single fact table and four dimension tables.
Snowflake Schema
A snowflake schema can have any number of dimensions and each dimension can have any number of levels.
The following figure shows a snowflake schema with two dimensions, each having three levels.
Star vs. Snowflake Snowflake SchemaStar Schema

Which Data warehouse?

Good to use for small data warehouses/data marts

Good for large data warehouses

Normalization(dim table)

3 Normal Form

2 Normal Demoralized Form

Ease of Use

More complex queries and hence less easy to understand

Less complex queries and easy to understand

Ease of maintenance/change

No redundancy and hence more easy to maintain and change

Has redundant data and hence less easy to maintain/change

Query Performance

More foreign keys-and hence more query execution time

Less foreign keys and hence lesser query execution time

Types of Facts / Measures

There are three types of facts:

Additive

Additive facts are facts that can be summed up through all of the dimensions in the fact table.

The purpose of this table is to record the sales amount for each product in each store on a daily basis. Sales_Amount is the fact. In this case, Sales_Amount is an additive fact, because you can sum up this fact along any of the three dimensions present in the fact table -- date, store, and product. For example, the sum of Sales_Amount for all 7 days in a week represent the total sales amount for that week.

Semi-Additive

Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.

Non-Additive

Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.

The purpose of this table is to record the current balance for each account at the end of each day, as well as the profit margin for each account for each day.
Current_Balance is a semi-additive fact, as it makes sense to add them up for all accounts (what's the total current balance for all accounts in the bank?), but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information). Profit_Margin is a non-additive fact, for it does not make sense to add them up for the account level or the day level.

Types of Fact Tables

Cumulative

This type of fact table describes what has happened over a period of time. For example, this fact table may describe the total sales by product by store by day. The facts for this type of fact tables are mostly additive facts. The first example presented here is a cumulative fact table.

Snapshot

This type of fact table describes the state of things in a particular instance of time, and usually includes more semi-additive and non-additive facts. The second example presented here is a snapshot fact table.

Dimension Types

Junk Dimension

Junk dimensions are dimensions that contain miscellaneous data (like flags and indicators) that do not fit in the base dimension table.

Degenerate Dimension

A degenerate dimension is data that is dimensional in nature but stored in a fact table. For example, if you have a dimension that only has Order Number and Order Line Number, you would have a 1:1 relationship with the Fact table. Do you want to have two tables with a billion rows or one table with a billion rows. Therefore, this would be a degenerate dimension and Order Number and Order Line Number would be stored in the Fact table.

Was the information posted in this article helpful?
209 138