ORACLE BUSINESS INTELLIGENCE WORLD

Oracle Business Intelligence and Data Warehousing Practice Notes and Knowledge Repository

  • About Me

    Hello Friends,

    This is Santosh Kumar Gidadmani, a Business Intelligence and Data Warehouse Enthusiast passionate about blogging articles in the BI, Data warehousing, space. This is my attempt to share my experience and knowledge on Oracle BI & Data Warehousing Subjects.

  • OBIEE 11g Certified Implementation Specialist

  • Oracle Partner Network Certified Specialist

  • Visit My Profiles





  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 73 other followers

  • Blog Hits Since Nov 2010

    • 354,021 Hits
  • Live Traffic

  • Visitors Country

    Since-Mar'12Free counters!

Archive for the ‘Identify Dimensions’ Category

Fast Changing Dimensions

Posted by Santosh Kumar Gidadmani on April 12, 2011

Fast changing dimensions are those dimensions if one or more of its attributes changes frequently and in many rows. A fast changing dimension can grow very large if we use the Type-2 approach to track numerous changes. These dimensions some time called rapidly changing dimensions.
Examples of fast changing dimensions are
Age
Income
Test score
Rating
Credit history score
Customer account
status
Weight

How to solve fast changing dimensions
After identifying the fast changing dimensions attributes, you have to create a mini dimension table with these attributes joined directly to fact table and not snowflaking.

A mini-dimension is a dimension that usually contains fast changing attributes of a larger dimension table. This is to improve accessibility to data in the fact table.

Before creating a mini dimension table, you have to convert these identified attributes individually into band ranges. The concept behind this method is to take limited discreet values as shown below.

Rows in mini-dimensions will be fewer than rows in large dimension tables because it restricts the rows in mini-dimensions by using the band range value method.

After identifying the fast changing attributes of the primary customer dimension, and determining the band ranges for these attributes, a new mini-dimension is formed called Cust_Mini_Dim as shown below.

– Santosh

Advertisements

Posted in Dimensional Modeling, Identify Dimensions | Leave a Comment »

Slowly Changing Dimensions

Posted by Santosh Kumar Gidadmani on April 10, 2011

Designing a dimensional model that accurately and efficiently handles changes is a critical consideration when building a data warehouse.
Handling changes to dimensional data across time can be difficult, and dimensional attributes rarely remain static. A customer address or name can change, sales representatives come and go, and companies introduce new products to replace older ones.

A slowly changing dimension is a dimension whose attribute or attributes for a record (row) change slowly over time.

Types of SCDs
1. Type-1 (overwriting the history)
2. Type-2 (preserving one or more versions of history)
3. Type-3 (preserving one or more columns of history)
4. Type-4 (preserving one or more table of history)

Type-1 (overwriting the history)
The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all.
This may be the best approach to use if the attribute change is simple, such as a correction in spelling. And, if the old value was wrong, it may not be critical that history is not maintained.
Advantages:
• It is the easiest and simplest to implement.
• It is extremely effective in those situations requiring the correction of bad data.
• No change is needed to the structure of the dimension table.

Type-2 (preserving the history)
A Type-2 approach adds a new dimension row for the changed attribute, and therefore preserves history. This approach accurately partitions history across time more efficiently than other change handling approaches. However, because the Type-2 approach adds new records for every attribute change, it can significantly increase the database’s size.
Advantages:
• Enables tracking of all historical information accurately and for an infinite number of changes.
Disadvantages:
• Causes the size of the dimension table to grow fast.
• Complicates the ETL process needed to load the dimensional model.

There are three methods to handle type 2 SCD.

Method A – No surrogate key – Use timestamp

When a change happens, a new record is added into the table. All the attributes are copied from the previous record except the changed values. The nature key is copied as well so the timestamps is used to differentiate the records.

When a fact table is joined with the dimension, if you are interested in the historical data, the timestamp will be used as part of the join condition. To ease the join, the record typically use two date columns – the effective start date and the effective end date.

Method B – No surrogate key – Use version number

Instead of using the date column, a version number is used to differentiate the different versions of the records.

This technique requires the fact table store both nature key and the version number to retrive a given version of the dimension date.

Method C – Use a surrogate key

When an attribue is change, a sequence generated key is used, the fact table will also use this key column as the foreign key.

Type-3 (preserving one or more versions of history)
The type-3 approach is typically used only if there is a limited need to preserve and accurately describe history. The Type 3 method tracks changes using separate columns.
Advantages:
• Does not increase the size of the table as compared to the type-2 approach, since new information is updated.

Type-4 (preserving one or more table of history)
The Type-4 method is usually referred to as using “history tables”, where one table keeps the current data, and an additional table is used to keep a record of some or all changes.

– Santosh

Posted in Dimensional Modeling, Identify Dimensions | Leave a Comment »

Degenerate Dimensions

Posted by Santosh Kumar Gidadmani on March 18, 2011

What is a degenerate dimension?

A degenerate dimension is dimension without attributes. It is a transaction-based number which resides in the fact table. There may be more than one degenerate dimension inside a fact table.

According to Ralph Kimball, in a data warehouse, a degenerate dimension is a dimension which is derived from the fact table and doesn’t have its own dimension table. Degenerate dimensions are often used when a fact table’s grain represents transactional level data and one wish to maintain system specific identifiers such as order numbers, invoice numbers (Bill No) and the like without forcing their inclusion in their own dimension.

A degenerate dimension is data that is dimensional in nature but stored in a fact table.

OLTP transaction numbers, such as bill numbers, courier tracking #, order number, invoice number, application received acknowledgement, and ticket number, usually produce dimensions without any attributes and are represented as degenerate dimensions in the fact table.

It would be correct to say that all information that the Bill Number# represents is stored in all other dimensions. Therefore, Bill Number# dimension has no attributes of its own; and therefore it cannot be made a separate dimension.

The Bill Number# should be placed inside the fact table right after the dimensional foreign keys and right before the numeric facts.

The best way to identify a missing or a badly designed degenerate dimension is to review your dimensional design and look for any dimension table that has equal or nearly the same number of rows as the fact table.

In other words, if, for every row that you insert in the fact table you also have to pre-insert another row in any other dimension table, then you have missed a degenerate dimension.

– Santosh

Posted in Dimensional Modeling, Identify Dimensions | Leave a Comment »

 
%d bloggers like this: