ORACLE BUSINESS INTELLIGENCE WORLD

Oracle Business Intelligence and Data Warehousing Practice Notes and Knowledge Repository

  • About Me

    Hello Friends,

    This is Santosh Kumar Gidadmani, a Business Intelligence and Data Warehouse Enthusiast passionate about blogging articles in the BI, Data warehousing, space. This is my attempt to share my experience and knowledge on Oracle BI & Data Warehousing Subjects.

  • OBIEE 11g Certified Implementation Specialist

  • Oracle Partner Network Certified Specialist

  • Visit My Profiles





  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 73 other followers

  • Blog Hits Since Nov 2010

    • 357,310 Hits
  • Live Traffic

  • Visitors Country

    Since-Mar'12Free counters!

Archive for the ‘Data Warehousing’ Category

Data Warehousing – Definitions

Posted by Santosh Kumar Gidadmani on January 31, 2011

Thought to put some basics of Data Warehousing Concepts before I start with more focused subjects. I did lot of research in gathering information on these topics. Let us see some important topics. I would cover all these topics in series, so check out all series to gain some basics of data warehousing, data warehouse, architecture, life cycle etc. These topics would be helpful for any data warehouse beginners.

What is Data Warehousing?

“A process of transforming data into information and making it available to users in a timely enough manner to make a difference” – Forrester Research, April 1996.

What is a Data Warehouse?

1. “A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context” – Barry Devlin.

2. “A data warehouse is a subject-oriented, integrated, time-varying, non-volatile. It is collection of data that is used primarily in organizational decision making” – Bill Inmon, Building the Data Warehouse 1996.

3. A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources – Oracle.

A fundamental concept of a data warehouse is the distinction between data and information. Data is composed of observable and recordable facts that are often found in operational or transactional systems. The Information is an integrated collection of facts and is used as the basis for decisionmaking.

What are the characteristics of Data Warehouse?

Subject-Oriented: Information is presented according to specific subjects or areas of interest, not simply as computer files.

Integrated: Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. That is, if two different source systems store conflicting data about entities, or attributes of an entity, the differences need to be resolved during the process of transforming the source data and loading it into the data warehouse.

Non-Volatile: Stable information that doesn’t change each time an operational process is executed. Information is consistent regardless of when the warehouse is accessed.

Time-Variant: Containing a history of the subject, as well as current information. Historical information is an important component of a data warehouse.

Accessible: The primary purpose of a data warehouse is to provide readily accessible information to end-users.

Process-Oriented: It is important to view data warehousing as a process for delivery of information. The maintenance of a data warehouse is ongoing and iterative in nature.

What is the Advantages of Data warehouse?

Enhances end-user access to a wide variety of data.

Business decision makers can obtain various kinds of trend reports.

Increased data consistency.

Potentially lower computing costs and increased productivity.

Providing a place to combine related data from separate sources.

Creation of a computing infrastructure that can support changes in computer systems and business structures.

Empowering end-users to perform any level of ad-hoc queries or reports without impacting the performance of the operational systems.

Source: http://www.dedupe.com

What are the historical developments in the areas data warehousing?

1960s — General Mills and Dartmouth College, in a joint research project, develop the terms dimensions and facts.

1970s — ACNielsen and IRI provide dimensional data marts for retail sales.

1970s — Bill Inmon begins to define and discuss the term: Data Warehouse

1983 — Teradata introduces a database management system specifically designed for decision support.

1988 — Barry Devlin and Paul Murphy publish the article An architecture for a business and information systems in IBM Systems Journal where they introduce the term "business data warehouse".

1990 — Daniel Linstedt begins work on Developing the Data Vault model and methodology for data warehouses

1990 — Red Brick Systems introduces Red Brick Warehouse, a database management system specifically for data warehousing.

1991 — Prism Solutions introduces Prism Warehouse Manager, software for developing a data warehouse.

1991 — Bill Inmon publishes the book Building the Data Warehouse.

1995 — The Data Warehousing Institute, a for-profit organization that promotes data warehousing, is founded.

1995 — Daniel Linstedt adds SEI/CMMI and Six Sigma to the Data Vault Methodology to manage projects in data warehousing.

1996 — Ralph Kimball publishes the book The Data Warehouse Toolkit.

2000 — Daniel Linstedt releases the Data Vault, enabling real time auditable Data Warehouses

Source: Wiki

– Santosh

Advertisements

Posted in Data Warehousing, DW Definitions | Leave a Comment »

Data Warehouse Approaches

Posted by Santosh Kumar Gidadmani on January 9, 2011

There are two major approaches to data warehouse design.

1. Bottom-up approach

• This approach is recommended by Kimball.

• In the bottom-up approach data marts are first created to provide reporting and analytical capabilities for specific business processes.

• Data marts contain, primarily, dimensions and facts. Facts can contain either atomic data and, if necessary, summarized data. The single data mart often models a specific business area such as Sales or Production.

• These data marts can eventually be integrated to create a comprehensive data warehouse.

• The integration of the data marts in the data warehouse is centered on the conformed dimensions.

• The actual integration of two or more data marts is then done by a process known as "Drill across". A drill-across works by grouping (summarizing) the data along the keys of the (shared) conformed dimensions of each fact participating in the "drill across" followed by a join on the keys of these grouped (summarized) facts.

• Some consider it an advantage of the Kimball method, that the data warehouse ends up being "segmented" into a number of logically self contained and consistent data marts, rather than a big and often complex centralized model.

• Business value can be returned as quickly as the first data mart is built.

2. Top-down approach

• This approach is recommended by Bill Inmon.

• Inmon is one of the leading proponents of the top-down approach to data warehouse design, in which the data warehouse is designed using a normalized enterprise data model.

• In the Inmon vision the data warehouse is at the center of the "Corporate Information Factory" (CIF), which provides a logical framework for delivering business intelligence (BI) and business management capabilities.

• The top-down design methodology generates highly consistent dimensional views of data across data marts since all data marts are loaded from the centralized repository.

• Generating new dimensional data marts against the data stored in the data warehouse is a relatively simple task.

• The main disadvantage to the top-down methodology is that it represents a very large project with a very broad scope, cost and time.

• In addition, the top-down methodology can be inflexible and unresponsive to changing departmental needs during the implementation phases.

– Santosh

Posted in Data Warehousing, DW Aproaches | Leave a Comment »

Data Warehouse – OLTP Vs OLAP

Posted by Santosh Kumar Gidadmani on January 3, 2011

Find out the complete differences between OLTP and OLAP.

Area OLTP SystemOnline Transaction Processing (Operational System) OLAP SystemOnline Analytical Processing (Data Warehouse)
Time Scale Stores current data Stores historical data for analysis
Indexing Few Indexes More Indexes
Normalization Fully Normalized Partially normalized
Stored Values Stores typically coded data Stores descriptive data
Source of data Scattered among different databases or DBMS and using different value coding schemes Centralized in data warehouse. Or in a collection of subject oriented data marts
Purpose of data To control and run fundamental business tasks To help with planning, problem solving, and decision support
What the data Reveals A snapshot of ongoing business processes Multi-dimensional views of various kinds of business activities
Inserts and Updates Short and fast inserts and updates initiated by end users Periodic long-running batch jobs refresh the data
Processing Speed Typically very fast Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes
Space Requirements Can be relatively small if historical data isarchived Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP
Modeling Uses Entity relational model Use Dimensional Model (Start and snowflake schema)

Sources: http://www.rainmakerworks.com, http://www.it.toolbox.com

Difference between OLTP and OLAP in Kimball’s words.

OLTP
The users of an operational system turn the wheels of the organization. They take orders, sign up new customers, and log complaints. Users of an operational system almost always deal with one record at a time. They repeatedly perform the same operational tasks over and over.

OLAP
The users of a data warehouse, on the other hand, watch the wheels of the organization turn. They count the new orders and compare them with last week’s orders and ask why the new customers signed up and what the customers complained about. Users of a data warehouse almost never deal with one row at a time. Rather, their questions often require that hundreds or thousands of rows be searched and compressed into an answer set. To further complicate matters, users of a data warehouse continuously change the kinds of questions they ask.

Add your thoughts if any other points are missing.

– Santosh

Posted in Data Warehousing, DW OLTP Vs OLAP | Leave a Comment »

 
%d bloggers like this: