Newest Viewed Downloaded

Data Warehouse Models and OLAP Operations Enrico Franconi CS 636

Data Warehouse Models and OLAP Operations Enrico Franconi CS 636

Data Warehouse Architecture

Decision Support

Information technology to help the knowledge worker (executive, manager, analyst) make faster & better decisions “What were the sales volumes by region and product category for the last year?” “How did the share price of comp. manufacturers correlate with quarterly profits over the past 10 years?” “Which orders should we fill to maximize revenues?” On-line analytical processing (OLAP) is an element of decision support systems (DSS)

Three-Tier Decision Support Systems

Warehouse database server Almost always a relational DBMS, rarely flat files OLAP servers Relational OLAP (ROLAP): extended relational DBMS that maps operations on multidimensional data to standard relational operators Multidimensional OLAP (MOLAP): special-purpose server that directly implements multidimensional data and operations Clients Query and reporting tools Analysis tools Data mining tools

The Complete Decision Support System Information Sources Data Warehouse Server (Tier 1) OLAP Servers (Tier 2) Clients (Tier 3) Operational DB’s Semistructured Sources extract transform load refresh etc. Data Marts Data Warehouse e.g., MOLAP e.g., ROLAP serve Analysis Query/Reporting Data Mining serve serve

Data Warehouse vs. Data Marts

Enterprise warehouse: collects all information about subjects (customers,products,sales,assets, personnel) that span the entire organization Requires extensive business modeling (may take years to design and build) Data Marts: Departmental subsets that focus on selected subjects Marketing data mart: customer, product, sales Faster roll out, but complex integration in the long run Virtual warehouse: views over operational dbs Materialize sel. summary views for efficient query processing Easy to build but require excess capability on operat. db servers

Approaches to OLAP Servers

Relational DBMS as Warehouse Servers Two possibilities for OLAP servers (1) Relational OLAP (ROLAP) Relational and specialized relational DBMS to store and manage warehouse data OLAP middleware to support missing pieces (2) Multidimensional OLAP (MOLAP) Array-based storage structures Direct access to array data structures

OLAP Server: Query Engine Requirements

Aggregates (maintenance and querying) Decide what to precompute and when Query language to support multidimensional operations Standard SQL falls short Scalable query processing Data intensive and data selective queries

OLAP for Decision Support

OLAP = Online Analytical Processing Support (almost) ad-hoc querying for business analyst Think in terms of spreadsheets View sales data by geography, time, or product Extend spreadsheet analysis model to work with warehouse data Large data sets Semantically enriched to understand business terms Combine interactive queries with reporting functions Multidimensional view of data is the foundation of OLAP Data model, operations, etc.

Warehouse Models & Operators

Data Models relations stars & snowflakes cubes Operators slice & dice roll-up, drill down pivoting other

Multi-Dimensional Data

Measures - numerical data being tracked Dimensions - business parameters that define a transaction Example: Analyst may want to view sales data (measure) by geography, by time, and by product (dimensions) Dimensional modeling is a technique for structuring data around the business concepts ER models describe “entities” and “relationships” Dimensional models describe “measures” and “dimensions”

The Multi-Dimensional Model

“Sales by product line over the past six months” “Sales by store between 1990 and 1995” Prod Code Time Code Store Code Sales Qty Store Info Product Info Time Info . . . Numerical Measures Key columns joining fact table to dimension tables Fact table for measures Dimension tables

Dimensional Modeling

Dimensions are organized into hierarchies E.g., Time dimension: days  weeks  quarters E.g., Product dimension: product  product line  brand Dimensions have attributes

Dimension Hierarchies

Store Dimension Product Dimension District Region Total Brand Manufacturer Total Stores Products The simple model we will use to demonstrate the various design alternatives is composed of three dimensions. Only two are shown, Store and Product. The third, Time, is composed of the following attribute hierarchy: date -> month -> quarter -> year. The Store dimension has an attribute hierarchy of store -> district -> region. Products is composed of products -> brand -> manufacturer. Based on this simple model, we can see that the granularity of data is products sold in stores by day.

ROLAP: Dimensional Modeling Using Relational DBMS

Special schema design: star, snowflake Special indexes: bitmap, multi-table join Special tuning: maximize query throughput Proven technology (relational model, DBMS), tend to outperform specialized MDDB especially on large data sets Products IBM DB2, Oracle, Sybase IQ, RedBrick, Informix

MOLAP: Dimensional Modeling Using the Multi Dimensional Model

MDDB: a special-purpose data model Facts stored in multi-dimensional arrays Dimensions used to index array Sometimes on top of relational DB Products Pilot, Arbor Essbase, Gentia

Star Schema (in RDBMS)

Star Schema Example

Star Schema with Sample Data

The “Classic” Star Schema

Benefits: Easy to understand, easy to define hierarchies, reduces # of physical joins, low maintenance, very simple metadata Drawbacks: Summary data in the fact table yields poorer performance for summary levels, huge dimension tables a problem The Star is built for simplicity and speed. Forget everything you learned about designing relational databases. The Star Schema makes no excuses for the rules it breaks. The assumption behind it is that the database is static or ìquiet,î meaning that no updates are performed on-line. Remember that most of the rules of relational database design are derived from the need to maintain atomicity, consistency and integrity (the ìACIDî test) in an On-Line Transaction Processing (OLTP) environment. Since the data warehouse is quiet, these constraints can be relaxed.

Showing 1 - 20 of 45 items Details

Name: 
CS636-olap
Author: 
Joachim Hammer
Company: 
University of Florida
Description: 
Data Warehouse Models and OLAP Operations Enrico Franconi CS 636
Tags: 
tabl | data | dimens | store | product | fact | region | queri
Created: 
8/31/1998 8:56:49 PM
Slides: 
45
Views: 
41
Downloads: 
6
Rating: 
0


> Comment



Share this presentation
|

Comments

Share this presentation:

|
Sitemap