Data Warehouse Modelsand OLAP Operations Enrico Franconi
CS 636
Data Warehouse Modelsand OLAP Operations Enrico Franconi
CS 636
Data Warehouse Architecture
Decision Support
Information technology to help the knowledge worker (executive, manager, analyst) make faster & better decisions
“What were the sales volumes by region and product category for the last year?”
“How did the share price of comp. manufacturers correlate with quarterly profits over the past 10 years?”
“Which orders should we fill to maximize revenues?”
On-line analytical processing (OLAP) is an element of decision support systems (DSS)
Three-Tier Decision Support Systems
Warehouse database server
Almost always a relational DBMS, rarely flat files
OLAP servers
Relational OLAP (ROLAP): extended relational DBMS that maps operations on multidimensional data to standard relational operators
Multidimensional OLAP (MOLAP): special-purpose server that directly implements multidimensional data and operations
Clients
Query and reporting tools
Analysis tools
Data mining tools
The Complete Decision Support System Information Sources Data Warehouse
Server
(Tier 1) OLAP Servers
(Tier 2) Clients
(Tier 3) Operational
DB’s Semistructured
Sources extract
transform
load
refresh
etc. Data Marts Data
Warehouse e.g., MOLAP e.g., ROLAP serve Analysis Query/Reporting Data Mining serve serve
Data Warehouse vs. Data Marts
Enterprise warehouse: collects all information about subjects (customers,products,sales,assets, personnel) that span the entire organization
Requires extensive business modeling (may take years to design and build)
Data Marts: Departmental subsets that focus on selected subjects
Marketing data mart: customer, product, sales
Faster roll out, but complex integration in the long run
Virtual warehouse: views over operational dbs
Materialize sel. summary views for efficient query processing
Easy to build but require excess capability on operat. db servers
Approaches to OLAP Servers
Relational DBMS as Warehouse Servers
Two possibilities for OLAP servers
(1) Relational OLAP (ROLAP)
Relational and specialized relational DBMS to store and manage warehouse data
OLAP middleware to support missing pieces
(2) Multidimensional OLAP (MOLAP)
Array-based storage structures
Direct access to array data structures
OLAP Server: Query Engine Requirements
Aggregates (maintenance and querying)
Decide what to precompute and when
Query language to support multidimensional operations
Standard SQL falls short
Scalable query processing
Data intensive and data selective queries
OLAP for Decision Support
OLAP = Online Analytical Processing
Support (almost) ad-hoc querying for business analyst
Think in terms of spreadsheets
View sales data by geography, time, or product
Extend spreadsheet analysis model to work with warehouse data
Large data sets
Semantically enriched to understand business terms
Combine interactive queries with reporting functions
Multidimensional view of data is the foundation of OLAP
Data model, operations, etc.
Warehouse Models & Operators
Data Models
relations
stars & snowflakes
cubes
Operators
slice & dice
roll-up, drill down
pivoting
other
Multi-Dimensional Data
Measures - numerical data being tracked
Dimensions - business parameters that define a transaction
Example: Analyst may want to view sales data (measure) by geography, by time, and by product (dimensions)
Dimensional modeling is a technique for structuring data around the business concepts
ER models describe “entities” and “relationships”
Dimensional models describe “measures” and “dimensions”
The Multi-Dimensional Model
“Sales by product line over the past six months”
“Sales by store between 1990 and 1995” Prod Code Time Code Store Code Sales Qty Store Info Product Info Time Info . . . Numerical Measures Key columns joining fact table
to dimension tables Fact table for
measures Dimension tables
Dimensional Modeling
Dimensions are organized into hierarchies
E.g., Time dimension: days weeks quarters
E.g., Product dimension: product product line brand
Dimensions have attributes
Dimension Hierarchies
Store Dimension Product Dimension District Region Total Brand Manufacturer Total Stores Products
The simple model we will use to demonstrate the various design alternatives is composed of three dimensions. Only two are shown, Store and Product. The third, Time, is composed of the following attribute hierarchy: date -> month -> quarter -> year.
The Store dimension has an attribute hierarchy of store -> district -> region. Products is composed of products -> brand -> manufacturer.
Based on this simple model, we can see that the granularity of data is products sold in stores by day.
ROLAP: Dimensional Modeling Using Relational DBMS
Special schema design: star, snowflake
Special indexes: bitmap, multi-table join
Special tuning: maximize query throughput
Proven technology (relational model, DBMS), tend to outperform specialized MDDB especially on large data sets
Products
IBM DB2, Oracle, Sybase IQ, RedBrick, Informix
MOLAP: Dimensional Modeling Using the Multi Dimensional Model
MDDB: a special-purpose data model
Facts stored in multi-dimensional arrays
Dimensions used to index array
Sometimes on top of relational DB
Products
Pilot, Arbor Essbase, Gentia
Star Schema (in RDBMS)
Star Schema Example
Star Schema with Sample Data
The “Classic” Star Schema
Benefits: Easy to understand, easy to define hierarchies, reduces # of physical joins, low maintenance, very simple metadata
Drawbacks: Summary data in the fact table yields poorer performance for summary levels, huge dimension tables a problem
The Star is built for simplicity and speed. Forget everything you learned about designing relational databases. The Star Schema makes no excuses for the rules it breaks. The assumption behind it is that the database is static or ìquiet,î meaning that no updates are performed on-line.
Remember that most of the rules of relational database design are derived from the need to maintain atomicity, consistency and integrity (the ìACIDî test) in an On-Line Transaction Processing (OLTP) environment. Since the data warehouse is quiet, these constraints can be relaxed.
Comments