Data Warehouse Technical (Functional) ArchitectureData Warehouse Technical (Functional) Architecture
Data Warehouse Technical (Functional) Architecture
Data Warehouse Technical (Functional) Architecture
1
Technical Data Warehouse Framework
Access Engineering Acquisition Delivery Extracted Data Data Marts Re-engd. Data Optl. Datastore Metadata Web Server Client/ Server Metadata Management Storage Workflow and System Management Data Staging Services "Back Room" "Front Room"
2
Metadata
describe the business processes and data that are required to support the operation of a Data Warehouse. are captured and managed in a database called the Repository. The metadata in the repository provide business and technical users with information about the contents of the Warehouse Common business definitions of warehouse and mart data data elements Data source data models Transformation metadata – the rules used to transform source data to Accepted usage conventions Formulae that are used to calculate derived data fields Descriptions of reference tables that are used in translations and transformations Descriptions of reports Business requirements documentation Design documentation
3
Metadata - capabilities
Enable a delivery facility that provides and / or publishes metadata as required to users. The metadata model is designed to allow new objects and object relationships to be created without adversely affecting the repository architecture. Single repository for the entire Warehouse.
4
#5
Data Acquisition Source Data (Internal and External) Extracted Data Extraction process Acquisition Engineering Storage Data Delivery Metadata Access Purpose This component acquires all data, internal and external, based on the target data models within the Storage component
5 Purpose: To select and extract data from source systems Framework Definitions: Extracted Data contains data that represents the scope and content of the target data model Activities: Identify authoritative data source Involve source data stewards and legacy application support Inspect data for completeness and correctness How often and how much data will be extracted? Select where processing will be done Address impact analysis
Data Acquisition
Definitions Internal Data is an organization’s data from operational and other systems that fuels the data warehouse External Data is integrated with an organization’s data in the warehouse to enhance the value of warehouse data. Examples include: customer demographic data, financial data, etc. Extracted Data represents all data from internal and external sources required to meet the content requirements of the warehouse data model Acquisition Engineering Storage Data Delivery Access
6
Data Acquisition
Functions/Considerations for Architecture Source Data: Identify authoritative source for all required data Identify and obtain commitment of data steward/owner Location of extraction Impact of extract on transaction systems - frequency/impact tradeoffs Data volume and projected (annual) growth Completeness Correctness Consistency Timeliness Platform/DBMS Availability Acquisition Engineering Storage Data Delivery Access
7
Data Acquisition
Functions (cont) Extraction process: Source data format Net change data only? Frequency Volumes Data movement Timing QA/Audit rules and process Change Control Acquisition Engineering Storage Data Delivery Access
8
#9
Data Engineering Warehouse Data Model Staged transformed data Extracted Data Transformation Transformed Data Metadata Acquisition Engineering Storage Data Delivery Access Source Data Model(s) Purpose Applies the business meaning and and rules to the incoming (Extracted) data and performs all necessary transformations for compliance to those. Re-engineered data is the transformed data per the business and integrity requirements of the warehouse data models.
9 Purpose: To design consistent data structures for the data warehouse, map, and transform source data to conform to the design. Framework Definitions: Re-engineered Data is the cleansed, standardized data to meet the integrity requirements of the data warehouse Activities: Model the database based on real business needs Design for query performance, understanding Use business terms Determine security "Scrub" data Define business rules and transformation logic Test and verify
#10
Data Engineering Acquisition Engineering Storage Data Delivery Access Definitions Re-engineered Data is the cleansed, standardized data to satisfy the business rules and integrity requirements of the data warehouse Metadata stores definitions and rules about data and other objects for on-going administration and operation of the data warehouse environment
10
Data Engineering
Functions Integration – assigning surrogate keys Managing Slowly Changing Dimensions Referental Integrity Checking Denormalization/renormalization Data type conversion Cleaning Standardization Derivation Aggregation(s) Audit Data Lineage audit NULL value management Acquisition Engineering Storage Data Delivery Access
11
#12
Data Storage Optional Staged transformed data Loading Transformed Data Staged transformed data Data Warehouse ODS Metadata Acquisition Engineering Storage Data Delivery Access Warehouse Data Model Purpose Maintains all implemented components of the data architecture for the data warehouse
12 Purpose: To manage the data warehouse data base environment Framework Definitions: Data Warehouse is the authorized source for a distributed data access environment Metadata Metadata stores business definitions of data Operational Data Store is the holding place for "current" time value data used for day-to-day reporting and decision making Activities: Administer the Data Base Optimize query performance Manage metadata Monitor data loads Build summary tables Provide data security Provide backup and restore
#13
Definitions Data Warehouse is the authorized source of subject-oriented, historical, time-variant information that supports the analysis requirements of users. The Data Warehouse acts as the central repository for a distributed data access environment Operational Data Store is the holding place for "current" time value data for day-to-day reporting and decision making Data Storage Acquisition Engineering Storage Data Delivery Access
13
Data Storage
Functions/Considerations for Architecture Growth/scalability Change management/version control Partitioning/chuncking RAID (Incremental)load strategy Load/refresh frequency Deletion/archiving Availability (contracts with marts) Backup/restore Acquisition Engineering Storage Data Delivery Access
14
#15
Data Delivery Data Warehouse Data Mart Data Mart Data Mart Data Mart Metadata Acquisition Engineering Storage Data Delivery Access Purpose Process and environment that distributes warehouse data to end user ‘marts’ as required to support their business needs. Typically based on a ‘publish and subscribe’ model.
15 Purpose: Distribute data to meet functional, processing, geographic and other organizational requirements Framework Definitions: Datamarts are application or function specific stores populated with subsets of data from the Data Warehouse Applications are the sharable processes that service the client OLAP Servers are "multidimensional database" engines that optimize business’ analysis of the data ROLAP Servers employ relational technology (as opposed to "multi-dimensional") to provide OLAP functionality Web Server services requests passed by the Internet Server Data Mining is a process that detects and investigates patterns in large data sets Activities: Distribute data to data marts and other decision support applications Distribute data for Internet/Intranet use
#16
Definitions Datamarts are application, function or geography specific stores that are populated with subsets of the data from the Data Warehouse Functions Volumes Frequency of data distribution Geographic distribution "Publish" form Common data subject synchronization Publish/subscribe parameters QA/Audit Availability requirements Administrative controls Security Data Delivery Acquisition Engineering Storage Data Delivery Access
16
#17
Data Access Metadata Acquisition Engineering Storage Data Delivery Access Purpose The end user environment that 'consumes' data for reporting, analysis and ‘knowledge discovery’
17 Purpose: To enable business users to query, analyze and report on data, thus turning data into information of value. Framework Definitions: Client interface is an application-related component residing on the client Activities: Understand user interaction with the data Define and manage business data views "One size does not fit all" for query and analysis tools Provide training and orientation Ensure infrastructure is in place
Data Access
Definitions On-line Analytic Processing (OLAP) applications provide "multidimensional" engines that optimize an organization’s analysis of data through pivoting, drilling, "slicing and dicing" functions. Relational OLAP (ROLAP) employs relational technology to provide OLAP-like functionality Statistical Analysis applications allow an organization to perform knowledge discovery (uncovering important patterns and relationships ‘hidden’ in warehouse data) through "traditional" statistical techniques and functions Data Mining automates the process of knowledge discovery by investigating patterns in large data sets using advanced statistical and artificial intelligence techniques Managed Query applications allow business users to create their own ad hoc queries and reports without the intervention of IT by insulating users from the complexity of the underlying data structures Report Writers provide a generalized facility for the creation of standard reports and templates Acquisition Eng
18
Data Access
Functions What business units are to be supported? Who will be warehouse users? How many? Where located? What information do users need to make decisions? Granularity? Aggregation? Time coverage? How do users want analyze data - time series, trend analysis, profiling….. What types of queries and reports? Ad hoc? Standard? How many/period? How do users need information presented - spreadsheet, graph …. How do users need to manipulate data - pivot, drill down, model, ‘what if’ …. Will users create information that needs to be retained? Integration with operational systems? Acquisition Engineering Storage Data Delivery Access
19
Warehouse and System Management
The architecture required to support: Security of Warehouse data Management of workflow Monitoring and Tuning of Warehouse performance "Governing" users Backup and Recovery Software distribution (access tools, typically)
20
Comments