metadata architecture tmpl

33
Metadata Approach and Architecture <Project Name> Month DD, YYYY

Upload: smitasamrat

Post on 13-Sep-2015

14 views

Category:

Documents


0 download

DESCRIPTION

Template for Metadata

TRANSCRIPT

Meta-Data Approach and Architecture

METADATA APPROACH AND ARCHITECTURE

Metadata Approach and Architecture

Month DD, YYYYFOR INTERNAL USE ONLY

AUTHORED BY

NameTitleEmail

VERSION CONTROL

Change ByVersionDateDescription of Change

CLIENT APPROVAL

Approved ByEmail AddressApproval

MethodApproval

Date

Table of Contents1Objective42Metadata Classification63GUIDING PRINCIPLES64Business Objectives and Drivers75Current Environment86Tool Selection97Functional Requirements117.1Metadata Functional Requirements (What)117.2Metadata Stakeholders (Constituencies)127.3Requirements to Business Objective Traceability Cross Reference138Metadata Approach148.1Conceptual Architecture149Metadata Content Types189.1DW and ETL Design Metadata189.2Data Modeling Metadata189.3ETL Construction Metadata199.4RDBMS creation, loading and maintenance Metadata199.5Legacy File/RDBMS Table System access and structure Metadata199.6EDW Job Execution including the Audit, Balance and Control environment Metadata199.7Migration Management Metadata209.8Data Quality Metadata209.9Business Intelligence Tool Metadata2010Repository Interface High Level Design2111Roadmap Current Phase2211.1EDW Phase I - Completed to Data2211.2EDW Phase II2211.3EDW Phase III2312Roadmap Future Phases23

Table of Figures

4Figure 1 Metadata Goals

Figure 2 - Phase Planning Roadmap (EDW Phases 1 and II)5Figure 3 Metadata Classification6Figure 4 ISC6Figure 5 - EDW Components8Figure 6 - Current Environment and Metadata Collection Points10Figure 7 Metadata Stakeholders12Figure 8 Metadata RTM13Figure 9 - Point to Point Approach14Figure 10 - Central Repository Approach14Figure 11 - Hybrid Approach16Figure 12 - Conceptual Architecture17Figure 13 - High Level Metadata Repository Interface Architecture21

1 Objective

This document establishes principles and policy with respect to metadata in the Enterprise Data Warehouse (EDW) environment. The EDW environment consists of three persistent and primary data repositories: The Staging Database, The Integrated Data Store (IDS) and the EDW itself. Throughout this document when the term EDW is used it is meant to include all three.

Metadata is commonly acknowledged to be data about data. However, to be more precise, the definition of metadata is addressed in the Information Access Strategy and Roadmap:

Metadata is data related to the complete understanding, formal management, and consistent use of an organizations data assets to support its business activities.

Principles and policies in this document may have applicability outside of the EDW. It is within the scope of this document to prescribe the policies and principles that will ensure that data in the entire enterprise has a precise and consistent use through out the Enterprise, regardless of source or legacy system use.

It is beyond the scope of this document to address the metadata needs for legacy systems as it relates to internal legacy system structure, data dictionaries, programming standards and data naming.

This metadata approach has three major goals:Define:

Manage the components necessary to provide an understanding of the EDW and the Enterprise through its content.

Build and Administer:

Manage the components necessary to efficiently define, build and operate the EDW. EDW metadata includes the structure and contents of the operational data that is used to execute EDW jobs and perform the Audit Balance and Control (ABC) aspects of the EDW environment. This goal requires well defined metadata about both data and processes.

Navigate:

Manage the components necessary to enable self service use of the EDW that is efficient, accurate and repeatable.

Figure 1 Metadata Goals

This document defines the approach (methods) for meeting those goals and the architecture needed to implement the method. The approach includes:

Which tools will be used and to what extent

The policies that must be followed to ensure proper metadata governance.

The processes to acquire, standardize, distribute, maintain and retire metadataThe approach excludes the detailed procedures that are used to execute processes. The approach is meant to be tool vendor neutral. However, given the lack of standardized metadata tools, the approach is based on the metadata tool set currently in place at.The Metadata Approach and Architecture compliments the Data Architecture and Design, and ETL Architecture and Standards documents.In the current project plan and the Needs Assessment, the EDW will be developed in Phases. Each Phase will be 3-4 months (+/-) long. Frequent, incremental releases are preferred to a big bang development and implementation approach.

Figure 2 - Phase Planning Roadmap (EDW Phases 1 and II)

Figure 2 shows how the EDW evolves based on a foundation that includes metadata. The metadata solution evolves over time as an independent high level task, but coordinated with the overall program plan.

The future detailed metadata Architecture is dependent, to some degree, on the composition of future EDW phases. In order to provide a practical boundary to the current architecture, this document is meant to apply specifically to EDW Phases II and III. The objectives of the build out of the metadata Architecture though the EDW Phase II and III time horizon are detailed in Section 11 Roadmap Current Phase.

Future metadata Architecture build-out objectives are not currently assigned to a specific Release, but are explained in Section 12 Roadmap Future Phases2 Metadata Classification

Metadata is classified according to the goals of Define, Build & Administer, and Navigate. Each of these goals becomes a building block that adds to the total metadata content. For each of these classifications, there are distinct sources, typical users and benefits to the enterprise.

Figure 3 Metadata Classification

Metadata is used to facilitate the Information Supply Chain (ISC). The ISC is a manufacturing metaphor applied to the intake of source data as raw material and manufacturing through a series of transformations in to information finished products. The Information Supply Chain is centered on the Enterprise Data Warehouse.Figure 4 ISC

In each of these seven contexts, the same data value (for example a single Claim Line Procedure Code) is identified by a variety of names and in different formats. Some raw data may be missing or of questionable quality. Metadata in support of the ISC is used to link the source through every stage of the process to its final end use.

The Navigation goal includes the ability to define the lineage of any data at any point in the ISC continuum. This is key to the success of the Data Warehouse effort. The Data Warehouse can be successful only when the information is deemed reliable and the most accurate available. The business users will acquire trust in the EDW contents when they are able to see how the data was moved from the traditional sources, transformed and organized in the EDW. Metadata defines data lineage and allows straightforward Navigation. Lineage is essential to EDW acceptance by the business community.

3 GUIDING PRINCIPLES

To facilitate the development of this document and future changes, a set of Guiding Principles are as follows:

1. The EDW is not a project but rather an ongoing process, so is the effort to build and maintain metadata.

2. Business needs will drive the process. Priorities will be established to maximize business benefits.

3. There is no single point solution to addressing metadata. An iterative approach must be used to increase functionality gradually through a series of planned releases. Metadata iterations will be consistent with the overall Phased approach to the EDW.

4. Metadata must be created and managed as integrated component of the Data Warehouse architecture. It cannot be created as afterthought or add on.

5. There is a need for an integrated view of metadata to support the EDW that can associate metadata across tools and display an integrated picture using the Web / Intranet.

6. The warehouse will continue to be supported by numerous tools and processes each of which will individually generate and store metadata. The MD solution will be used as a collection, consolidation and display point for metadata across the entire Data Warehousing Framework. The MD solution provides the capabilities to create and manage associations and relationships across tools.

7. Metadata will support the automation of Impact Analysis to decrease time to market and reduce rework for enchantments and revisions to the BI environment.

8. The generation, collection and sharing of metadata will be automated whenever possible and practical.

9. Data objects (tables, columns, etc.) will be versioned to provide the ability to see how the object has changed over a period of time.

10. Components available on the open marketplace will be used whenever practical. If all things were equal for buy vs. build decision, the decision would be in favor of buying. A build from scratch will only be used if no reasonable purchase option exists.

11. The MD solution will be developed in a manner best suited to take advantage of the existing and emerging industry standards:

XML/XMI is the emerging data interchange standard for metadata.

XML data streams produced by CWM/XML/XMI compliant tools will be used when available and practical.

Custom developed extracts will use XML data streams when practical.

CWM (Common Warehouse Metamodel) is the emerging EDW metadata model standard.

4 Business Objectives and Drivers

In the interviews leading to the Information Access Strategy and Roadmap, the following business drivers were established. Each driver defines a specific value added opportunity. The metadata approach is based on optimizing the total value of these opportunities, both initially and over time.

The business objectives from the Strategy and Roadmap that have the closest ties to metadata are:

Increase the Speed to Market

Become an information broker

Gain competitive advantage via analytics better than the competition

Reduce operational costs

The measures of success, that is, the value of collecting, storing and maintaining metadata can be summarized from the assessment as follows:

Metadata provides value added if it measurably improves data quality through

1. By providing consistent business rules and definitions, enabling users to deliver one version of the truth across the enterprise.

2. Enabling business user self-service and self sufficiency, increasing the timeliness of information

3. Providing reliable, consistent input to support trend and complex analytical analysis with repeatable results

4. By providing data lineage across all systems and the ability to tie it back to the operational source systems, facilitate full integration and reconciliation of medical and pharmacy data

5. By providing consistent classifications and characterizations of products, lines of business and accounting hierarchies, allow users to delivering repeatable, auditable results.

Metadata provides value added if it measurably improves the cost of data through

6. Reducing the time (labor) it takes to find the right source for the data

7. Reducing duplicate work

8. Reducing rework associated with implementing changes

9. Reducing learning curve for new employees and consistent education for business rules and definitions for all employees

10. Reducing the time required to reconcile conflicting information that should be consistent11. Reducing the cost of producing undetected wrong information

These 11 Business objectives provide traceability from the strategic needs to the functional requirements of the Metadata Architecture. All requirements should meet at least one business objective. All business objectives should give rise to at least one requirement.

5 Current Environment

Building and maintaining the EDW contains nine distinct components:

Figure 5 - EDW Components

Each of the components is supported by tools and processes. Tools provide storage, access and manipulation of metadata. However, tools generally provide only very limited interface and metadata exchange capability to other tools. Most provide none at all. While the selection of tools would ideally be done as part of the metadata strategy, there is a substantial investment in the current set of tools. As a result the current Approach and Architecture will use the current tool environment as beneficially as possible. As the EDW evolves and as more value added tools become available, a periodic study and optimization of the available tools is appropriate.

The Current Environment is shown in Figure 6 - Current Environment and Metadata Collection Points.

Each tool in the current environment is a potential collection point for metadata. Many of the collection points overlap. Specific users generally have access to only a small selection of the tools above. It is essential that the same metadata (for example: business definitions of data attributes) be accessible through several tools. Business definitions will be accessible through CDMA, the BI tool and Power Designer. While Power Designer may be the System of Record for this metadata, the metadata must be accessible to a wide audience and synchronized with all the tools in the final architecture.

6 Tool Selection

Due to the current EDW phased schedule and cost, it is expected that only incremental new tools will be acquired at least through the planning horizon for this Architecture. The existing tool set will remain stable. The most likely additional tool during Phase II will be to automate the Power Designer to COGNOS metadata interface with a bridge tool.Currently this document does not contain tool selection processes and criteria. However, tool selection is within the scope of a Metadata Architecture document. At the time when new tools are being considered, this document will be expanded to include the process and selection criteria for tools.

Part A EDW ETL Design, Modeling, ETL, RDBMS and File SystemsPart B Job Execution, Migration Management, Data Quality and BI

Figure 6 - Current Environment and Metadata Collection Points

7 Functional Requirements

Functional Requirements identify:

What the Functional Requirements are

Who has a vested interest in the Functional requirements (Metadata Stakeholders)

Traceability to the business Objectives

The functional requirements listed below represent the long-term vision for metadata. Capabilities to meet these requirements will be built out gradually overtime through an iterative development approach.

7.1 Metadata Functional Requirements (What)

1. Provide a single, cost effective view of business related metadata across tools for the business community that is:

Easily used, including self-help through online services User accessible

Easy to understand

Provides one version of the truth

While this functional requirement is just one of many, this requirement is more heavily weighted. The heavy weight is a direct result of the value to business users of meeting this single requirement.

2. Enable desire to obtain a faster response to business change by shifting the business analysis efforts from finding data to analyzing data and decision making.

3. Provide access to metrics and measures relating to data quality, load statistics, audit, balance and control processes, and data and code usage to help build trust in the underlying data.

4. Integrate and exchange metadata between the multiple tools that comprise the nine data warehousing components:

EDW and ETL Design

Data Modeling ETL Construction

RDBMS creation, loading and maintenance

Legacy File System access and structure

EDW Job Execution including the Audit, Balance and Control environment

Migration Management for all design and data objects through the ISC

Data Quality

Business Intelligence Tools

5. Adapt to changing technologies tools and products with minimal impact.

6. Automate the impact analysis across tools used in the construction of the EDW:

Power Designer

UDB tables, columns, views, stored procedures and triggers

Informatica

Cognos

Impact Analysis refers to the ability to identify how changes in any construction component will require subsequent changes in the other components. Impact Analysis results in a more reliable estimate of cost, effort and risk associated with EDW changes.

7. Provide a complete picture of data lineage from data sources to target in the ISC including:

Easy access to business and technical definitions for all objects

Ability to examine a data element and see where it originated and its destination and how it is transformed including: Mappings, Transformations, Associated Business Rules, Summaries/Calculations/Aggregations, Update Frequencies, Data Currency

Ability to examine standard codes including: definitions, source code set, target code set, mappings and conversions between source and target codes

Ability to see the original data source and final target with the ability to drill through detailed mappings and transformations

Ability to retrieve and review data models

8. Search and retrieval facilities with filtering capabilities to find the following along with its associated location, business and technical definitions and access instructions to encourage reuse of existing objects:

Available Data Sources

Tables/Views and columns with associated data

Keys and indexes

Constraints

Joins

Triggers, functions and stored procedures

Files and fields

Canned reports and queries

Available documentation

Limitations and known inaccuracies

9. Capture and display information knowledge stewardship responsibilities for the IDS and EDW.

7.2 Metadata Stakeholders (Constituencies)

The following summary defines the metadata stakeholders that have been identified and the aspect of metadata that most directly affects them.

Figure 7 Metadata Stakeholders

7.3 Requirements to Business Objective Traceability Cross Reference

Figure 8 Metadata RTM

8 Metadata Approach

8.1 Conceptual Architecture

There are three widely accepted metadata architectural approaches:

Point to Point,

Central Repository and

Hybrid of Point to Point and Central.

The discussions are based on the current state of the art reported by actual experience. Experiences have been reported through The Data Warehouse Institute (TDWI), The Gartner Group, and customer experience white papers.

The Point to Point approach is rep-resented in Figure 9. The Central Repository approach is represented in Figure 10.Figure 9 - Point to Point Approach

Point-to-Point Discussion

Point-to-point retains each tool as an independent repository. Metadata is synchronized by interfaces (generally batch) between each tools underlying databases.

The point-to-point approach has been proven to be cumbersome to implement to maintain as tools change over time. Point to point lacks the robustness needed to provide an integrated view of metadata for multiple user types and to meet advanced search requirements.There is an inverse economy of scale. As the tools change and as the extent of metadata interfaces increase, there are too many interfaces to keep synchronized. The number of interfaces increases exponentially as the number of tools increase. The Point-to-point approach requires six interfaces for four tools, ten for five tools and fifteen for six tools.

Figure 10 - Central Repository ApproachFor comparison, the current environment contains 5 primary tools (Cognos, UDB, Power Designer, Informatica and CDMA).

Each tool may have a different underlying database technology (i.e. UDB, Sybase, Access, custom, etc.). The programming knowledge to build and sustain the interfaces is complex. Programmers may need to know 6 or more programming dialects and understand the quirks of each tool even when standard interface methods such as ODBC are used.

Each new release of a tool requires analysis and possibly reprogramming of the interfaces. The larger the number of tool repositories, the more frequent release analysis will be required. Experience has demonstrated that keeping tools frozen at a point-in-time release will eventually put tools out of the vendors retroactive maintenance. At these points in time the cost and disruption of inserting the latest tool version is approximately equal to the original cost of implementing the architecture.

Due to generally poor prior industry experience, it was determined that the Point to Point approach will not meet Health Nows requirements in a practical way.Central Repository Discussion

The Central Repository approach requires the minimum number of interfaces. Depending on the tools in place, many of the interfaces may be available from the repository vendor, the tool vendor or an independent specialty company. This approach is also the most advantageous for substituting or adding new tools.

Central repository products have been proven to be complex to implement and expensive.

The current Repository products are not completely mature and generally require substantial effort to implement. Often metadata repositories do not provide functional applications that directly solve a business problem. They provide the building blocks that can be used to build applications to meet these needs. Basic installation and integration with the client infrastructure has taken 12-24 man-months and 3 months elapsed time. It is not uncommon fof robust implementations to require 2+ years to implement with resource levels in excess of 10. Repository products such as Rochade offer bridges to tools that are XML/XMI compliant. However, not all tools are offer full XML/XMI compliance. Non-compliance requires custom interface bridges. Central repository products, while very broad in functionality, have very complex custom interface APIs. Central repository approaches that are currently deployed have required 15+ man-years of effort to customize and implement.

The total initial cost of ownership for Central Repository solutions has been estimated in the $2 Million - $10 Million order of magnitude. The ongoing product maintenance is generally 20% (+/-) of the product cost.

Most organizations that are currently involved with Central Repository efforts have compelling security (Department of Defense), real-time operational (Banking) or strategic (corporate mission to be on the bleeding edge) that supersedes cost, time and labor considerations. Although HIPAA is a security concern for actual client data, HIPAA is not a high concern for s metadata. Real-time metadata update has not been identified as a requirement. There current strategic goals point more toward using leading, proven, tools to gain competitive advantage but not to advance the state of the art in data warehousing.

As a result of expected cost and the effort to compensate for incomplete functionality, the Central Repository approach is rejected.

Hybrid Approach Discussion

While the point-to-point and central repository approaches have significant problems, a metadata solution is still required. The Hybrid approach has the following structure drawn from both point-to-point and central approaches.

A select number of tools are selected as metadata stores. This limits the number of repository-to-repository interfaces.

Point-to-point interfaces are created from the Collection points to the selected repositories. The selected repository tools already have collection functionality for most of the metadata. The number of new collection point interfaces is relatively small.

Point to point interfaces between selected tools are leveraged where they work well. Standard functionality provided by point to point tools is used where it adds value. Custom expansion of point to point tools is limited.

The repository interfaces are built or purchased using XML/XMI standards. Even if the tool is currently not XML/XMI compliant an additional pre-processing component is created that transforms the tools native content to XML.

Figure 11 - Hybrid Approach.

Hybrid Conceptual Architecture

The most cost effective, current, proven solution is the Hybrid of Point-to-Point and Central Repository. The available reports from actual company experiences validate the use of the Hybrid approach.

The basic approach in Figure 11 is expanded to include specific details. Figure 12 defines the current Complete Conceptual Architecture.

Figure 12 - Conceptual Architecture

9 Metadata Content Types

The following listings classify metadata according to the nine EDW components. For each component, the detailed types of metadata are defined. For each detailed type, the Metadata Repository (from Figure 12) that is the System of Record is labeled (SOR). A (CON) label is shown for metadata that is contained in a Metadata Repository, but not as the System of Record.

9.1 DW and ETL Design MetadataMetadata TypePower DesignerCDMAInformaticaCOGNOSABC Data

Target Table NameSORCONCONCON

Target Column NameSORCONCONCON

Source Table NameSORCONCON

Source Column NameSORCONCON

Target Integrity ConstraintsSORCONCONCON

Column Type, Format & SizeSORCONCONCON

Source Target MappingSORCON

Mapping LogicSORCON

Transformation Module NameSOR

Source Code Values SORCONCON

Target Code Values SORCONCON

Code Source Target MappingSORCON

9.2 Data Modeling MetadataMetadata TypePower DesignerCDMAInformaticaCOGNOSABC Data

Target Logical Entity NameSORCONCON

Target Logical Attribute NameSORCONCON

Target DefinitionsSORCONCON

Target Table NameSORCONCONCON

Target Column NameSORCONCONCON

Target Integrity ConstraintsSORCONCONCON

Target Logical Model DiagramsSOR

Target Physical Model DiagramsSOR

Target Subject Area NameSORCON

9.3 ETL Construction MetadataConstruction requires all ETL design Data plus the following:

Metadata TypePower DesignerCDMAInformaticaCOGNOSABC Data

Table & File Access PathSORCON

Transformation CodeSOR

Transformation Module NameSOR

Module Execution Shell NameSOR

Module Execution Shell ParamsSOR

Module Execution ErrorsSOR

Module Execution StatisticsCONSOR

9.4 RDBMS creation, loading and maintenance MetadataMetadata TypePower DesignerCDMAInformaticaCOGNOSABC Data

Target Table NameSORCONCONCON

Target Column NameSORCONCONCON

Target Integrity ConstraintsSORCONCONCON

Column Type, Format & SizeSORCONCONCON

Partition DefinitionsSOR

Database NameCONSOR

Database Access PathSOR

Database Trigger DefinitionSOR

Database Stored SQLSOR

9.5 Legacy File/RDBMS Table System access and structure MetadataMetadata TypePower DesignerCDMAInformaticaCOGNOSABC Data

Source Table NameSORCONCON

Source Column NameSORCONCON

Source Integrity ConstraintsSORCONCON

Source Code Values SORCONCON

Target Code Values SORCONCON

Code Source Target MappingSORCON

9.6 EDW Job Execution including the Audit, Balance and Control environment MetadataMetadata TypePower DesignerCDMAInformaticaCOGNOSABC Data

Source Table NameSORCONCON

Source Column NameSORCONCON

Module Execution Shell NameSOR

Module Execution Shell ParamsSOR

Module Execution ErrorsSOR

Module Execution StatisticsCONSOR

9.7 Migration Management MetadataMetadata TypePower DesignerCDMAInformaticaCOGNOSABC Data

Source Table VersionSORCONCON

Source Column VersionSORCONCON

Target Table VersionSORCONCON

Target Column VersionSORSORCON

Module Execution Shell NameSOR

Module VersionSORCON

Module Compilation StatisticsSOR

ETL Tool VersionSOR

Power Designer VersionSORCON

ETL Tool VersionSORCON

CDMA VersionSORSOR

9.8 Data Quality MetadataData quality is currently only provided in a limited context. Selection of a comprehensive Data Quality Tool is TBD.

Metadata TypePower DesignerCDMAInformaticaCOGNOSABC Data

Source Data Content IntegritySOR

Source Code Value IntegritySOR

Source Code ProfileSOR

9.9 Business Intelligence Tool MetadataMetadata TypePower DesignerCDMAInformaticaCOGNOSABC Data

Target Logical Entity NameSORCONCON

Target Logical Attribute NameSORCONCON

Target DefinitionsSORCONCON

Target Table NameSORCONCONCON

Target Column NameSORCONCONCON

Target Integrity ConstraintsSORCONCONCON

Report DefinitionsCONSOR

Filter DefinitionsSOR

Aggregation DefinitionsSOR

Reusable SQL ComponentsSOR

10 Repository Interface High Level Design

The following high level design for interfaces between the repositories, major system components and data collection points is based on:

The Systems of record,

The shared metadata contained in each repository

The primary metadata required by the EDW operational systems and

The natural flow of the life cycle processes

Figure 13 - High Level Metadata Repository Interface Architecture11 Roadmap Current Phase

The current project plan divides the EDW build out and implementation by 4 month (+/-) Phases. It is assumed in Phases II and III there will be no new major tools acquired that will be integrated into the metadata architecture.

11.1 EDW Phase I - Completed to Data

1. CDMA was implemented to support ETL design, data dictionary functionality, Code integration, Source to target data lineage and business transformation rules.

2. An automated interface between the Power Designer data modeling tool and CDMA was developed and implemented.

3. The Power Designer to COGNOS interface was implemented.

4. The CDMA Dashboard was enhanced an implemented to provide easy access to metadata through the intranet.

5. CDMA was used to generate DDL to support the creation of dimension vies in the EDW.

6. The ETL team leveraged the CDMA data to generate SQL scripts needed to support keying and other process saving a significant amount of development time.To coordinate the metadata build out with the next two EDW phases, the following build-out objectives are planned:

11.2 EDW Phase II

1. Develop initial the COGNOS data handshake. Phase I has acquired and installed the COGNOS tool set and has implemented a work around to implement it. Phase II will work towards implementing a more permanent solution for delivering data modeling metadata to COGNOS.2. Provide a user friendly addition to the CDMA Dashboard as a web-based portal to display ABC Data. Phase II will be limited to the highest priority 3 display panels.

3. Add data versioning capability to CDMA by automating the maintenance of the initial release numbers and last release updated numbers in the Data Item form. Data Versioning allows the user to see how an object in CDMA has changed over time.

4. Develop model versioning processes to manage models in Power Designer using repository management.

5. Provide the initial capability to manage Reference Data in the Architecture. Reference Data is defined as data that are codes provided from external sources (Example ICD-9). Reference Codes frequently contain embedded hierarchies. Reference Data is just data, so the basic Architecture should apply. The objective it to determine how the repositories can be used to manage reference data effectively and to provide for ETL input rather than manual code updates.6. Scan the current tool vendors for Data Profiling Tools and develop Functional Requirements for data profiling and how the information it creates fits into our metadata architecture.7. Create the functional requirements for adding Data Mart metadata to the overall architecture and integrate with the current EDW metadata. This will primarily be a Power Designer, CDMA and CDMA Dashboard centric integration point.Link COGNOS to the Metadata Dashboard to display:

Definitions

Lineage

Code transformation and

Business Transformation Rules

11.3 EDW Phase III

8. Enhance the Metadata Dashboard to display reference data information.

9. Enhance the Phase II addition to the Metadata Dashboard for ABC Data by developing three to five additional pages.10. Develop initial COGNOS use statistics into the Meat-Data Dashboard/ABC environment.

11. Analyze and Recommend a Data Profiling Tool, either off the shelf or custom developed.12. Develop a metadata integration plan for:

Information Catalog.

On Demand Content Management.

Informatica Information Integrator

Cognos Content Manager.

12 Roadmap Future Phases

Beyond Phases II and II the following objectives are currently identified from the Conceptual Architecture. However, these build out objectives are not prioritized or committed to any particular future phase.

1. Fully integration metadata for:

Information Catalog.

On Demand Content Management.

Informatica Information Integrator

Cognos Content Manager.

2. Automate capturing history in CDMA to support the Release Change Summary for the Mata-Data Dashboard.

3. Expand and automate where used configuration data to cover the complete architecture.

4. Implement sample values display in the Metadata Dashboard.

5. Provide a robust, user driven, definition maintenance capability and process for entity and attribute definitions.

6. Provide CDMA to COGNOS report definitions real time cross reference. The current vision would be to tap into CDMA definitions when a user mouses over a COGNOS report field.

7. Capitalize on Data Quality statistics, as created by a data profiling tool, to automatically repair data integrity problems.

8. Provide real-time integration interfaces to allow users to see Informatica code in addition to the current capability to see the business rules that are used to create the code.

EMBED Excel.Sheet.8

EMBED Excel.Sheet.8

This document version is coordinated with Version 2.0 of the Data Architecture and Design and OHID ETL Architecture and Standards v.00.

XML and XMI are the current acknowledged standards for interchanging metadata. XMI, in particular is relatively new. While many tools report XMI compliance, most provide only limited XMI functionality. For example: very limited scope of metadata that can be imported, One time import only capability with no update, object identity and matching based on object names rather than an object unique identifier.

Copyright ( 2007 Hewlett-Packard. All rights reserved. Proprietary and Confidential- 15 -

_1183195344.xls

_1182771600.xls