why a cmr? what to include in a cmr? architecting …...•authority •standard •owner...
TRANSCRIPT
1
Architectinga
Corporate Metadata Repositoryat the
U.S. Bureau of Census
ArchitectingArchitectingaa
Corporate Metadata RepositoryCorporate Metadata Repositoryat theat the
U.S. Bureau of CensusU.S. Bureau of Census
Gail WrightCMR Program ManagerTechnical DirectorOracle [email protected]
Agenda
nWhy a CMR?
nWhat to include in a CMR?
n Architecting a CMR
n Leveraging a CMR
2
3
Why aWhy aCorporate MetadataCorporate MetadataRepository (CMR)?Repository (CMR)?
4
Metadata Technology ContinuumMetadata Technology Continuum
low integration
low share/reuse
few open standards
low interoperability
high integration
high share/reuse
many open standards
high interoperability
Buried,Inaccessible
Metadata
DefinedApplication
Models
AutonomousRepositories
IntegratedVertical/
Inter-DeptMetadata
Tool-basedData
Dictionaries
IntegratedGlobal
EnterpriseMetadata
IntegratedCorporateEnterpriseMetadata
EMR CMR EMR CMR FedStatsFedStats
3
BOC Current Business ProcessDoes not include an Integrated MetadataBusiness Process
BOC Current Business ProcessDoes not include an Integrated MetadataBusiness Process
internallydevelopedsystems
customizedcommercial
systems
CASES
variety ofprogramming
languages
GIDS
individualtool of choice
internallydevelopedsystems
customizedcommercial
systems
CASES
variety ofprogramming
languages
GIDS
individualtool of choice
CATICAPIMailPAPIOCSICM
CADECSAQOCRTDE
PFIRS
CATICAPIMailPAPIOCSICM
CADECSAQOCRTDE
PFIRS
internallydevelopedsystems
SAS
DEVSURV
COBOLFORTRANDECForms
StEPSECON DW
individualtool of choice
internallydevelopedsystems
SAS
DEVSURV
COBOLFORTRANDECForms
StEPSECON DW
individualtool of choice
DADS/AFF
CENSAS
FERRET
Econ DW
CD-ROM
Internet
ISS (future)
DADS/AFF
CENSAS
FERRET
Econ DW
CD-ROM
Internet
ISS (future)
Census 2000 AmericanCommunitySurvey
DemographicSurveys
Econ Census
Econ SurveysDesign Collect Process Share
What are the problems with the currentBusiness Process?What are the problems with the currentBusiness Process?
n Difficult to:n meet customer demands for quick turnaround of
surveys, and customized productsn re-use and share metadata within the BOCn maintain consistent standardsn compile and format metadata needed by dissemination
systemsn share metadata with external agencies, participate in
Virtual Statistical Agencies, etc.n meet new metadata requirements like FGDC’s CSDGM
content standardn perform time series or cross dataset comparisons
n Metadata integrity and quality can be compromised
4
Censusand
SurveyDesign
Censusand
SurveyDesign
DataCollection
DataCollection
DataProcessing
DataProcessing
DataDissemin-
ation
DataDissemin-
ation
Corporate M E T A D A T A RepositoryCorporate M E T A D A T A Repository
1998AnnualSurvey
1998AnnualSurvey
1998AnnualSurvey
1998AnnualSurvey
copy
1999AnnualSurvey
copycopycopy
1999AnnualSurvey
1999AnnualSurvey
1999AnnualSurvey
BOC Goal: An Integrated Metadata ProcessBOC Goal: An Integrated Metadata Process
8
What to includeWhat to includein ain a
CorporateCorporateMetadataMetadata
Repository?Repository?
5
9
n “Data about data”n Information about “raw” data that gives it meaning,
context or enhances understandingn Data about the Content, Quality, Condition, and
other characteristics about data
n Every informational asset that’s not datan Requirements, Data Models, Business Models,
Screen Layoutsn Data Mappings and transformationsn Hierarchies, Aggregation rules, Formulasn Rules for comparison of data sets and historical
meaningn Security access controls, operational schedules,
code, ...
What is Metadata?
What is a Repository?
DataDictionary
DataDirectory
DataRegistry
DataEncyclopedia
DataRepository
•Name•Definition•Format
•Name•Definition•Format
•Name•Definition•Format
•Name•Definition•Format
•Name•Definition•Format
•Source•Destination•Legacy
•Source•Destination•Legacy
•Source•Destination•Legacy
•Source•Destination•Legacy
•Owner•Authority•Standard
•Owner•Authority•Standard
•Owner•Authority•Standard
•Application•System•Model
•Application•System•Model
Everythingelse
6
Factors for determining CMR content
n Strategic to BOC Enterprisen Opportunity for sharing and reuse of:
n Metadatan Meta-Model
n Generic vs. Application specific
CMR Meta-Models
Data Element Registry (ISO/IEC 11179 Standard)
Data Elements, Value Domains, Valid Values, Data Element Concepts,… Data Set Registry
(Support FGDC CSDGM Geospatial Metadata Standard)A Data Set is a collection of Data Elements.
Product Registry(Supports FGDC CSDGM Geospatial Metadata Standard & Dublin Core)A Data Product may be a file/document, website/URL, or physical object.
Data Store(OMG CWM Standard)
Metadata for the physical data store.(Supports Relational, Multi-
Dimensional, and Flat File stores)Business Rule Registry
Workflow Framework
Security Framework
Configuration MgmtFramework
Classification Schemes(ISO/IEC 11179 Standard)
Taxonomies, Keywords
Survey RegistrySurveys, Survey Instances, Universes, Frames, Sample, Questionnaires,
Questions,…
7
Basic CMR Meta-Model Relationships
Survey
SurveyInstance
Questionnaire
Question
Product
DataSet
DataElement
DataStore
Definitions
n Administered Componentn An object requiring naming, identification,
configuration, security, and optionally,registration
n Has one or more designations (names)n Has one or more definitions
n Classified Componentn An object that may be classified as a part of a
classification scheme
8
CMR Meta-Model High Level
Basic CMR Meta-Model Relationships
Survey
Survey Instance
Questionnaire
Question
Product
Data Set
Data Element
DataStore
Administered Component
Classified Component
Generating a Census Bureau Taxonomy+ Census Bureau Information
+ Demographic
+ Census
- 1990 Census
+2000 Census
- Questionnaires
- Products
+Datasets
- Public Use Microdata Sample
- 100% Edited Detail File
+Sample Edited Detail File
- Data Elements
- Related Information
- Survey
- Economic
- Geographic
+Data Elements
+Basic Demographic
- Relationship
+ Sex
- Alternative Designations
- Alternative Definitions
- Data Element Concept
- Conceptual Domain
- Value Domain
- Related Data Elements
- Related Information
- Age
- Race
- Marital Status
- Occupation/Employment
- Housing
9
17
ArchitectingArchitectingaa
CMRCMR
CMR Component Based ArchitectureCMR Component Based Architecture
Metadata Repository Physical Storage Layer
COTSIntegratedProducts
Object Layer
AdminTools
BrowsingTools
BrowsingTools
MetadataInterchangeLoad/Unload
Browser User Interface ExternalSystems
SecurityFramework
u Flexible,functional,open,standards-based,component-basedarchitecture
u ReuseComponents
u SwapComponents
u Minimizechangeimpacts
u Flexible,functional,open,standards-based,component-basedarchitecture
u ReuseComponents
u SwapComponents
u Minimizechangeimpacts
10
Proposed Technical/Software Architecture
Four Ways an Application Can Use CMR Metadata
Proposed Technical/Software Architecture
Four Ways an Application Can Use CMR MetadataTightly Coupledwith CMR
Loosely Coupledwith CMR
1. Application written against CMR - uses it directlyfor metadata access and maintenance.
2. Application uses same CMR core physical model- can replicate metadata from CMR.
3. Application communicates with CMR through anAPI to exchange metadata.
4. Application communicates with CMR using astandard XML-based metadata interchange.
CMR Tools
Corporate Metadata RepositoryCMR Core Meta-Models
Web-enabledAdministration
Tools
OpenJavaAPI
Web-enabledBrowsing
Tools
OpenXML
InterchangeIntegrated
PortalWebSite
Builder
11
CMR Extensibility
Corporate MetadataRepository
CMR Core Meta-Models
Web-enabledAdministration
Tools
OpenJavaAPI
Web-enabledBrowsing
Tools
OpenXML
InterchangeIntegrated
PortalWebSite
BuilderCMR
ExtendedMeta-Model
CMRExtendedTools, API,Interchange
S/W Requirements
n Scalablen Provides for open API and Interchangen Implements Standards
n ISO/IEC 11179n FGDC CSDGMn Dublin Core
n COTS preferred, if meets requirementsn High productivity development toolsn Self-documenting, easy to maintain app
12
CMR S/W for Deployment & Development
Software Used for
Oracle8i EE V8.1.6
WebDB V2.2 (upgrading to Oracle 9i Portal)
OAS V4.0.8.1 (upgrading to iAS)
interMedia
CMR Physical Repository
Structured and Full-text Metadata
CMR Web Server
CMR Web Portal
Oracle XDK & MS Notepad
Rational Rose 2000
JDeveloper V3.1
Designer6iCMR Server Modeling. CMR Web ApplicationGeneration plus some PL/SQL coding.
CMR Java API and XML applicationdevelopment (BC4Js & JSPs)
CMR XML generation, parsing, processing, &upload/download from database tables
CMR UML Modeling
LogicalModels
MiddleTier
Deployment
ServerTier
Deployment
PhysicalModels
ClientTier
Deployment
FunctionalRequirements
UseCases
UMLObjectModel
ServerModel
WebModules
CMRRepository
TAPI(PL/SQL)
PL/SQLgeneratingHTML & JSApplication
Code HTTP
Net8
Net8
View LayerCreated/Generated usingOracle DesignerHand codedCreated/Generated usingRational Rose
OASEnvironment
w/PL/SQL
Cartridge&
HTTPListeners
WebBrowserHTML
Application
Designer Generated CMR Tools
13
LogicalModels
MiddleTier
Deployment
ServerTier
Deployment
PhysicalModels
ClientTier
Deployment
FunctionalRequirements
UseCases
UMLObjectModel
CMRViewLayer JDBC
BOC Java Applet orApplication
HTTP
BOC JavaServerPages
DER XMLApplication
HTTP
OAS
RationalRose
DesignerGenerated
CMRRepository
CMR OpenAPI
JavaObjectLayer(BC4J)
JDeveloperGenerated
ServerModel
Rational RoseGenerated. Oracle DesignerMaintained.
JDeveloper Generated Java API
26
LeveragingLeveragingaa
CMRCMR
14
27
1 5 0 45 22 7 1 5 03 2 1 90 5
4 2 0 0 0 ...5 7 1 23 16 3 0 37 47 4 0 14 08 2 0 75 2
Survey/Census: 1990 Decennial CensusSource: Bureau of the CensusDataset: 1990 Public Use Microdata Sample (PUMS)Description: The PUMS dataset has basic demographic information about
persons and housing in the U.S. This information comes from the 1990 Decennial Census long form which is randomly sent to 1 in every 7 households. This dataset is for public use and does not compromise the confidentiality of individuals.
Data Elements: ID - Record Identifier - A unique id for a record. Each record identifies 1 or more persons having the same demographic characteristics. (See WGT) WGT - Person Weight - A weight given to a record to represent the 1 or more persons with the same demographic characteristics. Valid values: 1..9 SEX - Person Gender - Valid values (0: male, 1: female) AGE - Person Age in Years - Valid values (0-90) Persons over 90 years of age are top-coded with an age of 90 for confidentiality reasons. MARITAL - Person Marital Status - Valid values (0: not applicable, 1: single, 2: married, 3: separated, 4: divorced, 5: widowed). Universe: Persons over 15 years of age. Those 15 and under are given a value of 0.
For more information: Related Datasets and Publications, Sampling Errors andTechniques, etc.
Data
Metadata
ID WGT SEX AGE MARITAL
Metadata for Dissemination
CMR Support for American FactFinder
CMR AFF
ASCIIAFFFile
AFF MetadataProviders
ASCIIAFFFile
XMLCMRFile
Data ElementsData SetsData Products
15
AFF Metadata-Driven ArchitectureAFF Metadata-Driven Architecture
Pr o d u c e s
AFFApplication Code
CMR/AFFBusiness & Technical
Metadata
RunTimeCal ls
AFF Metadata-Driven, Dynamic Application
u Add metadata and data for new
dataset -> AFF can automatically
search and query the new dataset
u Geography Trees, Datasets, Subjects,
Report topics, etc. are all generated
at runtime, by accessing the metadata
u Business metadata is linked to
technical metadata such that user
selections are used to generate SQL
statements to query the data
16
17
CMR Support for Econ 2002 Census
CMR
EconMetadataProviders
ASCIIAFFFile
XMLCMRFile
EconACSDFile
GIDS
AFF
ASCIIAFFFile
FGDCFile
EMR
XMLSurvey
File
450 Econ Questionnaires
Activating the CMR
Data Element Registry (ISO/IEC 11179 Standard)
Data Elements, Value Domains, Valid Values, Data Element Concepts,… Data Set Registry
(Support FGDC CSDGM Geospatial Metadata Standard)A Data Set is a collection of Data Elements.
Product Registry(Supports FGDC CSDGM Geospatial Metadata Standard & Dublin Core)A Data Product may be a file/document, website/URL, or physical object.
Data Store(OMG CWM Standard)
Metadata for the physical data store.(Supports Relational, Multi-
Dimensional, and Flat File stores)Business Rule Registry
Workflow Framework
Security Framework
Configuration MgmtFramework
Classification Schemes(ISO/IEC 11179 Standard)
Taxonomies, Keywords
Survey RegistrySurveys, Survey Instances, Universes, Frames, Sample, Questionnaires,
Questions,…
Data QualityInspection
SurveyInstrumentGeneration
ProductGeneration
Data SetQuery
Generation
TaxonomyTree
Generation
18
Metadata: A core enabling component of any Information technology
Data Warehousing& Decision SupportLegacy Migration Data Query
and Search
Data Integration Application/ToolIntegration
EnterpriseInformation Portal Digital Libraries
e-Business ERP
Knowledge Mgmt &Business Intelligence
36
LeveragingLeveragingthethe
CMRCMRData Element RegistryData Element Registry
19
Dat
a El
emen
t Reg
istr
y
Global Standardized Data Elements
Agency Standardized Data Elements
Non-Standardized Data Elements
Integration Layer
BOCDemographic,
Economic,Geographic
Data
BLSEconomic
Data
USGSGeographic
Data
HUDHousing
Data
Government Vision
EPAEnvironmental
Data
CDCHealthData
FAAAir Safety
Data
NASAAircraft
Data
NCIHealthData
HCFAHealthData
20
External FFData Sources
Data Marts
Exports
Legacy Migration DW and Analytics
OLTP DBData Warehouse
Multi-DimensionalCubes
StagingDB
Extract TransformQualityCheck Load
LegacyData
Web DeploymentInformation PortalsE-Commerce Apps
SourceFlat Files
DER Integration Technology
DER and MetadataRepository
QuestionsQuestions