meta dater metadata management and production system for surveys in empirical socio-economic...

25
Meta Date r Metadata Management and Production System for surveys in Empirical Socio- economic Research A Project funded by EU under the 5 th Framework Programme Key Action: Improving the Socio-economic Knowledge Base - Data infrastructure HPSE-CT-2002-00122 Project period 01.01.2003 – 31.12.2005 MetaDater: Towards Standards and Tools to document comparative surveys Internet: www.metadater.org Ekkehard Mochmann (Project Co-ordinator) Uwe Jensen (Project Manager) GESIS / ZA University of Cologne Edinburgh, May 27, 2005

Post on 18-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

MetaDater Metadata Management and Production System for surveys in Empirical Socio-economic Research

A Project funded by EU under the 5th Framework Programme Key Action: Improving the Socio-economic Knowledge Base - Data infrastructure – HPSE-CT-2002-00122

Project period 01.01.2003 – 31.12.2005

MetaDater: Towards Standards and Tools to document comparative surveys

Internet: www.metadater.org

Ekkehard Mochmann (Project Co-ordinator)

Uwe Jensen (Project Manager)

GESIS / ZA University of Cologne

Edinburgh, May 27, 2005

Page 2: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

Objectives & Products

• Standards for Metadata• Data model for metadata

covering the life-cycle of complex comparative surveys on space & time axis; no panel

• Efficient User system • for Data Collector and Provider to

produce, manage & exchange metadata according to new standards

Page 3: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

Project Status 2003 – 05.2005

1. User analysisData archive / Researcher & field work institute

2. Data modelling– Design of a unique data model based on analysis of

events & processes of the life-cycle of a comparative studies – Conceptual Data models > Relational Data model (2nd edition)

– Provision & Evaluation of the Data model by DDI / CSDI

3. System Architecture & Functional Core– Design of 3 tier system (GUI, Application Server, RDBMS) – Programming in Java

4. Application development & programming– Modular system for Data provider & collector

5. Usability– Preparation of a test database & tests

6. Dissemination activities

Page 4: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

1. User Analysis - Results

Archive – Analysis on work sequences & systems Common shared aspects

• Acquisition > Entry control & Study documentation > Archiving > Dissemination

Differences in detail (Archives; Studies)• Complexity: Cross-section studies > comparative Studies• Technical & functional differences in used systems

Researcher & Research institutesCapture of Metadata at first occurrence• General Interest vs. missing concepts or tools

Specific Interests given• Capture of Metadata on Study design & Field work• Support on Questionnaire development / Question DB• Easy exchange of metadata between researcher & archives

Page 5: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

1. Conceptual Data Model Major Life-cycle phases of Studies

1. Study design and documentation

2. Questionnaire development & testing

3. Sample selection & field work

4. Data & document. control – Data Processing of Cross-section

5. Study extension, harmonization & standardization

6. Indexing & Classification

7. Data and Metadata publication & distribution

8. Analysis, correction and update of Metadata Documentation

Recent discussions & evaluation in the DDI & CSDI network• Study Life-cycle is adopted in DDI data modelling; DDI version 3 shall be compatible • After evaluation provision to interested users

2. Relational Data Model• Relational schema of Metadata & metadata management by RDBMS• Compatible with DDI v2.0 > extension for comparative Studies

2. Data modelling of metadata

Page 6: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

3. System Architecture

3-tier system• User interface for the

modules

• Application server

• Relational database management system

Page 7: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

Overview:M1 Study description

M2 Q/V editor – define standard as a reference

M3 Process single data set (cross-section; comparative study)

M4 Process extension: harmonize &

standardise comparative data

M5 Import & Export of XML files or SPSS

metadata

4. User Modules & functional components

Page 8: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

Study description with metadata sections on Project > Study > Data set > Design - Questionnaire – Data - Context – Classification

Module M1 Study Description (cross-section dataset)

S tu d y d es c r ip t io n - s tu d y s ec tio n

S tu d y d es c r ip t io n - d a tas e t s ec t io n

C ita t io n - t it le s ta tem en t

T it le

S u b t it le

Alte r n a tiv e t it leP ar a lle l T it le

I D N u m b er

C ita t io n - v er s io n s ta tem en t

Ver s io n

R es p o n s ib ility

D ate o f p r o d u c tio n

P r o d u c tio n s ta tem en t

P r o d u c er

C o p y r ig h t

P lac e o f p r o d u c tio n

S tu d y in f o - s u m m ar y

Un iv er s e

Kin d o f d a ta

F ile E d it W in d o w He lp

Study de s cr iption

P r o jec t s ta tem en t

Au th o r in g en tr y /P r im ar y in v es t ig a to r

D ate o f d ep o s it

S ta tu s o f th e s tu d y

T im e m eth o d

L an g u ag e o f s tu d y

P r o jec t t it le

P r o jec t id

Title

Paralle l title

Subtitle

Alte rnative title

N ew S tu d y An o th er o p en s tu d y ( id )

ID num be r

This r ight pane li l lus trate s the

vie w o f the"c i tat io n - t i t le

s tate m e ntfo lde r .

S tu d y d es c r ip t io n - r e la ted f ile s

( p r ep ar a t io n f o r p 3 .5 )

Page 9: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

Basic task – define the standard for a study or a study collection

– Capture or edit questions from the (basic) questionnaire – Define the depending variables

• add variables to the data file structure (standard setup)– Organise, route & filter, index & classify questions - variables

Principle options to work on questions with the Q/V editor:

– Import (basic) questionnaire or questions to DB• Xml compatible & provided by a project; from XML codebook

– Copy from database• complete basic questionnaire (e.g. replicate ISSP)• single questions from previous wave (e.g. EB trend questions)

– Capture Questions from scratch with the Q/V editor

Q/V screens to capture of different question types : – Question type 1: Single response on simple Question– Question type 2: Single response on Question with Items– Question type 3: Multiple response by Answer Instances (1st. item; 2nd. Item ) – Question type 4: Multiple response by Answer categories (Mentioned / not mentioned ) – Question type 5: Multiple response on single Items of a question (Grid)

Module M2 Q/V editor

Page 10: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

Module M2 Q/V editor / example

Question type 4: Multiple response on Question with Items

Step1: > select Question type > enter1.st / last Item No. …

> CREATE activates Question / Variable pane

> offers respective no. of table lines for- Items 1 to 17- depending variables

Step2: enter Question text … & Variable definitions

Step 3 : next screen

Page 11: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

Module M2 Q/V editor / example

Question type 4: Multiple response on Question with Items

Step 3 at Tab Value/ Missing > select valid & missing values

Page 12: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

Application design & programming

M3 – Process single data set (cross-section; comparative study) with reference to the Q/V standard

Document deviations in question wording or demographic items Process metadata of the single datasets load SPSS data and adjust them to the

standard Technical standardisation of space or time bound variables Adjust the variables to the standard (recode, rename, filtering etc)

M4 - Process Extension: harmonize & standardise comparative data– integrated dataset (space compound datasets) > e.g. ISSP– cumulated dataset (time compound datasets) > e.g. ALLBUS– cumulated-integrated dataset > e.g. EB Trend file– Generalized dataset harmonization

Index & classify on Study and variable level

M5 - Import & Export of XML or SPSS based metadata– Metadata exchange & publications (XML)

> SD, CB, Quest. – SPSS metadata: portable / syntax / setup– Access via WWW front-ends (MADIERA; local systems;)

Preparation on Usability testing & user workshops

Other modules to come 2005

Page 13: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

The end of the presentation

Page 14: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

1. Conceptual Data Model • Life-cycle of Studies from the Concept > Dissemination• Relational schema of Metadata • Compatible with DDI v2.0 > extension for comparative

Studies• After evaluation provision to interested users

2. Relational Data Model• Model implemented at the RDBMS; DD & ERD

• Recent evaluation in the DDI network• Study Life-cycle is adopted in DDI data modelling• DDI version 3 shall be compatible • Detailed analysis is under work of DDI expert groups

2. Data modelling of metadata

Page 15: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

2.1 Conceptual Model II. Behavioural model

Main process groups along the study life-cycle

1. Study design and documentationConcept, Questionnaire, Field work, Data / Study description

2. Contact and Data acquisition

3. Data & document. control – Process Cross-section

4. Study extension, harmonization & standardization

5. Indexing & Classification

6. Data and Metadata publication & distribution

7. Correction and Update of Metadata Documentation

Page 16: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

Conceptual model - Concepts I

Model for simple & replicated survey

Simple cross-section:Project > Study with > one design > one set of data > a set of Variables > Values

Repeated over space and time: Basic object: dataset requires proper definition to be managed by the system

single dataset = one sample at one time

compound dataset = collection of datasets > many sets of data

collected in different time instances (time compound)

collected in different spaces (space compound)

processed dataset = integration of single data sets to a new one

cumulative (covering different time instances)

integrated (covering different space instances)

cumulated-integrated (covering time and space instances)

Page 17: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

Conceptual model Concepts II

Types of Metadata (app 7)

• Definitional

– Data definition, methodology, measurement

• Operational

– Researchers, Use of data

• Contextual– Social background; party information … Documents

• Classification– Study, variable, question level by variety of vocabularies

• Administrative– Embargo, access categories, responsibility to change data …

Page 18: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

Conceptual model Concepts III

Integration & Cumulation processes (app 8)

• Cumulation (z. B. ISSP)

• Time series (z. B. Politbarometer; ALLBUS)

• Trends (Eurobarometer)

• Harmonisation of different Data sets• involved datasets already in the systems

• Harmonisation over space or time

• Procedures: ex post overall – or adding new dataset

• Integration of collections of studies into the system• Import already processed metadata to the system

Page 19: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

Conceptual model Concepts IV

Questionnaire documentation & Question DB (app9)

The Questionnaire

Questionnaire ‘editor’

Import & export questionnaires

Indexing of questions & questionnaire

The questions – to be classified

Question types

Questions forms

Relation between dataset and questionnaire & questions

Page 20: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

• Documentation object (DOID = Document object id (~ ISBN) to identity uniquely every object in system

Assign administrational attributes to DO (not to each basic entity)

• Attributes like: who, when, release, permission levelor subtypes of document objects

• Study

• Data set

• Publication

• Classification

• Object of observation

• Attribute

Conceptual model Concepts V

Page 21: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

2.2 Relational Model E-R-D Schema & DD

Page 22: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

3-tier system• User interface for the modules

• Application server

• Relational database management system

3. System Architecture

Page 23: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

Question type 1: Single response on simple Question

4.1 Modules – M3 Q/V editor / example 1

Step1: select Question type and define … Open empty fields > step 2

Step2: enter Question and Variable information

Step 3: search and select Answer scale / missing values

Page 24: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

Question type 2: Single response on Question with items

4.1 Modules – M3 Q/V editor / example 2

Step2: enter Question and Variable information

Step 3: • Select Tab for the Answer text / missing value• search and select Answer scale / missing values

Result > next screen

Step1: select Question type and define … Open empty fields > step 2

Page 25: Meta Dater Metadata Management and Production System for surveys in Empirical Socio-economic Research A Project funded by EU under the 5 th Framework Programme

Question type 2: Single response on Question with items

4.1 Modules – M3 Q/V editor / example 2

Step 3 Tab with select Answer scale / missing values