![Page 1: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/1.jpg)
Organising social science data – computer science perspectives
Simon Jones
Computing Science and MathematicsUniversity of Stirling, Stirling, Scotland, UK
Seminar: Data management in the social sciences and the contribution of the DAMES Node
Stirling 31 January 2012
DAMES: Data Management through e-Social Sciencehttp://www.dames.org.uk
![Page 2: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/2.jpg)
2
DAMES: Background DAMES: Case studies, provision and support for data
management in the social sciences This talk: focusing on "support for data management"
Infrastructure/tools Driven by social science needs for support for advanced
data management operations “In practice, social researchers often spend more time
on data management than any other part of the research process” (Lambert)
A ‘methodology’ of data management is relevant to ‘harmonisation’, ‘comparability’, ‘reproducibility’ in quantitative social science
![Page 3: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/3.jpg)
3
DAMES: Themes Enabling the (social science) researcher:
To deposit, search and process heterogeneous data resources
To access online services/‘tools’ that enable researchers to carry out repeatable and challenging data management techniques such as: • fusion • matching • imputation …
Facilitating access is an important goal Underlying computer science research themes
MetadataData curationData management/processingPortals
![Page 4: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/4.jpg)
4
Data management/processing scenarios
Curation scenarios include:Uploading occupational data to distribute across
academic communityRecording data properties prior to undertaking data
fusion involving a survey and an aggregate dataset Fusion scenarios include:
Linking a micro-social survey with aggregate occupational information (deterministic link)
Enhancing a survey dataset with ‘nearest match’ explanatory variables (probabilistic link)
Other processes: recoding, operationalising, linking, cleaning…
![Page 5: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/5.jpg)
5
Generic data flows
Data setstore
Processing
Data sets are deposited
Data sets are selected
Processing is configured
Data set selection, and the configuration of processing jobs must be informed by knowledge about the data sets - metadata
Result is saved
![Page 6: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/6.jpg)
6
Key role for metadata Metadata records are absolutely core to the functioning
of the portal infrastructureFor adequate, searchable records for the
heterogeneous resources (data tables, command files, notes and documentation)
To connect the resources and the data mgmt toolsTo document the data sets resulting from application
of the data mgmt tools: inputs, process, rationale,… DAMES requirements:
(Micro-)data based, very general DDI (= Data Documentation Initiative)
![Page 7: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/7.jpg)
7
DDI 2 – An XML language<ddi2:codeBook xmlns:ddi2="http://www.icpsr.umich.edu/DDI"> <ddi2:docDscr> <ddi2:citation> <ddi2:titlStmt> <ddi2:titl>An interesting study</ddi2:titl> <ddi2:IDNo agency="DAMES-M">12</ddi2:IDNo> </ddi2:titlStmt> <ddi2:prodStmt> <ddi2:producer>DAMES Portal</ddi2:producer> <ddi2:copyright>Univ of Stirling </ddi2:copyright> <ddi2:prodDate>July 29, 2010</ddi2:prodDate> <ddi2:grantNo source="Financial_1" agency="Economic and Social Research Council"> RES-149-25-1066 </ddi2:grantNo> </ddi2:prodStmt> </ddi2:citation> </ddi2:docDscr> ...
![Page 8: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/8.jpg)
8
The metadata "cycle"
Processing
Metadata
SearchData is mirrored by metadata
Configure/ process Select
Deposit/curate
![Page 9: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/9.jpg)
9
DAMES portal architecture overview
Portal
DAMES Resources
External Dataset
Repositories
User
Services
Search
Enact Fusion
File Access
Compute Resources
Metadata
Local Datasets
(Note: Security omitted)
![Page 10: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/10.jpg)
10
Tools Since metadata must have a key role in data
management… So tools for managing and exploiting the metadata have
key role in the use and operation of the DAMES portalAt deposit/curationFor searchingFor informing the configuration of processing steps
The following slides illustrate use of our tools
![Page 11: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/11.jpg)
11
Curation ToolThe source data:
![Page 12: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/12.jpg)
12
![Page 13: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/13.jpg)
13
![Page 14: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/14.jpg)
14
![Page 15: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/15.jpg)
15
![Page 16: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/16.jpg)
16
![Page 17: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/17.jpg)
17
![Page 18: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/18.jpg)
18
![Page 19: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/19.jpg)
19
![Page 20: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/20.jpg)
20
![Page 21: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/21.jpg)
21
![Page 22: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/22.jpg)
22
![Page 23: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/23.jpg)
23
![Page 24: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/24.jpg)
24
Also automatically uploaded to searchable eXist database
![Page 25: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/25.jpg)
25
Metadata searching
![Page 26: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/26.jpg)
26
Browsing the search results
![Page 27: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/27.jpg)
27
Fusion Tool prototype Scenario: A soc sci researcher wishes to fuse Scottish
Household Survey data with privately collected study data:Uses the data curation tool to upload the dataUses the data fusion/imputation tool to select the data,
identify corresponding variables, and to generate a derived dataset (held in the portal)
The metadata about this derived dataset is stored and (may be) made public through the portal
Another researcher can now search the portal (metadata) for SHS data and find the derived dataset
DAMES metadata handling must facilitate this process
![Page 28: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/28.jpg)
28
The Fusion Tool prototypeSelect datasets
(recipient and donor)
Select "common variables"
Select variables to be imputed
Select data fusion method
Submit to fusion "enactor"
Metadata accessed
![Page 29: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/29.jpg)
29
Select datasets (recipient and donor)
Select "common variables"
Select variables to be imputed
Select data fusion method
Submit to fusion "enactor"
Metadata accessed
![Page 30: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/30.jpg)
30
Select datasets (recipient and donor)
Select "common variables"
Select variables to be imputed
Select data fusion method
Submit to fusion "enactor"
Ski
pped
Metadata for result dataset
![Page 31: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/31.jpg)
31
Job submission: Information flow
Wizard
EnactorCompute resources (Condor)
subjob1
subjob2
User's localfile store
Resultantdata
DDIrecord
notify(job id)
fetch job
submit
JFDL/JSDL
description.xml
Furtherinfra-
structure
![Page 32: Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,](https://reader035.vdocuments.net/reader035/viewer/2022062222/5697c0261a28abf838cd5ef2/html5/thumbnails/32.jpg)
35
Thank you!