a common standard for data and metadata: the esds qualidata xml schema libby bishop esds qualidata...

A Common Standard for Data and Metadata:

The ESDS Qualidata XML Schema

Libby BishopESDS Qualidata – UK Data Archive

E-Research Workshop Melbourne

27 April 2006

• need a standard– that includes both file-level metadata and

content-level metadata enables more precise searching/browsing extends to linking between sources (e.g. text,

annotations, analysis, audio etc)

• need one customised to social science research that:– meets generic needs of varied data types

– is more ‘analytical’ than ones adapted from TEI speech schema (e.g. oral history projects)

– is less granular than ones for conversational analysis (highly detailed)

Why another schema?

What does a schema enable?

• marking up data to an XML standard for data providers to publish to online systems, such as ESDS Qualidata Online

• meet needs of researchers requesting a standard they can follow

• encourage more qualitative data analysis software companies to pursue XML- outputs (and import/export tools) based on this standard

Hybrid of two standards

for the metadata – the DDI Standard for study, file and variable level

•Level 1: DDI Document description•Level 2: DDI Study description•Level 3: DDI Data file description

– file contents; format; data checks; processing; software)

•Level 4: DDI Variable description: – for study survey data (mixed methods) or numeric

outputs from qualitative data: demographic profile of sample other quantified responses to qualitative data

(attributes or thematic classifications often assigned (coded) in CAQDAS software)

•Level 5: DDI Other study related materials•Level 6: TEI-based qualitative content

TEI for content mark-up• standard for text mark-up in humanities and social

sciences

• Elements for the header for a TEI-conformant DTD:<teiheader = type = text/corpus>

<fileDesc> <encodingDesc> <profileDesc> <revisionDesc> standard bibliographic ref to text

• Mandatory = <teiHeader type=text>

</teiHeader>

What features do we need to mark-up and why?

• Spoken interview texts provide the clearest―and most common―example of the kinds of encoding features needed

• Three basic groups of features

– structural features representing basic format: utterance, specific turn taker, other speech tags e.g. defining idiosyncrasies

– structural features representing links to other data types created in the course of the research process (e.g. audio or video referencing points, researcher annotations)

– structural features representing identifying information such as real names, company names, place names, temporal information

“Reduced” set of TEI elements

• Start with core tag set for transcription, then add:• Editorial changes <unclear>• Names, numbers, dates <name>• Links and cross references <ref>• Notes and annotations <note>• Text structure <div>• Unique to spoken texts <kinesic>• Linking, segmentation and alignment <anchor>• Advanced pointing, will use XPointer framework• Synchronisation• Contextual information (participants, setting, text)

Metadata for model transcript output

Study Name <titlStmt><titl>Mothers and Daughters</titl></titlStmt>Depositor <distStmt><depositr>Mildred Blaxter</depositr></distStmt>

Interview number <intNum>4943int01</intNum>Date of interview <intDate>3 May 1979</intDate>Interview ID <persName>g24</persName>Date of birth <birth>1930</birth>Gender <gender>Female</gender>Occupation <occupation>pharmacy assistant</occupation>Geo region <geoRegion>Scotland</geoRegion>Marital status <marStat>Married</marStat>

Transcript with XML mark-up

XML is source for .rtf download

Metadata used to display search results

XML+XSL enables online publishing

Some questions to resolve:

• What hierarchical elements should we use for collections of interview transcripts? Corpus, group/text, text/div?

• What is the best XPointer scheme (or schemes) to handle linking and pointing to external resources?

• Are there preferred standards for linking to, and synchronising with, audio and video?

• We have some text requiring non-hierarchical coding and need to determine which of the schemes for multiple hierarchies best suits our texts.

• How can we best use TEI metadata to incorporate several DDI elements used by the UKDA for cataloguing?

• We are adapting natural language processing tools (NXT [NITE XML Toolkit]: http://www.ltg.ed.ac.uk/NITE/) to automate the mark-up of qualitative data. We are seeking advice on some issues arising from the integration of TEI and NXT.

Conclusion

• More information soon on the SQUAD website:

http://quads.esds.ac.uk/projects/squad.asp

Qualitative Data Mark-up Tools (QDMT)

• systematic preparation of digital data : to create formatted text documents ready for xml output

• mark-up of data to capture basic structural features of textual data: e.g. speakers and selected demographic details

• advanced annotation or mark-up of data

– automated information extraction of basic semantic information: inserting tags for names and temporal information

– automated anonymisation: replacing names with dummy forms, including co-references

– geographic mark-up to enable data linking: identifying and applying geographic mark-up, and scoping researchers' needs for geo-linking

• basic classification or thematic coding of textual data: will investigate linking into a domain ontology (e.g. social science thesaurus)

• contextual documentation to capture richness of the research methods, data collection and analytic interpretation and representation: will look at the interrelationships between complex intra-project data, annotations and context

• exposure of annotated and contextualised qualitative data to the web: investigating publishing of above QDM XML outputs to ESDS Qualidata Online, opportunities for exchange within CAQDAS tools, etc.

a common standard for data and metadata: the esds qualidata xml schema libby bishop esds qualidata...

markup of qualitative

standard slide

text slide

xml markup slide

data collection

data providers

xml output markup of

qualitative data attributes

Documents

new features for esds qualidata online libby bishop uk data...

archived qualitative data: accessing, searching and using...

a dtd for qualitative data: extending the ddi to mark-up the...

eonstor ds 1000 gen2シリーズ · 2018. 5. 1. · esds...

reusing qualitative data for teaching purposes esds...

strategies in teaching secondary analysis of qualitative...

secondary analysis of qualitative data: what is it and can...

living and working on sheppey: past, present and future...

exploring the potential: examining archived data at mass...

qualitative data resources: qualidata ukda libby bishop esds...

esds qualidata: encouraging the growth and use of archived...

naocl esds

“is it me, society or is it just the way life goes?”...

esds qualidata and quads coordination louise corti online...

esds qualidata: establishing and sustaining qualitative data...

how can qualitative data be re-used? louise corti esds...

caqdas, context and the recontextualisation of qualitative...

new directions for esds qualidata: 2003 and beyond louise...

esds qualidata libby bishop, esds qualidata economic and...

esds level 1 esds level 2 - rsa...