on metadata for open data
DESCRIPTION
On an enlarged metadata set for open data classification, allowing for automated processing and linkingTRANSCRIPT
On Metadata for Open Data
Yannis Charalabidis
25.04.2012
Introduction
We will try in the next slides to show you what is the level of expectation from metadata
handling from a 2nd generation open data system
Imagine you are in front of the ENGAGE system, and you have your URI from a dataset,
somewhere in the cloud,(copied as string in the clipboard)
And begin …
Prescreening: User only gives URI of the dataset
Enter (paste) the URI of your dataset
_
(then for 30 seconds you see this screen, changing)
Progress of ENGAGE Resource Prescreening: ( 45% ) of jobs completed
Managed to : Identify xls file
Autofill, provisionally: TitleAutofill, provisionally: CreatorCreate unique ENGAGE URI
Calculate keywordsAutofill, provisionally: keywords
……
(When finishing import, the report)Report
ENGAGE managed to automatically, provisionally fill in ( 21 ) of 43 metadata attributes for your dataset.
Your current validity is at ( 45% )
For your dataset to be inserted in the database, you need to continue filling
in ( 5 ) mandatory attributes.Your dataset will then be inserted with validity ( 55% )
If all ( 17 ) non-mandatory attributes are filled in, validity will be maximum, at
70% / limit of the insertion phase.
Please select next action: Continue ParkContinue Park CancelCancel
After import …
… and then, we enter the metadata insertion page with pre-filled data, etc.
When we finish, we get a similar final report.
AND NOW THE ENGAGE METADATA set, that makes all that a possibility:
But,before, some semantics:
Attribute characteristics – notation:
(M) : attribute is Mandatory (cannot be empty)(*) : attribute takes values from a controlled list of terms (codelist), or tree (dag of terms), or table (+) : takes values from an extendible list or tree. User may extend the list during insertion(a) : an auto-filling list (as suggestion) or otherwise automatically calculated attribute(m) : attribute accepts multiple values(v) : attribute entry can be verified through a type-checking algorithm
(( x )) : x is possible, but as an optionno tag : attribute is a simple string entry
---------- for the future -------------(c0), (c1), (c2), (c3) : the importance of attribute in completeness calculation (c3 is higher – mostly important)(q0), (q1), (q2), (q3) : the importance of attribute in data quality calculation (q3 is higher – mostly important)
A. The core attributesMetadata Attribute Type of Attribute Type of codelist
Size of codelist (nodes)
Existing codelists
TitleAutomatic: extracted from the dataset headline of the URI/dataset provided
(M) ((a)) String - - -
PublisherPUB admin tree (100 per country, extendible)
(M)(*)(+)Pointer to Tree Tree of Strings 100 X
countryGreece (ENG)
Creator PUB admin tree (100 per country, extendible)Prompt: same as the publisher
(M)(*)(+)Pointer to Tree Tree of PS entities 100 X
countryGreece (ENG)
CodeAutomatic: ENGAGE automatic classification system (date,country,PSector,type,etc) or ENGAGE URI
(M)(*)(a) String - - -
UserThe user who uploads that. Automatic filling from table of users / login
(*)(a)Pointer to Table Table of Users -
-
B. The outer core attributes Metadata Attribute Type of Attribute Type of codelist
Size of codelist (nodes)
Existing codelists
SubjectText describing the resource in one sentenceIt can be stored in a list and reused
(M)(*)(+)Pointer to List List of strings All resource
subjectsNO
Type List of types: dataset, linkable dataset, visualization, textual information, executable binary, unknown
(M)(*)(m)Pointer to list List of strings 10 ENG
Format xls xml odata … jpd pdf … (appr. 50 format types) (M)(*)(+)
Pointer to listList of strings 50 ENG
Language ISO simplified (5 < 20 (EU) < ISO (3000). Automatic: extract from language settings (when XLS / ISO)
(M)(*) ((a)) (m)Pointer to List List of strings 200 ISO List
(ENG)
Country 5 ENGAGE countries < rest of 27 EU < other countries ISO country list
(M)(*)(m)Pointer to List List of strings 200
ISO List (ENG)
C. The Public Sector ContextMetadata Attribute Type of Attribute Type of codelist
Size of codelist (nodes)
Existing codelists
Public Sector DomainTree of sectors (20: finance, health, social security, etc)Automatic : can be calculated from Creator, if all public sector entities have a domain
(*)(m)(+)Pointer to Tree Tree of strings 20 ENG, GR
Relative Public Service List of public services (i2010 20 basic services, plus “other-reward service”, “othr permission service”, “Other registry entry service”, “Other personal documents service”)
(*)(m)(+)Pointer to List List of strings 24 ENG, GR
Relative Information SystemList of EU and national main information systems (50+50*country)
(*)(m)(+) Pointer to List List of strings 200 GR
Legal Framework Main EU directives on open data (10), main national laws and decrees on open data (10 X country)
(*)(m)(+) Table of Legal Elements 100 GR
D. The Scientific ContextMetadata Attribute Type of Attribute Type of codelist
Size of codelist (nodes)
Existing codelists
Scientific Sector ENGAGE Tree of Scientific Domains
(*)(m)Pointer to Tree Tree of strings 100 Science
Scientific Usage of ResourceENGAGE tree of scientific types/usages: events data (nature or man-made), financial data, health data, etc (20)
(*)(m)(+)Pointer to Tree Tree of strings 20 Science
Intended AudienceList of possible audiences: citizens, enterprises, researchers, public sector managers, public sector officers, policy makers, members of National Parliament, MEP’s, NGO’s etc
(*)(m)(+)Pointer to List Tree of strings 20 ENGAGE
Keywords Initial list made / proposed by ENGAGE System with countries, Psector Domain, Science Domain, Usage. Also get from linked areas / domains / types etc
(*)(m)(+)(a)Pointer to List List of strings 200 -
E. URL’s – URI’s - Links Metadata Attribute Type of Attribute Type of codelist
Size of codelist (nodes)
Existing codelists
Type of Source Link URL / URI / DOI / WS / RSS/ ENGAGE / other (*)(+)
Pointer to List List of Strings 10 ENG
Source Link (URL) String or ENGAGE URL (*)(a). Automatic: put the URL of ENGAGE site
(*) (+) ((a))Pointer to List List of Strings
Codelist is the full list of URI’s in ENGAGE
Yes
Type of Resource link URL / URI / DOI / WS / RSS/ ENGAGE other (*)(+)
Pointer to List List of Strings 10 ENG
Resource Link String or ENGAGE (a). Automatic lists the link it already has.
(*) (+) ((a))Pointer to List List of Strings
Codelist is the full list of URI’s in ENGAGE
Yes
Relevant Resources List of existing URI’s in the system . Automatic: calculates from matching domain+type+ (*)(m)(+)(a) List of Strings
Codelist is the full list of URI’s in ENGAGE
Yes
F. Linked DataMetadata Attribute Type of Attribute Type of codelist
Size of codelist (nodes)
Existing codelists
Linking statusLinkable, linked, non-linked, non-linkable, unknown
(*)Pointer to List List of Strings 5 YES
Linked Data SetURI of a linked dataset. Details of link:
(*)(m)(+)(a)(d)Pointer to List List of URI’s No limit -
Linking Type (PK match) Pointer to List List of Strings 1 -
Matching column of this resource String - - -
Matching column of linked resource String - - -
Columns of this resource, to be included (m) String - - -
Columns of linked resource, to be included (m) String - - -
VisualisationsLinks to visualisations of current resource
(*)(m)(+)(a)(d)Pointer to List List of URI’s No limit -
G. Dates and StatusMetadata Attribute Type of Attribute Type of codelist
Size of codelist (nodes)
Existing codelists
Consideration Started on (v)DATE - - -
Initial Approval / Planning Started on (v)DATE - - -
Planned to be valid on (v)DATE - - -
Validity Started on (v)DATE - - -
Validity to finish on (v)DATE - - -
Rejected on (v)DATE - - -
Substituted on (v)DATE - - -
Status Considered, planned, valid, valid and linked, rejected, outdated, substituted. Automatic: calculation through DATES
(*) (a) Pointer to List List of Strings 8 ENG
H. RatingMetadata Attribute Type of Attribute Type of codelist
Size of codelist (nodes)
Existing codelists
Metadata CompletenessAutomatic: calculated by filled / empty non mandatory items
Number (1-100) - - -
Metadata QualityAutomatic: calculated by specific filled / empty non mandatory items Number (1-100) - - -
Citizen RatingAs reported / calculated by relative users Number (1-100) - - -
Researcher RatingAs reported / calculated by relative users Number (1-100) - - -
Business RatingAs reported / calculated by relative users
Number (1-100)
Number of DownloadsAs reported by the ENGAGE System Number - - -
Density of DownloadsAs number per total period of validity to date Number % - - -
An Infrastructure for Open, Linked Governmental Data Provision towards
Research Communities and Citizens
Proposal Evaluation HearingBrussels 23/2/2011
Not to forget: Metadata codelists where there, since the Hearing … !
Q6: Which types of metadata will you select?
• Exploit work already done by the consortium (DELFT, NTUA, AEGEAN, STFC) in public sector metadata schemas
• Multi-facet design: take under consideration the fact that the data may be used in different contexts, such as research, policy making or by citizens
• Take under consideration the fact that data sources may provide wildly differing metadata – go towards metadata standardisation for Open Data / a major contribution of ENGAGE
• Two-phase metadata design within ENGAGE workplan (Task C1.2: Data and knowledge representation annotation and linking methods). Initial proposal based on Dublin Core, UK eGovernment Metadata Schema and eGMS+, is as following:
Metadata ENGAGE Set Identifier Title CreatorPublisher Country SourceType (*) Format (*) Language (*)Sector (*) Subject (*) Keywords (*)Relative Public Service (*) Relative Information System URL / URI / DOIValidity Date (from – to) Audience (*) Legal FrameworkStatus (*) Relevant Resources Linkded Data Sets (*)
(*) Indicates Controlled Lists / Taxonomies