impact of standards in european open data catalogues. a multilingual perspective of dcat
TRANSCRIPT
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
1/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
1
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
EuropeanPublicSectorInformationPlatform
TopicReportNo.2013/09
ImpactofStandardsinEuropeanOpen
DataCatalogues.AMultilingual
perspectiveofDCAT
Authors:ElenaMontiel-Ponsoda,BorisVillazn-Terrazas
Published:September2013
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
2/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
2
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
TableofContents
TableofContents......................................................................................................................2Keywords:..................................................................................................................................3Abstract/ExecutiveSummary:..................................................................................................3
1 Introduction............................................................................................................................42 DCATOverview.......................................................................................................................53 DCATcompliantdatacatalogs................................................................................................74 SomeissuesrelatedwiththecurrentuseoftheDCATvocabulary:alanguageperspective12 5 EnrichingRDFvocabularieswithmultilingualinformation...................................................196 Approachesfortherepresentationofculturallyinfluencedelementsinontologies...........217 Conclusionsandrecommendations......................................................................................23AbouttheAuthors...................................................................................................................25References..............................................................................................................................26Copyrightinformation.............................................................................................................27
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
3/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
3
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
Keywords:
Datacatalogs,DCAT,PublicSectorInformation,Multilinguality
Abstract/ExecutiveSummary:
Severalcountriesfromaroundtheworldareestablishingdataplatforms.WithintheEuropean
Unionmemberstates are settingup officialdata cataloguesas entrypointsto simplifypublic
access to PSI (Public Sector Information) of the country. In this context it is important to
describethemetadataofthesedataportals, i.e.,datacatalogsand allowthe interoperability
amongthose.Totackletheseissues,theGovernmentLinkedDataWorkingGroupisdeveloping
DCAT(DataCatalogVocabulary),anRDFvocabularyfordescribingthemetadataofdatacatalogs.
Thistopicreportdescribestheadvantagesofhavingastandardfordatacatalogsandanalyses
themultilingualperspectiveofDCAT.
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
4/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
4
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
1IntroductionIn recentyearsdata hasbecomethenewoil. Indeed,just like oil, itneeds tobe discovered,
extractedfromitssources,andrefinedfromtherawmaterialintoproductswithahighadded
value. Following this trend,many national, regional and local governments, aswell as other
organizationsinsideandoutsidethepublicsector,areoperatingdatacatalogswebportals-
thatprovideaccesstomachine-readablepublicdatapublishedbytheseorganizations.Theneed
for a standard format to represent the metadata contained in these catalogs has been
recognized(Maalietal.,2010),asawaytoimproveinteroperabilityandexchangeofdataandin
ordertoavoidcatalogsendingupbeingdatasilos.
Inthisline,theW3CGovernmentLinkedDataWorkingGroup1isdevelopingDCAT(DataCatalog
Vocabulary), an RDFvocabulary designed to facilitate interoperability betweendata catalogs
publishedontheWeb(Maalietal.,2013).DCATwasfirstdevelopedandpublishedbyDERI2and
has seen widespread adoption at the time of this publication. The original vocabulary was
further developed by the eGov Interest Group3, before being brought onto the
RecommendationTrackbytheGovernmentLinkedData(GLD)WorkingGroup.
By using DCAT to describe datasets in data catalogs, publishers increase discoverability and
enable applications easily to consume metadata from multiple catalogs. It further enables
decentralized publishing of catalogs and facilitates federated dataset search across sites.
AggregatedDCATmetadatacanalsoserveasamanifestfiletofacilitatedigitalpreservation.
WithintheEuropeanContext,therearemorethan70officialdatacatalogues4andthereare23
officiallyrecognized languages.In this sense,ourclaim isthatmultilingualism isan important
featurevocabularieshaveto takeintoaccount.This reportanalyzesDCATfromamultilingual
perspectiveandproposestwooptionsforincludingmultilingualisminagivenvocabulary.
1http://www.w3.org/2011/gld/charter
2http://deri.ie/
3http://www.w3.org/2009/06/eGov/ig-charter.html
4
A list of official data catalogues in EU27 member states can be found at http://datacatalogs.org/group/eu-
official
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
5/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
5
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
2DCATOverviewDCAT is an RDF vocabularywell-suited for representing data catalogs. It defines three main
classes(seeFigure1):
dcat:Catalog,aclassthatdefinesacuratedcollectionofmetadataaboutdatasets.
This class is described by the following properties: title, description, date of formal
publicationofthecatalog,mostrecentdateofmodificationofthecatalog,languageof
thecatalog,a linkto thelicensedocumentunderwhichthecatalogismadeavailable,
the rights underwhich the catalog can be used/reused, the spatial coverage of the
catalog,andalinktothehomepageofthecatalog(seeFigure1).
dcat:DataSet, is defined as a collection of data, published or curatedby a single
agent,whichisavailableforaccessordownloadinoneormoreformats.Ascanbeseen
in Figure1, there are numerous properties that describe this class: title, description,
dateofformalissuanceandmostrecentdateofmodification,auniqueidentifierofthe
dataset, a keyword or tag describing the dataset, the language of the dataset, the
temporal period that the dataset covers, the spatial coverage of the dataset, the
frequencyatwhichthedataset ispublished,and, finally, thelandingpage, i.e.,aWeb
page that can be navigated to in aWeb browser to gain access to the dataset, its
distributionsand/oradditionalinformation.
dcat:Distribution,aclasswhichconnectsadatasettoitsavailabledistributions.
Thelatteraredefinedbypropertiessuchastitle,description,dateofformalpublication
andmost recentdateofmodification,linksto thelicensedocument underwhich the
distributionismadeavailable,aURLthatgivesaccesstoadistributionofthedataset,a
directlinktoadownloadablefileinagivenformat,mediatypeofthedistribution,the
fileformatofthedistribution,andthesizeofadistributioninbytes.
Moreover,throughthedcat:themeproperty,DCATreliesontheclassskos:Conceptfor
classifying its datasets according to a set of domains or categories, which in its turn are
categorizedororganizedinataxonomyusedtorepresentthemes/categoriesofdatasetsinthe
catalog(dcat:themeTaxonomy).
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
6/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
6
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
Figure1.DCATmainclasses(Maalietal.,2013)
Therestofthedocumentisorganizedasfollows.Insection3weprovideananalysisofsome
datacatalogswhichareannotatedwiththeDCATvocabulary,andwhichcontaindatainseveral
languages.Theanalysisprovidesaninterestinginsightintotheactualuseofthevocabularyand
highlights some issues related with the use of the vocabulary inmultilingual settings or by
publisherswhosemainlanguageisnotEnglish.Theseissuesaresummarized,exemplifiedand
discussed in section 4. Then, in section 5 we present several modeling options for the
enrichmentofRDFvocabularies,such as theDCAT vocabulary, withmultilingual information,
whatwould come to solve some of the problemsof its current use inmultilingual settings.
Finally, we end the report in section6, by presenting approaches for the representationof
culturallyinfluencedelementsinvocabularies.
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
7/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
7
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
3DCATcompliantdatacatalogs
In order toassess thecurrentuseof theDCAT vocabulary in European public data catalogs,
firstlyweanalyzedindetailseveralcatalogsthatmakeuseofthisvocabulary.Specifically,the
catalogsusedinouranalysisare:
PublicData.euEuropespublicdata5
ThedatacatalogoftheLocalGovernmentofGijn6,inSpain.
Gencat,thedatacatalogoftheRegionalGovernmentofCatalonia7,inSpain.
ThedatacatalogoftheLocalGovernmentofZaragoza8,inSpain.
Theselectionof these catalogswasmotivatedbyaprevious study carried out byDERI first
developers and publishersof the DCAT vocabulary-which resulted in a list of data catalogs
currently making use of this vocabulary. The list, which has been included below for
convenience(seeTable1),wasmadeavailabletousandcontainedninecatalogsintotal.
5http://publicdata.eu/
6http://datos.gijon.es/
7http://www20.gencat.cat/portal/site/dadesobertes
8http://www.zaragoza.es/ciudad/risp/
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
8/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
8
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
Table1:DatacatalogsDCATcompliant
Catalog Website Type
(national,
local,
regional)
dcat
availabe
at/as
#datasets theme? ke ywords? publisher ex te nt
(spatial)
distribution
format
distribution
size
Semantic
CKAN
http://semantic.cka
n.net
aggregate SPARQL
endpointhttp://seman
tic.ckan.net/
sparql
27592 nope yes...
representedusing
moat:tagged
WithTag
nope nope yes nope
Publicdata.
eu
http://publicdata.eu aggregate A list of RDF files
http://rdf.ope
ndatasearch.
org/
11655 yes, but
each
aggregated
catalog has
its own
hierarchy
i.e. themes
are not
reconciled
yes yes (though
uses
dc:creator)
yes yes nope
Barcelona http://w20.bcn.cat/
opendata/
City RDF dump
http://w20.bc
n.cat/openda
ta/CatalegR
DF.aspx
501 nope yes yes nope yes nope
Gijn http://datos.gijon.e
s
City an RDF file
per dataset
29 nope yes (though
wrongly
uses
dc:keyword)
yes nope yes yes
Catalonia http://dadesobertes
.gencat.cat
Regional RDF dump
http://dades
obertes.genc
at.cat/recurs
os/datasets/
cataleg.rdf
127 yes (though
uses
dc:subject)
yes yes yes yes yes
Balearic
Islandshttp://www.caib.es/
caibdatafront
Regional RDF dump
http://dades
obertes.caib.
es/recursos/
datasets/cat
alog/rdf
28 yes (though
uses
dc:subject)
nope yes (though
wrongly
uses
dc:editor)
nope yes (though
in a wrong
way)
yes (in a
wrong way
though)
Saragossa http://datos.zaragoz
a.es
City SPARQL
endpoint at
http://www.z
aragoza.es/d
atosabiertos/
sparql
271 yes (though
uses
dc:subject)
yes yes(wrongly
uses
dc:publiser
with no
label or any
other
property.
also uses
the non-
existing
dc:editor )
yes yes nope
Fingal http://data.fingal.ie Regional by t hird
party (DERI)
73 yes nope yes yes yes nope
lab.linkedd
ata.deri.ie
http://lab.linkeddat
a.deri.ie/govcat
Aggregate SPARQL
endpoint at
http://lab.link
eddata.deri.i
e/govcat/spa
rql
1710 yes yes yes yes yes yes
Inthefollowing,weprovideabriefdescriptionoftheselectedcatalogsfocusingonthepurposeofthecatalog,languageoftheportal,thelanguageofthedatasets,andthesearchoptions.
ThePublicData.euportal(Figure2)aimstoprovideaccesstoopendatasetsfromlocal,regional,
andnationalpublicbodiesacrossEurope.Apartfrombeingasinglepointofaccesstoscattered
datasetsinEurope,itenablesuserstobrowsedatasetsbyregion,subjectmatterandformat.
AlthoughtheportallanguageisEnglish,wecansaythatthePublicData.eucatalogismultilingual
in thesense that itcontains datasets originating from several European countriesandwhich
containinformation invariousnatural languages.Thedatasets come fromUK,France, Spain,
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
9/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
9
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
Austria, Denmark, Italy, Czech Republic, The Netherlands, Germany, Russia, Sweden, and
Ireland.Infact,anadditionalbrowseoptionoftheportalisbycountryoforigin(seeFigure3).
Figure2.SnapshopofthePublicData.euportal
Figure3.PublicData.eubrowseoptionbydatasetlanguage
Thenext catalog, thecatalogof theLocalGovernmentof Gijn,a city in thenorthofSpain,
containsaround400datasetsinSpanishmanagedbythelocalgovernment.Themainpurpose
ofthepublicationofdatasetsinRDFistocontributetoimprovecitizenparticipation,promote
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
10/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
10
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
innovationandgivecompaniestheopportunitytocreateaddedvalue,sothatbothpeopleand
marketplayerscanbenefit frompublic information andoffernewproductsandservices.The
portalisavailableonlyinSpanishandonecanfilterandcustomizethesearchresultsbyusing
theoptions:keyword,categoryandformat,asyoucanseeinFigure4.
Figure4.BrowseoptionsintheportaloftheLocalGovernmentofGijn
Gencatisthethirdcatalogwehavestudied(seeFigure5).Mostofthedatasetsinthisportalof
theGovernmentofCatalonia,inSpain,areinCatalan,theofficiallanguageinthisSpanishregion
together with the Spanish language. The portal is available in English, Spanish and Catalan.
According to theinformation in theportal, this catalog comprisesa databaseof the26,000
official facilitiesofCatalonia, the1,400procedures handled in the Governments officesand
some of its multimedia archives. The browse options are by categories (called tags in the
Englishversionoftheportal),format,anddatasources,i.e.,theorganismsthathavecreatedthe
correspondingdatasets.
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
11/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
11
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
Figure5.SnapshotofthesearchoptionsintheEnglishversionofGencat
ThefourthandlastcatalogthatwehaveconsideredforthisstudyisthedatacatalogoftheLocal
GovernmentofZaragoza,alsoaSpanishregion(Figure6).TheportalisonlyavailableinSpanish,
andtheinformationinthedatasetsitcontainsisalsoinSpanish.Asintheprecedingcatalogs,
thepurposeofthisportalistoprovideaccesstocitizenstopublicdataaswellastofosterthe
reuse of that information. The browser or search options are categories, type of update
proceedingofthedataset(yearly,quarterly,monthly,daily,instantly),formatandkeywords.
Figure6.PortalofthedatacatalogoftheLocalGovernmentofZaragoza
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
12/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
12
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
4SomeissuesrelatedwiththecurrentuseoftheDCATvocabulary:alanguageperspective
The next step inour analysiswas to access someof the datasets contained in the different
catalogs,availableintheRDFformatandannotatedwiththeDCATvocabulary,andlookintothe
use they made of the DCAT classes and properties. The main conclusions of this study are
discussedbelow.
1. SomedatasetsarenotusingthelastversionoftheDCATvocabulary.Forexample,the
datasetListdesIFSIenIledeFrancecontainedinthePublicData.eucatalogmakesuseof
the properties dct:creator and foaf:name to refer to the publisher of the
dataset,insteadofthedct:publisherpropertyandfoaf:Agentclassdefinedby
thecurrentversionoftheDCATvocabulary.
AsimilarexampleisfoundinthecatalogoftheLocalGovernmentofGijn.Inthecaseof
adatasetofhostels,wefindthefoaf:Organizationclassinsteadoffoaf:Agent
when defining the publisher of the datasets; or the dc:mediaTypeorExtent
insteadofthedcat:mediaTypedefinedinthecurrentversionofthevocabulary.
2. Some datasets make a free use of the DCAT vocabulary, i.e., they are not fully
compliantwithDCAT.Bythiswemeanthattheyusepropertiesofacertainclassinthe
descriptionofanotherclass.Forinstance,in thesamedatasetmentionedabove from
thePublicData.eucatalog, ListdesIFSIenIledeFrance,thepropertyfoaf:homepage
isapropertyoftheclassdcat:Dataset,i.e.,itisdescribingthedataset,whereasit
should be a property of the class dcat:Catalog, as established by the DCAT
vocabulary.
3. Anotherremarkableaspectoftheanalyzeddatasetsisthattheydonotmakeuseofthe
sameamountortypeofmetadata.Thismaybe,tosomeextent,reasonable,sinceeach
publishermightdecidewhichelementsofthevocabularycovertheneedsofhisorher
catalog.Mostcatalogsmakeuseofthedescriptiveinformationrelativetothedataset,
suchas, title, description, date ofissueordate ofmodification,andalso information
related to the distribution of the dataset (see the descriptive properties of the
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
13/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
13
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
dcat:Distribution classin section2).However,veryfewcontaininformationof
the Catalog itself, of the Record,or of the themeand theme taxonomy used bythe
catalogordatasetinquestion.
4. WhenaccessingthecodeofthedatasetinRDF,werealizedthatALLcatalogsreusedthe
DCAT vocabularyas it is,i.e.,with thelabels for classesand propertiesin English, as
definedby theauthorsof thevocabulary.NoneofthepublisherstranslatedtheDCAT
vocabularyitselfintoitsownlanguage,evenwhen therealdatao information in the
datasetswasinalanguagedifferentfromEnglish.
This isthecommonchoicewhentheontologyorvocabularyisshareableandvalidfor
different cultures. By this wemean that a certain conceptual organization (i.e., the
classesandpropertiesthatmakeupanontologyorvocabularyandthewayinwhich
theyhavebeenorganized)isuniversal,inthesensethatitdoesnotsolelyreflectthe
needs ofa certain cultureorhowa certain cultureapproachesaparticularly area of
knowledge,butitisvalidortranslatabletoothercultures.Infact,thesetofclassesand
propertiesproposedintheDCATvocabularyaregeneralenoughsoastobeacceptedby
anypublisher.Obviously,itmayhappenthatsomepropertiesthatarerelevantforone
publisher are not relevant for another one. For example, the dcat:language
property defining thedcat:Dataset class is highly relevant for the PublicData.eu
catalog,sinceitcontainsdatasetsinseveralnaturallanguages.Inthecaseofthecatalog
of the Local Government of Zaragoza this is not the case, since all datasets contain
informationinthesamelanguage,Spanish,andthisisnotapropertythatdeservesa
specialmention.
However,whatweobserveintheportalsanalyzedisthatthosepublishersincountries
where English isnot thenative language have provided different translations for the
classesor propertiesof the DCATvocabulary that are shown to the user intheweb
page. For example, in the catalog of the Local Government of Zaragoza, one of the
search options is by Materias (Subjects) or Temas (Topics), corresponding to the
dcat:themeproperty,whereastheLocalGovernmentofGijnhastranslatedthisas
Sectores (Sectors) o Sectores temticos (Domain areas). In the case of Gencat, the
catalogoftheGovernmentof Catalonia,theseareCategories (categories),and inthePublicData.eu,theyaredubbedGroups.
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
14/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
14
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
It could be said thatmateria, tema, sector, sector temtico, categoria or group are
synonymsortermvariantsthatcanbeassociatedtothesameconcept.Onecouldalso
argue that each publisher can translate the names of thesemetadata as he or she
wishes,as long as thetype of information provided by thedifferentmetadata is the
appropriate one. And we agree with this. Nevertheless, when searching different
catalogsinthesamelanguage,letus saySpanish,onewouldexpectto findthesame
information labeled in the same way. We believe this would avoid confusion and
simplifythesearch.
Forthisreason,andalsofortaskssuchastheautomaticgenerationofwebpagesfrom
ontologiesorvocabularies,wewouldbeinfavorofproposingofficialtranslationsof
theDCATvocabulary,whichcouldbeenrichedwithasmanyvariantsaswishedinorder
tocovertheneedsofallpublishers(seesection5formoredetailsonthis).
5. ThelastissuewhichwecameacrossregardingtheuseofDCATbydifferentpublishersin
Europeisthatthecategorizationtheymakeofthedatasetsisalsodifferent.Theauthors
oftheDCATdonotprescribethetopicsorcategoriesschemathatshouldbe followed
whenusingthisvocabulary.Theyonlydetermine that thepropertydcat:themebe
linked to a skos:Concept, which in its turn be included in a
skos:ConceptScheme. Because of this, each publisher has adopted a different
categorizationortaxonomyofcategoriestoclassifydatasets.Wehadthechancetoask
someofthepublishersandtheysaidthattheyhadsimplyadoptedthecategorization
thatthedifferentpublicbodiesalreadymadeintheirrespectivewebpages.
Bywayofexamplewe includebelow theclassification followed in thePublicData.eu
portal(Figure7),andthetaxonomyusedintheGencatportalfromtheGovernmentof
Catalonia(seeTable2).
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
15/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
15
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
Figure7.ClassificationofdatasetsonthePublicData.euportal
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
16/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
16
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
Table2.TaxonomyusedintheGencatportalfromtheGovernmentofCatalonia
Theme Includes Sector ID
Administraci pblica Oposicions, Publicacions, Recerca, estudis i anlisis, Altres administracio
Agricultura,
ramaderia i pescaAgricultura, Foment de la producci, Infraestructures rurals, Pesca.
Aqicultura, Recerca, estudis i anlisis
agr-ram-pesca
Associacionisme i
participaciCivisme, Cooperaci al desenvolupament, Entitats, Equipaments,
Participaci ciutadana, Pau i drets humans, Voluntariat
participacio
Comer i consum Consum comerc
EconomiaAssegurances, Defensa competncia, Deutes i sancions, Pagaments,
Tributs
economia
Educaci i formaci
AMPA, Educaci en el lleure, Educaci infantil, Educaci
secundria obligatria, Estudiar a l'estranger, Formaci adults,
Formaci per a docents, Formaci professional, Idiomes, Material
didctic, Mobilitat educativa, Preinscripci i matriculaci, Provesd'accs, Suport a l'alumnat, Altres, Oposicions
educacio
CulturaArts escniques, Arts visuals, Arxius i biblioteques, Cinema i
audiovisuals, Cultura popular, Lletres, Memria histrica, Museus,
Msica, Patrimoni, Recerca, estudis i anlisis, Altres
cultura
Esports, lleure i ociCaa, Esports, Jocs i espectacles, Nutica i busseig, Pesca, Proves
esportives, Vacances i estades
esports-oci
Indstria i energia Estalvi energticindustria-
energia
Habitatge Compra, Lloguer, Protecci oficial, Rehabilitaci, Altres habitatge
JustciaAcadmies, Associacions, Col
legis professionals, Federacions,
Fundacions, Gestors administratius, Oposicions, Recerca, estudis i
anlisis
justicia
Llengua i
comunicaciOccit,Llenguacatalana,Mitjansdecomunicaci comunicacio
Medi AmbientAigua, Ecologia, Estalvi energtic, Flora i fauna, Sostenibilitat,
Altres
medi-ambient
Mobilitat i transport Trnsit, Transports transport
SalutAssistncia sanitria, Foment salut, Higiene, Recerca, estudis i
anlisis
salut
Serveis Socials Cohesi social, Dependncia, Discapacitat, Altres serveis-socials
Societat, ciutadania i
famlies
Adopci i acolliment, Afers exteriors, Afers religiosos, Dones, Gais,
lesbianes i transsexuals, Gent gran, Igualtat, Immigraci, Infants i
famlies, Joves, Recerca, estudis i anlisis, Altres
societat
Tecnologia. Recerca i
InnovaciFoment, Noves tecnologies, Recerca, Societat de la Informaci,
Telecomunicacions, Innovaci
tecnologia
Territori i paisatge.
UrbanismeBoscos, Costes, Paisatge, Ports
territori
TreballBorsa de treball, Cooperatives, Emprenedoria, Formaci, Igualtat
d'oportunitats, Ocupaci, Oposicions, Relacions laborals, Seguretat i
salut laboral
treball
Turisme Establiments turstics, Foment turisme, Altres turisme
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
17/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
17
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
Theme Includes Sector ID
UniversitatsFormaci en empreses, Mobilitat educativa, Postgraus, doctorats i
msters, Preinscripci i matriculaci, Proves d'Accs a la
Universitat, Recerca, Altres
universitats
IntheGencattaxonomy, thefirst rowidentifiesthemain topics;thesecond, thesub-
topicsrelatedtothemainones,andthethirdrow,thetermusedtoidentifythetopics
in the web page. At a first sight, one realizes that the categorizations of the
PublicData.euportalandtheonesfromGencatdonotmatchincoverageorgranularity.
Inthecaseofcategorizationsofdatasets,weobservesomegeneralconceptsortopics
whicharepresentinanycategorization,whereasothers,maybemoreculturallybound,
areonlypresentincertaincategorizations.InthecaseofPublicData.eu,theypropose14
groups, such as Agriculture, Fisheries and Forestry or Economy and Industry,
whereas the Gencat catalog proposes 22 broad categories. When comparing the
PublicData.eucatalog to theGencatone, we seethat there is some overlap. Inboth
catalogswe find datasets groupedunder thecategories ofAgriculture, Fisheries and
Forestry Agricultura, ramaderia i pesca; Culture and Arts Cultura;
Environment Medi Ambient; or Health Salut. In other cases, the
PublicData.eucatalogmerges relatedcategories, and theGencat catalog keeps them
apart.ItisthecaseofEducationandCommunicationinthePublicData.eucatalogvs.
Universitats(Universities),Tecnologia,recercaiinnovaci(Technology,researchand
innovation), Llengua i comunicaci (Language and communication) and Esports,
lleureioci(Sports,freetimeandhobbies).Inthissense,wecansaythatthetaxonomy
of categories usedby theGencatcatalog is farmore specific than thePublic.Data.eu
catalog.
DuetothefactthattheGencatcataloghasbeencreatedbytheGovernmentofaregion,
in this case, Catalonia, it contains categories that comply with the purpose of the
catalog,suchasAdministracipblica(Publicadministration).Alsohighlyrelatedwith
thedistinctivefeaturesofthisregion,wefindthecategoryofLlenguaicomunicaci
(Language and communication), which contains the sub-categories Occit, Llengua
catalana, Mitjans de comunicaci (Occitane, Catalan language, Media). Such
distinctionsmayrepresentanissueforportalssuchasthePublicData.euone,whichaim
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
18/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
18
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
at providing a unified access to datasets across Europe, because they will have to
previously analyze the categorizations made by the different portals to aggregate
datasetsundermeaningfulcategories.Sincenocategorizationortaxonomyisprescribed
bytheDCATvocabulary,afteranalyzingseveralcategorizationsofthissort,atentative
categorizationcouldbeproposed,whichcouldbeextendedoradaptedtocoverspecific
culturalneeds.Forthispurpose,severalrepresentationalapproachescouldbefollowed.
Weprovideasummaryofthisinsection6.
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
19/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
19
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
5Enriching RDF vocabularies with multilingualinformation
Asmentionedinpoint4of theabovesection,withtheaimofenhancingtheuseof theDCAT
vocabularyataninternationallevel,itwouldberecommendabletoprovidetranslationsofthe
labelsthatdescribetheDCATclassesandpropertiestolanguagesotherthanEnglish.Someof
theadvantagesof havingmultilingual versions of this vocabularywouldbe thatpublishers in
countrieswhereEnglishisnottheofficiallanguagecouldmakeuseofthesedescriptionsintheir
ownlanguage,andtheycouldalsodirectlyreusethesetermsorlabelsin theirportalsorfinal
applications. This would also result in all portals making use of the same terms or labels,
contributinginthiswaytointeroperability.
TheideaofenrichingontologiesandRDFvocabularieswithmultilinguallinguisticinformationis
notnewandhasbeentheobjectofresearchandstudyforadecadenow.Tothebestofour
knowledge, some of the first approaches toenrichontologieswith linguisticdescriptions are
LingInfo (Buitelaar et al., 2006), LexOnto (Cimiano et al. 2007), LIR-Linguistic Information
Repository(Petersetal.,2007;Montiel-Ponsodaetal.,2010)orLexInfo(Buitelaaretal.,2009;
Cimianoetal.,2010).Thesemodelsmainlydifferinthetypeoflinguisticdescriptionstheyaimat
accountingfor.Forinstance,whereastheLingInfomodelfocusedontherepresentationofthe
morphologicalandsyntacticstructuresofthoselabelsortermsdescribingontologyclassesand
properties, the LIR model focused on the representation of term variants and translations.
Currently, researchers in this domain have joined forces and are working towards the
standardization ofamodel thatwill intend tocapturea wide rangeof linguisticdescriptions
relative to ontologies or RDF vocabularies. We are referring to the W3C Ontology-Lexica
Community Group9. This standardization initiative has taken the lemon (LExicon Model for
ONtologies)model(McCraeetal.,2011;http://lemon-model.net/)asbasisforitswork,anditis
evolving itintoamodelwhich,incombinationwiththesemanticinformationcapturedin the
ontology, isaimedat improving theperformanceofNLP (Natural Language Processing) tools,
amongst other objectives. See Figure 8 to gain an impression of the kind of linguistic
descriptionsthatcanbeassociatedtoontologyelementsinthe lemonmodel.
9http://www.w3.org/community/ontolex/
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
20/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
20
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
Figure8.Overviewofthelemonmodel
Asforthe specific caseof the DCATvocabulary, amodel such as lemon would allow for the
inclusionoftermvariantsindifferentlanguagesfortheclassesandpropertiesofthevocabulary.
ComingbacktothepreviouslymentionedexampleoftheseveraltranslationsinSpanishofthe
dcat:themeproperty(materia,tema,sector,sectortemtico,categoriaorgroup),theycould
allbeaccountedforasvariantsorlinguisticrealizationsoftheproperty dcat:theme. Fora
propertysuchasdct:modified,wecouldhavemorereadabletermsorlabels(lastupdate,
changedate,fechademodificacin,fechadeactualizacin,nderungsdatum,Datumderletzten
Aktualisierung,etc.),whichcouldthenbeusedfortheautomaticgenerationofwebpages.
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
21/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
21
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
6Approachesfortherepresentationofculturallyinfluencedelementsinontologies
Closelyrelatedwiththeapproachesproposedto enrichontologiesandRDFvocabularieswith
multilingual linguistic information is the issue of capturing culturally-bounded classes and
propertiesinontologies.Asmentionedinissuenumber5,section3,theDCATvocabularydoes
notprescribeanycategorizationortaxonomyofcategoriesorthemesintowhichdatasetscan
beclassified.Inthecatalogsanalyzedwefoundoutthatthecategorizationsofdatasetsshowed
some differences, mainlymotivated by the idiosyncrasyof the catalogs themselves,and the
cultureandlanguageinwhichtheyhadbeendeveloped.Inthissense,itwouldbeadvisableto
proposeataxonomyandanalyzewhichapproachisthemostsuitabletomeettheneedsofmost
(ifnotall)publishers.
Taking into account previous work on ontology localization (Montiel-Ponsoda et al. 2010,
Cimianoetal.2010),weenvisiontwopossibilities:
1. Tomapthedifferentcategorizationsbymeansofamappingmodel
2. Tomaintainonecategorizationandtorepresentculturalissuesinanexternallinguistic
modelorasspecificlanguagemodulesorextensionsintheontology
Thefirstapproachallowsforeachpublishermaintainingitsowncategorization,andallofthem
beingmappedorlinkedtoacentralcategorization(seeFigure9fromMontiel-Ponsoda,2011).
However,themappingestablishmentmaybeatoughtask,andsomescalabilityissuesmayalso
appearasmoreandmoredatasetsusetheDCATvocabulary.
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
22/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
22
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
Figure9.Mappingmodel
Asforthesecondoption,Figure10,onecategorizationwouldbesharedbyallpublishers,andin
case of cultural issues, these could be kept in the linguistic model, or, if needed, specific
culturalmodulescouldbeproposedtoextendtheoriginalcategorization.Themainadvantage
ofthislatterapproachisthatitcontributestointeroperability,butwithoutforgettingculturally
boundissues.InthecaseoftheDCATvocabulary,wewouldbeinfavorofthislatteroption.
Figure10.Vocabularylinkedtoanexternalmodel
Again,thelemonmodeldescribedinsection4(orthemodelthatwillresultfromtheW3COnto-
LexicaCommunityGroup)wouldcometosolvethemodellingissuesinvolvedinthislattermodel.
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
23/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
23
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
7Conclusionsandrecommendations
The number of data catalogs in Europe is increasing. Lately, there is a trend in public
administrations (regional, local, national and European) to public government data in data
catalogs.DCAT, a vocabulary for representing metadata of data catalogs, is being developed
withintheGovernmentLinkedDataW3CWorkingGroup.ThankstoDCAT,publishersincrease
discoverabilityandenableapplicationstoeasilyconsumemetadatafrommultiplecatalogs.
Inthisreportwehavepresented:
anoverviewofDCAT;
asetofdataportalsthatrelyonDCATfordescribingtheirmetadata;
ananalysisonhowtheseveralportalsactuallyuseDCAT,whichclassesandproperties
theyuse,and,inparticular,howtheyrepresentthethemesoftheircatalog;
the possible options of enriching DCAT with multilingual information, to be able to
representdatacataloguesindifferentlanguages
Ourmainrecommendationistoconsiderthemultilingualismaspectinanyvocabulary,since,on
theonehand,itmaycontributetoitsglobaladoption,and,ontheother,itmayalsoaddto
interoperability.To this respect wehave proposed lemon, amodel for therepresentationof
linguisticinformationrelativetoanontologyorRDFvocabularythatiscurrentlybeingreviewed
forstandardizationpurposes.
Ideally,multilingualismshouldbeconsideredasearlyaspossible,sothatspecificitiesofcertain
languages could be approachedas soon aspossible. This would also allow fora prescriptive
approach,inwhichpublishersaresaidwhichlabels touseineachcase.However,theprocess
rarely follows this order. As vocabularies gain popularity, their adoption increases and
multilingualneedsappeartosupportinteroperability.Infact,widespreadadoptioncomesfirst,
and,then,onerealizesthebenefitsofthemultilingualaspect.Forthesereasons,modelssuchas
lemon allow to maintain the model or vocabulary as it is, and enrich it withmultilingual
informationatanystageoftheprocess.InthespecificcaseoftheDCATvocabulary,andtaken
intoaccount itsgeneraladoption,thenextstepwould involveananalysisofthecatalogsand
portals that implement it to identify the labels used by the various publishers in different
languages. All those labels, or preferably, the ones that better express the meaning of the
vocabularytermsshouldbecapturedinthelinguisticmodelandrecognizedaspreferredlabels
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
24/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
24
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
ineachlanguage.Thebenefitofthisapproachisthatthemodelwouldtakeadvantageoflabels
(variantsortranslations)thatarepopularandacceptedbypublishers,andwouldnotimpose
theuseofsomelabelsthatmayendupnotbeingmeaningfulforusers.Themodelwouldalso
leavethedooropenfornewlinguisticneedswithoutinterferingwiththeoriginalvocabulary.
Moreover, we believe that it should be made following a conciliatory approach in which
different options are welcomed and integrated, and in which different communities can
participate in proposing termsand translations in their own languages, thus building it in a
cooperative way. All in all, the enrichment of the vocabulary with multilingual linguistic
information would contribute to a wider adoption and increased understanding and
interoperability.
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
25/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
25
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
AbouttheAuthors
ElenaMontielPonsoda isLecturerattheUniversidadPolitcnicadeMadrid,inMadrid,Spain,
andmemberoftheOntologyEngineeringGroupatthesameuniversity.ShereceivedherM.A.in
ConferenceInterpretingandTranslation(September2000)byUniversidaddeAlicante,herB.A.
inTechnicalInterpreting(February2003)byHochschuleMagdeburg-Stendal,Germany,andher
PhDonAppliedLinguistics (January 2011) byUniversidadPolitcnica deMadrid.Hercurrent
research activities include, among others: Terminology and Translation in the field of
InformationTechnologyandNaturalLanguageProcessing(NLP),inwhichshehasparticipatedin
differentinternationalprojectsconcerningterminology,ontologiesandmultilingualismandits
applicationto theSemanticWeb.She haspublished thebook"Multilingualism inOntologies.
BuildingPatternsandRepresentationModels",andnumerouspapersinjournals,conferences
andworkshopsintheareasofAppliedLinguistics,SemanticWeb,andNLP.
BorisVillazn-TerrazasisLinkedDataResearcherManageratiSOCO.HeholdsaPhDinArtificial
IntelligencefromUniversidadPolitcnicadeMadrid.HehaspreviouslyworkedasPost-Docat
theOntology EngineeringGroup.Beforehewasa researcherand software developer at the
ResearchInstituteofInformaticsattheUniversidadCatlicaBolivianaSanPablo.Hisresearch
interestsarefocusedonLinkedData,SemanticWebandOntologyEngineering,amongothers.
Hehasparticipated inseveralEuropean researchprojects suchasKnowledgeWeb,OntoGrid,
SEEMP,NeOn,SemsorGrid4Env,PlanetData,andParlance,aswellasinnationalprojectssuchas
Reimdoc, Servicios Semnticos, Plata, Gis4Gov, WebN+1, Buscamedia and Ciudad2020.
Moreover, he was leading the Spanish Linked Data initiatives, such as GeoLinkedData,
datos.bne.es,AEMETLinkedData,andElViajero.Finally,hehaspublishedmorethan40papers
in journals, conferences and workshops, and currently he is actively participating in the
RDB2RDF,andGovernmentLinkedDataW3CWorkingGroups.
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
26/27
ePSIplatform Topic Report No. 2013 / 09 , September 2013
26
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
References
1 Maali,F.&Cyganiak,R.&Peristeras,V.(2010). EnablingInteroperabilityofGovernment
DataCatalogues.ElectronicGovernment10thInternationalConference
2 Maali,F.&Erickson,J.&Archer,P.(2013). DataCatalogVocabulary(DCAT),W3CLastCallWorkingDraft.
3 Buitelaar,P.,Declerck,T.,Frank,A.,Kiesel,M.,Sintek,M.,Romanelli,M.,Sonntag,D.,Loos,B.,Micelli,V.,&Porzel,R.(2006).LingInfo:DesignandApplicationsof aModel for the
IntegrationofLinguisticInformationinOntologies .InProceedingsofOntolex2006.
4 Cimiano,P.,Haase,P.,Herold,M.,Mantel,M.,andBuitelaar,P.(2007).LexOnto:AModelforOntologyLexiconsforOntology-basedNLP.InProceedingsoftheOntoLex07Workshop
attheISWC07.
5 Peters, W., Montiel-Ponsoda, E., Aguado de Cea, G., and Gmez-Prez, A. (2007).Localizingontologies inOWL.InFromtexttoknowledge,thelexicon/ontologyinterface,
proceedingsoftheOntolex07workshop.Busan,SouthCorea.
6 Montiel-Ponsoda,E.,AguadodeCea,G.,Gmez-Prez,A.,andPeters,W.(2010).EnrichingOntologieswithMultilingualInformation.JournalofNaturalLanguageEngineering,17(3),
283-309.
7 Buitelaar,P.,Cimiano,P.,Haase,P.,andSintek,M.(2009). Towardslinguisticallygroundedontologies.InProceedingsofthe6thEuropeanSemanticWebConference(ESWC09),111-
125.
8 Cimiano,P.,Montiel-Ponsoda,E.,Buitelaar,P.,Espinoza,M.,andGmez-Prez,A.(2010). Anoteonontologylocalization .JournalofAppliedOntology,5(2),127-137.
9 McCrae, J., Aguado-de-Cea, G., Buitelaar, P., Cimiano, P., Declerck, T.,Gomz-Prez, A.,Gracia, J., Hollink, L., Montiel-Ponsoda, E., Spohr, D., Wunner, T. (2011). Interchanging
lexical resourceson theSemanticWeb.En LanguageResourcesandEvaluation,46,701-719.
10 Montiel-Ponsoda, E. (2011). Multilingualism in Ontologies. Building Patterns andRepresentationModels.LAP-LambertAcademicPublishing.
.
-
7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT
27/27
ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual
perspectiveofDCAT
Copyrightinformation
2013 European PSI Platform This document and all material
thereinhasbeencompiledwithgreatcare.However,theauthor,editorand/orpublisherand/or
anypartywithin theEuropeanPSIPlatformor itspredecessor projects theePSIplusNetwork
projectorePSINetconsortiumcannotbeheldliableinanywayfortheconsequencesofusing
the contentof this document and/or any material referenced therein. This report has been
publishedundertheauspicesoftheEuropeanPublicSectorinformationPlatform.
The reportmay be reproduced providing acknowledgement ismade to the European Public
SectorInformation(PSI)Platform.