impact of standards in european open data catalogues. a multilingual perspective of dcat

Upload: epsi-platform

Post on 14-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    1/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    1

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    EuropeanPublicSectorInformationPlatform

    TopicReportNo.2013/09

    ImpactofStandardsinEuropeanOpen

    DataCatalogues.AMultilingual

    perspectiveofDCAT

    Authors:ElenaMontiel-Ponsoda,BorisVillazn-Terrazas

    Published:September2013

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    2/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    2

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    TableofContents

    TableofContents......................................................................................................................2Keywords:..................................................................................................................................3Abstract/ExecutiveSummary:..................................................................................................3

    1 Introduction............................................................................................................................42 DCATOverview.......................................................................................................................53 DCATcompliantdatacatalogs................................................................................................74 SomeissuesrelatedwiththecurrentuseoftheDCATvocabulary:alanguageperspective12 5 EnrichingRDFvocabularieswithmultilingualinformation...................................................196 Approachesfortherepresentationofculturallyinfluencedelementsinontologies...........217 Conclusionsandrecommendations......................................................................................23AbouttheAuthors...................................................................................................................25References..............................................................................................................................26Copyrightinformation.............................................................................................................27

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    3/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    3

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    Keywords:

    Datacatalogs,DCAT,PublicSectorInformation,Multilinguality

    Abstract/ExecutiveSummary:

    Severalcountriesfromaroundtheworldareestablishingdataplatforms.WithintheEuropean

    Unionmemberstates are settingup officialdata cataloguesas entrypointsto simplifypublic

    access to PSI (Public Sector Information) of the country. In this context it is important to

    describethemetadataofthesedataportals, i.e.,datacatalogsand allowthe interoperability

    amongthose.Totackletheseissues,theGovernmentLinkedDataWorkingGroupisdeveloping

    DCAT(DataCatalogVocabulary),anRDFvocabularyfordescribingthemetadataofdatacatalogs.

    Thistopicreportdescribestheadvantagesofhavingastandardfordatacatalogsandanalyses

    themultilingualperspectiveofDCAT.

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    4/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    4

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    1IntroductionIn recentyearsdata hasbecomethenewoil. Indeed,just like oil, itneeds tobe discovered,

    extractedfromitssources,andrefinedfromtherawmaterialintoproductswithahighadded

    value. Following this trend,many national, regional and local governments, aswell as other

    organizationsinsideandoutsidethepublicsector,areoperatingdatacatalogswebportals-

    thatprovideaccesstomachine-readablepublicdatapublishedbytheseorganizations.Theneed

    for a standard format to represent the metadata contained in these catalogs has been

    recognized(Maalietal.,2010),asawaytoimproveinteroperabilityandexchangeofdataandin

    ordertoavoidcatalogsendingupbeingdatasilos.

    Inthisline,theW3CGovernmentLinkedDataWorkingGroup1isdevelopingDCAT(DataCatalog

    Vocabulary), an RDFvocabulary designed to facilitate interoperability betweendata catalogs

    publishedontheWeb(Maalietal.,2013).DCATwasfirstdevelopedandpublishedbyDERI2and

    has seen widespread adoption at the time of this publication. The original vocabulary was

    further developed by the eGov Interest Group3, before being brought onto the

    RecommendationTrackbytheGovernmentLinkedData(GLD)WorkingGroup.

    By using DCAT to describe datasets in data catalogs, publishers increase discoverability and

    enable applications easily to consume metadata from multiple catalogs. It further enables

    decentralized publishing of catalogs and facilitates federated dataset search across sites.

    AggregatedDCATmetadatacanalsoserveasamanifestfiletofacilitatedigitalpreservation.

    WithintheEuropeanContext,therearemorethan70officialdatacatalogues4andthereare23

    officiallyrecognized languages.In this sense,ourclaim isthatmultilingualism isan important

    featurevocabularieshaveto takeintoaccount.This reportanalyzesDCATfromamultilingual

    perspectiveandproposestwooptionsforincludingmultilingualisminagivenvocabulary.

    1http://www.w3.org/2011/gld/charter

    2http://deri.ie/

    3http://www.w3.org/2009/06/eGov/ig-charter.html

    4

    A list of official data catalogues in EU27 member states can be found at http://datacatalogs.org/group/eu-

    official

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    5/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    5

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    2DCATOverviewDCAT is an RDF vocabularywell-suited for representing data catalogs. It defines three main

    classes(seeFigure1):

    dcat:Catalog,aclassthatdefinesacuratedcollectionofmetadataaboutdatasets.

    This class is described by the following properties: title, description, date of formal

    publicationofthecatalog,mostrecentdateofmodificationofthecatalog,languageof

    thecatalog,a linkto thelicensedocumentunderwhichthecatalogismadeavailable,

    the rights underwhich the catalog can be used/reused, the spatial coverage of the

    catalog,andalinktothehomepageofthecatalog(seeFigure1).

    dcat:DataSet, is defined as a collection of data, published or curatedby a single

    agent,whichisavailableforaccessordownloadinoneormoreformats.Ascanbeseen

    in Figure1, there are numerous properties that describe this class: title, description,

    dateofformalissuanceandmostrecentdateofmodification,auniqueidentifierofthe

    dataset, a keyword or tag describing the dataset, the language of the dataset, the

    temporal period that the dataset covers, the spatial coverage of the dataset, the

    frequencyatwhichthedataset ispublished,and, finally, thelandingpage, i.e.,aWeb

    page that can be navigated to in aWeb browser to gain access to the dataset, its

    distributionsand/oradditionalinformation.

    dcat:Distribution,aclasswhichconnectsadatasettoitsavailabledistributions.

    Thelatteraredefinedbypropertiessuchastitle,description,dateofformalpublication

    andmost recentdateofmodification,linksto thelicensedocument underwhich the

    distributionismadeavailable,aURLthatgivesaccesstoadistributionofthedataset,a

    directlinktoadownloadablefileinagivenformat,mediatypeofthedistribution,the

    fileformatofthedistribution,andthesizeofadistributioninbytes.

    Moreover,throughthedcat:themeproperty,DCATreliesontheclassskos:Conceptfor

    classifying its datasets according to a set of domains or categories, which in its turn are

    categorizedororganizedinataxonomyusedtorepresentthemes/categoriesofdatasetsinthe

    catalog(dcat:themeTaxonomy).

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    6/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    6

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    Figure1.DCATmainclasses(Maalietal.,2013)

    Therestofthedocumentisorganizedasfollows.Insection3weprovideananalysisofsome

    datacatalogswhichareannotatedwiththeDCATvocabulary,andwhichcontaindatainseveral

    languages.Theanalysisprovidesaninterestinginsightintotheactualuseofthevocabularyand

    highlights some issues related with the use of the vocabulary inmultilingual settings or by

    publisherswhosemainlanguageisnotEnglish.Theseissuesaresummarized,exemplifiedand

    discussed in section 4. Then, in section 5 we present several modeling options for the

    enrichmentofRDFvocabularies,such as theDCAT vocabulary, withmultilingual information,

    whatwould come to solve some of the problemsof its current use inmultilingual settings.

    Finally, we end the report in section6, by presenting approaches for the representationof

    culturallyinfluencedelementsinvocabularies.

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    7/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    7

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    3DCATcompliantdatacatalogs

    In order toassess thecurrentuseof theDCAT vocabulary in European public data catalogs,

    firstlyweanalyzedindetailseveralcatalogsthatmakeuseofthisvocabulary.Specifically,the

    catalogsusedinouranalysisare:

    PublicData.euEuropespublicdata5

    ThedatacatalogoftheLocalGovernmentofGijn6,inSpain.

    Gencat,thedatacatalogoftheRegionalGovernmentofCatalonia7,inSpain.

    ThedatacatalogoftheLocalGovernmentofZaragoza8,inSpain.

    Theselectionof these catalogswasmotivatedbyaprevious study carried out byDERI first

    developers and publishersof the DCAT vocabulary-which resulted in a list of data catalogs

    currently making use of this vocabulary. The list, which has been included below for

    convenience(seeTable1),wasmadeavailabletousandcontainedninecatalogsintotal.

    5http://publicdata.eu/

    6http://datos.gijon.es/

    7http://www20.gencat.cat/portal/site/dadesobertes

    8http://www.zaragoza.es/ciudad/risp/

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    8/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    8

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    Table1:DatacatalogsDCATcompliant

    Catalog Website Type

    (national,

    local,

    regional)

    dcat

    availabe

    at/as

    #datasets theme? ke ywords? publisher ex te nt

    (spatial)

    distribution

    format

    distribution

    size

    Semantic

    CKAN

    http://semantic.cka

    n.net

    aggregate SPARQL

    endpointhttp://seman

    tic.ckan.net/

    sparql

    27592 nope yes...

    representedusing

    moat:tagged

    WithTag

    nope nope yes nope

    Publicdata.

    eu

    http://publicdata.eu aggregate A list of RDF files

    http://rdf.ope

    ndatasearch.

    org/

    11655 yes, but

    each

    aggregated

    catalog has

    its own

    hierarchy

    i.e. themes

    are not

    reconciled

    yes yes (though

    uses

    dc:creator)

    yes yes nope

    Barcelona http://w20.bcn.cat/

    opendata/

    City RDF dump

    http://w20.bc

    n.cat/openda

    ta/CatalegR

    DF.aspx

    501 nope yes yes nope yes nope

    Gijn http://datos.gijon.e

    s

    City an RDF file

    per dataset

    29 nope yes (though

    wrongly

    uses

    dc:keyword)

    yes nope yes yes

    Catalonia http://dadesobertes

    .gencat.cat

    Regional RDF dump

    http://dades

    obertes.genc

    at.cat/recurs

    os/datasets/

    cataleg.rdf

    127 yes (though

    uses

    dc:subject)

    yes yes yes yes yes

    Balearic

    Islandshttp://www.caib.es/

    caibdatafront

    Regional RDF dump

    http://dades

    obertes.caib.

    es/recursos/

    datasets/cat

    alog/rdf

    28 yes (though

    uses

    dc:subject)

    nope yes (though

    wrongly

    uses

    dc:editor)

    nope yes (though

    in a wrong

    way)

    yes (in a

    wrong way

    though)

    Saragossa http://datos.zaragoz

    a.es

    City SPARQL

    endpoint at

    http://www.z

    aragoza.es/d

    atosabiertos/

    sparql

    271 yes (though

    uses

    dc:subject)

    yes yes(wrongly

    uses

    dc:publiser

    with no

    label or any

    other

    property.

    also uses

    the non-

    existing

    dc:editor )

    yes yes nope

    Fingal http://data.fingal.ie Regional by t hird

    party (DERI)

    73 yes nope yes yes yes nope

    lab.linkedd

    ata.deri.ie

    http://lab.linkeddat

    a.deri.ie/govcat

    Aggregate SPARQL

    endpoint at

    http://lab.link

    eddata.deri.i

    e/govcat/spa

    rql

    1710 yes yes yes yes yes yes

    Inthefollowing,weprovideabriefdescriptionoftheselectedcatalogsfocusingonthepurposeofthecatalog,languageoftheportal,thelanguageofthedatasets,andthesearchoptions.

    ThePublicData.euportal(Figure2)aimstoprovideaccesstoopendatasetsfromlocal,regional,

    andnationalpublicbodiesacrossEurope.Apartfrombeingasinglepointofaccesstoscattered

    datasetsinEurope,itenablesuserstobrowsedatasetsbyregion,subjectmatterandformat.

    AlthoughtheportallanguageisEnglish,wecansaythatthePublicData.eucatalogismultilingual

    in thesense that itcontains datasets originating from several European countriesandwhich

    containinformation invariousnatural languages.Thedatasets come fromUK,France, Spain,

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    9/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    9

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    Austria, Denmark, Italy, Czech Republic, The Netherlands, Germany, Russia, Sweden, and

    Ireland.Infact,anadditionalbrowseoptionoftheportalisbycountryoforigin(seeFigure3).

    Figure2.SnapshopofthePublicData.euportal

    Figure3.PublicData.eubrowseoptionbydatasetlanguage

    Thenext catalog, thecatalogof theLocalGovernmentof Gijn,a city in thenorthofSpain,

    containsaround400datasetsinSpanishmanagedbythelocalgovernment.Themainpurpose

    ofthepublicationofdatasetsinRDFistocontributetoimprovecitizenparticipation,promote

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    10/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    10

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    innovationandgivecompaniestheopportunitytocreateaddedvalue,sothatbothpeopleand

    marketplayerscanbenefit frompublic information andoffernewproductsandservices.The

    portalisavailableonlyinSpanishandonecanfilterandcustomizethesearchresultsbyusing

    theoptions:keyword,categoryandformat,asyoucanseeinFigure4.

    Figure4.BrowseoptionsintheportaloftheLocalGovernmentofGijn

    Gencatisthethirdcatalogwehavestudied(seeFigure5).Mostofthedatasetsinthisportalof

    theGovernmentofCatalonia,inSpain,areinCatalan,theofficiallanguageinthisSpanishregion

    together with the Spanish language. The portal is available in English, Spanish and Catalan.

    According to theinformation in theportal, this catalog comprisesa databaseof the26,000

    official facilitiesofCatalonia, the1,400procedures handled in the Governments officesand

    some of its multimedia archives. The browse options are by categories (called tags in the

    Englishversionoftheportal),format,anddatasources,i.e.,theorganismsthathavecreatedthe

    correspondingdatasets.

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    11/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    11

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    Figure5.SnapshotofthesearchoptionsintheEnglishversionofGencat

    ThefourthandlastcatalogthatwehaveconsideredforthisstudyisthedatacatalogoftheLocal

    GovernmentofZaragoza,alsoaSpanishregion(Figure6).TheportalisonlyavailableinSpanish,

    andtheinformationinthedatasetsitcontainsisalsoinSpanish.Asintheprecedingcatalogs,

    thepurposeofthisportalistoprovideaccesstocitizenstopublicdataaswellastofosterthe

    reuse of that information. The browser or search options are categories, type of update

    proceedingofthedataset(yearly,quarterly,monthly,daily,instantly),formatandkeywords.

    Figure6.PortalofthedatacatalogoftheLocalGovernmentofZaragoza

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    12/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    12

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    4SomeissuesrelatedwiththecurrentuseoftheDCATvocabulary:alanguageperspective

    The next step inour analysiswas to access someof the datasets contained in the different

    catalogs,availableintheRDFformatandannotatedwiththeDCATvocabulary,andlookintothe

    use they made of the DCAT classes and properties. The main conclusions of this study are

    discussedbelow.

    1. SomedatasetsarenotusingthelastversionoftheDCATvocabulary.Forexample,the

    datasetListdesIFSIenIledeFrancecontainedinthePublicData.eucatalogmakesuseof

    the properties dct:creator and foaf:name to refer to the publisher of the

    dataset,insteadofthedct:publisherpropertyandfoaf:Agentclassdefinedby

    thecurrentversionoftheDCATvocabulary.

    AsimilarexampleisfoundinthecatalogoftheLocalGovernmentofGijn.Inthecaseof

    adatasetofhostels,wefindthefoaf:Organizationclassinsteadoffoaf:Agent

    when defining the publisher of the datasets; or the dc:mediaTypeorExtent

    insteadofthedcat:mediaTypedefinedinthecurrentversionofthevocabulary.

    2. Some datasets make a free use of the DCAT vocabulary, i.e., they are not fully

    compliantwithDCAT.Bythiswemeanthattheyusepropertiesofacertainclassinthe

    descriptionofanotherclass.Forinstance,in thesamedatasetmentionedabove from

    thePublicData.eucatalog, ListdesIFSIenIledeFrance,thepropertyfoaf:homepage

    isapropertyoftheclassdcat:Dataset,i.e.,itisdescribingthedataset,whereasit

    should be a property of the class dcat:Catalog, as established by the DCAT

    vocabulary.

    3. Anotherremarkableaspectoftheanalyzeddatasetsisthattheydonotmakeuseofthe

    sameamountortypeofmetadata.Thismaybe,tosomeextent,reasonable,sinceeach

    publishermightdecidewhichelementsofthevocabularycovertheneedsofhisorher

    catalog.Mostcatalogsmakeuseofthedescriptiveinformationrelativetothedataset,

    suchas, title, description, date ofissueordate ofmodification,andalso information

    related to the distribution of the dataset (see the descriptive properties of the

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    13/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    13

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    dcat:Distribution classin section2).However,veryfewcontaininformationof

    the Catalog itself, of the Record,or of the themeand theme taxonomy used bythe

    catalogordatasetinquestion.

    4. WhenaccessingthecodeofthedatasetinRDF,werealizedthatALLcatalogsreusedthe

    DCAT vocabularyas it is,i.e.,with thelabels for classesand propertiesin English, as

    definedby theauthorsof thevocabulary.NoneofthepublisherstranslatedtheDCAT

    vocabularyitselfintoitsownlanguage,evenwhen therealdatao information in the

    datasetswasinalanguagedifferentfromEnglish.

    This isthecommonchoicewhentheontologyorvocabularyisshareableandvalidfor

    different cultures. By this wemean that a certain conceptual organization (i.e., the

    classesandpropertiesthatmakeupanontologyorvocabularyandthewayinwhich

    theyhavebeenorganized)isuniversal,inthesensethatitdoesnotsolelyreflectthe

    needs ofa certain cultureorhowa certain cultureapproachesaparticularly area of

    knowledge,butitisvalidortranslatabletoothercultures.Infact,thesetofclassesand

    propertiesproposedintheDCATvocabularyaregeneralenoughsoastobeacceptedby

    anypublisher.Obviously,itmayhappenthatsomepropertiesthatarerelevantforone

    publisher are not relevant for another one. For example, the dcat:language

    property defining thedcat:Dataset class is highly relevant for the PublicData.eu

    catalog,sinceitcontainsdatasetsinseveralnaturallanguages.Inthecaseofthecatalog

    of the Local Government of Zaragoza this is not the case, since all datasets contain

    informationinthesamelanguage,Spanish,andthisisnotapropertythatdeservesa

    specialmention.

    However,whatweobserveintheportalsanalyzedisthatthosepublishersincountries

    where English isnot thenative language have provided different translations for the

    classesor propertiesof the DCATvocabulary that are shown to the user intheweb

    page. For example, in the catalog of the Local Government of Zaragoza, one of the

    search options is by Materias (Subjects) or Temas (Topics), corresponding to the

    dcat:themeproperty,whereastheLocalGovernmentofGijnhastranslatedthisas

    Sectores (Sectors) o Sectores temticos (Domain areas). In the case of Gencat, the

    catalogoftheGovernmentof Catalonia,theseareCategories (categories),and inthePublicData.eu,theyaredubbedGroups.

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    14/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    14

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    It could be said thatmateria, tema, sector, sector temtico, categoria or group are

    synonymsortermvariantsthatcanbeassociatedtothesameconcept.Onecouldalso

    argue that each publisher can translate the names of thesemetadata as he or she

    wishes,as long as thetype of information provided by thedifferentmetadata is the

    appropriate one. And we agree with this. Nevertheless, when searching different

    catalogsinthesamelanguage,letus saySpanish,onewouldexpectto findthesame

    information labeled in the same way. We believe this would avoid confusion and

    simplifythesearch.

    Forthisreason,andalsofortaskssuchastheautomaticgenerationofwebpagesfrom

    ontologiesorvocabularies,wewouldbeinfavorofproposingofficialtranslationsof

    theDCATvocabulary,whichcouldbeenrichedwithasmanyvariantsaswishedinorder

    tocovertheneedsofallpublishers(seesection5formoredetailsonthis).

    5. ThelastissuewhichwecameacrossregardingtheuseofDCATbydifferentpublishersin

    Europeisthatthecategorizationtheymakeofthedatasetsisalsodifferent.Theauthors

    oftheDCATdonotprescribethetopicsorcategoriesschemathatshouldbe followed

    whenusingthisvocabulary.Theyonlydetermine that thepropertydcat:themebe

    linked to a skos:Concept, which in its turn be included in a

    skos:ConceptScheme. Because of this, each publisher has adopted a different

    categorizationortaxonomyofcategoriestoclassifydatasets.Wehadthechancetoask

    someofthepublishersandtheysaidthattheyhadsimplyadoptedthecategorization

    thatthedifferentpublicbodiesalreadymadeintheirrespectivewebpages.

    Bywayofexamplewe includebelow theclassification followed in thePublicData.eu

    portal(Figure7),andthetaxonomyusedintheGencatportalfromtheGovernmentof

    Catalonia(seeTable2).

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    15/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    15

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    Figure7.ClassificationofdatasetsonthePublicData.euportal

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    16/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    16

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    Table2.TaxonomyusedintheGencatportalfromtheGovernmentofCatalonia

    Theme Includes Sector ID

    Administraci pblica Oposicions, Publicacions, Recerca, estudis i anlisis, Altres administracio

    Agricultura,

    ramaderia i pescaAgricultura, Foment de la producci, Infraestructures rurals, Pesca.

    Aqicultura, Recerca, estudis i anlisis

    agr-ram-pesca

    Associacionisme i

    participaciCivisme, Cooperaci al desenvolupament, Entitats, Equipaments,

    Participaci ciutadana, Pau i drets humans, Voluntariat

    participacio

    Comer i consum Consum comerc

    EconomiaAssegurances, Defensa competncia, Deutes i sancions, Pagaments,

    Tributs

    economia

    Educaci i formaci

    AMPA, Educaci en el lleure, Educaci infantil, Educaci

    secundria obligatria, Estudiar a l'estranger, Formaci adults,

    Formaci per a docents, Formaci professional, Idiomes, Material

    didctic, Mobilitat educativa, Preinscripci i matriculaci, Provesd'accs, Suport a l'alumnat, Altres, Oposicions

    educacio

    CulturaArts escniques, Arts visuals, Arxius i biblioteques, Cinema i

    audiovisuals, Cultura popular, Lletres, Memria histrica, Museus,

    Msica, Patrimoni, Recerca, estudis i anlisis, Altres

    cultura

    Esports, lleure i ociCaa, Esports, Jocs i espectacles, Nutica i busseig, Pesca, Proves

    esportives, Vacances i estades

    esports-oci

    Indstria i energia Estalvi energticindustria-

    energia

    Habitatge Compra, Lloguer, Protecci oficial, Rehabilitaci, Altres habitatge

    JustciaAcadmies, Associacions, Col

    legis professionals, Federacions,

    Fundacions, Gestors administratius, Oposicions, Recerca, estudis i

    anlisis

    justicia

    Llengua i

    comunicaciOccit,Llenguacatalana,Mitjansdecomunicaci comunicacio

    Medi AmbientAigua, Ecologia, Estalvi energtic, Flora i fauna, Sostenibilitat,

    Altres

    medi-ambient

    Mobilitat i transport Trnsit, Transports transport

    SalutAssistncia sanitria, Foment salut, Higiene, Recerca, estudis i

    anlisis

    salut

    Serveis Socials Cohesi social, Dependncia, Discapacitat, Altres serveis-socials

    Societat, ciutadania i

    famlies

    Adopci i acolliment, Afers exteriors, Afers religiosos, Dones, Gais,

    lesbianes i transsexuals, Gent gran, Igualtat, Immigraci, Infants i

    famlies, Joves, Recerca, estudis i anlisis, Altres

    societat

    Tecnologia. Recerca i

    InnovaciFoment, Noves tecnologies, Recerca, Societat de la Informaci,

    Telecomunicacions, Innovaci

    tecnologia

    Territori i paisatge.

    UrbanismeBoscos, Costes, Paisatge, Ports

    territori

    TreballBorsa de treball, Cooperatives, Emprenedoria, Formaci, Igualtat

    d'oportunitats, Ocupaci, Oposicions, Relacions laborals, Seguretat i

    salut laboral

    treball

    Turisme Establiments turstics, Foment turisme, Altres turisme

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    17/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    17

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    Theme Includes Sector ID

    UniversitatsFormaci en empreses, Mobilitat educativa, Postgraus, doctorats i

    msters, Preinscripci i matriculaci, Proves d'Accs a la

    Universitat, Recerca, Altres

    universitats

    IntheGencattaxonomy, thefirst rowidentifiesthemain topics;thesecond, thesub-

    topicsrelatedtothemainones,andthethirdrow,thetermusedtoidentifythetopics

    in the web page. At a first sight, one realizes that the categorizations of the

    PublicData.euportalandtheonesfromGencatdonotmatchincoverageorgranularity.

    Inthecaseofcategorizationsofdatasets,weobservesomegeneralconceptsortopics

    whicharepresentinanycategorization,whereasothers,maybemoreculturallybound,

    areonlypresentincertaincategorizations.InthecaseofPublicData.eu,theypropose14

    groups, such as Agriculture, Fisheries and Forestry or Economy and Industry,

    whereas the Gencat catalog proposes 22 broad categories. When comparing the

    PublicData.eucatalog to theGencatone, we seethat there is some overlap. Inboth

    catalogswe find datasets groupedunder thecategories ofAgriculture, Fisheries and

    Forestry Agricultura, ramaderia i pesca; Culture and Arts Cultura;

    Environment Medi Ambient; or Health Salut. In other cases, the

    PublicData.eucatalogmerges relatedcategories, and theGencat catalog keeps them

    apart.ItisthecaseofEducationandCommunicationinthePublicData.eucatalogvs.

    Universitats(Universities),Tecnologia,recercaiinnovaci(Technology,researchand

    innovation), Llengua i comunicaci (Language and communication) and Esports,

    lleureioci(Sports,freetimeandhobbies).Inthissense,wecansaythatthetaxonomy

    of categories usedby theGencatcatalog is farmore specific than thePublic.Data.eu

    catalog.

    DuetothefactthattheGencatcataloghasbeencreatedbytheGovernmentofaregion,

    in this case, Catalonia, it contains categories that comply with the purpose of the

    catalog,suchasAdministracipblica(Publicadministration).Alsohighlyrelatedwith

    thedistinctivefeaturesofthisregion,wefindthecategoryofLlenguaicomunicaci

    (Language and communication), which contains the sub-categories Occit, Llengua

    catalana, Mitjans de comunicaci (Occitane, Catalan language, Media). Such

    distinctionsmayrepresentanissueforportalssuchasthePublicData.euone,whichaim

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    18/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    18

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    at providing a unified access to datasets across Europe, because they will have to

    previously analyze the categorizations made by the different portals to aggregate

    datasetsundermeaningfulcategories.Sincenocategorizationortaxonomyisprescribed

    bytheDCATvocabulary,afteranalyzingseveralcategorizationsofthissort,atentative

    categorizationcouldbeproposed,whichcouldbeextendedoradaptedtocoverspecific

    culturalneeds.Forthispurpose,severalrepresentationalapproachescouldbefollowed.

    Weprovideasummaryofthisinsection6.

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    19/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    19

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    5Enriching RDF vocabularies with multilingualinformation

    Asmentionedinpoint4of theabovesection,withtheaimofenhancingtheuseof theDCAT

    vocabularyataninternationallevel,itwouldberecommendabletoprovidetranslationsofthe

    labelsthatdescribetheDCATclassesandpropertiestolanguagesotherthanEnglish.Someof

    theadvantagesof havingmultilingual versions of this vocabularywouldbe thatpublishers in

    countrieswhereEnglishisnottheofficiallanguagecouldmakeuseofthesedescriptionsintheir

    ownlanguage,andtheycouldalsodirectlyreusethesetermsorlabelsin theirportalsorfinal

    applications. This would also result in all portals making use of the same terms or labels,

    contributinginthiswaytointeroperability.

    TheideaofenrichingontologiesandRDFvocabularieswithmultilinguallinguisticinformationis

    notnewandhasbeentheobjectofresearchandstudyforadecadenow.Tothebestofour

    knowledge, some of the first approaches toenrichontologieswith linguisticdescriptions are

    LingInfo (Buitelaar et al., 2006), LexOnto (Cimiano et al. 2007), LIR-Linguistic Information

    Repository(Petersetal.,2007;Montiel-Ponsodaetal.,2010)orLexInfo(Buitelaaretal.,2009;

    Cimianoetal.,2010).Thesemodelsmainlydifferinthetypeoflinguisticdescriptionstheyaimat

    accountingfor.Forinstance,whereastheLingInfomodelfocusedontherepresentationofthe

    morphologicalandsyntacticstructuresofthoselabelsortermsdescribingontologyclassesand

    properties, the LIR model focused on the representation of term variants and translations.

    Currently, researchers in this domain have joined forces and are working towards the

    standardization ofamodel thatwill intend tocapturea wide rangeof linguisticdescriptions

    relative to ontologies or RDF vocabularies. We are referring to the W3C Ontology-Lexica

    Community Group9. This standardization initiative has taken the lemon (LExicon Model for

    ONtologies)model(McCraeetal.,2011;http://lemon-model.net/)asbasisforitswork,anditis

    evolving itintoamodelwhich,incombinationwiththesemanticinformationcapturedin the

    ontology, isaimedat improving theperformanceofNLP (Natural Language Processing) tools,

    amongst other objectives. See Figure 8 to gain an impression of the kind of linguistic

    descriptionsthatcanbeassociatedtoontologyelementsinthe lemonmodel.

    9http://www.w3.org/community/ontolex/

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    20/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    20

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    Figure8.Overviewofthelemonmodel

    Asforthe specific caseof the DCATvocabulary, amodel such as lemon would allow for the

    inclusionoftermvariantsindifferentlanguagesfortheclassesandpropertiesofthevocabulary.

    ComingbacktothepreviouslymentionedexampleoftheseveraltranslationsinSpanishofthe

    dcat:themeproperty(materia,tema,sector,sectortemtico,categoriaorgroup),theycould

    allbeaccountedforasvariantsorlinguisticrealizationsoftheproperty dcat:theme. Fora

    propertysuchasdct:modified,wecouldhavemorereadabletermsorlabels(lastupdate,

    changedate,fechademodificacin,fechadeactualizacin,nderungsdatum,Datumderletzten

    Aktualisierung,etc.),whichcouldthenbeusedfortheautomaticgenerationofwebpages.

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    21/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    21

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    6Approachesfortherepresentationofculturallyinfluencedelementsinontologies

    Closelyrelatedwiththeapproachesproposedto enrichontologiesandRDFvocabularieswith

    multilingual linguistic information is the issue of capturing culturally-bounded classes and

    propertiesinontologies.Asmentionedinissuenumber5,section3,theDCATvocabularydoes

    notprescribeanycategorizationortaxonomyofcategoriesorthemesintowhichdatasetscan

    beclassified.Inthecatalogsanalyzedwefoundoutthatthecategorizationsofdatasetsshowed

    some differences, mainlymotivated by the idiosyncrasyof the catalogs themselves,and the

    cultureandlanguageinwhichtheyhadbeendeveloped.Inthissense,itwouldbeadvisableto

    proposeataxonomyandanalyzewhichapproachisthemostsuitabletomeettheneedsofmost

    (ifnotall)publishers.

    Taking into account previous work on ontology localization (Montiel-Ponsoda et al. 2010,

    Cimianoetal.2010),weenvisiontwopossibilities:

    1. Tomapthedifferentcategorizationsbymeansofamappingmodel

    2. Tomaintainonecategorizationandtorepresentculturalissuesinanexternallinguistic

    modelorasspecificlanguagemodulesorextensionsintheontology

    Thefirstapproachallowsforeachpublishermaintainingitsowncategorization,andallofthem

    beingmappedorlinkedtoacentralcategorization(seeFigure9fromMontiel-Ponsoda,2011).

    However,themappingestablishmentmaybeatoughtask,andsomescalabilityissuesmayalso

    appearasmoreandmoredatasetsusetheDCATvocabulary.

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    22/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    22

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    Figure9.Mappingmodel

    Asforthesecondoption,Figure10,onecategorizationwouldbesharedbyallpublishers,andin

    case of cultural issues, these could be kept in the linguistic model, or, if needed, specific

    culturalmodulescouldbeproposedtoextendtheoriginalcategorization.Themainadvantage

    ofthislatterapproachisthatitcontributestointeroperability,butwithoutforgettingculturally

    boundissues.InthecaseoftheDCATvocabulary,wewouldbeinfavorofthislatteroption.

    Figure10.Vocabularylinkedtoanexternalmodel

    Again,thelemonmodeldescribedinsection4(orthemodelthatwillresultfromtheW3COnto-

    LexicaCommunityGroup)wouldcometosolvethemodellingissuesinvolvedinthislattermodel.

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    23/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    23

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    7Conclusionsandrecommendations

    The number of data catalogs in Europe is increasing. Lately, there is a trend in public

    administrations (regional, local, national and European) to public government data in data

    catalogs.DCAT, a vocabulary for representing metadata of data catalogs, is being developed

    withintheGovernmentLinkedDataW3CWorkingGroup.ThankstoDCAT,publishersincrease

    discoverabilityandenableapplicationstoeasilyconsumemetadatafrommultiplecatalogs.

    Inthisreportwehavepresented:

    anoverviewofDCAT;

    asetofdataportalsthatrelyonDCATfordescribingtheirmetadata;

    ananalysisonhowtheseveralportalsactuallyuseDCAT,whichclassesandproperties

    theyuse,and,inparticular,howtheyrepresentthethemesoftheircatalog;

    the possible options of enriching DCAT with multilingual information, to be able to

    representdatacataloguesindifferentlanguages

    Ourmainrecommendationistoconsiderthemultilingualismaspectinanyvocabulary,since,on

    theonehand,itmaycontributetoitsglobaladoption,and,ontheother,itmayalsoaddto

    interoperability.To this respect wehave proposed lemon, amodel for therepresentationof

    linguisticinformationrelativetoanontologyorRDFvocabularythatiscurrentlybeingreviewed

    forstandardizationpurposes.

    Ideally,multilingualismshouldbeconsideredasearlyaspossible,sothatspecificitiesofcertain

    languages could be approachedas soon aspossible. This would also allow fora prescriptive

    approach,inwhichpublishersaresaidwhichlabels touseineachcase.However,theprocess

    rarely follows this order. As vocabularies gain popularity, their adoption increases and

    multilingualneedsappeartosupportinteroperability.Infact,widespreadadoptioncomesfirst,

    and,then,onerealizesthebenefitsofthemultilingualaspect.Forthesereasons,modelssuchas

    lemon allow to maintain the model or vocabulary as it is, and enrich it withmultilingual

    informationatanystageoftheprocess.InthespecificcaseoftheDCATvocabulary,andtaken

    intoaccount itsgeneraladoption,thenextstepwould involveananalysisofthecatalogsand

    portals that implement it to identify the labels used by the various publishers in different

    languages. All those labels, or preferably, the ones that better express the meaning of the

    vocabularytermsshouldbecapturedinthelinguisticmodelandrecognizedaspreferredlabels

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    24/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    24

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    ineachlanguage.Thebenefitofthisapproachisthatthemodelwouldtakeadvantageoflabels

    (variantsortranslations)thatarepopularandacceptedbypublishers,andwouldnotimpose

    theuseofsomelabelsthatmayendupnotbeingmeaningfulforusers.Themodelwouldalso

    leavethedooropenfornewlinguisticneedswithoutinterferingwiththeoriginalvocabulary.

    Moreover, we believe that it should be made following a conciliatory approach in which

    different options are welcomed and integrated, and in which different communities can

    participate in proposing termsand translations in their own languages, thus building it in a

    cooperative way. All in all, the enrichment of the vocabulary with multilingual linguistic

    information would contribute to a wider adoption and increased understanding and

    interoperability.

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    25/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    25

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    AbouttheAuthors

    ElenaMontielPonsoda isLecturerattheUniversidadPolitcnicadeMadrid,inMadrid,Spain,

    andmemberoftheOntologyEngineeringGroupatthesameuniversity.ShereceivedherM.A.in

    ConferenceInterpretingandTranslation(September2000)byUniversidaddeAlicante,herB.A.

    inTechnicalInterpreting(February2003)byHochschuleMagdeburg-Stendal,Germany,andher

    PhDonAppliedLinguistics (January 2011) byUniversidadPolitcnica deMadrid.Hercurrent

    research activities include, among others: Terminology and Translation in the field of

    InformationTechnologyandNaturalLanguageProcessing(NLP),inwhichshehasparticipatedin

    differentinternationalprojectsconcerningterminology,ontologiesandmultilingualismandits

    applicationto theSemanticWeb.She haspublished thebook"Multilingualism inOntologies.

    BuildingPatternsandRepresentationModels",andnumerouspapersinjournals,conferences

    andworkshopsintheareasofAppliedLinguistics,SemanticWeb,andNLP.

    BorisVillazn-TerrazasisLinkedDataResearcherManageratiSOCO.HeholdsaPhDinArtificial

    IntelligencefromUniversidadPolitcnicadeMadrid.HehaspreviouslyworkedasPost-Docat

    theOntology EngineeringGroup.Beforehewasa researcherand software developer at the

    ResearchInstituteofInformaticsattheUniversidadCatlicaBolivianaSanPablo.Hisresearch

    interestsarefocusedonLinkedData,SemanticWebandOntologyEngineering,amongothers.

    Hehasparticipated inseveralEuropean researchprojects suchasKnowledgeWeb,OntoGrid,

    SEEMP,NeOn,SemsorGrid4Env,PlanetData,andParlance,aswellasinnationalprojectssuchas

    Reimdoc, Servicios Semnticos, Plata, Gis4Gov, WebN+1, Buscamedia and Ciudad2020.

    Moreover, he was leading the Spanish Linked Data initiatives, such as GeoLinkedData,

    datos.bne.es,AEMETLinkedData,andElViajero.Finally,hehaspublishedmorethan40papers

    in journals, conferences and workshops, and currently he is actively participating in the

    RDB2RDF,andGovernmentLinkedDataW3CWorkingGroups.

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    26/27

    ePSIplatform Topic Report No. 2013 / 09 , September 2013

    26

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    References

    1 Maali,F.&Cyganiak,R.&Peristeras,V.(2010). EnablingInteroperabilityofGovernment

    DataCatalogues.ElectronicGovernment10thInternationalConference

    2 Maali,F.&Erickson,J.&Archer,P.(2013). DataCatalogVocabulary(DCAT),W3CLastCallWorkingDraft.

    3 Buitelaar,P.,Declerck,T.,Frank,A.,Kiesel,M.,Sintek,M.,Romanelli,M.,Sonntag,D.,Loos,B.,Micelli,V.,&Porzel,R.(2006).LingInfo:DesignandApplicationsof aModel for the

    IntegrationofLinguisticInformationinOntologies .InProceedingsofOntolex2006.

    4 Cimiano,P.,Haase,P.,Herold,M.,Mantel,M.,andBuitelaar,P.(2007).LexOnto:AModelforOntologyLexiconsforOntology-basedNLP.InProceedingsoftheOntoLex07Workshop

    attheISWC07.

    5 Peters, W., Montiel-Ponsoda, E., Aguado de Cea, G., and Gmez-Prez, A. (2007).Localizingontologies inOWL.InFromtexttoknowledge,thelexicon/ontologyinterface,

    proceedingsoftheOntolex07workshop.Busan,SouthCorea.

    6 Montiel-Ponsoda,E.,AguadodeCea,G.,Gmez-Prez,A.,andPeters,W.(2010).EnrichingOntologieswithMultilingualInformation.JournalofNaturalLanguageEngineering,17(3),

    283-309.

    7 Buitelaar,P.,Cimiano,P.,Haase,P.,andSintek,M.(2009). Towardslinguisticallygroundedontologies.InProceedingsofthe6thEuropeanSemanticWebConference(ESWC09),111-

    125.

    8 Cimiano,P.,Montiel-Ponsoda,E.,Buitelaar,P.,Espinoza,M.,andGmez-Prez,A.(2010). Anoteonontologylocalization .JournalofAppliedOntology,5(2),127-137.

    9 McCrae, J., Aguado-de-Cea, G., Buitelaar, P., Cimiano, P., Declerck, T.,Gomz-Prez, A.,Gracia, J., Hollink, L., Montiel-Ponsoda, E., Spohr, D., Wunner, T. (2011). Interchanging

    lexical resourceson theSemanticWeb.En LanguageResourcesandEvaluation,46,701-719.

    10 Montiel-Ponsoda, E. (2011). Multilingualism in Ontologies. Building Patterns andRepresentationModels.LAP-LambertAcademicPublishing.

    .

  • 7/27/2019 Impact of Standards in European Open Data Catalogues. A Multilingual perspective of DCAT

    27/27

    ImpactofStandardsinEuropeanOpenDataCatalogues.AMultilingual

    perspectiveofDCAT

    Copyrightinformation

    2013 European PSI Platform This document and all material

    thereinhasbeencompiledwithgreatcare.However,theauthor,editorand/orpublisherand/or

    anypartywithin theEuropeanPSIPlatformor itspredecessor projects theePSIplusNetwork

    projectorePSINetconsortiumcannotbeheldliableinanywayfortheconsequencesofusing

    the contentof this document and/or any material referenced therein. This report has been

    publishedundertheauspicesoftheEuropeanPublicSectorinformationPlatform.

    The reportmay be reproduced providing acknowledgement ismade to the European Public

    SectorInformation(PSI)Platform.