open data trentino - seminar at universidad simon bolivar - 15th october 2013
DESCRIPTION
Seminar on Open Data at Universidad Simon Bolivar presented by Lorenzino Vaccari. Authors: Juan Pane, Lorenzino Vaccari. Contributions (CC-BY) from Maurizio Napolitano: Slides 7,8, 55,56,57 and from 61 to 69 Five parts: 1. Open Data: introduction 2. Open Data: Issues 3. Open Data in Trentino Project 4. Open data: Applications 5. Open Data: Semantic IssuesTRANSCRIPT
11/04/231 Lorenzino Vaccari, Juan Panehttp://dati.trentino.it
Open Government DataSeminar @USB*
*This presentation is taken from the “Open Government Data Tutorial” presented at CLEI2013
Lorenzino Vaccari1, Juan Pane2
1Autonomous Province of Trento, Trento, Italy [email protected]
2University of Trento, Trento, Italy – Universidad Nacional de Asuncion, Asuncion, Paraguay [email protected]
11/04/23 Lorenzino Vaccari, Juan Pane2
Goal of the Seminar• Introduce Open Government Data
• Intro, Issues (Part 1)
• If you need it, how can you organize it?• Real experience (Part 2)
• Methods for opening data• Applications (Part 3)• Semantic Issues (Part 4)
11/04/23 Lorenzino Vaccari, Juan Pane3 15/10/2013Juan Pane, Lorenzino Vaccari3http://www.point-fort.com/index.php?2012/01/25/805-why-how-what
http://www.point-fort.com/index.php?2012/01/25/805-why-how-what
11/04/23 Lorenzino Vaccari, Juan Pane4
What?
“is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and
sharealike.” *
*(Source: )
http://www.opendefinition.org
11/04/23 Lorenzino Vaccari, Juan Pane5
usereuse
“open” = redistributioncommercial reusederivative works
BUT, may require:- attribution- share alike
http://myfbcovers.com/uploads/covers/2012/06/09/16628a1094aa012f7c6e0025902480d2/watermarked_cover.jpg
J. Gray (OKF): http://www.slideshare.net/jwyg/open-government-data-what-why-how
11/04/23 Lorenzino Vaccari, Juan Pane6
The value is in its use
11/04/23 Lorenzino Vaccari, Juan Pane7 Maurizio Napolitano: http://www.youtube.com/watch?v=YlkjrVAW43Q
11/04/23 Lorenzino Vaccari, Juan Pane8
Is open data useful?
Maurizio Napolitano: http://www.youtube.com/watch?v=YlkjrVAW43Q
11/04/23 Lorenzino Vaccari, Juan Pane9
Open Data Benefits The Open data are the knowledge base to:
Improve the economic grow and the entrepreneurship based on the development of digital services reusing Public Sector Information
Answer to social needs through the publication of innovative services and applications
Aims at reducing the cost of the public administrative activities within Public – Private Partnerships (PPP)
Improve the transparency of the activities of the public institutions and the participation of the citizens to these activities
11/04/23 Lorenzino Vaccari, Juan Pane10
Principles
Tim Berners-Lee (5-Stars of Linked Open Data)Vs.Tim Davis (5-Stars of Open Data Engagement)
http://5stardata.info/
http://www.timdavies.org.uk/2012/01/21/5-stars-of-open-data-engagement/
11/04/23 Lorenzino Vaccari, Juan Pane11
5 Starts Linked Open DataTim Berners-Lee
http://5stardata.info
11/04/23 Lorenzino Vaccari, Juan Pane12
5-Stars of Open Data Engagement
* Be demand driven * * Provide context * * * Support conversation * * * * Build capacity & skills* * * * * Collaborate with the community
Tim Davis
http://www.timdavies.org.uk/2012/01/21/5-stars-of-open-data-engagement/
11/04/23 Lorenzino Vaccari, Juan Pane13
Create Communityhttp://msnbcmedia.msn.com/j/MSNBC/Components/Photo/_new/pb-121007-spain-tarragona-pyramid-nj-02.photoblog900.jpg
11/04/23 Lorenzino Vaccari, Juan Pane14
Open Government Data
11/04/23 Lorenzino Vaccari, Juan Pane15
State of the ArtWhat is happening around us?-Globally-Europe-Latin America
11/04/23 Lorenzino Vaccari, Juan Pane16
Open Data Charter - G8The principles are:Open Data by DefaultQuality and QuantityUseable by AllReleasing Data for Improved GovernanceReleasing Data for Innovation
http://opensource.com/government/13/7/open-data-charter-g8
https://www.gov.uk/government/publications/open-data-charter/g8-open-data-charter-and-technical-annex
11/04/23 Lorenzino Vaccari, Juan Pane17http://opensource.com/government/13/7/open-data-charter-g8
http://census.okfn.org/
OGD around the world
11/04/23 Lorenzino Vaccari, Juan Pane18http://opensource.com/government/13/7/open-data-charter-g8
http://census.okfn.org/country/
11/04/23 Lorenzino Vaccari, Juan Pane19
OGD in Europe
http://open-data.europa.eu/
11/04/23 Lorenzino Vaccari, Juan Pane20
OGD in Europescreenshots
http://epsiplatform.eu/content/european-psi-scoreboard
11/04/23 Lorenzino Vaccari, Juan Pane21
OGD in EuropeInsert table
http://epsiplatform.eu/content/european-psi-scoreboard http://epsiplatform.eu/content/psi-scoreboard-indicator-list
11/04/23 Lorenzino Vaccari, Juan Pane22
OGD in Italy
http://www.dati.gov.it/content/infografica
11/04/23 Lorenzino Vaccari, Juan Pane23
OGD in Latin America*
*In Venezuela some OD projects have been started by the USB
11/04/2324 Lorenzino Vaccari, Juan Pane
Questions?
OGD: Part 2 - Issues
11/04/2325 Lorenzino Vaccari, Juan Pane 08/10/2013Juan Pane, Lorenzino Vaccari25http://evian-thesource.com/kids-having-fun/http://evian-thesource.com/kids-having-fun/
11/04/2326 Lorenzino Vaccari, Juan Pane
Open Data. Oh ohh
08/10/2013Juan Pane, Lorenzino Vaccari26
LegalLegalOrganizationalOrganizational TechnicalTechnicalAdoptionAdoptionBarriersBarriers
ContextualContextual
http://www.wallpapermania.eu/wallpaper/trick-or-treat-cute-pumpkins-lanterns-halloween-wallpaper
11/04/23 Lorenzino Vaccari, Juan Pane27http://de.straba.us/wp-content/uploads/2012/08/barrieres_for_implementation_of_ogd.png
11/04/23 Lorenzino Vaccari, Juan Pane28
Organizational Barriers
Not readyLack of resources
ITHuman
Don’t want to be ready
http://montcomediation.org/images/MCMC_MyWayYourWay.jpg
11/04/23 Lorenzino Vaccari, Juan Pane29
Legal barriersOpen the Data
All the data that was produced using public money has to be made publicly available (with exceptions)
vs PrivacyYou cannot open data that could allow
correlation of private personal data
Or the complete lack of legislation!
11/04/23 Lorenzino Vaccari, Juan Pane30
Adoption barriersData is not contextualizedPeople are not informedOpening data is a complex task, opening
cleaned data is even more complex.Unclear licenses
11/04/23 Lorenzino Vaccari, Juan Pane31
Technical BarriersAccess to data:
OrganizationalTechnical, Downtimes, logins, Payment fees
Fragmentation, incomplete data, scattered
FormatCataloging, indexing, searchLack of explicit semantics, metadataData is not reliableConflicting standards, models,
ontologies
11/04/23 Lorenzino Vaccari, Juan Pane32
BarriersZuiderwijk et al 2010
Listed 118 socio-technical impediments for opening data in the literature.FindabilityUsabilityUnderstandablityQualityLinkingComparability and compatibilityMetadata….
http://www.ejeg.com/issue/download.html?idArticle=255
11/04/23 Lorenzino Vaccari, Juan Pane33
Context Barriers
Privileged access to dataOther companies what to avoid legislation
of privacy.Transparency is bad for fraudulent business
http://img.gawkerassets.com/img/182n8vzdlg1iojpg/original.jpg
11/04/23 Lorenzino Vaccari, Juan Pane34http://netdna.webdesignerdepot.com/uploads/photo_manipulation/manipulation-9.jpg
11/04/2335 Lorenzino Vaccari, Juan Pane
Preguntas?
Part 3 - Real Experience
11/04/23 Lorenzino Vaccari, Juan Pane36http://goo.gl/T2Xp80
11/04/23 Lorenzino Vaccari, Juan Pane37
The “Open Data in Trentino” project
• The “Open Data in Trentino” project is a 3 years initiative finalized to develop an open data infrastructure to enhance Service Innovation for Trentino following the PAT strategy for services innovation enabled by ICT. The project will be developed within a partnership between Trento RISE and the Autonomous Province of Trento (PAT) according to the innovation PAT model
• Goals• Improved quality of life for citizens• Open Data and local businesses• Transparency• Improved efficiency and productivity
11/04/23 Lorenzino Vaccari, Juan Pane38
Workplan - Steps
11/04/23 Lorenzino Vaccari, Juan Pane39
Nome (Acronimo) Descrizione
Tipo di Dato Estensione del file
Comma Separated Value (CSV) Formato testuale per l'interscambio testuale di tabelle, le cui righe corrispondono a linee e i cui valori delle singole colonne sono separati da una virgola (o punto e virgola)
Dato tabellare .csv
Geographic Markup Language (GML) Formato XML utile allo scambio di dati territoriali di tipo vettoriale
Dato geografico vettoriale
.gml
Keyhole Markup Language (KML) Formato basato su XML creato per gestire dati territoriali in tre dimensioni nei programmi Google Earth, Google Maps
Dato geografico vettoriale
.kml
Open Document Format (ODF) Formato per l'archiviazione e lo scambio di documenti di testo, fogli di calcolo, diagrammi e presentazioni
Dato tabellare .odc
Resource Description Framework (RDF) Basato su XML, e' lo strumento base proposto da World Wide Web Consortium (W3C) per la codifica, lo scambio e il riutilizzo di metadati strutturati e consente l'interoperabilità tra applicazioni che si scambiano informazioni sul Web
Dato strutturato .rdf
ESRI Shapefile (SHP) Lo Shapefile ESRI è un popolare formato vettoriale per sistemi informativi geografici. Il dato geografico viene distribuito normalmente attraverso tre o quattro files (se indicato il sistema di riferimento delle coordinate). Il formato è stato rilasciato da ESRI come formato (quasi) aperto
Dato geografico vettoriale
.shp, .shx, .dbf,
.prj
Extensible Markup Language (XML) E' un formato di markup, ovvero basato su un meccanismo che consente di definire e controllare il significato degli elementi contenuti in un documento o in un testo attraverso delle etichette (markup)
Dato strutturato .xml
11/04/23 Lorenzino Vaccari, Juan Pane40
…MeteoMeteo GeoDatiGeoDati StatisticaStatistica Comune
TrentoComuneTrento TrasportiTrasporti Etc…Etc……
Tecnological platform
11/04/23 Lorenzino Vaccari, Juan Pane41
Catalog
The Open Knowledge Foundation (OKF) is a non-profit organisation founded in 2004 and dedicated to promoting open data and open content in all their forms – including government data, publicly funded research and public domain cultural content.
(2004)
http://okfn.org
11/04/23 Lorenzino Vaccari, Juan Pane42
http://dati.trentino.it*
Analysis: http://dati.trentino.it/stats Admin: http://dati.trentino.it/admin Harvesting: http://dati.trentino.it/harvest
* Available for all the data providers of Trentino
11/04/23 Lorenzino Vaccari, Juan Pane43
Services
11/04/23 Lorenzino Vaccari, Juan Pane44
Also Trentino is going to launch a challenge to build software applications and creative products (multimedia, audiovisual products, posters, illustrations) based on the datasets published on the http://dati.trentino.it open data catalog.
#ODTChallenge will be the official hashtag for our first open data challenge in Trentino!
11/04/23 Lorenzino Vaccari, Juan Pane45
11/04/23 Lorenzino Vaccari, Juan Pane46
7 months until now68.555 visits 7.988 unique visits2.516 downloads
37,36% returning visitors
62,64% new visitors
NOW- ALL the departmnets demand to be involved- Plus other local actors
AgricultureCultureGeographical DataWelfareWeather ForecastSocial policiesStatisticsTransports…MUNICIPALITY OF TRENTO, and
INFORMATICA TRENTINA
567 datasetsprovided by 10 departments of PAT…
20 reporting errors15 asking for new data10 new suggestions6 OD Applications
100% ENTHUSIASTIC REACTIONS
11/04/23 Lorenzino Vaccari, Juan Pane47
Want to Know & Learn more?
11/04/23 Lorenzino Vaccari, Juan Pane48http://www.theodi.org/
11/04/23 Lorenzino Vaccari, Juan Pane49http://schoolofdata.org/
11/04/23 Lorenzino Vaccari, Juan Pane50http://opendatahandbook.org/pt_BR/
11/04/23 Lorenzino Vaccari, Juan Pane51 http://www.od4d.org/category/open-data/how-to/
11/04/23 Lorenzino Vaccari, Juan Pane52http://schoolofdata.org/online-resources/
11/04/23 Lorenzino Vaccari, Juan Pane53
Thanks to the project team !!!!• General Manager: Isabella Bressan
• Project coordinator: Lorenzino Vaccari• Organizational/Communication issues: Francesca Gleria,
Roberto Cibin • Data gatherer: Luca Paolazzi • Catalog: Maurizio Napolitano, Samuele Santi• Semantics: Juan Pane, David Leoni, Alberto Zanella• Legal issues: Eleonora Bassi, Stefano Leucci• Communities: Maurizio Napolitano, Francesca De Chiara• System integration: Marco Combetto, Lorenzo Dallapè• Statistical Linked Data: Pavel Shvaiko
11/04/2354 Lorenzino Vaccari, Juan Pane
Questions?
OGD: Part 4 - Applications
11/04/23 Lorenzino Vaccari, Juan Pane55
Apps4Italy
11/04/23 Lorenzino Vaccari, Juan Pane56
Best Application: http://parlamento17.openpolis.it/
11/04/23 Lorenzino Vaccari, Juan Pane57
Open Bilancio
Best Idea: http://opendata.comune.fi.it/open_bilancio/
11/04/23 Lorenzino Vaccari, Juan Pane58
What?
DAL America Latina (2012): http://desarrollandoamerica.org/aplicaciones-2012/
DAL America Latina (2013): http://2013.desarrollandoamerica.org/appschallenge/
11/04/23 Lorenzino Vaccari, Juan Pane59
http://limaio.innovacion.pe/ http://www.limaio.com/demo
11/04/23 Lorenzino Vaccari, Juan Pane60http://www.mysociety.org/2007/more-travel-maps/morehousing
11/04/23 Lorenzino Vaccari, Juan Pane61
Johann MITTHEISZ (CIO der Stadt Wien)
http://www.slideshare.net/BrigitteLutz/keynote-mittheisz-cio-stadt-wien/16
Total hours to develop 38 applications:around 2.600
City of Wien saved around 208.000 Euro
11/04/23 Lorenzino Vaccari, Juan Pane62
The Open Data Ecosystem(and the OpenStreetMap case)
11/04/23 Lorenzino Vaccari, Juan Pane63
11/04/23 Lorenzino Vaccari, Juan Pane64
OpenStreetMap
~
OpenStreetMap project creates and provides geographical data, such as road maps, freely available to anyone. Behind the establishment and growth of the project have been restrictions on use or availability of map information across much of the world and the advent of inexpensive portable satellite navigation devices.
OpenStreetMap is a free map of theworld, created by someone like you
11/04/23 Lorenzino Vaccari, Juan Pane65http://tools.geofabrik.de/mc/?mt0=mapnik&mt1=googlemap&lon=11.12042&lat=46.07224&zoom=18
11/04/23 Lorenzino Vaccari, Juan Pane66http://haiti.ushahidi.com
11/04/23 Lorenzino Vaccari, Juan Pane67
Watercolor maps
http://content.stamen.com/files/cartography/index_watercolor.html#18.00/46.07204/11.12097
11/04/23 Lorenzino Vaccari, Juan Pane68
From maps to blankets…
http://softcities.net
11/04/23 Lorenzino Vaccari, Juan Pane69
Sharing Data Globally(the eHabitat example)
11/04/23 Lorenzino Vaccari, Juan Pane70
21th Century Challenges
Source: http://www.slideshare.net/angeled/geoss © GEO secretariat
11/04/23 Lorenzino Vaccari, Juan Pane71
The Group of Earth Observation
Source: http://www.slideshare.net/angeled/geoss © GEO secretariat84 GEO members and 61 Participating organizations
11/04/23 Lorenzino Vaccari, Juan Pane72
GEOSS Data Sharing Principles
• Full and Open Exchange of Data, recognizing Relevant International Instruments and National Policies
• Data and Products at Minimum Time delay and Minimum Cost
• Free of Charge or minimal Cost for Research and Education
http://www.geoportal.org/web/guest/geo_home
11/04/23 Lorenzino Vaccari, Juan Pane73
“Venezuela is considered a state with extremely high biodiversity, with habitats ranging from the Andes mountains in the west to the Amazon Basin rainforest in the south, via extensive llanos plains and Caribbean coast in the center and the Orinoco River Delta in the east."
Source: Wikipedia
11/04/23 Lorenzino Vaccari, Juan Pane74
GEOSS for biodiversity
http://www.eurogeoss-broker.eu/
11/04/23 Lorenzino Vaccari, Juan Pane75
The eHabitat Model
http://ehabitat-wps.jrc.ec.europa.eu/ehabitat/
11/04/2376 Lorenzino Vaccari, Juan Pane
Questions?
OGD: Part 5 - Semantics
11/04/23 Lorenzino Vaccari, Juan Pane77
Available
Structured
Open formats
Redefenceable
Linked
Linked Open Data
The best data is an open data
Vs.
All data must be perfect
11/04/23 Lorenzino Vaccari, Juan Pane78
Lack of explicit semanticsThe real meaning of the data was kept in the developers mind when creating the data
78http://goo.gl/npEHKr
11/04/23 Lorenzino Vaccari, Juan Pane79
Lack of explicit semanticsCan lead to things like:
11/04/23 Lorenzino Vaccari, Juan Pane80
Semantic heterogeneityDifference in the meaning of local data
11/04/23 Lorenzino Vaccari, Juan Pane81
Issues when Opening Trentino Data
Each department has authority on only some part of the data.
Dataset originally created for internal use only.Dataset created for a specific need.Dataset created with custom format:
For structure (some exceptions)For data
Lack of reuse -> duplication.Lack of programmers.We cannot TELL them what/how to do (always).Data changes
11/04/23 Lorenzino Vaccari, Juan Pane82
Available
Structured
Open formats
Redefenceable
Linked
Entity CentricSemantic Layer
Data Catalog
Data Catalog
11/04/23 Lorenzino Vaccari, Juan Pane83
Entity centric: Added valueAggregated dataAccurate data, manually curatedUnique identifiers, distributed perspectives
Re-think identifiersSemantified values
E1
name Juan Pane
nationality italian
lives in Trento
affiliation Univ. Trento
E2
name Ignacio P. F.
born in Paraguay
date of birth 1980
affiliation PF-UNA
11/04/23 Lorenzino Vaccari, Juan Pane84
EntitiesReal world: is something that has a distinct,
separate existence, although it need not be a material (physical) existence. Has a set of properties, which evolve over time. Example:
Mental: personal (local) model created and maintained by a person that references and describes a real world entity.
Digital: capture the semantics of real world entities, provided by people.
11/04/23 Lorenzino Vaccari, Juan Pane85
Entity Centric Semantic Layer:• Address the integration problems due to
semantic heterogeneity:• Different formats• Different identifiers• Implicit semantics• Homonyms, synonyms, aliases• Partial knowledge• Knowledge evolution
http://www.webfoundation.org/2011/11/5-star-open-data-initiatives/
11/04/23 Lorenzino Vaccari, Juan Pane86
Entity-based Integration• Focus on entities as first class citizens
• Entities are objects which are so important in our everyday life to be referred with a name
• Each entity has its own metadata (e.g. name, latitude, longitude, …)• Each entity is in relation with many other entities (e.g. Einstein was
born in Ulm, his affiliation was Charles University, Ulm is a city in Germany)
• There are relatively “few” commonsense entity types (person, …, event)
• There are many domain specific entities (bus stops, cycling paths, ..)• All components have explicit semantics: schema, entities, attributes,
values
11/04/23 Lorenzino Vaccari, Juan Pane87
Importing pipeline, Macro Steps1. Domain analysis
Study the needed entity types, adapt the knowledge base accordingly. First time bootstrapping
2. Import entities Semi-automatic tool.
Domain experts are expensive. Human attention is a scarce resource. Incremental enrichment and aggregation of
entities.
11/04/23 Lorenzino Vaccari, Juan Pane88
Open Data PeculiaritiesAll data comes from a CKAN repository
(DCAT).Process one data file at a time.Each data file can be represented as a
table.Each row in the table represents a (partial)
entity.The format of the values might not be
enforced in the data files.Not all data is relevant.
11/04/23 Lorenzino Vaccari, Juan Pane89
Importing tool process
11/04/23 Lorenzino Vaccari, Juan Pane90
1. Source SelectionImport one data file at a time
11/04/23 Lorenzino Vaccari, Juan Pane91
2. Schema MatchingSelect a target type of entity -> correspondences between the input columns and the output attributes
nome provincia descrizione funivie lat long
Andalo (1047) Provincia di Trento
Sorge su un'ampia sella prativa al centro...
3 654463 712857
Canazei (1450) Trento Prov. Situato all'estremità settentrionale della...
2 511504 147444
11/04/23 Lorenzino Vaccari, Juan Pane92
3. Data ValidationApplies format and structure validation and possible automatic transformations needed to have the input data in the expected format.
11/04/23 Lorenzino Vaccari, Juan Pane93
4. Semantic Enrichment (1/2)Entity disambiguation: Transform text references into links to existing entities.
11/04/23 Lorenzino Vaccari, Juan Pane94
4. Semantic Enrichment (2/2)Natural Language Processing: Extract concepts and entity references from free-text.
11/04/23 Lorenzino Vaccari, Juan Pane95
5. ReconciliationRun Identity Management Algorithms to identify each row as a new or existing entity.
Result•No Match•Match•Multiple Matches
Action:•Use ID•New ID•Ignore Row
11/04/23 Lorenzino Vaccari, Juan Pane96
6. ExportingAt this point:We know what to export.All values for target attributes conform to the expected format.All text has been semantified (NLP).All textual references to entities are converted to linksEach row has an identifier
i i+1v0
11/04/23 Lorenzino Vaccari, Juan Pane97
7. PublishingPut back the semantified entities into CKAN so that the entities can be Open Data and can be found in the same catalog as the original data.Developers and find the data files of the cleaned, aggregated entitiesBut can also interact with the entities via the Entitypedia APIs
8. VisualizationSearch and Navigation
11/04/23 Lorenzino Vaccari, Juan Pane98
Semantic Layer: ServicesTool for aiding the “semantification” of the datasets in the catalog based on:
• Schema matching services• Identity Management services
• Entity Matching services• Global Unique Identifier services
• Semantic search and indexing services• Natural Language Processing• Entity store
11/04/23 Lorenzino Vaccari, Juan Pane99
Our Goal
TN
UK
BEES
11/04/23 Lorenzino Vaccari, Juan Pane100 http://www.youtube.com/watch?v=Bq_ZWl1ZXA0
BEYOND
11/04/23 Lorenzino Vaccari, Juan Pane101
Gracias!
Grazie!
Mercy!
Gràcies!Gratias!
Thanks!
Danke!
Dank u!
Kiitos!
ευχαριστώ
We thank in particular CLEI 2013, Autonomous Province of Trento, TrentoRise association, Universidad Nacional de Asuncion, Universidad Simon Bolivar and University of Trento
Lorenzino Vaccari1, Juan Pane2
1Autonomous Province of Trento, Trento, Italy [email protected]
2University of Trento, Trento, Italy – Universidad Nacional de Asuncion, Asuncion, Paraguay