rio info 2013 - linked data at globo.com

51
Linked Data at Tatiana Al-Chueyr Martins [email protected] @tati_alchueyr 18 de setembro de 2013, Simpósio Rio Info globo.com

Upload: tatiana-al-chueyr

Post on 30-Nov-2014

1.762 views

Category:

Education


1 download

DESCRIPTION

Prsenta

TRANSCRIPT

Page 1: Rio info 2013 - Linked Data at Globo.com

Linked Data at

Tatiana Al-Chueyr [email protected]@tati_alchueyr

18 de setembro de 2013, Simpósio Rio Info

globo.com

Page 2: Rio info 2013 - Linked Data at Globo.com

BROADCAST MOVIES PAY TV INTERNET

EVENTS MUSIC

PUBLISHING

NEW VENTURES NEWSPAPERRADIO NETWORK

Page 3: Rio info 2013 - Linked Data at Globo.com

Andréia Bustamante

Ícaro Medeiros

Tatiana Al-Chueyr

Rodrigo Senra

Semantic Team

Page 4: Rio info 2013 - Linked Data at Globo.com

Franklin Amorim

Diogo Kiss

Contributors

Page 5: Rio info 2013 - Linked Data at Globo.com

MotivationNot only words

São Paulo

Page 6: Rio info 2013 - Linked Data at Globo.com

MotivationNot only words

São Paulo?

Page 7: Rio info 2013 - Linked Data at Globo.com

MotivationNot only words

São Paulo state

Page 8: Rio info 2013 - Linked Data at Globo.com

MotivationNot only words

São Paulo city

Page 9: Rio info 2013 - Linked Data at Globo.com

MotivationNot only words

São Paulo saint

Page 10: Rio info 2013 - Linked Data at Globo.com

MotivationNot only words

São Paulo soccer team

Page 11: Rio info 2013 - Linked Data at Globo.com

MotivationMultiple words for the same thing

FemalefF

femalewoman

...

Page 12: Rio info 2013 - Linked Data at Globo.com

MotivationMultiple words for the same thing

http://data.globo.com/female

Page 13: Rio info 2013 - Linked Data at Globo.com

Motivation

Soccer player

Cross-link content from different web products

Page 14: Rio info 2013 - Linked Data at Globo.com

Politician

MotivationCross-link content from different web products

Page 15: Rio info 2013 - Linked Data at Globo.com

Celebrity

Motivation● Cross-link content from different web products

MotivationCross-link content from different web products

Page 16: Rio info 2013 - Linked Data at Globo.com

Isabella Nardoni foi morta em 29 de março de 2008

na Zona Norte de São Paulo (Foto:Reprodução)

Isabella de Oliveira Nardoni, de 5 anos, foi morta na noite de 29 de março de 2008. A perícia concluiu que a menina foi atirada do sexto andar do prédio onde moravam seu pai, Alexandre Nardoni, sua madrasta, Anna Carolina Jatobá, e dois filhos pequenos do casal, na Vila Isolina Mazzei, na zona norte de São Paulo.

Túmulo de Isabella vira local de visitação em SP; casal Nardoni está preso.

Caso Isabella Nardoni

Juliana Cardilli G1 SP

RDF

FOAF

GEO

Dublin Core

SKOS

Semantic markup in web pagesMotivation

Page 17: Rio info 2013 - Linked Data at Globo.com

Recommend annotations to information ProducerMotivation

Page 18: Rio info 2013 - Linked Data at Globo.com

Suggest related content to information Consumer Motivation

Page 19: Rio info 2013 - Linked Data at Globo.com

Suggest related content to information Consumer Motivation

Page 20: Rio info 2013 - Linked Data at Globo.com

Suggest related content to information Consumer Motivation

Page 21: Rio info 2013 - Linked Data at Globo.com

Changes● Replacement of words by entities

http://data.globo.com/person/Person/santos_dumont

Page 22: Rio info 2013 - Linked Data at Globo.com

Changes● Replacement of labels by qualified relationships

Page 23: Rio info 2013 - Linked Data at Globo.com

Changes● Organize data from tables to graphs

Page 24: Rio info 2013 - Linked Data at Globo.com

Outcomes ● To replace words by entities improved:

○ Finding

○ Linking

○ Reconciling

○ Organizing

multiple layers of information

Page 25: Rio info 2013 - Linked Data at Globo.com

Outcomes ● Flexible ways to organize content

● Ease to find related issues

● Explicit relations derived from annotated content

● Up-to-date topic pages with little editorial effort

● Linking content across different web products

● Seamless navigation leading to flow state

Page 26: Rio info 2013 - Linked Data at Globo.com

Status QuoUsed by the main web products of Globo.com:

○ 18,485 organizations

○ 83,000 people

○ 9,129 places

○ 1,000,000+ annotated news

Which sum up 2,500,000+ entities!

from August 2010 to May 2013

Page 27: Rio info 2013 - Linked Data at Globo.com

Linked dataproblems

Page 28: Rio info 2013 - Linked Data at Globo.com

Legacy Architecture

CDA

CMA

triple store

search engine

ontology

Page 29: Rio info 2013 - Linked Data at Globo.com

CDA

CMA

CDACMA

CDACMA

CDACMA

Legacy Architecture

triple store

search engine

ontology

Page 30: Rio info 2013 - Linked Data at Globo.com

Poor data management

○ direct access to triple store (unmanaged)

○ difficulty to share data (distributed DBs)

○ re-sync triple-store and search engine index

○ scalability of triple store

○ high entropy in distributed ontology engineering

Problems

Page 31: Rio info 2013 - Linked Data at Globo.com

Problems

Page 32: Rio info 2013 - Linked Data at Globo.com

Ontology Engineering

Domain-driven(current)

Base

G1 GE EGO TVG

news sports gossip tv

Upper

Person Organization

Music

Politics

Programme Education

Sports

Product-driven(past)

Place

Page 33: Rio info 2013 - Linked Data at Globo.com

Possible Solution

UpperOntology

Page 34: Rio info 2013 - Linked Data at Globo.com

Semantic as a library

○ many different versions in production

○ programming language dependent

○ steep learning curve for RDF/OWL/SPARQL

Problems

Page 35: Rio info 2013 - Linked Data at Globo.com

Create an open semantic data management platform

● Scalable

● Mobile and Web friendly

● Interconnect Globo's data with external data sources

● Automate content extraction (including NER)

Solution

Page 36: Rio info 2013 - Linked Data at Globo.com

Brainiaklinked data restful API

Page 37: Rio info 2013 - Linked Data at Globo.com

CDA

CMA

CDACMA

CDACMA

CDACMA

Legacy Architecture

triple store

search engine

ontology

Page 38: Rio info 2013 - Linked Data at Globo.com

APIBrainiak

CMA

CDA

CDA

CDA

CDA

triple store

search engine

Under Development

Page 39: Rio info 2013 - Linked Data at Globo.com

Requirements● Indirect usage of SPARQL

● Programming language independent

● Data management with quality

● Finer-grained authorization and authentication

● Isolate applications from triplestore

● Improve triplestore performance

Page 40: Rio info 2013 - Linked Data at Globo.com

SPARQL query DEFINE input:inference <http://data.globo.com/ruleset> SELECT ?uri ?label FROM <http://data.globo.com/sports/> WHERE { ?uri a <http://data.globo.com/sports/Team>; rdfs:label ?label . } LIMIT 10 OFFSET 0

task: list all sports teams

Page 41: Rio info 2013 - Linked Data at Globo.com

/sports/Team

Brainiak query

GET

Page 42: Rio info 2013 - Linked Data at Globo.com

SPARQL response

Page 43: Rio info 2013 - Linked Data at Globo.com

Brainiak response

Page 44: Rio info 2013 - Linked Data at Globo.com

SPARQL query

SELECT DISTINCT ?classWHERE { <http://data.globo.com/place/City> rdfs:subClassOf ?class OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) . ?class a owl:Class .}

task: retrieve all superclasses of a class

Page 45: Rio info 2013 - Linked Data at Globo.com

SPARQL query SELECT DISTINCT ?predicate ?predicate_graph ?predicate_comment ?type ?range ?title ?range_graph ?range_label ?super_propertyWHERE { { GRAPH ?predicate_graph { ?predicate rdfs:domain ?domain_class } . } UNION { graph ?predicate_graph {?predicate rdfs:domain ?blank} . ?blank a owl:Class . ?blank owl:unionOf ?enumeration . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?domain_class } . } FILTER (?domain_class IN (<http://data.globo.com/place/City>, <http://data.globo.com/place/GeopoliticalDivision>, <http://data.globo.com/place/Place>, <http://data.globo.com/upper/Object>, <http://data.globo.com/upper/Substance>, <http://data.globo.com/upper/ConcreteEntity>, <http://data.globo.com/upper/Entity>)) {?predicate rdfs:range ?range .} UNION { ?predicate rdfs:range ?blank . ?blank a owl:Class . ?blank owl:unionOf ?enumeration . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?range } . } FILTER (!isBlank(?range)) ?predicate rdfs:label ?title . ?predicate rdf:type ?type . OPTIONAL { ?predicate rdfs:subPropertyOf ?super_property } . FILTER (?type in (owl:ObjectProperty, owl:DatatypeProperty)) . FILTER(langMatches(lang(?title), "en") OR langMatches(lang(?title), "")) . OPTIONAL { ?predicate rdfs:comment ?predicate_comment } FILTER(langMatches(lang(?predicate_comment), "en") OR langMatches(lang(?predicate_comment), "")) . OPTIONAL { GRAPH ?range_graph { ?range rdfs:label ?range_label . FILTER(langMatches(lang(?range_label), "en") OR langMatches(lang(?range_label), "")) . } }}

task: retrieve all properties of a group of classes

Page 46: Rio info 2013 - Linked Data at Globo.com

SPARQL query SELECT DISTINCT ?predicate ?min ?max ?range ?enumerated_value ?enumerated_value_labelWHERE { <http://data.globo.com/place/City> rdfs:subClassOf ?s OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) . ?s owl:onProperty ?predicate . OPTIONAL { ?s owl:minQualifiedCardinality ?min } . OPTIONAL { ?s owl:maxQualifiedCardinality ?max } . OPTIONAL { { ?s owl:onClass ?range } UNION { ?s owl:onDataRange ?range } UNION { ?s owl:allValuesFrom ?range } OPTIONAL { ?range owl:oneOf ?enumeration } . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?enumerated_value } . OPTIONAL { ?enumerated_value rdfs:label ?enumerated_value_label . } . }}

}

task: retrieve the cardinalities of all properties of a certain class

Page 47: Rio info 2013 - Linked Data at Globo.com

/place/City/_schema

Brainiak query

GET

Page 48: Rio info 2013 - Linked Data at Globo.com

● Enrich Globo.com search

● SEO (automatic schema.org)

● Improve annotator (DBpedia Spotlight)

● Richer content relationships (inference)

● Link to open data (e.g. DBPedia, dados.gov.br)

Next steps

Page 49: Rio info 2013 - Linked Data at Globo.com

Stay tuned

@brainiak_api

... will be soon released

as an open source project !