linked data, big data, and user science at globo.com

Post on 14-Jul-2015

960 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Ícaro Medeiros icaro.medeiros@gmail.com semantica@corp.globo.com

I Encontro de Computação Semântica

@UFRJ 11/03/2015

LINKED DATA BIG DATA

USER SCIENCE@

globo.com

( icaro, home_globoesporte, pageview@23:00 )

( icaro, materia_1, scroll+2min@14:00 )

Signals

( materia_1: [messi, neymar, barcelona] )content description

LINKED DATA (content)

Ontologies‣ 288 classes

‣ Person: 65K

‣ Place: 50K

‣ Athlete: 22K

‣ Politicians: 32K

Annotation tool

Interface follows the ontology

Fields

Search ranges

Suggest as you type

Triples stored in Virtuoso

Automatic entity extraction

Fast search in Elastic Search

Contextual navigation

globoesporte.com

globoesporte.com

globoesporte.com

Automatic page generation

Intelligent Search

BIG DATA

Cluster Stats

‣ 10 machines

‣ 1 TB RAM

‣ 500 TB disk

‣ 338 VCores

Signal Capturing

Beyond clicks (engagement science)

‣ Attention-based metrics

‣ Scroll

‣ Time spent on page

‣ Dwell time

‣ Social Media Analytics

http://labs.yahoo.com/publication/beyond-clicks-dwell-time-for-personalization/

Shares are noisy

http://time.com/12933/what-you-think-you-know-about-the-web-is-wrong/

Scroll

http://time.com/12933/what-you-think-you-know-about-the-web-is-wrong/

Recommendation

‣ TF-IDF

‣ Collaborative Filtering

‣ Users

‣ Content

‣ Latent Factor Analysis

USER SCIENCE for news reading

User Modeling (for news reading)

‣ Dynamic profiling

‣ Explicit personal data

‣ Interests (implicit)

‣ Temporal constraints: periodicity

Signal Capturing

Excelsior

Signals

Semantic User Modeling

‣ Annotations from engaged content

‣ Profile can answer:

‣ My favourite team

‣ City I live in

‣ My hometown

Spreading Activation

My profile on

City/State I live in

Hometown and State

Football team test (3.5MM users)

82% precision

95% precision@top3

* When the user has read at least one article that cites their team

How fast?

mean request time

between interaction

and profile update

5 min 48 ms

Potential uses

‣ Personalized homepages

‣ Targeted advertising

‣ Granular user/content description

‣ Semantic Recommendation

‣ Clustering

‣ Demographic data

‣ Informed product creation/evolution

github.com/globocom/

IWantToWorkAtGloboCom

Ícaro Medeiros icaro.medeiros@gmail.com

Semantic team semantica@corp.globo.com

globo.com

slides icaromedeiros.com.br

slideshare.net/icaromedeiros

top related