linked data, big data, and user science at globo.com

37
Ícaro Medeiros [email protected] [email protected] I Encontro de Computação Semântica @UFRJ 11/03/2015 LINKED DATA BIG DATA USER SCIENCE @ globo.com

Upload: icaro-medeiros

Post on 14-Jul-2015

960 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Linked Data, Big Data, and User Science at Globo.com

Ícaro Medeiros [email protected] [email protected]

I Encontro de Computação Semântica

@UFRJ 11/03/2015

LINKED DATA BIG DATA

USER SCIENCE@

globo.com

Page 2: Linked Data, Big Data, and User Science at Globo.com
Page 3: Linked Data, Big Data, and User Science at Globo.com

( icaro, home_globoesporte, pageview@23:00 )

( icaro, materia_1, scroll+2min@14:00 )

Signals

( materia_1: [messi, neymar, barcelona] )content description

Page 4: Linked Data, Big Data, and User Science at Globo.com

LINKED DATA (content)

Page 5: Linked Data, Big Data, and User Science at Globo.com

Ontologies‣ 288 classes

‣ Person: 65K

‣ Place: 50K

‣ Athlete: 22K

‣ Politicians: 32K

Page 6: Linked Data, Big Data, and User Science at Globo.com

Annotation tool

Page 7: Linked Data, Big Data, and User Science at Globo.com
Page 8: Linked Data, Big Data, and User Science at Globo.com

Interface follows the ontology

Fields

Search ranges

Suggest as you type

Triples stored in Virtuoso

Automatic entity extraction

Fast search in Elastic Search

Page 9: Linked Data, Big Data, and User Science at Globo.com

Contextual navigation

Page 10: Linked Data, Big Data, and User Science at Globo.com

globoesporte.com

Page 11: Linked Data, Big Data, and User Science at Globo.com

globoesporte.com

Page 12: Linked Data, Big Data, and User Science at Globo.com

globoesporte.com

Page 13: Linked Data, Big Data, and User Science at Globo.com

Automatic page generation

Page 14: Linked Data, Big Data, and User Science at Globo.com
Page 15: Linked Data, Big Data, and User Science at Globo.com

Intelligent Search

Page 16: Linked Data, Big Data, and User Science at Globo.com

BIG DATA

Page 17: Linked Data, Big Data, and User Science at Globo.com

Cluster Stats

‣ 10 machines

‣ 1 TB RAM

‣ 500 TB disk

‣ 338 VCores

Page 18: Linked Data, Big Data, and User Science at Globo.com

Signal Capturing

Page 19: Linked Data, Big Data, and User Science at Globo.com

Beyond clicks (engagement science)

‣ Attention-based metrics

‣ Scroll

‣ Time spent on page

‣ Dwell time

‣ Social Media Analytics

http://labs.yahoo.com/publication/beyond-clicks-dwell-time-for-personalization/

Page 20: Linked Data, Big Data, and User Science at Globo.com

Shares are noisy

http://time.com/12933/what-you-think-you-know-about-the-web-is-wrong/

Page 21: Linked Data, Big Data, and User Science at Globo.com

Scroll

http://time.com/12933/what-you-think-you-know-about-the-web-is-wrong/

Page 22: Linked Data, Big Data, and User Science at Globo.com

Recommendation

‣ TF-IDF

‣ Collaborative Filtering

‣ Users

‣ Content

‣ Latent Factor Analysis

Page 23: Linked Data, Big Data, and User Science at Globo.com
Page 24: Linked Data, Big Data, and User Science at Globo.com
Page 25: Linked Data, Big Data, and User Science at Globo.com

USER SCIENCE for news reading

Page 26: Linked Data, Big Data, and User Science at Globo.com

User Modeling (for news reading)

‣ Dynamic profiling

‣ Explicit personal data

‣ Interests (implicit)

‣ Temporal constraints: periodicity

Page 27: Linked Data, Big Data, and User Science at Globo.com

Signal Capturing

Excelsior

Signals

Page 28: Linked Data, Big Data, and User Science at Globo.com

Semantic User Modeling

‣ Annotations from engaged content

‣ Profile can answer:

‣ My favourite team

‣ City I live in

‣ My hometown

Page 29: Linked Data, Big Data, and User Science at Globo.com

Spreading Activation

Page 30: Linked Data, Big Data, and User Science at Globo.com

My profile on

Page 31: Linked Data, Big Data, and User Science at Globo.com

City/State I live in

Page 32: Linked Data, Big Data, and User Science at Globo.com

Hometown and State

Page 33: Linked Data, Big Data, and User Science at Globo.com

Football team test (3.5MM users)

82% precision

95% precision@top3

* When the user has read at least one article that cites their team

Page 34: Linked Data, Big Data, and User Science at Globo.com

How fast?

mean request time

between interaction

and profile update

5 min 48 ms

Page 35: Linked Data, Big Data, and User Science at Globo.com

Potential uses

‣ Personalized homepages

‣ Targeted advertising

‣ Granular user/content description

‣ Semantic Recommendation

‣ Clustering

‣ Demographic data

‣ Informed product creation/evolution

Page 36: Linked Data, Big Data, and User Science at Globo.com

github.com/globocom/

IWantToWorkAtGloboCom

Page 37: Linked Data, Big Data, and User Science at Globo.com

Ícaro Medeiros [email protected]

Semantic team [email protected]

globo.com

slides icaromedeiros.com.br

slideshare.net/icaromedeiros