thomas van der elsen, richard lawrence, jumi oladimeji, alastair smith

20
Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

Upload: gabriella-gutierrez

Post on 28-Mar-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

Thomas van der Elsen, Richard Lawrence,

Jumi Oladimeji, Alastair Smith

Page 2: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

IntroductionPeople increasingly publish their reactions to

public events using a blogA tool that enables this info to be published quicklyA journal that is available on the web

Need for effective data-mining techniques specific to blogs and similar tools (e.g. the Semantic Web)

“Our goal is to develop a method of capturing hot conversations by automating readers’ processes for characterizing and monitoring blogs.”

Page 3: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

OverviewData-mining techniques

Creation of blog link structureAnalysing link structure

Types of important bloggersAgitatorsSummarisers

Applications, analysis and conclusionsReal-world applications and extensionsPros and cons of the paper

Page 4: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

Crawling blogsExtracting hyperlinksExtracting blog threads

Page 5: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

Crawling blogs

System crawls through RSS list registering for each entry:TitlePermalink List entry date

Aggregator: gathers RSS feeds from multiple sources and organises them

OPML: file format used to share RSS feed lists

RSS: A format for distributing content on the web

Aggregators

RSS list

RSS feeds

OPML

Page 6: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

Extracting hyperlinks

Problem: Different tag structures per server

RSS feed from list

Description

Blog entries

Hyperlink list

Page 7: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

Extracting blog threadsHyperlink

If sourceLinkIf replyLink

Check links exist in thread data

Add

Check departure URL exists in thread data

Check destination URL points to entry on list

&&

Add dest entry to thread

11

Add destination entry to entry list and add to thread

10

Add departure entry to thread

01Create new thread

00

Page 8: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

Example Results

Page 9: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

AgitatorsSummarisersJoe Bloggs

Page 10: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

AgitatorsDiscussion stimulatorThreads often grow after an agitator’s entryThree discriminants for an agitator

Link (Agi1)Popularity (Agi2)Topic (Agi3)

The three discriminants can be weighted using the following formula:

Page 11: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

Link-based Discriminantex is an agitator if

(kx) > θ1

ex = a blog entry

kx = no of entries

in threadi with a

replyLink to ex

Page 12: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

Popularity-based discriminantex is an agitator if

(lx/mx) > θ2

ex = a blog entrylx = no of entries in

threadi

published t days after ex

mx = no of entries in

threadi published t days

before ex

Page 13: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

Topic-based discriminantex is an agitator if

ex = a blog entry

n = number of entries

Page 14: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

Summarizers Publish entries that collate

and compact previous posts Provide a convenient way of

digesting an entire thread The discriminant for

summarizers is link-based:ex is a summarizer if

(px) > θ4

ex = a blog entry

px = number of entries in threadi that have a replyLink from ex

Page 15: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

ApplicationsPros and ConsConclusions

Page 16: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

ApplicationsSupplementary info e.g. TV, news site etc

Home and Away – who shot Josh West Agitator

Sports, etc. – used by studios and media to highlight points of interest in a match Summariser

Page 17: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

Analysis – ProsBasis for future research – a brief intro to the

subject. Multiple thread analysisIdentification of areas of bloggers’ expertise

Highly effective in certain specific areasNews and reviews

Implementation of theory (feature vector)

Page 18: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

Analysis – ConsOnly 25 sites used in sample (but 1000s of

blogs)Does not take context into consideration

E.g., an agitator may be posting offensive entries

No measurement of summary successComments are not analysedInappropriate for certain areas

MySpace, Bebo, et al. (due to target audience)

Page 19: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

ConclusionsCreated a data-mining framework for future

researchMay instigate research into further work

Nice idea and potentially useful but needs to be extended

Page 20: Thomas van der Elsen, Richard Lawrence, Jumi Oladimeji, Alastair Smith

Thank you for your time