implications of web 2.0 on information research

62
1/62 Implications of Web 2.0 on Information Research Wen-Lian Hsu Academia Sinica, Taiwan 中中中中中中中中 中中中 [email protected]

Upload: sylvia

Post on 17-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Implications of Web 2.0 on Information Research. Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 [email protected]. Outline. What is Web 2.0? Web 2.0 and Research Human-based Computation Folksonomy (Social Tagging) Academic Data Analysis GIO-Info Conclusion. What is Web 2.0?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Implications of Web 2.0 on Information Research

1/62

Implications of Web 2.0 on Information Research

Wen-Lian HsuAcademia Sinica, Taiwan

中央研究院資訊所 許聞廉[email protected]

Page 2: Implications of Web 2.0 on Information Research

2/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Outline

What is Web 2.0? Web 2.0 and Research

Human-based Computation Folksonomy (Social Tagging) Academic Data Analysis GIO-Info

Conclusion

Page 3: Implications of Web 2.0 on Information Research

3/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Page 4: Implications of Web 2.0 on Information Research

4/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

What is Web 2.0?

Web 2.0 Conference (October 2004) Tim O'Reilly

The Web As a Platform Harnessing Collective Intelligence Data is the Next Intel Inside End of the Software Release Cycle Lightweight Programming Models Software Above the Level of a Single Device Rich User Experiences

Page 5: Implications of Web 2.0 on Information Research

5/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

What is Web 2.0?

Page 6: Implications of Web 2.0 on Information Research

6/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

What is Web 2.0? Web 2.0 is the combination of “tools and technologie

s”, “business strategies” and social/cultural trends, which drive the individual creation and sharing of content on the Internet.

ED YOURDON Web 2.0 opens up the Long Tail, making it increasingly

cost-effective to service the interests of large numbers of relatively small groups of individuals, and to enable them to benefit from key pieces of the platform while fulfilling their own needs.

PAUL MILLER

Page 7: Implications of Web 2.0 on Information Research

7/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

What is Web 2.0? "Web 2.0" seems to be like Pink Floyd lyrics: It can mean

different things to different people, depending on your state of mind.

KEVIN MANEY “Web 2.0 definitely is a buzzword, and it’s overused. But the

movement is only starting. That movement is about leveraging the power of people”

CHAD HURLEY “What we’re seeing is a return to the roots of the web.”

CATARINA FAKE Web 2.0 “is enabling a fundamental shift in power that

really is giving power to the consumer”MARK PARKER

“It’s a way to collaborate with your customers, to allow them to co-create with you”.

Page 8: Implications of Web 2.0 on Information Research

8/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Web 2.0 Sites

Source: http://www.go2web20.net/

Page 9: Implications of Web 2.0 on Information Research

9/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Key Web 2.0 services/applications

Blogs Wikis Tagging and social bookmarking Multimedia sharing RSS and syndication Podcasting P2P

Page 10: Implications of Web 2.0 on Information Research

10/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Sharing and Collecting Resources

Content: Blogger, Wikipeida, Flickr, Youtube

Opinion: Digg, Hemidemi, 推推王 Bandwidth: Emule, BT, Skype, Joost, PPStream Computing: SETI, Grid Innovation: Second Life Money: Din.Ben.Don 訂便當團購網 , 共

乘網

Page 11: Implications of Web 2.0 on Information Research

11/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Social Bookmarking

Source: http://funp.com/push/

Page 12: Implications of Web 2.0 on Information Research

12/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Soruce: http://www.hemidemi.com/

Source: http://digg.com/

Page 13: Implications of Web 2.0 on Information Research

13/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Blog

ContentContent

comments

comments

adsenseadsenseSocial bookmark

Social bookmark

Source: http://carol.bluecircus.net/

Page 14: Implications of Web 2.0 on Information Research

14/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Skype

Source: S.A Baset, H. Schulzrinne (September 14, 2004). An Analysis of the Skype Peer-to-Peer Internet Telephony Protocol. Technical Report. Columbia University.

Page 15: Implications of Web 2.0 on Information Research

15/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Wikipedia

Page 16: Implications of Web 2.0 on Information Research

16/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Second Life

Page 17: Implications of Web 2.0 on Information Research

17/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Symbiosis ( 共生機制 ) is the Key

Blog Social bookmark

Page 18: Implications of Web 2.0 on Information Research

18/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

The Web Changes in Several Dimensions

Dynamics Heterogeneity Collaboration Composition Socialization

Page 19: Implications of Web 2.0 on Information Research

19/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Current Research Activities Information Retrieval on Blogs

NTCIR-7 CLIRB (Cross-Lingual Information Retrieval for Blog) Question Answering on Blogs

TREC 2007 QA Track Question Answering on Wikipedia

QA@CLEF 2007 CLEF 2006 WiQA

given a Wikipedia page, locate information snippets in Wikipedia PASCAL Ontology Learning Challenge

Ontology construction Ontology extension Ontology population Concept naming

LinkKDD2006, Textlink2007, MRDM2007

Page 20: Implications of Web 2.0 on Information Research

20/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Web 2.0 and Research

Human-based Computation Folksonomy (Social Tagging) Academic Data Analysis GIO-Info

Page 21: Implications of Web 2.0 on Information Research

21/62

Human-based Computation

Page 22: Implications of Web 2.0 on Information Research

22/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Human-based Computation

Social Search wayfinding tools informed by human judgment

CAPTCHA reversed Turing test (Turing test 是由人來詢問系統,這裡

則是由系統來詢問使用者) Interactive Genetic Algorithm (IGA)

a genetic algorithm informed by human judgment. 由人工提供 fitness function 結果

例子:描繪罪犯畫像,系統以 GA 方式產生嫌犯畫像,目擊者負責評分看那個比較像,不斷重複過程直到接近罪犯樣子為止

Page 23: Implications of Web 2.0 on Information Research

23/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

CAPTCHA Completely Automated Public Turing test to tell Computers and Humans Apart

A CAPTCHA is a type of challenge-response test used in computing to determine whether the user is human. wikipedia

SOURCE: http://recaptcha.net/

Page 24: Implications of Web 2.0 on Information Research

24/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

CAPTCHA

blog

CAPTCHA

blog

CAPTCHA

blog

CAPTCHA

Unrecognizedtext

Recognizedtext

Page 25: Implications of Web 2.0 on Information Research

25/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

The ESP Game a two-player game The goal is to guess what y

our partner is typing on each image.

Once you both type the same word(s), you get scores.

Source: http://www.espgame.org/

ESPESP

Page 26: Implications of Web 2.0 on Information Research

26/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

The Phetch GamePlay as a describer

Page 27: Implications of Web 2.0 on Information Research

27/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

The Phetch GamePlay as a seeker

PhetchPhetch

Page 28: Implications of Web 2.0 on Information Research

28/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

How about a game for describing idioms?

罄竹難書 如沐春風

高抬貴手 不動如山壞事做太多罄竹難書 : 壞事做太多虎頭蛇尾 : 做事沒有毅力………

Page 29: Implications of Web 2.0 on Information Research

29/62

Folksonomy (Social Tagging)

Page 30: Implications of Web 2.0 on Information Research

30/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Folksonomy (Social Tagging) Also known as social tagging, collaborative

tagging, social classification, social indexing

Folksonomy is the practice and method of collaboratively creating and managing tags to annotate and categorize content.

Wikipedia

Page 31: Implications of Web 2.0 on Information Research

31/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Page 32: Implications of Web 2.0 on Information Research

32/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

del.icio.usTags: Descriptive words applied by users to links. Tags are searchable

My Tags: Words I’ve used to describe links in a way that makes sense to me

Page 33: Implications of Web 2.0 on Information Research

33/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Tag Cloud

Page 34: Implications of Web 2.0 on Information Research

34/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Semantic Web

Source: Tim Berners-Lee

Page 35: Implications of Web 2.0 on Information Research

35/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Using Folksonomy to Help Semantic Web Top-down Semantic Annotation

Approach Define an ontology first Use the ontology to add semantic markups to web

resources. The semantics is provided by the ontology which

is shared among different web agents and applications.

Problem Negotiation Evolution (hard to maintain) High Barrier (background)

Source: Xian Wu, Lei Zhang, Yong Yu. “Exploring Social Annotations for the Semantic Web”

Page 36: Implications of Web 2.0 on Information Research

36/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Using Folksonomy to Help Semantic Web Bottom-up approach with social tagging

Advantage No common ontology or dictionary are needed Easy to access Sensitive to information drift

Disadvantage Ambiguity Problem: For example, “XP” can refer to

either “Extreme Programming” or “Windows XP”. Group Synonymy Problem: two seemingly different

annotations may bear the same meaning.

Source: Xian Wu, Lei Zhang, Yong Yu. “Exploring Social Annotations for the Semantic Web”

Page 37: Implications of Web 2.0 on Information Research

37/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Or Folksonomy is the Solution? Ontology is Overrated

Classification of the web has failed Classification itself is filled with bias and

error Tagging is the solutionSource: http://www.shirky.com/writings/ontology_overrated.html

Page 38: Implications of Web 2.0 on Information Research

38/62

Academic Data Analysis

Page 39: Implications of Web 2.0 on Information Research

39/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Academic Data Analysis

CiteSeer

Google Scholar

e-Lib, Lib 2.0 concept adding

into application, so search platform

provide open API for collecting more

data

Users participate and

interact with data and people

Add My Library, TagEx. Citeulike, BibSonomy

Add Comments, Rating, Recommendation

Ex. Techlens

Domain Focus GroupsEx. Botanicus

Windows Live Academic Search

PudMed

Arxiv

Citation indexPapers , journal/conference, authors

Page 40: Implications of Web 2.0 on Information Research

40/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

An Example

Let’s use an example of TechLen to imagine what research on IR /NLP can do.

Authors Readers

Papers

Page 41: Implications of Web 2.0 on Information Research

41/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

The Terminology Alfred V AhoEntities

Alfred Aho AV AhoAho, A. V.References

LinksAlfred Aho, John Hopcroft, Jeffrey Ullman

AV Aho, BW Kernighan, PJ Weinberger

Entity Groups G1(Programming Languages)

G2(Databases)

G3(Algorithms)

Page 42: Implications of Web 2.0 on Information Research

42/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Imagine how we can make use of them

Papers

Authors

Readers

Comments

Rating

Reference Extraction

Entity Resolution

Page 43: Implications of Web 2.0 on Information Research

43/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

New Research Topics From those changes, key emerging challenge for “Data

Mining” is tackling the problem of dealing with richly structured, finding patterns behind heterogeneous datasets, …, etc.

Several researches focus on those problem like (Social) Network Analysis Link Mining PASCAL Ontology Learning Challenge …

Page 44: Implications of Web 2.0 on Information Research

44/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Society

Nodes: individuals (Authors, Readers)

Links: social relationship (family/work/friendship/belong to,…etc.)

S. Milgram (1967)

Social networks: Many individuals with diverse social interactions between them.

John Guare

Six Degrees of Separation,

Science

source: www.cs.uiuc.edu/~hanj

Page 45: Implications of Web 2.0 on Information Research

45/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Communication networks

The Earth is developing an electronic nervous system, a network with diverse nodes and links are

-computers

-routers

-satellites

-Papers

-User IP

-Comments

-Response

-…

-phone lines

-TV cables

-EM waves

- Relations between artifacts

Communication networks: Many non-identical components with diverse connections between them.

source: www.cs.uiuc.edu/~hanj

Artifacts in Techlens

Page 46: Implications of Web 2.0 on Information Research

46/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Link-based Object Ranking Perhaps the most well known link mining task is that of link-

based object ranking (LBR), which is a primary focus of the link analysis community. The objective of LBR is to exploit the link structure of a graph to order or prioritize the set of objects within the graph.

Example PageRank What paper is most important in this area? What journal/conference is most important in this area? What topic is important in this area?

Page 47: Implications of Web 2.0 on Information Research

47/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Link-based Object Classification/ Link-based Classification (LBC)

Predicting the category of an object based on its attributes and its links and attributes of linked objects

Web: Predict the category of a web page, based on words that occur on the page, links between pages, anchor text, html tags, etc.

Citation: Predict the topic of a paper, based on word occurrence, citations, co-citations

Epidemic : Predict disease type based on characteristics of the people; predict person’s age based on ages of people they have been in contact with and disease type

Page 48: Implications of Web 2.0 on Information Research

48/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Group Detection Cluster the nodes in the graph into

groups that share common characteristics. That is, Predicting when a set of entities belong to the same group based on clustering both object attribute values and link structure.

Web: identifying communities Citation: identifying research communities

Page 49: Implications of Web 2.0 on Information Research

49/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Entity Resolution

Predicting when two objects are the same, based on their attributes and their links Web: predict when two sites are mirrors of

each other Citation: predicting when two citations are

referring to the same paper Epidemics: predicting when two disease

strains are the same Biology: learning when two names refer to

the same protein

Page 50: Implications of Web 2.0 on Information Research

50/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Link Prediction Predict whether a link exists between

two entities, based on attributes and other observed links Web: predict if there will be a link between

two pages Citation: predicting if a paper will cite

another paper, or predict the venue type of a publication (conference, journal, workshop) based on properties of the paper

Epidemics: predicting who a patient’s contacts are ( 在流行病學上需要去找出病源( 灶 )/ 傳染源 )

Page 51: Implications of Web 2.0 on Information Research

51/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Other Possible Research Directions

Expert Finding like giving a suggestion of Paper Reviewer,

Conference committee member Ecological Evolution of Some Research

Like one topic with different solution in a time period

A domain’s topic distribution

Page 52: Implications of Web 2.0 on Information Research

52/62

GEO-Info 地理資訊

Page 53: Implications of Web 2.0 on Information Research

53/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

GEO-Info

Google Earth/Map

GISLimited user, limited usage

Open for every one

Google Earth Community

Google Earth Blog

Ogle Earth ….

User Participate

GML

Photo-sharing User Annotation

Page 54: Implications of Web 2.0 on Information Research

54/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Some Research Topics Until now, a lot of information can be combined int

o google earth/map by KML. Hence such information can be integrated by geoc

oding, some models become very interesting, such as

Photo Annotation, Sharing, and Search Live information Planning 3D, Flights Animation Travel experience, comments Transportation information, survival information Climate Change

Page 55: Implications of Web 2.0 on Information Research

55/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Some Information bundled with Google Earth/Map ( 中山公園 )

Integrated with Youtube (video & tags)

Photo sharing, (photo & Tags)

Page 56: Implications of Web 2.0 on Information Research

56/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Some Application Integrate more Information on Map

Personal Life Information Integration

GeoDDupe: A Novel Interface for Interactive Entity Resolution in Geospatial Data

Page 57: Implications of Web 2.0 on Information Research

57/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

57

Photo link with Map

Source: http://www.panoramio.com

Page 58: Implications of Web 2.0 on Information Research

58/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Image-based Rendering (IBR)

IBR relies on a set of two-dimensional images of a scene to generate a three-dimensional model and then render some novel views of this scene.

Web 2.0 enables sharing of photographs on a truly massive scale

Page 59: Implications of Web 2.0 on Information Research

59/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Microsoft PhotoSynth From SIFT to PhotoSynth

Page 60: Implications of Web 2.0 on Information Research

60/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Conclusion Research results can be easily integrated on the W

eb 2.0 platform make restricted-domain research more useful for t

he public (such as image-based rendering) Software agent

Benefit human-based computation Certain research topics will be easier to tackle, suc

h as personalization in virtual world (more data available)

Data becomes more task oriented (e.g. Wikipedia) More versatile data networks available

Page 61: Implications of Web 2.0 on Information Research

61/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th

Academia Sinica

Acknowledgement

謝謝盧文祥教授及鄭卜壬教授的邀請

I would also like to thank two Ph. D. students of mine who help organize the slides: 李政緯,呂俊宏

Page 62: Implications of Web 2.0 on Information Research

62/62

Thank You