implications of web 2.0 on information research
DESCRIPTION
Implications of Web 2.0 on Information Research. Wen-Lian Hsu Academia Sinica, Taiwan 中央研究院資訊所 許聞廉 [email protected]. Outline. What is Web 2.0? Web 2.0 and Research Human-based Computation Folksonomy (Social Tagging) Academic Data Analysis GIO-Info Conclusion. What is Web 2.0?. - PowerPoint PPT PresentationTRANSCRIPT
1/62
Implications of Web 2.0 on Information Research
Wen-Lian HsuAcademia Sinica, Taiwan
中央研究院資訊所 許聞廉[email protected]
2/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Outline
What is Web 2.0? Web 2.0 and Research
Human-based Computation Folksonomy (Social Tagging) Academic Data Analysis GIO-Info
Conclusion
3/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
4/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
What is Web 2.0?
Web 2.0 Conference (October 2004) Tim O'Reilly
The Web As a Platform Harnessing Collective Intelligence Data is the Next Intel Inside End of the Software Release Cycle Lightweight Programming Models Software Above the Level of a Single Device Rich User Experiences
5/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
What is Web 2.0?
6/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
What is Web 2.0? Web 2.0 is the combination of “tools and technologie
s”, “business strategies” and social/cultural trends, which drive the individual creation and sharing of content on the Internet.
ED YOURDON Web 2.0 opens up the Long Tail, making it increasingly
cost-effective to service the interests of large numbers of relatively small groups of individuals, and to enable them to benefit from key pieces of the platform while fulfilling their own needs.
PAUL MILLER
7/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
What is Web 2.0? "Web 2.0" seems to be like Pink Floyd lyrics: It can mean
different things to different people, depending on your state of mind.
KEVIN MANEY “Web 2.0 definitely is a buzzword, and it’s overused. But the
movement is only starting. That movement is about leveraging the power of people”
CHAD HURLEY “What we’re seeing is a return to the roots of the web.”
CATARINA FAKE Web 2.0 “is enabling a fundamental shift in power that
really is giving power to the consumer”MARK PARKER
“It’s a way to collaborate with your customers, to allow them to co-create with you”.
8/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Web 2.0 Sites
Source: http://www.go2web20.net/
9/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Key Web 2.0 services/applications
Blogs Wikis Tagging and social bookmarking Multimedia sharing RSS and syndication Podcasting P2P
10/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Sharing and Collecting Resources
Content: Blogger, Wikipeida, Flickr, Youtube
Opinion: Digg, Hemidemi, 推推王 Bandwidth: Emule, BT, Skype, Joost, PPStream Computing: SETI, Grid Innovation: Second Life Money: Din.Ben.Don 訂便當團購網 , 共
乘網
11/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Social Bookmarking
Source: http://funp.com/push/
12/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Soruce: http://www.hemidemi.com/
Source: http://digg.com/
13/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Blog
ContentContent
comments
comments
adsenseadsenseSocial bookmark
Social bookmark
Source: http://carol.bluecircus.net/
14/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Skype
Source: S.A Baset, H. Schulzrinne (September 14, 2004). An Analysis of the Skype Peer-to-Peer Internet Telephony Protocol. Technical Report. Columbia University.
15/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Wikipedia
16/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Second Life
17/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Symbiosis ( 共生機制 ) is the Key
Blog Social bookmark
18/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
The Web Changes in Several Dimensions
Dynamics Heterogeneity Collaboration Composition Socialization
19/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Current Research Activities Information Retrieval on Blogs
NTCIR-7 CLIRB (Cross-Lingual Information Retrieval for Blog) Question Answering on Blogs
TREC 2007 QA Track Question Answering on Wikipedia
QA@CLEF 2007 CLEF 2006 WiQA
given a Wikipedia page, locate information snippets in Wikipedia PASCAL Ontology Learning Challenge
Ontology construction Ontology extension Ontology population Concept naming
LinkKDD2006, Textlink2007, MRDM2007
20/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Web 2.0 and Research
Human-based Computation Folksonomy (Social Tagging) Academic Data Analysis GIO-Info
21/62
Human-based Computation
22/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Human-based Computation
Social Search wayfinding tools informed by human judgment
CAPTCHA reversed Turing test (Turing test 是由人來詢問系統,這裡
則是由系統來詢問使用者) Interactive Genetic Algorithm (IGA)
a genetic algorithm informed by human judgment. 由人工提供 fitness function 結果
例子:描繪罪犯畫像,系統以 GA 方式產生嫌犯畫像,目擊者負責評分看那個比較像,不斷重複過程直到接近罪犯樣子為止
23/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
CAPTCHA Completely Automated Public Turing test to tell Computers and Humans Apart
A CAPTCHA is a type of challenge-response test used in computing to determine whether the user is human. wikipedia
SOURCE: http://recaptcha.net/
24/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
CAPTCHA
blog
CAPTCHA
blog
CAPTCHA
blog
CAPTCHA
Unrecognizedtext
Recognizedtext
25/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
The ESP Game a two-player game The goal is to guess what y
our partner is typing on each image.
Once you both type the same word(s), you get scores.
Source: http://www.espgame.org/
ESPESP
26/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
The Phetch GamePlay as a describer
27/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
The Phetch GamePlay as a seeker
PhetchPhetch
28/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
How about a game for describing idioms?
罄竹難書 如沐春風
高抬貴手 不動如山壞事做太多罄竹難書 : 壞事做太多虎頭蛇尾 : 做事沒有毅力………
29/62
Folksonomy (Social Tagging)
30/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Folksonomy (Social Tagging) Also known as social tagging, collaborative
tagging, social classification, social indexing
Folksonomy is the practice and method of collaboratively creating and managing tags to annotate and categorize content.
Wikipedia
31/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
32/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
del.icio.usTags: Descriptive words applied by users to links. Tags are searchable
My Tags: Words I’ve used to describe links in a way that makes sense to me
33/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Tag Cloud
34/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Semantic Web
Source: Tim Berners-Lee
35/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Using Folksonomy to Help Semantic Web Top-down Semantic Annotation
Approach Define an ontology first Use the ontology to add semantic markups to web
resources. The semantics is provided by the ontology which
is shared among different web agents and applications.
Problem Negotiation Evolution (hard to maintain) High Barrier (background)
Source: Xian Wu, Lei Zhang, Yong Yu. “Exploring Social Annotations for the Semantic Web”
36/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Using Folksonomy to Help Semantic Web Bottom-up approach with social tagging
Advantage No common ontology or dictionary are needed Easy to access Sensitive to information drift
Disadvantage Ambiguity Problem: For example, “XP” can refer to
either “Extreme Programming” or “Windows XP”. Group Synonymy Problem: two seemingly different
annotations may bear the same meaning.
Source: Xian Wu, Lei Zhang, Yong Yu. “Exploring Social Annotations for the Semantic Web”
37/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Or Folksonomy is the Solution? Ontology is Overrated
Classification of the web has failed Classification itself is filled with bias and
error Tagging is the solutionSource: http://www.shirky.com/writings/ontology_overrated.html
38/62
Academic Data Analysis
39/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Academic Data Analysis
CiteSeer
Google Scholar
e-Lib, Lib 2.0 concept adding
into application, so search platform
provide open API for collecting more
data
Users participate and
interact with data and people
Add My Library, TagEx. Citeulike, BibSonomy
Add Comments, Rating, Recommendation
Ex. Techlens
Domain Focus GroupsEx. Botanicus
Windows Live Academic Search
PudMed
Arxiv
Citation indexPapers , journal/conference, authors
40/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
An Example
Let’s use an example of TechLen to imagine what research on IR /NLP can do.
Authors Readers
Papers
41/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
The Terminology Alfred V AhoEntities
Alfred Aho AV AhoAho, A. V.References
LinksAlfred Aho, John Hopcroft, Jeffrey Ullman
AV Aho, BW Kernighan, PJ Weinberger
Entity Groups G1(Programming Languages)
G2(Databases)
G3(Algorithms)
42/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Imagine how we can make use of them
Papers
Authors
Readers
Comments
Rating
Reference Extraction
Entity Resolution
43/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
New Research Topics From those changes, key emerging challenge for “Data
Mining” is tackling the problem of dealing with richly structured, finding patterns behind heterogeneous datasets, …, etc.
Several researches focus on those problem like (Social) Network Analysis Link Mining PASCAL Ontology Learning Challenge …
44/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Society
Nodes: individuals (Authors, Readers)
Links: social relationship (family/work/friendship/belong to,…etc.)
S. Milgram (1967)
Social networks: Many individuals with diverse social interactions between them.
John Guare
Six Degrees of Separation,
Science
source: www.cs.uiuc.edu/~hanj
45/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Communication networks
The Earth is developing an electronic nervous system, a network with diverse nodes and links are
-computers
-routers
-satellites
-Papers
-User IP
-Comments
-Response
-…
-phone lines
-TV cables
-EM waves
- Relations between artifacts
Communication networks: Many non-identical components with diverse connections between them.
source: www.cs.uiuc.edu/~hanj
Artifacts in Techlens
46/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Link-based Object Ranking Perhaps the most well known link mining task is that of link-
based object ranking (LBR), which is a primary focus of the link analysis community. The objective of LBR is to exploit the link structure of a graph to order or prioritize the set of objects within the graph.
Example PageRank What paper is most important in this area? What journal/conference is most important in this area? What topic is important in this area?
47/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Link-based Object Classification/ Link-based Classification (LBC)
Predicting the category of an object based on its attributes and its links and attributes of linked objects
Web: Predict the category of a web page, based on words that occur on the page, links between pages, anchor text, html tags, etc.
Citation: Predict the topic of a paper, based on word occurrence, citations, co-citations
Epidemic : Predict disease type based on characteristics of the people; predict person’s age based on ages of people they have been in contact with and disease type
48/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Group Detection Cluster the nodes in the graph into
groups that share common characteristics. That is, Predicting when a set of entities belong to the same group based on clustering both object attribute values and link structure.
Web: identifying communities Citation: identifying research communities
49/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Entity Resolution
Predicting when two objects are the same, based on their attributes and their links Web: predict when two sites are mirrors of
each other Citation: predicting when two citations are
referring to the same paper Epidemics: predicting when two disease
strains are the same Biology: learning when two names refer to
the same protein
50/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Link Prediction Predict whether a link exists between
two entities, based on attributes and other observed links Web: predict if there will be a link between
two pages Citation: predicting if a paper will cite
another paper, or predict the venue type of a publication (conference, journal, workshop) based on properties of the paper
Epidemics: predicting who a patient’s contacts are ( 在流行病學上需要去找出病源( 灶 )/ 傳染源 )
51/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Other Possible Research Directions
Expert Finding like giving a suggestion of Paper Reviewer,
Conference committee member Ecological Evolution of Some Research
Like one topic with different solution in a time period
A domain’s topic distribution
52/62
GEO-Info 地理資訊
53/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
GEO-Info
Google Earth/Map
GISLimited user, limited usage
Open for every one
Google Earth Community
Google Earth Blog
Ogle Earth ….
User Participate
GML
Photo-sharing User Annotation
54/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Some Research Topics Until now, a lot of information can be combined int
o google earth/map by KML. Hence such information can be integrated by geoc
oding, some models become very interesting, such as
Photo Annotation, Sharing, and Search Live information Planning 3D, Flights Animation Travel experience, comments Transportation information, survival information Climate Change
55/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Some Information bundled with Google Earth/Map ( 中山公園 )
Integrated with Youtube (video & tags)
Photo sharing, (photo & Tags)
56/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Some Application Integrate more Information on Map
Personal Life Information Integration
GeoDDupe: A Novel Interface for Interactive Entity Resolution in Geospatial Data
57/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
57
Photo link with Map
Source: http://www.panoramio.com
58/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Image-based Rendering (IBR)
IBR relies on a set of two-dimensional images of a scene to generate a three-dimensional model and then render some novel views of this scene.
Web 2.0 enables sharing of photographs on a truly massive scale
59/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Microsoft PhotoSynth From SIFT to PhotoSynth
60/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Conclusion Research results can be easily integrated on the W
eb 2.0 platform make restricted-domain research more useful for t
he public (such as image-based rendering) Software agent
Benefit human-based computation Certain research topics will be easier to tackle, suc
h as personalization in virtual world (more data available)
Data becomes more task oriented (e.g. Wikipedia) More versatile data networks available
61/62Workshop on Web 2.0 Technology and Applications, Taipei, 2007 Dec. 19th
Academia Sinica
Acknowledgement
謝謝盧文祥教授及鄭卜壬教授的邀請
I would also like to thank two Ph. D. students of mine who help organize the slides: 李政緯,呂俊宏
62/62
Thank You