web intelligence - springer978-3-662-05320-1/1.pdf · vi preface journal, web intelligence and...
TRANSCRIPT
Editors:
NingZhong Knowledge Information Systems Lab. Dept. ofSystems and Information Eng. Maebashi Institute of Technology, 460-1 Kamisadori-Cho Maebashi -City 3 71-0816, J apan
JimingLiu Dept. of Computer Science, Hong Kong Baptist University Kowloon Tong, Hong Kong
YiyuYao Dept. of Computer Science, University of Regina, Regina, Saskatchewan S4S OA2 Canada
Library of Congress Cata!oging-in-Publication Data applied for
Die Deutsche Bibliothek - CIP-Einheitsaufnahme Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at <http:l/dnb.ddb.de>.
ACM Subject Classification (1998): H.3.5, H.5.3, ].2.11, H.3.3, H.2.8
ISBN 978-3-642-07936-8 ISBN 978-3-662-05320-1 (eBook) DOI 10.1007/978-3-662-05320-1 This work is subject to copyright. AH rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law.
© Springer-Verlag Berlin Heidelberg 2003 Originally published by Springer-Verlag Berlin Heidelberg New York in 2003 Softcover reprint of the hardcover 1 st edition 2003
The use of designations, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
Cover Design: KiinkelLopka, Heidelberg Typesetting: Computer to film by author' s data Printed on acid-free paper 45/3111 5 4 3 2 1 SPIN 11308744
Preface
This book is the first coherently written multi-author monograph on Web Intelligence (WI). It offers a thorough introduction and a systematic overview of the new field. It reflects the current state of the research and development in various areas of WI, as well as theoretical and application aspects of WI and Web-based intelligent information systems and services. It highlights several promising WI topics, which will impact on the development of the ultimate Wisdom Web.
The book contains one introductory paper and 19 survey/research papers. The papers are structured into six parts: Web agents, Web mining and farming, Web information retrieval, Web knowledge management, the infrastructure for Web intelligent systems, and social network intelligence.
We conceived and coined the notion Web Intelligence in late 1999. Back then although there was a variety of Web- or Internet-related conferences, journals, and books, none was devoted to the intelligence aspects of Web information systems and services. It was felt that there was a need for a conference, a journal, and/or books for researchers, scientists, and industry practitioners who wanted to publish and exchange ideas on Web Intelligence.
At the 24th Annual International Computer Software and Applications Conference (IEEE COMPSAC) in 2000, we first introduced Web Intelligence. In 2001, the first Web Intelligence conference (WI 2001, http://kis.maebashi-it.ac.jp/wiOll) was successfully held in Maebashi, Japan.
We received quick and vast responses, as well as kind support, from the research community, industry, and reputable scientific publishers. To meet the strong demands for participation and the growing interests in WI, the Web Intelligence Consortium (WIC) was formed in Spring 2002. The WIC (http://wi-consortium.org/) is an international organization dedicated to promoting world-wide scientific research and industrial development in the era of Web and agent intelligence. The WIC specializes in the development and promotion of new WI-related research and technologies through collaborations with WI research centers throughout the world and organization/individual members, technology showcases at WI conferences and workshops, WIC official book and journal publications, the WIC newsletter, and WIC official releases of new industrial solutions and standards.
In addition to various special issues on WI published or being published by several international journals, including IEEE Computer, a WI-focused scientific
VI Preface
journal, Web Intelligence and Agent Systems: An International Journal (WIAS), has been successfully launched as the official journal of the WIC.
This book is recommended by the WIC as the first book on WI research. It is a collaborative effort involving many leading researchers and practitioners who have contributed chapters on their areas of expertise. We wish to express our gratitude to all authors and reviewers for their contributions.
We are very grateful to people who joined or supported the Wl-related research activities, and in particular, the WIC Advisory Board members: Edward A. Feigenbaum, Setsuo Ohsuga, Benjamin Wah, Philip Yu, and Lotfi A. Zadeh. We thank them for their strong support.
Last, but not least, we thank Alfred Hofmann and Ralf Gerstner of SpringerVerlag for their help in coordinating the publication of this monograph and editorial assistance.
Maebashi, Japan Hong Kong Regina, Canada January 2003
Ning Zhong liming Liu
Yiyu Yao
Table of Contents
1. Web Intelligence (WI): A New Paradigm for Developing the Wisdom Web and Social Network Intelligence Ning Zhong, Jiming Liu, and Yiyu Yao ........................... .
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 The Wisdom Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 A Minimalist Wisdom Web Scenario . . . . . . . . . . . . . . . . . . . . 2 1.2.2 Fundamental Capabilities of the Wisdom Web . . . . . . . . . . . . 3
1.3 Levels of WI vs. Sociallntelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.1 Levels of WI vs. Social Intelligence..................... 4 1.3.2 Social Network Intelligence for Enterprise Portals......... 6
1.4 Extensional Description of WI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.5 An Overview of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 References..................................................... 15
Part I. Web Agents
2. Agent-Based Characterization of Web Regularities Jiming Liu, Shiwu Zhang, and Yiming Ye . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.1 Empirical Regularities on the World-Wide Web . . . . . . . . . . . 19 2.1.2 Regularity Characterization. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Problem Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 An Overview of Foraging Agent-Based Web Characterization ..... 21 2.4 Foraging Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.1 Artificial Web Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.2 Interests of Foraging Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4.3 Motivational Support Aggregation . . . . . . . . . . . . . . . . . . . . . . 23 2.4.4 Characterization of Foraging Decisions . . . . . . . . . . . . . . . . . . 24 2.4.5 Motivational Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 An Outline ofthe Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
VIII Table of Contents
2.6.1 A Comparison with Real-World Log Data. . . . . . . . . . . . . . . . 29 2.6.2 Further Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3. Agent-Based Composite Services in DAML-S: the Behavior-Oriented Design of an Intelligent Semantic Web Joanna J. Bryson, David Martin, Sheila A. Mcllraith, and Lynn Andrea Stein 37
3.1 Introduction: Intelligence and the Semantic Web . . . . . . . . . . . . . . . . 37 3.2 Definitions: Agents and Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3 Bringing Services onto the Semantic Web . . . . . . . . . . . . . . . . . . . . . . 40
3.3.1 DAML-S Processes .................................. 42 3.4 Semantic Web Development and Software Agent Architecture . . . . . 44
3.4.1 Modularity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.4.2 Action Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5 Web Services as Agent Behavior.............................. 47 3.5.1 Services as Behavior Modules . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.5.2 Composite Services as Action Selection . . . . . . . . . . . . . . . . . 47 3.5.3 Program, Agent, or Multi-Agent System? . . . . . . . . . . . . . . . . 48
3.6 Implications for DAML-S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.6.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.6.2 Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.6.3 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.6.4 Basic Reactive Plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.6.5 Agent-Level Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.8 Appendix A- Basic Reactive Plans . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4. Designing Scenarios for Social Agents Toru Ishida and Hideyuki Nakanishi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2 Describing Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2.2 Cue and Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2.3 Guarded Command .................................. 61 4.2.4 Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.5 Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Q for Legacy Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3.1 Microsoft Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3.2 Free Walk Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Designing Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.1 Q Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.2 Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Table of Contents IX
4.5 Applying Scenarios......................................... 70 4.5.1 Crisis-Management Simulation ......................... 71 4.5.2 Social Psychological Study of Agents . . . . . . . . . . . . . . . . . . . 73
4.6 Conclusions ............................................... 74 References..................................................... 76
5. Using Agent Technology to Improve the Quality of Web-Based Education W. Lewis Johnson............................................. 77
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.2 Why Guidebots? ........................................... 78 5.3 The Generic ADE Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.4 Hypothesis-Based Reasoning................................. 82
5.4.1 Selecting the Next Evidence-Gathering Step. . . . . . . . . . . . . . 84 5.4.2 Modeling the Student's Knowledge . . . . . . . . . . . . . . . . . . . . . 85 5.4.3 The Student Guidebot Dialogue . . . . . . . . . . . . . . . . . . . . . . . . 85 5.4.4 Student Evaluations . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.5 Generalizing the Work ...................................... 90 5.5.1 VrrtualPresenters .................................... 91 5.5.2 Infrastructure for Simulation Management and Automated
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.6 Future Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Part II. Web Mining and Farming
6. Discovering Business Intelligence Information by Comparing Company Web Sites Bing Liu, Yiming Ma, and PhilipS. Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.1 Introduction ............................................... 105 6.1.1 Interestingness Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.1.2 Summary of the Proposed Approach .................... 108
6.2 Vector Space Representation and Association Rule Mining ........ 109 6.2.1 Vector Space Representation of Text Documents .......... I 09 6.2.2 Finding Concepts Using Association Rule Mining ......... 110
6.3 Proposed Techniques ....................................... 111 6.3.1 Comparing Two Web Sites ............................ 111 6.3.2 Incorporating the User's Existing Knowledge ............. 115
6.4 System Architecture ........................................ 116 6.5 A Running Example ........................................ 117 6.6 Evaluation ................................................ 121
6.6.1 Application experiences ............................... 121
X Table of Contents
6.6.2 Efficiency .......................................... 122 6.7 Related Work .............................................. 123 6.8 Conclusions ............................................... 125 References ..................................................... 125
7. Discovery of Indirect Associations from Web Usage Data Pang-Ning Tan and Vi pin Kumar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.1 Introduction ............................................... 128 7 .1.1 Related Work ....................................... 132
7.2 Preliminaries .............................................. 133 7.2.1 Definition .......................................... 133 7.2.2 NonSequential Indirect Association ..................... 135 7 .2.3 Sequential Indirect Association ........................ 137
7.3 Implementation ............................................ 139 7.3.1 The INDIRECT Algorithm ............................ 139 7.3.2 Combining Indirect Associations ....................... 140
7.4 Experimental Evaluation .................................... 142 7.4.1 Non-sequential Indirect Association .................... 142 7.4.2 Sequential Indirect Association ........................ 144 7.4.3 Performance ........................................ 146 7.4.4 Threshold Selection .................................. 148
7.5 Conclusions ............................................... 148 References ..................................................... 150
8. Knowledge-Based Wrapper Induction for Intelligent Web Information Extraction Jaeyoung Yang and Joongmin Choi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
8.1 Introduction ............................................... 153 8.1.1 Classification of Wrapper Generation ................... 153 8.1.2 Our Approach ....................................... 155
8.2 XTROS System Overview ................................... 156 8.3 Domain Knowledge Specification by XML ..................... 157 8.4 Knowledge-Based Wrapper Generation ........................ 160
8.4.1 Converting HTML Sources into Logical Lines ............ 160 8.4.2 Determining the Meaning of Logical Lines ............... 160 8.4.3 Finding the Most Frequent Pattern ...................... 161 8.4.4 Constructing an XML-Based Wrapper ................... 163 8.4.5 Interpreting the Wrapper .............................. 164
8.5 Implementation and Evaluation ............................... 164 8.6 Conclusions ............................................... 171 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Table of Contents XI
9. Web Log Mining Zhiyong Lu, Yiyu Yao, and Ning Zhong . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
9.1 Introduction ............................................... 173 9.2 Overview of Web Mining .................................... 174
9.2.1 Classification of Web Mining .......................... 174 9.2.2 Web Content Mining ................................. 175 9.2.3 Web Structure Mining ................................ 176 9.2.4 Web Usage/Log Mining ............................... 176 9.2.5 Combinations of Web Content, Structure, and Usage Mining 177
9.3 Data Preparation ........................................... 177 9.3.1 Data Collection ...................................... 178 9.3.2 Data Preprocessing ................................... 180 9.3.3 Data Abstraction ..................................... 180
9.4 Data Mining and Pattern Analysis ............................. 183 9.4.1 Statistical Information ................................ 183 9.4.2 Association Rules .................................... 183 9.4.3 Classification and Clustering ........................... 184 9.4.4 Sequential Patterns ................................... 185 9.4.5 Dependency Modeling ................................ 185 9.4.6 Data Warehousing and OLAP .......................... 185 9 .4. 7 Pattern and Rule Evaluation ........................... 186
9.5 Applications ............................................... 186 9.5.1 Web Pre-fetching and Caching ......................... 187 9.5.2 Improved Website Design and Organization .............. 187 9.5 .3 Web Personalization and Recommendation ............... 187 9.5.4 Adaptive Websites and Pages .......................... 188 9.5.5 Intelligent Web Agents ............................... 189
9.6 Conclusions ............................................... 189 References ..................................................... 189
Part III. Web Information Retrieval
10. Personalized and Focused Web Spiders Michael Chau and Hsinchun Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
10.1 Introduction ............................................... 197 1 0.1.1 Web Spider Research ................................. 197 10.1.2 Applications of Web Spiders ........................... 198 10.1.3 Analysis of Web Content and Structure .................. 199 1 0.1.4 Graph Traversal Algorithms ........................... 202
10.2 Web Spiders for Personal Search .............................. 203 1 0.2.1 Personal Web Spiders ................................. 203 10.2.2 Case Study ......................................... 205
10.3 Using Web Spiders to Create Specialized Search Engines ......... 206
XII Table of Contents
10.3.1 Specialized Search Engines ............................ 207 10.3.2 Focused Spidering Algorithms for Specialized Search En-
gines ............................................... 207 10.3.3 Case Study ......................................... 208
10.4 Conclusions ............................................... 211 10.5 Appendix A: URLs of Spiders and Search Engines ............... 212 References ..................................................... 213
11. Exploiting the Web as Parallel Corpora for Cross-Language Information Retrieval Jian-Yun Nie and Jiang Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
11.1 Introduction ............................................... 218 11.1.1 Query Translation .................................... 219 11.1.2 The Need for Parallel Corpora ......................... 220
11.2 Mining for Parallel Texts - PTMiner ........................... 221 11.2.1 General Principle of Automatic Mining .................. 221 11.2.2 Identification of Candidate Websites .................... 223 11.2.3 File Name Fetching .................................. 224 11.2.4 Host Crawling ....................................... 224 11.2.5 Pair Scan by Names .................................. 225 11.2.6 Filtering by Content .................................. 226 11.2. 7 PTMiner Implementation ............................. 227 11.2.8 Generated Corpora ................................... 228
11.3 Training Statistical Translation Models on Parallel Corpora ....... 229 11.3.1 Sentence Alignment. ................................. 229 11.3.2 Processing of Words .................................. 230 11.3.3 Model Training ...................................... 231
11.4 Evaluation of the Translation Models .......................... 232 11.5 CLIR Experiments ......................................... 234
11.5.1 English-French CLIR ................................. 234 11.5.2 English-Chinese CLIR ................................ 235 11.5.3 Discussions ......................................... 236
11.6 Conclusions ............................................... 237 References ..................................................... 238
Part IV. Web Knowledge Management
12. Knowledge Representation, Sharing, and Retrieval on the Web Philippe Martin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
12.1 Introduction ............................................... 243 12.2 Elements and Landmarks of Knowledge Representation and Shar-
ing on the Web ............................................. 246 12.2.1 Exchange Formats and Programming Interfaces ........... 246
Table of Contents XIII
12.2.2 Ontologies and Knowledge Bases ...................... 247 12.2.3 Ontology Servers .................................... 248 12.2.4 Knowledge Within Web Documents .................... 250
12.3 Requirements for a Viable Semantic Web ...................... 252 12.3.1 Need for a Standard Library of Ontological Primitives ..... 252 12.3.2 Need for Expressive Notations ......................... 253 12.3.3 Need for High-Level (and Expressive) Notations .......... 254 12.3.4 Need for Lexical/Structural/Ontological Conventions ...... 256 12.3.5 Need for Flexible Ways to Refer to a Category ............ 258 12.3.6 Need for a Shared Natural Language Ontology ........... 261 12.3.7 Need for More Centralization .......................... 262
12.4 Mechanisms for Cooperatively Editing a Shared KB ............. 263 12.4.1 Control on Graph Additions ........................... 265
12.5 Search Interfaces and Mechanisms ............................ 265 12.5.1 Searching Categories and Links ........................ 267 12.5.2 Accessing or Adding Graphs Via Generated Interfaces ..... 267 12.5.3 Mechanisms for Searching Graphs ...................... 270
12.6 Conclusions ............................................... 273 References ..................................................... 275
13. On-To-Knowledge: Semantic Web-Enabled Knowledge Management York Sure, Hans Akkermans, Jeen Broekstra, John Davies, Ying Ding, Alistair Duke, Robert Engels, Dieter Fensel, Ian Horrocks, Victor Iosif, Arjohn Kampman, Atanas Kiryakov, Michel Klein, Thorsten Lau, Damyan Ognyanov, Ulrich Reimer, Kiril Simov, Rudi Studer, Jos van der Meer, and Frank van Harmelen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
13.1 Introduction ............................................... 277 13.2 Tool Environment for Ontology-Based Knowledge Management ... 279
13.2.1 RDFferret: Full Text Searching plus RDF Querying ....... 280 13.2.2 OntoShare: Community Support ........................ 281 13.2.3 Spectacle: Information Presentation ..................... 281 13.2.4 OntoEdit: Ontology Development ...................... 283 13.2.5 Ontology Middleware Module: Integration Platform ....... 285 13.2.6 Onto View: Change Management for Ontologies .......... 287 13.2.7 Sesame: Repository for Ontologies and Data ............. 288 13.2.8 CORPORUM: Information Extraction ................... 289
13.3 OIL: Inference Layer for the Semantic World-Wide Web .......... 290 13.3.1 Combining Description Logics with Frame Languages ..... 290 13.3.2 Web Interface ....................................... 291 13.3.3 Layering ........................................... 292 13.3.4 Current Status ....................................... 293 13.3.5 Future Developments ................................. 293
13.4 Business Applications in Semantic Information Access ........... 294
XIV Table of Contents
13.401 On-To-Knowledge Methodology 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 294 13.402 Information Search 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 296 13.403 Skills Management 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 296 13.4.4 Exchanging Knowledge in a Virtual Organization 0 0 0 0 0 0 0 0 0 296
1305 Conclusions 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 297 References 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 299
14. Ontology Learning Part One - on Discovering Taxonomic Relations from the Web Alexander Maedche, Viktor Pekar, and Steffen Staab 0 o o o 0 o o o o 0 0 o o o 0 0 301
1401 Introduction o o o o o 0 o o o o o 0 o o o 0 0 0 0 o 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 301 1402 Survey of Symbolic Approaches 0 o o 0 o o 0 o 0 0 o 0 0 0 0 0 0 0 0 0 o 0 0 0 0 0 o 0 0 0 302
140201 Extraction of Taxonomic Relations 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 302 140202 Refinement of Taxonomic Relations 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 304
1403 Survey of Statistics-Based Approaches 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 304 140301 Statistics-Based Extraction of Taxonomic Relations 0 0 0 0 0 0 0 305 14.302 Statistics-Based Refinement of Taxonomic Relations o o o 0 0 0 309
14.4 Making Use of the Structure of the Ontology 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 310 14.401 Tree Descending Algorithm 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 310 14.402 Tree Ascending Algorithm 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 311
1405 Data and Settings of the Experiments 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 312 1406 Evaluation Method 0 0 o o 0 0 o o o 0 0 o o o 0 0 o 0 o 0 0 0 0 0 0 0 0 o 0 0 0 o 0 0 0 0 0 o 0 0 0 313 1407 Results 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 313 1408 Conclusions o o o o o o o o 0 o o o o 0 0 o o o o o o o o o o o o o o o o o o o o o o o o o 0 o o o o o o 317 References 0 0 0 0 0 0 0 0 0 0 0 o 0 0 0 0 0 0 0 0 o 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 318
Part V. Infrastructure for Web Intelligent Systems
15. Algorithmic Aspects of Web Intelligent Systems Dimitrios Kalles, Athanasios Papagelis, and Christos Zaroliagis o 0 0 o o 0 o o 323
1501 Introduction 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 323 1502 An Overview of the System 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 325
150201 User Interface 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 325 150202 Performance 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 328 150203 Users and Authentication Techniques 0 0 0 0 o o 0 o o 0 o 0 0 o o 0 o o o 331 1502.4 Agent's Inference Engine 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 331
1503 Algorithms 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 332 150301 Data Characteristics and Generic Handling Techniques 0 0 0 0 0 332 150302 Choosing the Next Document 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 333 15.3.3 Finding Interesting Object Collections and Predicting Votes
by Matching Users 0 0 o 0 0 0 0 0 0 0 0 0 0 0 0 o o 0 o o 0 o o o 0 0 o o o 0 0 o 0 0 0 335 1503.4 Finding an Interesting Documents Collection and Predict-
ing Votes Using Na'ive Bayes Analysis 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 336
Table of Contents XV
15.3.5 Matching Related Documents .......................... 338 15.4 Conclusions ............................................... 343 References ..................................................... 343
16. Web Document Prefetching on the Internet Xin Chen and Xiaodong Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
16.1 Introduction: Prefetching at Different Stages .................... 345 16.1.1 DNS Prefetching ..................................... 345 16.1.2 TCP Connection Prefetching ........................... 345 16.1.3 Content Prefetching .................................. 346
16.2 Conditions of Content Prefetching ............................ 34 7 16.2.1 History Information for Pre fetching ..................... 34 7 16.2.2 Expiration Time ..................................... 347 16.2.3 Time for Prefetching ................................. 347
16.3 Classifying Prefetching Methods .............................. 348 16.3.1 Client-Based Prefetching .............................. 348 16.3.2 Proxy-Based Prefetching .............................. 349 16.3.3 Server-Based Prefetching ............................. 350 16.3.4 Cooperative Prefetching .............................. 351
16.4 Prefetching Structure and Optimization ........................ 353 16.4.1 PPM ............................................... 353 16.4.2 Longest Repeating Subsequence ........................ 355 16.4.3 Popularity-Based PPM ................................ 355
16.5 Performance Evaluations on Prefetching ....................... 357 16.5.1 Latency Reduction Bounds ............................ 357 16.5.2 Prefetching Effects on Networks ....................... 358 16.5.3 Tradeoff Analysis .................................... 359
16.6 Other Variants of Prefetching ................................. 360 16.6.1 Real-Time Prediction ................................. 360 16.6.2 Prefetching for Multimedia on the Internet ............... 360 16.6.3 Predict HTTP Requests for Dynamic Content. ............ 361
16.7 Related Applications ........................................ 361 16.7.1 Search Engines ...................................... 361 16.7.2 Recommender Systems ............................... 361
16.8 Conclusions ............................................... 361 References ..................................................... 362
Part VI. Social Network Intelligence
17. Social Networks: From the Web to Knowledge Management Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
17.1 Introduction ............................................... 367
XVI Table of Contents
17.2 Link Analysis of the Web .................................... 368 17.3 Communities on the Web .................................... 370 17.4 Connectivity and the Diameter ofthe Web ...................... 371 17.5 Fractal Nature of the Web .................................... 374 17.6 Social Networks for Knowledge Management ................... 375
17 .6.1 Enterprise Knowledge Management ..................... 377 17.7 Conclusions ............................................... 378 References ..................................................... 378
18. A Ranking Algorithm Based on Graph Topology to Generate Reputation or Relevance Josep M. Pujol, Ramon Sangiiesa, and Jordi Delgado . . . . . . . . . . . . . . . . 380
18.1 Introduction ............................................... 380 18.2 Social Networks ........................................... 381 18.3 Ranking Algorithms ........................................ 383
18.3.1 Overview of Pagerank ................................ 383 18.3.2 Overview of HITS ................................... 384 18.3.3 Our Proposal: the NodeRanking Algorithm ............... 385 18.3.4 Comparisons ...................................... ~ . 387
18.4 Experiments About Ranking, Reputation, and Relevance .......... 388 18.4.1 Extracting Reputation from Social Networks ............. 388 18.4.2 Extracting Relevance from the Web ..................... 391
18.5 Conclusions ............................................... 392 References ..................................................... 393
19. Communityware That Facilitates Knowledge Interactions Yasuyuki Sumi and Kenji Mase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
19.1 Introduction ............................................... 395 19.2 PalmGuide: Personal Tour Assistant ........................... 398 19.3 Semantic Map: Visual Explorer of Community Information ....... 401 19.4 AgentSalon: Facilitating Face-to-Face Conversations ............. 403 19.5 Experiments and Evaluation .................................. 410 19.6 Related Work .............................................. 415 19.7 Conclusions ............................................... 416 References ..................................................... 417
20. Social Intelligence Design for Web Intelligence Toyoaki Nishida ............................................. 419
20.1 Introduction: Social Intelligence Design for Web Intelligence ...... 419 20.2 Overview of Social Intelligence Design ........................ 421
20.2.1 Groups and Communities ............................. 421 20.2.2 Issues of Social Intelligence Design ..................... 422 20.2.3 Applications of Social Intelligence Design ............... 424
Table of Contents XVII
20.3 The Traveling Conversation Model ............................ 425 20.4 A Broadcast-Based Approach ................................ 425 20.5 A Conversational Agent-Based Approach ...................... 429 20.6 A Smart Environment-Based Approach ........................ 431 20.7 Psychological Evaluation .................................... 433 20.8 Technical Issues ............................................ 435 20.9 Conclusions ............................................... 435 References ..................................................... 436
Author Index ................................................... 438
Subject Index ................................................... 439
List of Contributors
Hans Akkermans Free University Amsterdam Division of Math. & Computer Science De Boelelaan 1081a NL-1 081 HV Amsterdam The Netherlands HansAkkermans@cs. vu.nl
Jeen Broekstra Free University Amsterdam Faculty of Sciences Division of Math. & Computer Science De Boelelaan 1081a NL-1 081 HV Amsterdam The Netherlands jbroeks@ cs. vu.nl
Joanna J. Bryson Dept. of Computer Science University of Bath BATH BA2 7AY, UK J.J .Bryson @cs.bath.ac.uk
Michael Chau Dept. of Management Information Sys. University of Arizona Tucson, AZ 85721, USA mchau @bpa.arizona.edu
Hsinchun Chen Dept. of Management Information Sys. University of Arizona Tucson, AZ 85721, USA hchen @bpa.arizona.edu
Jiang Chen DIRO, Universite de Montreal CP. 6128, succursale Centre-Ville Montreal, Quebec, H3C 3J7 Canada chen @iro. umontreal.ca
XinChen College of William and Mary Williamsburg, VA 23185, USA xinchen @cs. wm.edu
Joongmin Choi Dept. of Computer Science & Eng. Hanyang University 1271 Sa-l Dong Ansan Kyunggi-Do 425-791, Korea jmchoi @cse.hanyang.ac.kr
John Davies British Telecommunications pic. BT Adastral Park Martlesham Heath Ipswich IPS 3RE, UK [email protected]
Jordi Delgado Software Department Technical University of Catalonia C/J ordi Girona 1-3 C6-105 08034 Barcelona, Spain jdelgado@ lsi.upc.es
XX List of Contributors
YingDing Free University Amsterdam Faculty of Sciences Division of Math. & Computer Science De Boelelaan 1081a NL-1 081 HV Amsterdam The Netherlands [email protected]
Alistair Duke British Telecommunications plc. BT Adastral Park Martlesham Heath Ipswich IPS 3RE, UK alistair. duke@ bt.com
Robert Engels CogniTa.s. Busterudgt 1 N-1754 Halden, Norway [email protected]
Dieter Fensel Institute for Computer Science University of Innsbruck Technikerstrasse 25 6020 Innsbruck, Austria dieter.fensel @uibk.ac.at
Frank van Harmelen Free University Amsterdam Faculty of Sciences Division of Math. & Computer Science De Boelelaan 1081a NL-1 081 HV Amsterdam The Netherlands Frank. van.Harmelen @cs. vu.nl
Ian Horrocks Department of Computer Science University of Manchester Oxford Road Manchester M13 9PL, UK [email protected]
Victor Iosif EnerSearch AB SE 205 09 Malmo Sweden victor.iosif@ fek.lu.se
Toru Ishida Dept. of Social Informatics Kyoto University Yoshida-Honmachi, Sakyo Kyoto 606-8501, Japan ishida@ i.kyoto-u.ac.jp
W. Lewis Johnson CARTE USC/Information Sciences Institute Marina del Rey, CA 90292, USA johnson @isi.edu
Dimitrios Kalles AHEAD Relationship Mediators 65 Oth.-Amalias St. 26221 Patras, Greece and Dept. of Computer Eng. & Informatics University of Patras 26500 Patras, Greece kalles@ aheadrm.com
Arjohn Kampman Aldministrator Nederland BV Juliaplein 14B 3817CS Amersfoort The Netherlands akam@ aidministrator.nl
Atanas Kiryakov OntoText Lab. Sirma AI Ltd. 38A Hristo Botev blvd. Sofia 1000, Bulgaria naso@ sirma.bg
Michel Klein Free University Amsterdam Faculty of Sciences Division of Math. & Computer Science De Boelelaan 1081a NL-1 081 HV Amsterdam The Netherlands Michel. Klein@ cs. vu.nl
RaviKumar IBM Almaden Research Center 650 Harry Road, San Jose CA 95120 USA ravi@ almaden.ibm.com
VipinKumar Dept. of Computer Science & Eng. University of Minnesota 4-192 EE/CSci Building 200 Union Street SE Minneapolis MN 55455, USA [email protected]
Thorsten Lau Rentenanstalt/Swiss Life IT Coordination (CC/ITC) P.O. Box CH-8022, Zurich Switzerland Thorsten.Lau@ gmx.net
BingLiu Department of Computer Science University of Illinois at Chicago 851 South Morgan Street Chicago, IL 60607-7053 USA [email protected]
JimingLiu Department of Computer Science Hong Kong Baptist University Kowloon Tong, Hong Kong jiming @comp.hkbu.edu.hk
List of Contributors XXI
Zhiyong Lu Department of Computer Science University of Regina Regina, Saskatchewan S4S OA2 Canada luzhiyzh @cs. uregina.ca
YimingMa Information and Computer Science University of California at Irvine Irvine, CA 92697-3425, USA [email protected]
Alexander Maedche FZI, University of Karlsruhe Haid-und-Neu-Str. 10-14 76131 Karlsruhe, Germany [email protected]
David Martin Artificial Intelligence Center SRI International Menlo Park, CA 94025, USA martin@ ai.sri.com
Philippe Martin Distributed System Technology Centre Brisbane, 4072 Australia philippe.martin@ gu.edu.au
Kenji Mase ATR Media Information Science Labs. Seika-cho, Soraku-gun, Kyoto 619-0288 Japan [email protected]
Sheila A. Mcllraith Stanford University Knowledge Systems Lab Stanford, CA 94305 USA sam@ ksl. stanford.edu
XXII List of Contributors
Jos van der Meer Aldministrator Nederland BV Juliaplein 14B 3817CS Amersfoort The Netherlands Jos. van.der.Meer@ Aldministrator.nl
Hideyuki Nakanishi Dept. of Social Informatics Kyoto University Yoshida-Honmachi, Sakyo Kyoto 606-8501, Japan [email protected]
Jian-Yon Nie DIRO, Universite de Montreal CP. 6128, succursale Centre-Ville Montreal, Quebec, H3C 3J7 Canada nie@ iro. umontreal.ca
Toyoaki Nishida Dept. of Infor. & Communication Eng. University of Tokyo 7-3-1 Hongo, Bunkyo-ku Tokyo 113-8656, Japan [email protected]
Damyan Ognyanov Onto Text Lab. Sirma AI Ltd. 38A Hristo Botev blvd. Sofia 1000, Bulgaria [email protected]
Athanasios Papagelis AHEAD Relationship Mediators 65 Oth.-Amalias St. 26221 Patras, Greece Dept of Computer Eng. & Informatics University of Patras 26500 Patras, Greece papagel@ ceid. upatras.gr
Viktor Pekar Bashkir State University Okt.Revolutsii 3a, Ufa 450000 Russia [email protected]
Josep M. Pujol Software Department Technical University of Catalonia C/J ordi Girona 1-3 C5-221 08034 Barcelona Spain jmpujol@ lsi.upc.es
Prabhakar Raghavan Verity, Inc. 892 Ross Drive, Sunnyvale CA 94089, USA pragh @verity.com
Sridhar Rajagopalan IBM Almaden Research Center 650 Harry Road, San Jose CA 95120, USA sridhar@ almaden.ibm.com
Ulrich Reimer Rentenanstalt/Swiss Life IT Coordination (CC/ITC) P.O. Box CH-8022, Zurich Switzerland ulrich.reimer@ acm.org
Ramon Sangiiesa Software Department Technical University of Catalonia C/J ordi Girona 1-3 C6-204 08034 Barcelona Spain sanguesa@ lsi.upc.es
Kiril Simov OntoText Lab. Sirma AI Ltd. 38A Hristo Botev blvd., Sofia 1000 Bulgaria kivs@ bultreebank.org
Steffen Staab Institute AIFB University of Karlsruhe Postfach, 76128 Karlsruhe Germany staab@ aifb. uni-karlsruhe.de
Lynn Andrea Stein Computers and Cognition Lab Franklin W. Olin College of Eng. Needham, MA 02492, USA [email protected]
Rudi Studer Institute AIFB University of Karlsruhe Postfach, 76128 Karlsruhe Germany studer@ aifb.uni-karlsruhe. de
Yasuyuki Sumi ATR Media Information Science Labs. Seika-cho, Soraku-gun, Kyoto 619-0288 Japan sumi@ atr.co.jp
York Sure Institute AIFB University of Karlsruhe Postfach, 76128 Karlsruhe Germany [email protected]
List of Contributors XXIII
Pang-Ning Tan AHPCRC I University of Minnesota 1100 Washington AvenueS, #101 Minneapolis, MN 55415, USA ptan @cs.umn.edu
Andrew Tomkins IBM Almaden Research Center 650 Harry Road, San Jose CA 95120, USA tomkins@ almaden.ibm.com
Jaeyoung Yang Dept. of Computer Science and Eng. Hanyang University 1271 Sa-l Dong Ansan, Kyunggi-Do 425-791 Korea jyyang@ cse.hanyang.ac.kr
Yiyu Yao Department of Computer Science University of Regina Regina, Saskatchewan, S4S OA2 Canada [email protected]
YimingYe IBM T. J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA [email protected]
PhilipS. Yu IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA psyu @us.ibm.com
XXIV List of Contributors
Christos Zaroliagis Computer Technology Institute, P.O. Box 1122, 26110 Patras, Greece and Dept of Computer Eng. & Informatics University of Patras 26500 Patras, Greece zaro@ ceid. upatras.gr
Shiwu Zhang Department of Computer Science Hong Kong Baptist University Kowloon Tong, Hong Kong [email protected]
Xiaodong Zhang College of William and Mary Williamsburg, VA 23185, USA zhang@cs. wm.edu
NingZhong Dept. of Systems & Information Eng. Maebashi Institute of Technology 460-1 Kamisadori-Cho Maebashi-City 371-0816, Japan [email protected]