web intelligence - springer978-3-662-05320-1/1.pdf · vi preface journal, web intelligence and...

23
Web Intelligence

Upload: nguyendieu

Post on 07-Sep-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Web Intelligence

Springer-Verlag Berlin Heidelberg GmbH

Ning Zhong Jiming Liu Yiyu Yao (Eds.)

Web Intelligence With 126 Figures and 42 Tables

Springer

Editors:

NingZhong Knowledge Information Systems Lab. Dept. ofSystems and Information Eng. Maebashi Institute of Technology, 460-1 Kamisadori-Cho Maebashi -City 3 71-0816, J apan

JimingLiu Dept. of Computer Science, Hong Kong Baptist University Kowloon Tong, Hong Kong

YiyuYao Dept. of Computer Science, University of Regina, Regina, Saskatchewan S4S OA2 Canada

Library of Congress Cata!oging-in-Publication Data applied for

Die Deutsche Bibliothek - CIP-Einheitsaufnahme Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at <http:l/dnb.ddb.de>.

ACM Subject Classification (1998): H.3.5, H.5.3, ].2.11, H.3.3, H.2.8

ISBN 978-3-642-07936-8 ISBN 978-3-662-05320-1 (eBook) DOI 10.1007/978-3-662-05320-1 This work is subject to copyright. AH rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law.

© Springer-Verlag Berlin Heidelberg 2003 Originally published by Springer-Verlag Berlin Heidelberg New York in 2003 Softcover reprint of the hardcover 1 st edition 2003

The use of designations, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

Cover Design: KiinkelLopka, Heidelberg Typesetting: Computer to film by author' s data Printed on acid-free paper 45/3111 5 4 3 2 1 SPIN 11308744

Preface

This book is the first coherently written multi-author monograph on Web Intelli­gence (WI). It offers a thorough introduction and a systematic overview of the new field. It reflects the current state of the research and development in various areas of WI, as well as theoretical and application aspects of WI and Web-based intelligent information systems and services. It highlights several promising WI topics, which will impact on the development of the ultimate Wisdom Web.

The book contains one introductory paper and 19 survey/research papers. The papers are structured into six parts: Web agents, Web mining and farming, Web information retrieval, Web knowledge management, the infrastructure for Web in­telligent systems, and social network intelligence.

We conceived and coined the notion Web Intelligence in late 1999. Back then although there was a variety of Web- or Internet-related conferences, journals, and books, none was devoted to the intelligence aspects of Web information systems and services. It was felt that there was a need for a conference, a journal, and/or books for researchers, scientists, and industry practitioners who wanted to publish and exchange ideas on Web Intelligence.

At the 24th Annual International Computer Software and Applications Confer­ence (IEEE COMPSAC) in 2000, we first introduced Web Intelligence. In 2001, the first Web Intelligence conference (WI 2001, http://kis.maebashi-it.ac.jp/wiOll) was successfully held in Maebashi, Japan.

We received quick and vast responses, as well as kind support, from the research community, industry, and reputable scientific publishers. To meet the strong de­mands for participation and the growing interests in WI, the Web Intelligence Con­sortium (WIC) was formed in Spring 2002. The WIC (http://wi-consortium.org/) is an international organization dedicated to promoting world-wide scientific re­search and industrial development in the era of Web and agent intelligence. The WIC specializes in the development and promotion of new WI-related research and technologies through collaborations with WI research centers throughout the world and organization/individual members, technology showcases at WI conferences and workshops, WIC official book and journal publications, the WIC newsletter, and WIC official releases of new industrial solutions and standards.

In addition to various special issues on WI published or being published by several international journals, including IEEE Computer, a WI-focused scientific

VI Preface

journal, Web Intelligence and Agent Systems: An International Journal (WIAS), has been successfully launched as the official journal of the WIC.

This book is recommended by the WIC as the first book on WI research. It is a collaborative effort involving many leading researchers and practitioners who have contributed chapters on their areas of expertise. We wish to express our gratitude to all authors and reviewers for their contributions.

We are very grateful to people who joined or supported the Wl-related research activities, and in particular, the WIC Advisory Board members: Edward A. Feigen­baum, Setsuo Ohsuga, Benjamin Wah, Philip Yu, and Lotfi A. Zadeh. We thank them for their strong support.

Last, but not least, we thank Alfred Hofmann and Ralf Gerstner of Springer­Verlag for their help in coordinating the publication of this monograph and editorial assistance.

Maebashi, Japan Hong Kong Regina, Canada January 2003

Ning Zhong liming Liu

Yiyu Yao

Table of Contents

1. Web Intelligence (WI): A New Paradigm for Developing the Wisdom Web and Social Network Intelligence Ning Zhong, Jiming Liu, and Yiyu Yao ........................... .

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 The Wisdom Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 A Minimalist Wisdom Web Scenario . . . . . . . . . . . . . . . . . . . . 2 1.2.2 Fundamental Capabilities of the Wisdom Web . . . . . . . . . . . . 3

1.3 Levels of WI vs. Sociallntelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.1 Levels of WI vs. Social Intelligence..................... 4 1.3.2 Social Network Intelligence for Enterprise Portals......... 6

1.4 Extensional Description of WI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.5 An Overview of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 References..................................................... 15

Part I. Web Agents

2. Agent-Based Characterization of Web Regularities Jiming Liu, Shiwu Zhang, and Yiming Ye . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.1 Empirical Regularities on the World-Wide Web . . . . . . . . . . . 19 2.1.2 Regularity Characterization. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Problem Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 An Overview of Foraging Agent-Based Web Characterization ..... 21 2.4 Foraging Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.4.1 Artificial Web Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.2 Interests of Foraging Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4.3 Motivational Support Aggregation . . . . . . . . . . . . . . . . . . . . . . 23 2.4.4 Characterization of Foraging Decisions . . . . . . . . . . . . . . . . . . 24 2.4.5 Motivational Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.5 An Outline ofthe Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

VIII Table of Contents

2.6.1 A Comparison with Real-World Log Data. . . . . . . . . . . . . . . . 29 2.6.2 Further Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3. Agent-Based Composite Services in DAML-S: the Behavior-Oriented Design of an Intelligent Semantic Web Joanna J. Bryson, David Martin, Sheila A. Mcllraith, and Lynn Andrea Stein 37

3.1 Introduction: Intelligence and the Semantic Web . . . . . . . . . . . . . . . . 37 3.2 Definitions: Agents and Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3 Bringing Services onto the Semantic Web . . . . . . . . . . . . . . . . . . . . . . 40

3.3.1 DAML-S Processes .................................. 42 3.4 Semantic Web Development and Software Agent Architecture . . . . . 44

3.4.1 Modularity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.4.2 Action Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.5 Web Services as Agent Behavior.............................. 47 3.5.1 Services as Behavior Modules . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.5.2 Composite Services as Action Selection . . . . . . . . . . . . . . . . . 47 3.5.3 Program, Agent, or Multi-Agent System? . . . . . . . . . . . . . . . . 48

3.6 Implications for DAML-S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.6.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.6.2 Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.6.3 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.6.4 Basic Reactive Plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.6.5 Agent-Level Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.8 Appendix A- Basic Reactive Plans . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4. Designing Scenarios for Social Agents Toru Ishida and Hideyuki Nakanishi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.2 Describing Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2.2 Cue and Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2.3 Guarded Command .................................. 61 4.2.4 Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.5 Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.3 Q for Legacy Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3.1 Microsoft Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3.2 Free Walk Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.4 Designing Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.1 Q Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.2 Design Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Table of Contents IX

4.5 Applying Scenarios......................................... 70 4.5.1 Crisis-Management Simulation ......................... 71 4.5.2 Social Psychological Study of Agents . . . . . . . . . . . . . . . . . . . 73

4.6 Conclusions ............................................... 74 References..................................................... 76

5. Using Agent Technology to Improve the Quality of Web-Based Education W. Lewis Johnson............................................. 77

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.2 Why Guidebots? ........................................... 78 5.3 The Generic ADE Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.4 Hypothesis-Based Reasoning................................. 82

5.4.1 Selecting the Next Evidence-Gathering Step. . . . . . . . . . . . . . 84 5.4.2 Modeling the Student's Knowledge . . . . . . . . . . . . . . . . . . . . . 85 5.4.3 The Student Guidebot Dialogue . . . . . . . . . . . . . . . . . . . . . . . . 85 5.4.4 Student Evaluations . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.5 Generalizing the Work ...................................... 90 5.5.1 VrrtualPresenters .................................... 91 5.5.2 Infrastructure for Simulation Management and Automated

Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.6 Future Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

Part II. Web Mining and Farming

6. Discovering Business Intelligence Information by Comparing Company Web Sites Bing Liu, Yiming Ma, and PhilipS. Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.1 Introduction ............................................... 105 6.1.1 Interestingness Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.1.2 Summary of the Proposed Approach .................... 108

6.2 Vector Space Representation and Association Rule Mining ........ 109 6.2.1 Vector Space Representation of Text Documents .......... I 09 6.2.2 Finding Concepts Using Association Rule Mining ......... 110

6.3 Proposed Techniques ....................................... 111 6.3.1 Comparing Two Web Sites ............................ 111 6.3.2 Incorporating the User's Existing Knowledge ............. 115

6.4 System Architecture ........................................ 116 6.5 A Running Example ........................................ 117 6.6 Evaluation ................................................ 121

6.6.1 Application experiences ............................... 121

X Table of Contents

6.6.2 Efficiency .......................................... 122 6.7 Related Work .............................................. 123 6.8 Conclusions ............................................... 125 References ..................................................... 125

7. Discovery of Indirect Associations from Web Usage Data Pang-Ning Tan and Vi pin Kumar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

7.1 Introduction ............................................... 128 7 .1.1 Related Work ....................................... 132

7.2 Preliminaries .............................................. 133 7.2.1 Definition .......................................... 133 7.2.2 NonSequential Indirect Association ..................... 135 7 .2.3 Sequential Indirect Association ........................ 137

7.3 Implementation ............................................ 139 7.3.1 The INDIRECT Algorithm ............................ 139 7.3.2 Combining Indirect Associations ....................... 140

7.4 Experimental Evaluation .................................... 142 7.4.1 Non-sequential Indirect Association .................... 142 7.4.2 Sequential Indirect Association ........................ 144 7.4.3 Performance ........................................ 146 7.4.4 Threshold Selection .................................. 148

7.5 Conclusions ............................................... 148 References ..................................................... 150

8. Knowledge-Based Wrapper Induction for Intelligent Web Information Extraction Jaeyoung Yang and Joongmin Choi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

8.1 Introduction ............................................... 153 8.1.1 Classification of Wrapper Generation ................... 153 8.1.2 Our Approach ....................................... 155

8.2 XTROS System Overview ................................... 156 8.3 Domain Knowledge Specification by XML ..................... 157 8.4 Knowledge-Based Wrapper Generation ........................ 160

8.4.1 Converting HTML Sources into Logical Lines ............ 160 8.4.2 Determining the Meaning of Logical Lines ............... 160 8.4.3 Finding the Most Frequent Pattern ...................... 161 8.4.4 Constructing an XML-Based Wrapper ................... 163 8.4.5 Interpreting the Wrapper .............................. 164

8.5 Implementation and Evaluation ............................... 164 8.6 Conclusions ............................................... 171 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

Table of Contents XI

9. Web Log Mining Zhiyong Lu, Yiyu Yao, and Ning Zhong . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

9.1 Introduction ............................................... 173 9.2 Overview of Web Mining .................................... 174

9.2.1 Classification of Web Mining .......................... 174 9.2.2 Web Content Mining ................................. 175 9.2.3 Web Structure Mining ................................ 176 9.2.4 Web Usage/Log Mining ............................... 176 9.2.5 Combinations of Web Content, Structure, and Usage Mining 177

9.3 Data Preparation ........................................... 177 9.3.1 Data Collection ...................................... 178 9.3.2 Data Preprocessing ................................... 180 9.3.3 Data Abstraction ..................................... 180

9.4 Data Mining and Pattern Analysis ............................. 183 9.4.1 Statistical Information ................................ 183 9.4.2 Association Rules .................................... 183 9.4.3 Classification and Clustering ........................... 184 9.4.4 Sequential Patterns ................................... 185 9.4.5 Dependency Modeling ................................ 185 9.4.6 Data Warehousing and OLAP .......................... 185 9 .4. 7 Pattern and Rule Evaluation ........................... 186

9.5 Applications ............................................... 186 9.5.1 Web Pre-fetching and Caching ......................... 187 9.5.2 Improved Website Design and Organization .............. 187 9.5 .3 Web Personalization and Recommendation ............... 187 9.5.4 Adaptive Websites and Pages .......................... 188 9.5.5 Intelligent Web Agents ............................... 189

9.6 Conclusions ............................................... 189 References ..................................................... 189

Part III. Web Information Retrieval

10. Personalized and Focused Web Spiders Michael Chau and Hsinchun Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

10.1 Introduction ............................................... 197 1 0.1.1 Web Spider Research ................................. 197 10.1.2 Applications of Web Spiders ........................... 198 10.1.3 Analysis of Web Content and Structure .................. 199 1 0.1.4 Graph Traversal Algorithms ........................... 202

10.2 Web Spiders for Personal Search .............................. 203 1 0.2.1 Personal Web Spiders ................................. 203 10.2.2 Case Study ......................................... 205

10.3 Using Web Spiders to Create Specialized Search Engines ......... 206

XII Table of Contents

10.3.1 Specialized Search Engines ............................ 207 10.3.2 Focused Spidering Algorithms for Specialized Search En-

gines ............................................... 207 10.3.3 Case Study ......................................... 208

10.4 Conclusions ............................................... 211 10.5 Appendix A: URLs of Spiders and Search Engines ............... 212 References ..................................................... 213

11. Exploiting the Web as Parallel Corpora for Cross-Language Information Retrieval Jian-Yun Nie and Jiang Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

11.1 Introduction ............................................... 218 11.1.1 Query Translation .................................... 219 11.1.2 The Need for Parallel Corpora ......................... 220

11.2 Mining for Parallel Texts - PTMiner ........................... 221 11.2.1 General Principle of Automatic Mining .................. 221 11.2.2 Identification of Candidate Websites .................... 223 11.2.3 File Name Fetching .................................. 224 11.2.4 Host Crawling ....................................... 224 11.2.5 Pair Scan by Names .................................. 225 11.2.6 Filtering by Content .................................. 226 11.2. 7 PTMiner Implementation ............................. 227 11.2.8 Generated Corpora ................................... 228

11.3 Training Statistical Translation Models on Parallel Corpora ....... 229 11.3.1 Sentence Alignment. ................................. 229 11.3.2 Processing of Words .................................. 230 11.3.3 Model Training ...................................... 231

11.4 Evaluation of the Translation Models .......................... 232 11.5 CLIR Experiments ......................................... 234

11.5.1 English-French CLIR ................................. 234 11.5.2 English-Chinese CLIR ................................ 235 11.5.3 Discussions ......................................... 236

11.6 Conclusions ............................................... 237 References ..................................................... 238

Part IV. Web Knowledge Management

12. Knowledge Representation, Sharing, and Retrieval on the Web Philippe Martin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

12.1 Introduction ............................................... 243 12.2 Elements and Landmarks of Knowledge Representation and Shar-

ing on the Web ............................................. 246 12.2.1 Exchange Formats and Programming Interfaces ........... 246

Table of Contents XIII

12.2.2 Ontologies and Knowledge Bases ...................... 247 12.2.3 Ontology Servers .................................... 248 12.2.4 Knowledge Within Web Documents .................... 250

12.3 Requirements for a Viable Semantic Web ...................... 252 12.3.1 Need for a Standard Library of Ontological Primitives ..... 252 12.3.2 Need for Expressive Notations ......................... 253 12.3.3 Need for High-Level (and Expressive) Notations .......... 254 12.3.4 Need for Lexical/Structural/Ontological Conventions ...... 256 12.3.5 Need for Flexible Ways to Refer to a Category ............ 258 12.3.6 Need for a Shared Natural Language Ontology ........... 261 12.3.7 Need for More Centralization .......................... 262

12.4 Mechanisms for Cooperatively Editing a Shared KB ............. 263 12.4.1 Control on Graph Additions ........................... 265

12.5 Search Interfaces and Mechanisms ............................ 265 12.5.1 Searching Categories and Links ........................ 267 12.5.2 Accessing or Adding Graphs Via Generated Interfaces ..... 267 12.5.3 Mechanisms for Searching Graphs ...................... 270

12.6 Conclusions ............................................... 273 References ..................................................... 275

13. On-To-Knowledge: Semantic Web-Enabled Knowledge Management York Sure, Hans Akkermans, Jeen Broekstra, John Davies, Ying Ding, Al­istair Duke, Robert Engels, Dieter Fensel, Ian Horrocks, Victor Iosif, Ar­john Kampman, Atanas Kiryakov, Michel Klein, Thorsten Lau, Damyan Ognyanov, Ulrich Reimer, Kiril Simov, Rudi Studer, Jos van der Meer, and Frank van Harmelen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

13.1 Introduction ............................................... 277 13.2 Tool Environment for Ontology-Based Knowledge Management ... 279

13.2.1 RDFferret: Full Text Searching plus RDF Querying ....... 280 13.2.2 OntoShare: Community Support ........................ 281 13.2.3 Spectacle: Information Presentation ..................... 281 13.2.4 OntoEdit: Ontology Development ...................... 283 13.2.5 Ontology Middleware Module: Integration Platform ....... 285 13.2.6 Onto View: Change Management for Ontologies .......... 287 13.2.7 Sesame: Repository for Ontologies and Data ............. 288 13.2.8 CORPORUM: Information Extraction ................... 289

13.3 OIL: Inference Layer for the Semantic World-Wide Web .......... 290 13.3.1 Combining Description Logics with Frame Languages ..... 290 13.3.2 Web Interface ....................................... 291 13.3.3 Layering ........................................... 292 13.3.4 Current Status ....................................... 293 13.3.5 Future Developments ................................. 293

13.4 Business Applications in Semantic Information Access ........... 294

XIV Table of Contents

13.401 On-To-Knowledge Methodology 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 294 13.402 Information Search 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 296 13.403 Skills Management 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 296 13.4.4 Exchanging Knowledge in a Virtual Organization 0 0 0 0 0 0 0 0 0 296

1305 Conclusions 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 297 References 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 299

14. Ontology Learning Part One - on Discovering Taxonomic Relations from the Web Alexander Maedche, Viktor Pekar, and Steffen Staab 0 o o o 0 o o o o 0 0 o o o 0 0 301

1401 Introduction o o o o o 0 o o o o o 0 o o o 0 0 0 0 o 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 301 1402 Survey of Symbolic Approaches 0 o o 0 o o 0 o 0 0 o 0 0 0 0 0 0 0 0 0 o 0 0 0 0 0 o 0 0 0 302

140201 Extraction of Taxonomic Relations 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 302 140202 Refinement of Taxonomic Relations 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 304

1403 Survey of Statistics-Based Approaches 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 304 140301 Statistics-Based Extraction of Taxonomic Relations 0 0 0 0 0 0 0 305 14.302 Statistics-Based Refinement of Taxonomic Relations o o o 0 0 0 309

14.4 Making Use of the Structure of the Ontology 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 310 14.401 Tree Descending Algorithm 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 310 14.402 Tree Ascending Algorithm 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 311

1405 Data and Settings of the Experiments 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 312 1406 Evaluation Method 0 0 o o 0 0 o o o 0 0 o o o 0 0 o 0 o 0 0 0 0 0 0 0 0 o 0 0 0 o 0 0 0 0 0 o 0 0 0 313 1407 Results 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 313 1408 Conclusions o o o o o o o o 0 o o o o 0 0 o o o o o o o o o o o o o o o o o o o o o o o o o 0 o o o o o o 317 References 0 0 0 0 0 0 0 0 0 0 0 o 0 0 0 0 0 0 0 0 o 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 318

Part V. Infrastructure for Web Intelligent Systems

15. Algorithmic Aspects of Web Intelligent Systems Dimitrios Kalles, Athanasios Papagelis, and Christos Zaroliagis o 0 0 o o 0 o o 323

1501 Introduction 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 323 1502 An Overview of the System 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 325

150201 User Interface 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 325 150202 Performance 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 328 150203 Users and Authentication Techniques 0 0 0 0 o o 0 o o 0 o 0 0 o o 0 o o o 331 1502.4 Agent's Inference Engine 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 331

1503 Algorithms 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 332 150301 Data Characteristics and Generic Handling Techniques 0 0 0 0 0 332 150302 Choosing the Next Document 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 333 15.3.3 Finding Interesting Object Collections and Predicting Votes

by Matching Users 0 0 o 0 0 0 0 0 0 0 0 0 0 0 0 o o 0 o o 0 o o o 0 0 o o o 0 0 o 0 0 0 335 1503.4 Finding an Interesting Documents Collection and Predict-

ing Votes Using Na'ive Bayes Analysis 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 336

Table of Contents XV

15.3.5 Matching Related Documents .......................... 338 15.4 Conclusions ............................................... 343 References ..................................................... 343

16. Web Document Prefetching on the Internet Xin Chen and Xiaodong Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

16.1 Introduction: Prefetching at Different Stages .................... 345 16.1.1 DNS Prefetching ..................................... 345 16.1.2 TCP Connection Prefetching ........................... 345 16.1.3 Content Prefetching .................................. 346

16.2 Conditions of Content Prefetching ............................ 34 7 16.2.1 History Information for Pre fetching ..................... 34 7 16.2.2 Expiration Time ..................................... 347 16.2.3 Time for Prefetching ................................. 347

16.3 Classifying Prefetching Methods .............................. 348 16.3.1 Client-Based Prefetching .............................. 348 16.3.2 Proxy-Based Prefetching .............................. 349 16.3.3 Server-Based Prefetching ............................. 350 16.3.4 Cooperative Prefetching .............................. 351

16.4 Prefetching Structure and Optimization ........................ 353 16.4.1 PPM ............................................... 353 16.4.2 Longest Repeating Subsequence ........................ 355 16.4.3 Popularity-Based PPM ................................ 355

16.5 Performance Evaluations on Prefetching ....................... 357 16.5.1 Latency Reduction Bounds ............................ 357 16.5.2 Prefetching Effects on Networks ....................... 358 16.5.3 Tradeoff Analysis .................................... 359

16.6 Other Variants of Prefetching ................................. 360 16.6.1 Real-Time Prediction ................................. 360 16.6.2 Prefetching for Multimedia on the Internet ............... 360 16.6.3 Predict HTTP Requests for Dynamic Content. ............ 361

16.7 Related Applications ........................................ 361 16.7.1 Search Engines ...................................... 361 16.7.2 Recommender Systems ............................... 361

16.8 Conclusions ............................................... 361 References ..................................................... 362

Part VI. Social Network Intelligence

17. Social Networks: From the Web to Knowledge Management Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

17.1 Introduction ............................................... 367

XVI Table of Contents

17.2 Link Analysis of the Web .................................... 368 17.3 Communities on the Web .................................... 370 17.4 Connectivity and the Diameter ofthe Web ...................... 371 17.5 Fractal Nature of the Web .................................... 374 17.6 Social Networks for Knowledge Management ................... 375

17 .6.1 Enterprise Knowledge Management ..................... 377 17.7 Conclusions ............................................... 378 References ..................................................... 378

18. A Ranking Algorithm Based on Graph Topology to Generate Reputation or Relevance Josep M. Pujol, Ramon Sangiiesa, and Jordi Delgado . . . . . . . . . . . . . . . . 380

18.1 Introduction ............................................... 380 18.2 Social Networks ........................................... 381 18.3 Ranking Algorithms ........................................ 383

18.3.1 Overview of Pagerank ................................ 383 18.3.2 Overview of HITS ................................... 384 18.3.3 Our Proposal: the NodeRanking Algorithm ............... 385 18.3.4 Comparisons ...................................... ~ . 387

18.4 Experiments About Ranking, Reputation, and Relevance .......... 388 18.4.1 Extracting Reputation from Social Networks ............. 388 18.4.2 Extracting Relevance from the Web ..................... 391

18.5 Conclusions ............................................... 392 References ..................................................... 393

19. Communityware That Facilitates Knowledge Interactions Yasuyuki Sumi and Kenji Mase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395

19.1 Introduction ............................................... 395 19.2 PalmGuide: Personal Tour Assistant ........................... 398 19.3 Semantic Map: Visual Explorer of Community Information ....... 401 19.4 AgentSalon: Facilitating Face-to-Face Conversations ............. 403 19.5 Experiments and Evaluation .................................. 410 19.6 Related Work .............................................. 415 19.7 Conclusions ............................................... 416 References ..................................................... 417

20. Social Intelligence Design for Web Intelligence Toyoaki Nishida ............................................. 419

20.1 Introduction: Social Intelligence Design for Web Intelligence ...... 419 20.2 Overview of Social Intelligence Design ........................ 421

20.2.1 Groups and Communities ............................. 421 20.2.2 Issues of Social Intelligence Design ..................... 422 20.2.3 Applications of Social Intelligence Design ............... 424

Table of Contents XVII

20.3 The Traveling Conversation Model ............................ 425 20.4 A Broadcast-Based Approach ................................ 425 20.5 A Conversational Agent-Based Approach ...................... 429 20.6 A Smart Environment-Based Approach ........................ 431 20.7 Psychological Evaluation .................................... 433 20.8 Technical Issues ............................................ 435 20.9 Conclusions ............................................... 435 References ..................................................... 436

Author Index ................................................... 438

Subject Index ................................................... 439

List of Contributors

Hans Akkermans Free University Amsterdam Division of Math. & Computer Science De Boelelaan 1081a NL-1 081 HV Amsterdam The Netherlands HansAkkermans@cs. vu.nl

Jeen Broekstra Free University Amsterdam Faculty of Sciences Division of Math. & Computer Science De Boelelaan 1081a NL-1 081 HV Amsterdam The Netherlands jbroeks@ cs. vu.nl

Joanna J. Bryson Dept. of Computer Science University of Bath BATH BA2 7AY, UK J.J .Bryson @cs.bath.ac.uk

Michael Chau Dept. of Management Information Sys. University of Arizona Tucson, AZ 85721, USA mchau @bpa.arizona.edu

Hsinchun Chen Dept. of Management Information Sys. University of Arizona Tucson, AZ 85721, USA hchen @bpa.arizona.edu

Jiang Chen DIRO, Universite de Montreal CP. 6128, succursale Centre-Ville Montreal, Quebec, H3C 3J7 Canada chen @iro. umontreal.ca

XinChen College of William and Mary Williamsburg, VA 23185, USA xinchen @cs. wm.edu

Joongmin Choi Dept. of Computer Science & Eng. Hanyang University 1271 Sa-l Dong Ansan Kyunggi-Do 425-791, Korea jmchoi @cse.hanyang.ac.kr

John Davies British Telecommunications pic. BT Adastral Park Martlesham Heath Ipswich IPS 3RE, UK [email protected]

Jordi Delgado Software Department Technical University of Catalonia C/J ordi Girona 1-3 C6-105 08034 Barcelona, Spain jdelgado@ lsi.upc.es

XX List of Contributors

YingDing Free University Amsterdam Faculty of Sciences Division of Math. & Computer Science De Boelelaan 1081a NL-1 081 HV Amsterdam The Netherlands [email protected]

Alistair Duke British Telecommunications plc. BT Adastral Park Martlesham Heath Ipswich IPS 3RE, UK alistair. duke@ bt.com

Robert Engels CogniTa.s. Busterudgt 1 N-1754 Halden, Norway [email protected]

Dieter Fensel Institute for Computer Science University of Innsbruck Technikerstrasse 25 6020 Innsbruck, Austria dieter.fensel @uibk.ac.at

Frank van Harmelen Free University Amsterdam Faculty of Sciences Division of Math. & Computer Science De Boelelaan 1081a NL-1 081 HV Amsterdam The Netherlands Frank. van.Harmelen @cs. vu.nl

Ian Horrocks Department of Computer Science University of Manchester Oxford Road Manchester M13 9PL, UK [email protected]

Victor Iosif EnerSearch AB SE 205 09 Malmo Sweden victor.iosif@ fek.lu.se

Toru Ishida Dept. of Social Informatics Kyoto University Yoshida-Honmachi, Sakyo Kyoto 606-8501, Japan ishida@ i.kyoto-u.ac.jp

W. Lewis Johnson CARTE USC/Information Sciences Institute Marina del Rey, CA 90292, USA johnson @isi.edu

Dimitrios Kalles AHEAD Relationship Mediators 65 Oth.-Amalias St. 26221 Patras, Greece and Dept. of Computer Eng. & Informatics University of Patras 26500 Patras, Greece kalles@ aheadrm.com

Arjohn Kampman Aldministrator Nederland BV Juliaplein 14B 3817CS Amersfoort The Netherlands akam@ aidministrator.nl

Atanas Kiryakov OntoText Lab. Sirma AI Ltd. 38A Hristo Botev blvd. Sofia 1000, Bulgaria naso@ sirma.bg

Michel Klein Free University Amsterdam Faculty of Sciences Division of Math. & Computer Science De Boelelaan 1081a NL-1 081 HV Amsterdam The Netherlands Michel. Klein@ cs. vu.nl

RaviKumar IBM Almaden Research Center 650 Harry Road, San Jose CA 95120 USA ravi@ almaden.ibm.com

VipinKumar Dept. of Computer Science & Eng. University of Minnesota 4-192 EE/CSci Building 200 Union Street SE Minneapolis MN 55455, USA [email protected]

Thorsten Lau Rentenanstalt/Swiss Life IT Coordination (CC/ITC) P.O. Box CH-8022, Zurich Switzerland Thorsten.Lau@ gmx.net

BingLiu Department of Computer Science University of Illinois at Chicago 851 South Morgan Street Chicago, IL 60607-7053 USA [email protected]

JimingLiu Department of Computer Science Hong Kong Baptist University Kowloon Tong, Hong Kong jiming @comp.hkbu.edu.hk

List of Contributors XXI

Zhiyong Lu Department of Computer Science University of Regina Regina, Saskatchewan S4S OA2 Canada luzhiyzh @cs. uregina.ca

YimingMa Information and Computer Science University of California at Irvine Irvine, CA 92697-3425, USA [email protected]

Alexander Maedche FZI, University of Karlsruhe Haid-und-Neu-Str. 10-14 76131 Karlsruhe, Germany [email protected]

David Martin Artificial Intelligence Center SRI International Menlo Park, CA 94025, USA martin@ ai.sri.com

Philippe Martin Distributed System Technology Centre Brisbane, 4072 Australia philippe.martin@ gu.edu.au

Kenji Mase ATR Media Information Science Labs. Seika-cho, Soraku-gun, Kyoto 619-0288 Japan [email protected]

Sheila A. Mcllraith Stanford University Knowledge Systems Lab Stanford, CA 94305 USA sam@ ksl. stanford.edu

XXII List of Contributors

Jos van der Meer Aldministrator Nederland BV Juliaplein 14B 3817CS Amersfoort The Netherlands Jos. van.der.Meer@ Aldministrator.nl

Hideyuki Nakanishi Dept. of Social Informatics Kyoto University Yoshida-Honmachi, Sakyo Kyoto 606-8501, Japan [email protected]

Jian-Yon Nie DIRO, Universite de Montreal CP. 6128, succursale Centre-Ville Montreal, Quebec, H3C 3J7 Canada nie@ iro. umontreal.ca

Toyoaki Nishida Dept. of Infor. & Communication Eng. University of Tokyo 7-3-1 Hongo, Bunkyo-ku Tokyo 113-8656, Japan [email protected]

Damyan Ognyanov Onto Text Lab. Sirma AI Ltd. 38A Hristo Botev blvd. Sofia 1000, Bulgaria [email protected]

Athanasios Papagelis AHEAD Relationship Mediators 65 Oth.-Amalias St. 26221 Patras, Greece Dept of Computer Eng. & Informatics University of Patras 26500 Patras, Greece papagel@ ceid. upatras.gr

Viktor Pekar Bashkir State University Okt.Revolutsii 3a, Ufa 450000 Russia [email protected]

Josep M. Pujol Software Department Technical University of Catalonia C/J ordi Girona 1-3 C5-221 08034 Barcelona Spain jmpujol@ lsi.upc.es

Prabhakar Raghavan Verity, Inc. 892 Ross Drive, Sunnyvale CA 94089, USA pragh @verity.com

Sridhar Rajagopalan IBM Almaden Research Center 650 Harry Road, San Jose CA 95120, USA sridhar@ almaden.ibm.com

Ulrich Reimer Rentenanstalt/Swiss Life IT Coordination (CC/ITC) P.O. Box CH-8022, Zurich Switzerland ulrich.reimer@ acm.org

Ramon Sangiiesa Software Department Technical University of Catalonia C/J ordi Girona 1-3 C6-204 08034 Barcelona Spain sanguesa@ lsi.upc.es

Kiril Simov OntoText Lab. Sirma AI Ltd. 38A Hristo Botev blvd., Sofia 1000 Bulgaria kivs@ bultreebank.org

Steffen Staab Institute AIFB University of Karlsruhe Postfach, 76128 Karlsruhe Germany staab@ aifb. uni-karlsruhe.de

Lynn Andrea Stein Computers and Cognition Lab Franklin W. Olin College of Eng. Needham, MA 02492, USA [email protected]

Rudi Studer Institute AIFB University of Karlsruhe Postfach, 76128 Karlsruhe Germany studer@ aifb.uni-karlsruhe. de

Yasuyuki Sumi ATR Media Information Science Labs. Seika-cho, Soraku-gun, Kyoto 619-0288 Japan sumi@ atr.co.jp

York Sure Institute AIFB University of Karlsruhe Postfach, 76128 Karlsruhe Germany [email protected]

List of Contributors XXIII

Pang-Ning Tan AHPCRC I University of Minnesota 1100 Washington AvenueS, #101 Minneapolis, MN 55415, USA ptan @cs.umn.edu

Andrew Tomkins IBM Almaden Research Center 650 Harry Road, San Jose CA 95120, USA tomkins@ almaden.ibm.com

Jaeyoung Yang Dept. of Computer Science and Eng. Hanyang University 1271 Sa-l Dong Ansan, Kyunggi-Do 425-791 Korea jyyang@ cse.hanyang.ac.kr

Yiyu Yao Department of Computer Science University of Regina Regina, Saskatchewan, S4S OA2 Canada [email protected]

YimingYe IBM T. J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA [email protected]

PhilipS. Yu IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA psyu @us.ibm.com

XXIV List of Contributors

Christos Zaroliagis Computer Technology Institute, P.O. Box 1122, 26110 Patras, Greece and Dept of Computer Eng. & Informatics University of Patras 26500 Patras, Greece zaro@ ceid. upatras.gr

Shiwu Zhang Department of Computer Science Hong Kong Baptist University Kowloon Tong, Hong Kong [email protected]

Xiaodong Zhang College of William and Mary Williamsburg, VA 23185, USA zhang@cs. wm.edu

NingZhong Dept. of Systems & Information Eng. Maebashi Institute of Technology 460-1 Kamisadori-Cho Maebashi-City 371-0816, Japan [email protected]