[studies in fuzziness and soft computing] fuzzy logic and the internet volume 137 ||

11
V. Loia, M. Nikravesh, 1. A. Zadeh (Eds.) Fuzzy Logic and the Internet Springer-Verlag Berlin Heidelberg GmbH

Upload: lotfi-a

Post on 23-Dec-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

V. Loia, M. Nikravesh, 1. A. Zadeh (Eds.)

Fuzzy Logic and the Internet

Springer-Verlag Berlin Heidelberg GmbH

Studies in Fuzziness and Soft Computing, Volume 137

Editor-in-chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw Poland E-mail: [email protected]

Further volumes of this series can he found on our homepage: springeronline.com

Vol. 118. M. Wygralak Cardinalities of Fuzzy Sets, 2003 ISBN 3-540-00337-1

Vol. 119. Karmeshu (Ed.) Entropy Measures, Maximum Entropy Principle and Emerging Applications, 2003 ISBN 3-540-00242-1

Vol. 120. H.M. Cartwright, L.M. Sztandera (Eds.) Soft Computing Approaches in Chemistry, 2003 ISBN 3-540-00245-6

Vol. 121. J. Lee (Ed.) Software Engineering with Computational Intelligence, 2003 ISBN 3-540-00472-6

Vol. 122. M. Nachtegael, D. Van der Weken, D. Van de Ville and E.E. Kerre (Eds.) Fuzzy Filters for Image Processing, 2003 ISBN 3-540-00465-3

vol. 123. V. Torra (Ed.) Information Fusion in Data Mining, 2003 ISBN 3-540-00676-1

Vol. 124. X. Yu, J. Kacprzyk (Eds.) Applied Decision Support with Soft Computing, 2003 ISBN 3-540-02491-3

Vol. 125. M. Inuiguchi, S. Hirano and S. Tsumoto (Eds.) Rough Set Theory and Granular Computing, 2003 ISBN 3-540-00574-9

Vol. 126. J.-L. Verdegay (Ed.) Fuzzy Sets Based Heuristics for Optimization, 2003 ISBN 3-540-00551-X

Vol 127. L. Reznik, V. Kreinovich (Eds.) Soft Computing in Measurement and Information Acquisition, 2003 ISBN 3-540-00246-4

Vol 128. J. Casillas, O. Cord6n, F. Herrera, L. Magdalena (Eds.) Interpretability Issues in Fuzzy Modeling, 2003 ISBN 3-540-02932-X

Vol 129. J. Casillas, O. Cord6n, F. Herrera, L. Magdalena (Eds.) Accuracy Improvements in Linguistic Fuzzy Modeling, 2003 ISBN 3-540-02933-8

Vol 130. P.S. Nair Uncertainty in Multi-Source Databases, 2003 ISBN 3-540-03242-8

Vol 131. J.N. Mordeson, D.S. Malik, N. Kuroki Fuzzy Semigroups, 2003 ISBN 3-540-03243-6

Vol 132. Y. Xu, D. Ruan, K. Qin, J. Liu Lattice-Valued Logic, 2003 ISBN 3-540-40175-X

Vol. 133. Z.-Q. Liu, J. Cai, R. Buse Handwriting Recognition, 2003 ISBN 3-540-40177-6

Vol 134. V.A. Niskanen Soft Computing Methods in Human Sciences, 2004 ISBN 3-540-00466-1

Vol. 135. J.J. Buckley Fuzzy Probabilities and Fuzzy Sets for Web Planning, 2004 ISBN 3-540-00473-4

Vol. 136. L. Wang (Ed.) Soft Computing in Communications, 2004 ISBN 3-540-40575-5

Vincenzo Loia Masoud Nikravesh Lotfi A. Zadeh (Eds.)

Fuzzy Logic and the Internet

, Springer

Prof. Vincenzo Loia Universita di Salerno

Dipto. Matematica e Informatica

Via S. Allende

84081 Baronissi

Italy

E-mail: [email protected]

Prof. Masoud Nikravesh E-mail: [email protected]

Prof. Dr. Lotfi A. Zadeh E-mail: [email protected]

University of California

Dept. Electrical Engineering and Computer

Science - EECS

94720 Berkeley, CA

USA

ISBN 978-3-642-05770-0 ISBN 978-3-540-39988-9 (eBook) DOI 10.1007/978-3-540-39988-9

Library of Congress Cataloging-in-Publication-Data

Fuzzy logic and the Internet / Vincenzo Loia, Masoud Nikravesh, Lotfi A. Zadeh (eds.). p.cm.

1. Fuzzy logic. 2. Internet research. 3. Internet searching. I. Loia, Vincezo, 1961- II. Nikravesh, Masoud, 1959- III. Zadeh, Lotfi Asker. QA76.87.F895 2004 004.67'8--dc22

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitations, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law.

springeronline.com

© Springer-Verlag Berlin Heidelberg 2004 Softcover reprint of the hardcover I st edition 2004 Originally published by Springer-Verlag Berlin Heidelberg New York in 2004.

The use of general descriptive names, registered names trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

Typesetting: Camera-ready by author Cover design: E. Kirchner, Springer-Verlag, Heidelberg Printed on acid free paper 6213020/M - 543 2 1 0

Preface

With the daily addition of million documents and new users, there is no doubt that the World Wide Web (WWW or Web shortly) is still expanding its global information infrastructure. Thanks to low-cost wireless technology, the Web is no more limited to homes or offices, but it is simply everywhere. The Web is so large and growing so rapidly that the 40 million page "WebBase" repository of Inktomi corresponds to only about 4% of the estimated size of the publicly indexable Web as of January 2000 and there is every reason to believe these numbers will all swell significantly in the next few years.

This unrestrainable explosion is not bereft of troubles and drawbacks, especially for inexpert users. Probably the most critical problem is the effectiveness of Web search engines: though the Web is rich in providing numerous services, the primary use of the Internet falls in emails and information retrieval activities. Focusing in this latter, any user has felt the frustrating experience to see as result of a search query overwhelming numbers of pages that satisfy the query but that are irrelevant to the user.

Due to nature of the Web itself, there is a strong need of new research approaches, in term of theories and systems. Among these new research trends, an important role is played by those methodologies that enable to process imprecise information, and to perform approximate reasoning capability. The ability of Fuzzy Technology to exploit the tolerance for imprecision to achieve tractability, robustness, and low solution cost, has played a fundamental and successful role in any area of Information Technology, reporting a growing interest especially in the area of computational intelligence.

Nowadays Web-based systems handle user interaction in matching user's queries that are too weak to cope with the user's expressiveness. First attempts in extending searching towards deduction capability are essentially based on two-valued logic and standard probability theory. The complexity of the problem coupled with some features of the space domain (unstructured data, immature standards) demand a

v

strong deviation from this trend. Fuzzy Logic, and more in general Soft Computing, can be a right choice to face complex Web problems, as reported by the contributions of this volume.

This book contains 14 chapters.

First chapter written by Nikravesh, Takagi, Tajima, Loia, and Azvine is an introduction to the book. The main objective of this chapter to provide a better understanding of the issues related to the Internet (Fuzzy Logic and the Internet), and provide new tools and ideas Toward the Enhancing the Power of the Internet. The main purpose of this chapter to draw the attention of the fuzzy logic community as well as the Internet community to the fundamental importance of specific Internet­related problems. This issue is critically significant about problems that center on search and deduction in large, unstructured knowledge bases. The authors summarize the challenges, the road ahead and directions for the future by recognizing the challenging problems and the new direction toward the next generation of the search engines and Internet.

Chapter 2 written by Beg and Ahmad is on the problem of rank aggregation on the web. In this chapter the authors propose new ranking solutions, namely MFO, MBV, Improved Shimura and Entropy-based ranking. MFO works by comparing the values of the membership functions of the document positions. MBV proceeds by carrying out an ascending sort on the ratio of mean and variance of the document positions. The Shimura technique is improved by replacing the min function with the OW A operation. The entropy-based technique goes about by adopting the entropy minimization principle for the purpose of rank aggregation.

Chapter 3 written by Cord6n, Moya and Zarco explains how it is possible to automatically derive extended Boolean queries for fuzzy information retrieval systems from a set of relevant documents provided by a user. The chapter features an advanced evolutionary algorithm, GA-P, specially designed to tackle with multi­objective problems by means of a Pareto-based multi objective technique. The approach is experimented on the usual Cranfield collection and compared to other well-known methods.

Chapter 4 written by Damiani, Lavarini, Oliboni, and Tanca proposes a flexible querying technique, XML compliant, able to locate and extract information. In structure and tag vocabulary. This approach relies on representing XML documents as graphs, whose edges are weighted at different levels of granularity. A smart weighting technique process the features of the edges, generating a separate weight

VI

according to each characteristic, and then aggregating these values in a single arc­weight. An important optimization is carried out by a threshold-based pruning that deletes unimportant edges, in order to retain only the most useful information for an efficient Web searching strategy.

Chapter 5 written by Herrera, Herrera-Viedma, Martinez and Porcel describes a distributed intelligent model for gathering information on the Internet, where the agents and users may communicate among them using a multi-granular linguistic technique based on a linguistic 2-tuple computational. Different advantages derive from this technique: the retrieval process gains in flexibility, the agent-oriented interaction can benefit from a deeper expressivity, and the availability of a words­based computation improves precision without loss of information.

Chapter 6 written by Hong, Lin and Wang presents a fuzzy web-mining algorithm for processing web-server logs in order to discover fuzzy browsing patterns among them. The chapter describes how this approach can derive a more complete set of browsing patterns than other previous solutions, detailing some experimental results for showing the time-completeness trade-off effects.

Chapter 7 written by Liu, Wan and Wang describes a fuzzy inference system for audio classification and retrieval, a crucial problem for any multimedia Web search engine. The chapter illustrates the benefits of the fuzzy classifier that is characterized by a very quick classification, flexibility and efficiency in adding new classes to audio samples in the database.

Chapter 8 written by Loia is on Web searching catalogues. In many cases, these catalogues are maintained manually with enormous costs and difficulty due to the incessant growing of the Web. The chapter presents an evolutionary approach useful to construct automatically the catalogue as well as to perform the classification of a Web document. This functionality is achieved by a genetic-based fuzzy clustering applied on the context of the document, as opposite to content-based clustering that works on the complete document information.

Chapter 9 written by Martin-Bautista, Sanchez, Serrano and Vila addresses the problem of query specification by describing an application of data mining techniques in a text framework. The chapter proposes a text transaction technology based on fuzzy transactions, considering that each transaction correspond to a document representation. The set of transactions represents a document collection

VII

from which the fuzzy association rules are extracted. The extracted can be automatically added to the original query in order to optimize the search.

Chapter 10 written by Nikravesh and Azvine introduces fuzzy query and fuzzy aggregation as an alternative for ranking and predicting the risk for credit scoring and university admissions. The chapter presents the BISC Decision Support System characterized by smart Internet-based services designed to use intelligently the vast amounts of important data in complex organizations and to share internal data with external entities by respecting the constraints of security and efficiency.

Chapter 11 written by Pal, Talwar and Mitra provides an overview on different characteristics of web data, the basic components of web mining and its different types, and their current states of the art. The chapter underlines the limitations existing in web mining methods and evidenciate how the soft computing approach can be a valid ally to achieve Web intelligence.

Chapter 12 written by Pasi and Yager presents a technique suitable to improve the quality of the information available to customers in making Web purchase decisions. The Product Category Summarization (PCS) method is presented, and the chapter illustrates who PCS is able to help the consumers understanding a product line in a way that can help them in their purchasing decisions. PCS, after providing a clustering of a product line into a finite number of categories, automatically constructs some user friendly descriptions of the relevant features shared by the majority of the products associated with each category.

Chapter 13 written by Pham faces logo technology, widely used nowadays to meet an increasing demand for the automatic processing of documents and images. The chapter outlines the concept of geostatistics that serves as a tool for extracting spatial features of logo images. Different logo classifiers experiences are discussed, ranging from a model based on neural networks, pseudo hidden Markov models, and fuzzy sets, up to an algorithm built-on the concept of the mountain clustering.

Chapter 14 written by Wang and Zhang presents a Fuzzy web information classification agent based on Fuzzy Web Intelligence. The agent can act upon user's instructions and refresh the stock data in a real time manner by accessing the database on the Internet. Using fuzzy reasoning, the agent can create a list of top stocks based on the output values calculated from input stock information. The chapter shows how the results of the data processing are precise and reliable.

VIII

We thank the authors for their outstanding contribution in this book. Thanks are due to Professor J. Kacprzyk for his kind support that encouraged us in preparing this volume. We are also very grateful to the editorial team of the Springer-Verlag Company for the continuous fruitful assistance.

Vincenzo Loia Masoud Nikravesh

LotfiZadeh

IX

Table of Contents

Fuzzy Logic and the Internet: Web Intelligence ....................................................... 1 M. Nikravesh, T. Takagi, M. Tajima, V. Loia and B. Azvine

Fuzzy Logic and Rank Aggregation for the Wodd Wide Web .............................. 27 S. Beg and N. Ahmad

Automatic Learning of Multiple Extended\ \ Boolean Queries by Multiobjective GA-P Algorithms ......................................................................... 47 o. Cordon, F. Moya and C. Zarco

An Approximate Querying Environment for XML Data ....................................... 71 E. Damiani, N. Lavarini, B. Oliboni and L. Tanca

Information Gathering on the Internet Using a Distributed Intelligent Agent Model with Multi-Granular Linguistic Information .................................. 95 F. Herrera, E. Herrera-Viedma, L. Martinez and C. Porcel

A Time-Completeness Tradeoff on Fuzzy W eb-Browsing Mining ..................... 117 T-P. Hong, K-Y. Lin and S-L. Wang

A Fuzzy Logic Approach for Content-Based Audio Classification and Boolean Retrieval. ............................................................................................. 135 M. Liu, C. Wan and L. Wang

Soft Computing Technology for Dynamic Web Pages Categorization ............... 157 V. Loia

Text Mining using Fuzzy Association Rules ......................................................... 173 M.]. Martin-Bautista, D. Sanchez,]. M. Serrano and M. A. Vila

BISC Decision Support System: University Admission System .......................... 191 M. Nikravesh abd B. Azvine

Web Mining in Soft Computing Framework: A Survey ....................................... 231 S. K. Pal, V. Talwar and P. Mitra

XI

A decision support tool for web-shopping using Product Category Summarization ......................................................................................................... 261 G. Pasi and R.R. Yager

Logo Recognition and Detection with Geostatistical, Stochastic, and Soft-Computing Models ................................................................................... 277 T.D.Pham

Fuzzy Web Information Classification Agents ..................................................... 309 Y. Wang and Y-Q. Zhang

XII