handbook of parallel computing models, algorithms and applications (2008)

Handbook ofParallel ComputingModels, Algorithms and Applications 2008 by Taylor & Francis Group, LLCPUBLISHED TITLESADVERSARIAL REASONING: COMPUTATIONAL APPROACHES TO READING THE OPPONENTS MINDAlexander Kott and William M. McEneaneyDISTRIBUTED SENSOR NETWORKSS. Sitharama Iyengar and Richard R. BrooksDISTRIBUTED SYSTEMS: AN ALGORITHMIC APPROACHSukumar GhoshFUNDEMENTALS OF NATURAL COMPUTING: BASIC CONCEPTS, ALGORITHMS, AND APPLICATIONSLeandro Nunes de CastroHANDBOOK OF ALGORITHMS FOR WIRELESS NETWORKING AND MOBILE COMPUTINGAzzedine BoukercheHANDBOOK OF APPROXIMATION ALGORITHMS AND METAHEURISTICSTeofilo F. GonzalezHANDBOOK OF BIOINSPIRED ALGORITHMS AND APPLICATIONSStephan Olariu and Albert Y. ZomayaHANDBOOK OF COMPUTATIONAL MOLECULAR BIOLOGYSrinivas AluruHANDBOOK OF DATA STRUCTURES AND APPLICATIONSDinesh P. Mehta and Sartaj SahniHANDBOOK OF DYNAMIC SYSTEM MODELINGPaul A. FishwickHANDBOOK OF PARALLEL COMPUTING: MODELS, ALGORITHMS AND APPLICATIONSSanguthevar Rajasekaran and John ReifHANDBOOK OF REAL-TIME AND EMBEDDED SYSTEMSInsup Lee, Joseph Y-T. Leung, and Sang H. SonHANDBOOK OF SCHEDULING: ALGORITHMS, MODELS, AND PERFORMANCE ANALYSISJoseph Y.-T. LeungHIGH PERFORMANCE COMPUTING IN REMOTE SENSINGAntonio J. Plaza and Chein-I ChangTHE PRACTICAL HANDBOOK OF INTERNET COMPUTINGMunindar P. SinghSCALABLE AND SECURE INTERNET SERVICES AND ARCHITECTURECheng-Zhong XuSPECULATIVE EXECUTION IN HIGH PERFORMANCE COMPUTER ARCHITECTURESDavid Kaeli and Pen-Chung YewCHAPMAN & HALL/CRCCOMPUTER and INFORMATION SCIENCE SERIESSeries Editor: Sartaj Sahni 2008 by Taylor & Francis Group, LLCHandbook ofParallel ComputingModels, Algorithms and ApplicationsEdited bySanguthevar RajasekaranUniversity of ConnecticutStorrs, U.S.A.John ReifDuke UniversityDurham, North Carolina, U.S.A.Chapman & Hall/CRCTaylor &Francis GroupBoca Raton London New YorkChapman & Hall/CRC is an imprint of theTaylor & Francis Group, an informa business 2008 by Taylor & Francis Group, LLCChapman & Hall/CRCTaylor & Francis Group6000 Broken Sound Parkway NW, Suite 300Boca Raton, FL 33487-2742 2008 by Taylor & Francis Group, LLCChapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa businessNo claim to original U.S. Government worksPrinted in the United States of America on acid-free paper10987654321International Standard Book Number-13: 978-1-58488-623-5 (Hardcover)This book contains information obtained from authentic and highly regarded sources. Reprinted material is quotedwith permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made topublish reliable data and information, but the author and the publisher cannot assume responsibility for the validity ofall materials or for the consequences of their use.Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti-lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy-ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from thepublishers.For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923,978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For orga-nizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only foridentification and explanation without intent to infringe.Library of Congress Cataloging-in-Publication DataRajasekaran, Sanguthevar.Handbook of parallel computing: models, algorithms and applications / Sanguthevar Rajasekaranand John Reif.p. cm. (Chapman & Hall/CRC computer & information science)Includes bibliographical references and index.ISBN-13: 978-1-58488-623-5ISBN-10:1-58488-623-41. Parallel processing (Electronic computers)Handbooks, manuals, etc. 2. Computeralgorithms-Handbooks, manuals, etc. I. Reif, J. H. (John H.) II. Title. III. Series.QA76.58.R34 2007005.1-dc22 2007025276Visit the Taylor & Francis Web site athttp://www.taylorandfrancis.comand the CRC Press Web site athttp://www.crcpress.com 2008 by Taylor & Francis Group, LLCDedicationToPandi, Pandiammal, and PeriyasamySanguthevar RajasekaranToJane, Katie, and EmilyJohn H. Reif 2008 by Taylor & Francis Group, LLCThe deeper you dig in sand, the larger will be the water flow;The more you learn, the more will be the knowledge flowThiruvalluvar (circa 100 B.C.)(Thirukkural; Chapter 40: Education) 2008 by Taylor & Francis Group, LLCContentsPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiEditors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiiiContributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvModels1 Evolving Computational SystemsSelim G. Akl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12 Decomposable BSP: A Bandwidth-Latency Model for Parallel and HierarchicalComputationGianfranco Bilardi, Andrea Pietracaprina, and Geppino Pucci . . . . . . . . . . 2-13 Membrane Systems: A Natural Way of Computing with CellsOscar H. Ibarra and Andrei P aun . . . . . . . . . . . . . . . . . . . . . . . . 3-14 Optical Transpose Systems: Models and AlgorithmsChih-Fang Wang and Sartaj Sahni . . . . . . . . . . . . . . . . . . . . . . . 4-15 Models for Advancing PRAM and Other Algorithms into Parallel Programs fora PRAM-On-Chip PlatformUzi Vishkin, George C. Caragea, and Bryant C. Lee . . . . . . . . . . . . . . . 5-1ix 2008 by Taylor & Francis Group, LLCx Contents6 Deterministic and Randomized Sorting Algorithms for Parallel Disk ModelsSanguthevar Rajasekaran . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-17 A Programming Model and Architectural Extensions for Fine-GrainParallelismAlex Gontmakher, Assaf Schuster, Gregory Shklover, and Avi Mendelson . . . . . 7-18 Computing with Mobile Agents in Distributed NetworksEvangelos Kranakis, Danny Krizanc, and Sergio Rajsbaum . . . . . . . . . . . . 8-19 Transitional Issues: Fine-Grain to Coarse-Grain MulticomputersStephan Olariu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-110 Distributed Computing in the Presence of Mobile FaultsNicola Santoro and Peter Widmayer . . . . . . . . . . . . . . . . . . . . . . . 10-111 A Hierarchical Performance Model for Recongurable ComputersRonald Scrofano and Viktor K. Prasanna . . . . . . . . . . . . . . . . . . . . 11-112 Hierarchical Performance Modeling and Analysis of Distributed SoftwareSystemsReda A. Ammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-113 Randomized Packet Routing, Selection, and Sorting on the POPS NetworkJaime Davila and Sanguthevar Rajasekaran . . . . . . . . . . . . . . . . . . . 13-114 Dynamic Reconguration on the R-MeshRamachandran Vaidyanathan and Jerry L. Trahan . . . . . . . . . . . . . . . 14-115 Fundamental Algorithms on the Recongurable MeshKoji Nakano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-116 Recongurable Computing with Optical BusesAnu G. Bourgeois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-1Algorithms17 Distributed Peer-to-Peer Data StructuresMichael T. Goodrich and Michael J. Nelson . . . . . . . . . . . . . . . . . . . 17-118 Parallel Algorithms via the Probabilistic MethodLasse Kliemann and Anand Srivastav . . . . . . . . . . . . . . . . . . . . . . 18-119 Broadcasting on Networks of WorkstationsSamir Khuller, Yoo-Ah Kim, and Yung-Chun (Justin) Wan . . . . . . . . . . . . 19-120 Atomic Selsh Routing in Networks: A SurveySpyros Kontogiannis and Paul G. Spirakis . . . . . . . . . . . . . . . . . . . . 20-1 2008 by Taylor & Francis Group, LLCContents xi21 Scheduling in Grid EnvironmentsYoung Choon Lee and Albert Y. Zomaya . . . . . . . . . . . . . . . . . . . . . 21-122 QoS Scheduling in Network and Storage SystemsPeter J. Varman and Ajay Gulati . . . . . . . . . . . . . . . . . . . . . . . . 22-123 Optimal Parallel Scheduling Algorithms in WDM Packet InterconnectsZhenghao Zhang and Yuanyuan Yang . . . . . . . . . . . . . . . . . . . . . . 23-124 Real-Time Scheduling Algorithms for Multiprocessor SystemsMichael A. Palis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-125 Parallel Algorithms for Maximal Independent Set and Maximal MatchingYijie Han . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-126 Efcient Parallel Graph Algorithms for Multicore and MultiprocessorsDavid A. Bader and Guojing Cong . . . . . . . . . . . . . . . . . . . . . . . 26-127 Parallel Algorithms for Volumetric Surface ConstructionJoseph JaJa, Amitabh Varshney, and Qingmin Shi . . . . . . . . . . . . . . . . 27-128 Mesh-Based Parallel Algorithms for Ultra Fast Computer VisionStephan Olariu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-129 Prospectus for a Dense Linear Algebra Software LibraryJames Demmel, Beresford Parlett, William Kahan, Ming Gu, David Bindel, YozoHida, E. Jason Riedy, Christof Voemel, Jakub Kurzak, Alfredo Buttari, JulieLangou, Stanimire Tomov, Jack Dongarra, Xiaoye Li, Osni Marques, JulienLangou, and Piotr Luszczek . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-130 Parallel Algorithms on StringsWojciech Rytter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-131 Design of Multithreaded Algorithms for Combinatorial ProblemsDavid A. Bader, Kamesh Madduri, Guojing Cong, and John Feo . . . . . . . . . 31-132 Parallel Data Mining Algorithms for Association Rules and ClusteringJianwei Li, Wei-keng Liao, Alok Choudhary, and Ying Liu . . . . . . . . . . . . 32-133 An Overview of Mobile Computing AlgorithmicsStephan Olariu and Albert Y. Zomaya . . . . . . . . . . . . . . . . . . . . . . 33-1Applications34 Using FG to Reduce the Effect of Latency in Parallel Programs Running onClustersThomas H. Cormen and Elena Riccio Davidson . . . . . . . . . . . . . . . . . 34-1 2008 by Taylor & Francis Group, LLCxii Contents35 High-Performance Techniques for Parallel I/OAvery Ching, Kenin Coloma, Jianwei Li, Wei-keng Liao, and Alok Choudhary . . . 35-136 Message Dissemination Using Modern Communication PrimitivesTeolo F. Gonzalez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36-137 Online Computation in Large NetworksSusanne Albers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37-138 Online Call Admission Control in Wireless Cellular NetworksIoannis Caragiannis, Christos Kaklamanis, and Evi Papaioannou . . . . . . . . 38-139 Minimum Energy Communication in Ad Hoc Wireless NetworksIoannis Caragiannis, Christos Kaklamanis, and Panagiotis Kanellopoulos . . . . . 39-140 Power Aware Mapping of Real-Time Tasks to MultiprocessorsDakai Zhu, Bruce R. Childers, Daniel Moss, and Rami Melhem . . . . . . . . . 40-141 Perspectives on Robust Resource Allocation for Heterogeneous Parallel andDistributed SystemsShoukat Ali, Howard Jay Siegel, and Anthony A. Maciejewski . . . . . . . . . . . 41-142 A Transparent Distributed Runtime for JavaMichael Factor, Assaf Schuster, and Konstantin Shagin . . . . . . . . . . . . . . 42-143 Scalability of Parallel ProgramsAnanth Grama and Vipin Kumar . . . . . . . . . . . . . . . . . . . . . . . . 43-144 Spatial Domain Decomposition Methods in Parallel Scientic ComputingSudip Seal and Srinivas Aluru . . . . . . . . . . . . . . . . . . . . . . . . . 44-145 Game Theoretical Solutions for Data Replication in Distributed ComputingSystemsSamee Ullah Khan and Ishfaq Ahmad . . . . . . . . . . . . . . . . . . . . . . 45-146 Effectively Managing Data on a GridCatherine L. Ruby and Russ Miller . . . . . . . . . . . . . . . . . . . . . . . 46-147 Fast and Scalable Parallel Matrix Multiplication and Its Applications onDistributed Memory SystemsKeqin Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47-1 2008 by Taylor & Francis Group, LLCPrefaceWe live in an era in which parallel computing has become mainstream and very affordable. This is mainlybecause hardware costs have come down rapidly. With the advent of the Internet, we also experiencethe phenomenon of data explosion in every application of interest. Processing voluminous datasets ishighly computation intensive. Parallel computing has been fruitfully employed in numerous applicationdomains to process large datasets and handle other time-consuming operations of interest. As a result,unprecedented advances have been made in such areas as biology, scientic computing, modeling andsimulations, and so forth. In this handbook we present recent developments in the areas of parallel models,algorithms, and applications.An Introduction to Parallel ComputingThere are many ways of achieving parallelism. Some examples include the use of supercomputers, clusters,network of workstations, and grid computing. In sequential computing, the Random Access Machine(RAM), which executes arithmetic and Boolean operations as well as read and write memory operations,has been universally accepted as an appropriate model of computing. On the other hand, numerousparallel models of computing have been proposed in the literature. These models differ in the way theprocessors communicate among themselves.If we use P processors to solve a problem, then there is a potential of reducing the (sequential) runtimeby a factor of up to P. If S is the best known sequential runtime and if T is the parallel runtime usingP processors, then PT S. Otherwise, we can simulate the parallel algorithm using a single processorand get a runtime better than S (which will be a contradiction). We refer to PT as the work done by theparallel algorithm. Any parallel algorithm for which PT = O(S) will be referred to as an optimal workalgorithm.In most parallel machines we can still think of each processor as a RAM. Variations among differentarchitectures arise from the ways in which they implement interprocessor communications. Parallelmodels can be categorized broadly into parallel comparison trees, shared memory models, and xedconnection networks.A parallel comparison tree is analogous to the sequential comparison (or decision) tree. It is typicallyemployed for the study of comparison problems such as sorting, selection, merging, and so forth. Analgorithm under this model is represented as a tree. The computation starts at the root. P pairs of inputkeys are compared in parallel at the root (P being the number of processors). On the basis of the outcomesof these comparisons, the computation branches to an appropriate child of the root. Each node of thetree corresponds to comparison of P pairs of input keys. The computation terminates at a leaf node thatxiii 2008 by Taylor & Francis Group, LLCxiv Prefacehas enough information to output the correct answer. Thus, there is a tree corresponding to every inputsize. For a given instance of the problem, a branch in the tree is traversed. The worst case runtime isproportional to the depth of the tree. This model takes into account only the comparison operationsperformed.The Parallel Random Access Machine (PRAM) is a shared memory model for parallel computationconsisting of a collection of RAMs working in synchrony where communication takes place with the helpof a common block of shared memory. If, for example, processor i wants to communicate with processor j,it can do so by writing a message in memory cell j, which can then be read by processor j.More than one processor may want to access the same cell at the same time for either reading fromor writing into. Depending on how these conicts are resolved, a PRAM can further be classied intothree. In an Exclusive Read and Exclusive Write (EREW) PRAM, neither concurrent reads nor concur-rent writes are allowed. In a Concurrent Read and Exclusive Write (CREW) PRAM, concurrent readsare permitted but not concurrent writes. Finally, a Concurrent Read and Concurrent Write (CRCW)PRAM allows both concurrent reads and concurrent writes. A mechanism for handling write conictsis needed for a CRCW PRAM, since the processors trying to write at the same time in the same cellcan possibly have different data to write and we should determine which data gets written. This is nota problem in the case of concurrent reads since the data read by different processors will be the same.In a Common-CRCW PRAM, concurrent writes are permissible only if the processors trying to accessthe same cell at the same time have the same data to write. In an Arbitrary-CRCW PRAM, if more thanone processor tries to write in the same cell at the same time, an arbitrary one of them succeeds. Ina Priority-CRCW PRAM, processors have assigned priorities. Write conicts are resolved using thesepriorities.Also, we consider xed connection network machine models. A directed graph is used to repres-ent the xed connection network. The nodes of this graph correspond to processing elements, andthe edges correspond to communication links. If two processors are connected by an edge, they cancommunicate in a unit step. Two processors not connected by an edge can communicate by sendinga message along a path that connects the two processors. Each processor in a xed connection machineis a RAM. Examples of xed connection machines include the mesh, the hypercube, the star graph, andso forth.A mesh is an n n square grid whose nodes are processors and whose edges are communication links.The diameter of a mesh is 2n2. (Diameter of a graph is dened to be the maximum of the shortestdistance between any two nodes in the graph.) Diameter of a xed connection machine is often a lowerbound on the solution time of any nontrivial problem on the machine. The degree of a xed connectionnetwork should be as small as possible for it to be physically realizable. (The degree of a xed connectionmachine is dened to be the maximum number of neighbors for any node.) The degree of a mesh isfour. Many variants of the mesh have also been proposed in the literature. Two examples are meshes withxed buses and meshes with recongurable buses. In a mesh with xed buses, in addition to the edgeconnections of a standard mesh, each row and each column has an associated bus. In any time unit, one ofthe processors connected to a bus can send a message along the bus (which can be read by the others in thesame time unit). In a mesh with recongurable buses also, each row and each column has an associatedbus, but the buses can be dynamically recongured.A hypercube of dimension n has 2nnodes. Any node in a hypercube can be denoted as an n-bit binarynumber. Let x and y be the binary representations of any two nodes in a hypercube. Then, these two nodeswill be connected by an edge if and only if the Hamming distance between x and y is one, that is, x andy differ in exactly one bit position. Thus, the degree of a hypercube with 2nnodes is n. The diameter ofa 2n-noded hypercube can also be seen to be n. Buttery, CCC, de Bruijn, and so forth are networks thatare very closely related to a hypercube.The chapters in this book span all of the above models of computing and more. They have been looselycategorized into three sections, namely, Parallel Models, Parallel Algorithms, and Parallel Applications.However, many chapters will cover more than one of these three aspects. 2008 by Taylor & Francis Group, LLCPreface xvParallel ModelsChapter 1 explores the concept of evolving computational systems. As an example, it considers thecomputation of n functions F0, F1, . . . , Fn1 on the n variables x0, x1, . . . , xn1, where the variablesthemselves change with time. Both sequential and parallel solutions are considered for this problem.An important consideration in adapting a model of computation is that it should provide a frameworkfor the design and analysis of algorithms that can be executed efciently on physical machines. In the areaof parallel computing, the Bulk Synchronous Model (BSP) has been widely studied. Chapter 2 deals witha variant of this model, called Decomposable Bulk Synchronous Model (D-BSP).In Chapter 3, basic elements of membrane computing, a recent branch of natural computing, arepresented. Membrane computing is a model abstracted from the way living cells function. The specicversion of membrane computing considered in this chapter is called a P system. AP systemcan be thoughtof as a distributed computing device.In any standard xed connection machine (like a mesh), it is assumed that all the interconnects areelectric. It is known that when communication distances exceed a few millimeters, optical interconnectsare preferable since they provide bandwidth and power advantages. Thus, it is wise to use both optical(for long distances) and electrical (for short distances) interconnects. Chapter 4 deals with one such modelcalled the Optical Transpose Interconnection System (OTIS).Chapter 5 proposes models and ways for converting PRAM and other algorithms onto a so-calledPRAM-on-chip architecture. PRAM-on-chip is a commodity high-end multicore computer architecture.This chapter provides the missing link for upgrading the standard theoretical PRAM algorithms class toa parallel algorithms and programming course. It also guides future compiler work by showing how togenerate performance-tuned programs from simple PRAM-like programs. Such program examples canused by compiler experts in order to teach the compiler about performance-tuned programs.When the amount of data to be processed is very large, it may not be possible to t all the data in themain memory of the computer used. This necessitates the use of secondary storage devices (such as disks)and the employment of out-of-core algorithms. In the context of parallel computing, the Parallel DiskModel (PDM) has been studied as a model for out-of-core computing. In Chapter 6, deterministic andrandomized algorithms for sorting on the PDM model are surveyed.Chapter 7 introduces a new threading model called Inthreads. This model is sufciently lightweight touse parallelization at very ne granularity. The Inthreads model is based on microthreads that operatewithin the context of a conventional thread.Mobile agents are software entities with the capacity for motion that can act on behalf of their user witha certain degree of autonomy in order to accomplish a variety of computing tasks. They nd applicationsin numerous computer environments such as operating system daemons, data mining, web crawlers, andso forth. In Chapter 8, an introduction to mobile agents is provided.The main focus of Chapter 9 is to investigate transitional issues arising in the migration from ne- tocoarse-grain multicomputers, as viewed from the perspective of the design of scalable parallel algorithms.A number of fundamental computational problems whose solutions are fairly well understood in thene-grain model are treated as case studies.A distributed computing environment is a collection of networked computational entities communicat-ing with each other by means of messages, in order to achieve a common goal, for example, to performa given task, to compute the solution to a problem, or to satisfy a request. Examples include data commu-nication networks, distributed databases, transaction processing systems, and so forth. In Chapter 10, theproblemof fault-tolerant computing (when communication faults are present) in a distributed computingenvironment is studied.Recongurable hardware has been shown to be very effective in providing application acceleration.For any scientic computing application, certain components can be accelerated using recongurablehardware and the other components may benet from general purpose processors. In Chapter 11, a hier-archical performance model is developed to aid in the design of algorithms for a hybrid architecture thathas both recongurable and general purpose processors. 2008 by Taylor & Francis Group, LLCxvi PrefaceIn Chapter 12, a general purpose-methodology called Hierarchical Performance Modeling (HPM) toevaluate the performance of distributed software systems is provided. It consists of a series of separatemodeling levels to match abstraction layers present in distributed systems. Each level represents a differentview of the system at a particular amount of details.A partitioned optical passive star (POPS) network is closely related to the OTIS network of Chapter 5.In a POPS(d, g) there are n = dg processors. These processors are divided into g groups with d nodes inevery group. There is an optical coupler between every pair of groups, so that at any given time step anycoupler can receive a message froma processor of the source group and broadcast it to the processors of thedestination group. In Chapter 13, algorithms are presented on the POPS network for several fundamentalproblems such as packet routing and selection.Chapter 14 studies the power of the recongurable mesh (R-Mesh) model. Many variants of the R-Meshare described. Algorithms for fundamental problems such as permutation routing, neighbor localization,prex sums, sorting, graph problems, and so forth are presented. The relationship between R-Mesh andIn Chapter 15 the subject matter is also the recongurable mesh. Algorithms for fundamental problemssuch as nding the leftmost 1, compression, prime numbers selection, integer summing, and so forth arepresented.Chapter 16 considers recongurable models that employ optical buses. The one-dimensional versionof this model is called a Linear Array with a Recongurable Pipelined Bus System (LARPBS). This modelas compression, prex computation, sorting, selection, PRAM simulations, and so forth are considered.A comparison of different optical models is also presented in this chapter.Parallel AlgorithmsPeer-to-peer networks consist of a collection of machines connected by some networking infrastruc-ture. Fundamental to such networks is the need for processing queries to locate resources. Thefocus of Chapter 17 is on the development of distributed data structures to process these querieseffectively.Chapter 18 gives an introduction to the design of parallel algorithms using the probabilistic method.Algorithms of this kind usually possess a randomized sequential counterpart. Parallelization of suchalgorithms is inherently linked with derandomization, either with the Erd osSpencer method ofconditional probabilities or exhaustive search in a polynomial-sized sample space.Networks of Workstations (NOWs) are a popular alternative to massively parallel machines and arewidely used. By simply using off-the-shelf PCs, a very powerful workstation cluster can be created, and thiscan provide a high amount of parallelismat relatively lowcost. Chapter 19 addresses issues and challengesin implementing broadcast and multicast on such platforms.Chapter 20 presents a survey of some recent advances in the atomic congestion games literature. Themainfocus is ona special case of congestiongames, called network congestion games, whichare of particularinterest to the networking community. The algorithmic questions of interest include the existence of pureNash equilibria.Grid scheduling requires a series of challenging tasks. These include searching for resources in collec-tions of geographically distributed heterogeneous computing systems and making scheduling decisions,taking into consideration quality of service. In Chapter 21, key issues related to grid scheduling aredescribed and solutions for them are surveyed.Chapter 22 surveys a set of scheduling algorithms that work for different models of sharing, resource,and request requirements. An idealized uid ow model of fair service is described, followed by practicalschedulers using discrete service models such as WFQ, SC, and WF2Q. Algorithms for parallel storageservers are also given.In Chapter 23, the problem of packet scheduling in wavelength-division-multiplexing (WDM) opticalinterconnects with limited range wavelength conversion and shared buffer is treated. Parallel algorithms 2008 by Taylor & Francis Group, LLCis closely related to the models described in Chapters 4, 11, 13, 14, and 15. Fundamental problems suchother models such as the PRAM, Boolean circuits, and Turing machines is brought out.Preface xviifor nding an optimal packet schedule are described. Finding an optimal schedule is reduced to a matchingproblem in a bipartite graph.Chapter 24 presents a summary of the past and current research on multiprocessor scheduling algorithmsfor hard real-time systems. It highlights some of the most important theoretical results on this topic overa 30-year period, beginning with the seminal paper of Liu and Layland in 1973, which provided impetusto much of the subsequent research in real-time scheduling theory.Maximal independent set and maximal matching are fundamental problems studied in computerscience. In the sequential case, a greedy linear time algorithm can be used to solve these problems.However, in parallel computation these problems are not trivial. Chapter 25 provides details of thederandomization technique and its applications to the maximal independent set and maximal matchingproblems.Chapter 26 focuses on algorithm design and engineering techniques that better t current (or nearfuture) architectures and help achieve good practical performance for solving arbitrary, sparse instancesof fundamental graphproblems. The target architecture is shared-memory machines, including symmetricmultiprocessors (SMPs) and the multithreaded architecture.Large-scale scientic data sets are appearing at an increasing rate, whose sizes can range from hundredsof gigabytes to tens of terabytes. Isosurface extraction and rendering is an important visualization tech-nique that enables the visual exploration of such data sets using surfaces. In Chapter 27, basic sequentialand parallel techniques used to extract and render isosurfaces with a particular focus on out-of-coretechniques are presented.Chapter 28 surveys computational paradigms offered by the recongurability of bus systems to obtainfast and ultra fast algorithms for a number of fundamental low and mid-level vision tasks. Applicationsconsidered range from the detection of man-made objects in a surrounding environment to automatictarget recognition.There are several current trends and associated challenges in Dense Linear Algebra (DLA) that inu-ence the development of DLA software libraries. Chapter 29 identies these trends, addresses the newchallenges, and consequently outlines a prospectus for new releases of the LAPACK and ScaLAPACKlibraries.Chapter 30 presents several polylogarithmic-time parallel string algorithms for the Parallel RandomAccess Machine (PRAM) models. Problems considered include string matching, construction of thedictionary of basic subwords, sufx arrays, and sufx trees. Sufx tree is a very useful data structure instring algorithmics.Graph theoretic and combinatorial problems arise in several traditional and emerging scientic discip-lines such as VLSI design, optimization, databases, and computational biology. Some examples includephylogeny reconstruction, proteinprotein interaction network, placement and layout in VLSI chips, andso forth. Chapter 31 presents fast parallel algorithms for several fundamental graph theoretic problems,optimized for multithreaded architectures such as the Cray MTA-2.Volumes of data are exploding in both scientic and commercial domains. Data mining techniques thatextract information fromhuge amount of data have become popular in many applications. In Chapter 32,parallel algorithms for association rule mining and clustering are presented.Algorithmics research in mobile computing is still in its infancy, and dates back to only a few years. Theelegance and terseness that exist today in algorithmics, especially parallel computing algorithmics, can bebrought to bear on research on mobile computing algorithmics. Chapter 33 presents an overviewin thisdirection.Parallel ApplicationsOne of the most signicant obstacles to high-performance computing is latency. Working with massivedata on a cluster induces latency in two forms: accessing data on disk and interprocessor communication.Chapter 34 explains how pipelines can mitigate latency and describes FG, a programming environmentthat improves pipeline-structured programs. 2008 by Taylor & Francis Group, LLCxviii PrefaceAn important aspect of any large-scale scientic application is data storage and retrieval. I/Otechnologylags behind other computing components by several orders of magnitude with a performance gap that isstill growing. Chapter 35 presents many powerful I/O techniques applied to each stratum of the parallelI/O software stack.Chapter 36surveys algorithms, complexity issues, andapplications for message disseminationproblemsdened over networks based on modern communication primitives. More specically, the problem ofdisseminating messages in parallel and distributed systems under the multicasting communication modeis discussed.In Chapter 37, fundamental network problems that have attracted considerable attention in thealgorithms community over the past 510 years are studied. Problems of interest are related tocommunication and data transfer, which are premier issues in high-performance networks today.An important optimization problem that has to be solved by the base stations in wireless cellularnetworks that utilize frequency division multiplexing (FDM) technology is the call admission controlproblem. Given a spectrum of available frequencies and users that wish to communicate with their basestation, the problem is to maximize the benet, that is, the number of users that communicate withoutsignal interference. Chapter 38 studies the online version of the call admission control problem undera competitive analysis perspective.In ad hoc wireless networks, the establishment of typical communication patterns like broadcasting,multicasting, andgroupcommunicationis strongly relatedto energy consumption. Since energy is a scarceresource, corresponding minimum energy communication problems arise. Chapter 39 considers a seriesof such problems on a suitable combinatorial model for ad hoc wireless networks and surveys recentapproximation algorithms and inapproximability results.Chapter 40investigates the problemof power aware scheduling. Techniques that explore different degreesof parallelism in a schedule and generate an energy-efcient canonical schedule are surveyed. A canonicalschedule corresponds to the case where all the tasks take their worst case execution times.Parallel and distributed systems may operate in an environment that undergoes unpredictable changescausing certain system performance features to degrade. Such systems need robustness to guaranteelimited degradation despite some uctuations in the behavior of their component parts or environment.Chapter 41 investigates the robustness of an allocation of resources to tasks in parallel and distributedsystems.Networks of workstations are widely considered a cost-efcient alternative to supercomputers. Despitethis consensus and high demand, there is no general parallel processing framework that is accepted andused by all. Chapter 42 discusses such a framework and addresses difculties that arise in the use of NOWs.Chapter 43 focuses on analytical approaches to scalability analysis of multicore processors. In relativelyearly stages of hardware development, such studies can effectively guide hardware design. For manyapplications, parallel algorithms scaling to very large congurations may not be the same as those that yieldhigh efciencies on smaller congurations. For these desired predictive properties, analytical modeling iscritical.Chapter 44 studies spatial locality-based parallel domain decomposition methods. An overview ofsome of the most widely used techniques such as orthogonal recursive bisection (ORB), space llingcurves (SFCs), and parallel octrees is given. Parallel algorithms for computing the specied domaindecompositions are surveyed as well.Data replication is an essential technique employed to reduce the user access time in distributed com-puting systems. Algorithms available for the data replication problem (DRP) range from the traditionalmathematical optimizationtechniques, suchas linear programming, dynamic programming, andsoforth,to the biologically inspired metaheuristics. Chapter 45 aims to introduce game theory as a new oracle totackle the data replication problem.In Chapter 46, three initiatives that are designed to present a solution to the data service requirements ofthe Advanced Computational Data Center Grid (ACDCGrid) are presented. Discussions in this chapter alsoinclude two data grid le utilization simulation tools, the Scenario Builder and the Intelligent Migrator,which examine the complexities of delivering a distributed storage network. 2008 by Taylor & Francis Group, LLCPreface xixChapter 47 presents fast and scalable parallel matrix multiplication algorithms and their applicationsto distributed memory systems. This chapter also describes a processor-efcient parallelization of thebest-known sequential matrix multiplication algorithm on a distributed memory system.Intended Use of the BookThis book is meant for use by researchers, developers, educators, and students in the area of parallelcomputing. Since parallel computing has found applications in a wide variety of domains, the ideas andtechniques illustrated in this book should prove useful to a very wide audience.This book can also be used as a text in a graduate course dealing with parallel computing. Graduatestudents who plan to conduct research in this area will nd this book especially invaluable. The book canalso be used as a supplement to any course on algorithms, complexity, or applied computing.Other ReadingA partial list of books that deal with randomization is given below. This is by no means an exhaustive listof all the books in the area.1. T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms, Second edition,MIT Press, Cambridge, MA, 2001.2. E. Horowitz, S. Sahni, and S. Rajasekaran, Computer Algorithms, W. H. Freeman, New York, 1998.3. J. J J, An Introduction to Parallel Algorithms, Addison-Wesley Publishers, Reading, 1992.4. T. Leighton, Introduction to Parallel Algorithms and Architectures: ArraysTreesHypercube,Morgan-Kaufmann Publishers, San Mateo, CA, 1992.5. P.M. Pardalos and S. Rajasekaran, editors, Advances in Randomized Parallel Computing, KluwerAcademic Publishers, Boston, 1999.6. J.H. Reif, editor, Synthesis of Parallel Algorithms, Morgan-Kaufmann Publishers, San Mateo, CA,1992.7. M.T. Goodrich and R. Tamassia, Algorithm Design: Foundations, Analysis, and Internet Examples,John Wiley & Sons, Inc., New York, 2002. 2008 by Taylor & Francis Group, LLCAcknowledgmentsWe are very thankful to the authors who have written these chapters under a very tight schedule. Wethank the staff of Chapman & Hall/CRC, in particular Bob Stern and Jill Jurgensen. We also gratefullyacknowledge the partial support fromthe National Science Foundation through grants CCR-9912395 andITR-0326155.Sanguthevar RajasekaranJohn H. Reifxxi 2008 by Taylor & Francis Group, LLCEditorsSanguthevar Rajasekaran received his ME degree in automation from the Indian Institute of Science,Bangalore, in 1983, and his PhD degree in computer science from Harvard University in 1988. Currentlyhe is the UTC chair professor of computer science and engineering at the University of Connecticut andthe director of Booth Engineering Center for Advanced Technologies (BECAT). Before joining UConn,he has served as a faculty member in the CISE Department of the University of Florida and in the CISDepartment of the University of Pennsylvania. During 20002002 he was the chief scientist for ArcotSystems. His research interests include parallel algorithms, bioinformatics, data mining, randomizedcomputing, computer simulations, and combinatorial optimization. He has published more than 150articles in journals and conferences. He has coauthored two texts on algorithms and coedited four bookson algorithms and related topics. He is an elected member of the Connecticut Academy of Science andEngineering.JohnReif is Hollis Edens distinguished professor at Duke University. He developed numerous parallel andrandomized algorithms for various fundamental problems including the solution of large sparse systems,sorting, graph problems, data compression, and so forth. He also made contributions to practical areasof computer science including parallel architectures, data compression, robotics, and optical computing.He is the author of over 200 papers and has edited four books on synthesis of parallel and randomizedalgorithms. He is a fellow of AAAS, IEEE, and ACM.xxiii 2008 by Taylor & Francis Group, LLCContributorsIshfaq AhmadDepartment of Computer Scienceand EngineeringUniversity of TexasArlington, TexasSelimG. AklSchool of ComputingQueens UniversityKingston, Ontario, CanadaSusanne AlbersDepartment of ComputerScienceUniversity of FreiburgFreiburg, GermanyShoukat AliPlatform Validation EngineeringIntel CorporationSacramento, CaliforniaSrinivas AluruDepartment of Electrical andComputer EngineeringIowa State UniversityAmes, IowaReda A. AmmarComputer Science andEngineering DepartmentUniversity of ConnecticutStorrs, ConnecticutDavid A. BaderGeorgia Institute ofTechnologyAtlanta, GeorgiaGianfranco BilardiDepartment of InformationEngineeringUniversity of PadovaPadova, ItalyDavid BindelUniversity of CaliforniaBerkeley, CaliforniaAnu G. BourgeoisDepartment of Computer ScienceGeorgia State UniversityAtlanta, GeorgiaAlfredo ButtariUniversity of TennesseeKnoxville, TennesseeGeorge C. CarageaDepartment of Computer ScienceUniversity of MarylandCollege Park, MarylandIoannis CaragiannisUniversity of Patras and ResearchAcademic ComputerTechnology InstitutePatras, GreeceBruce R. ChildersDepartment of Computer ScienceUniversity of PittsburghPittsburgh, PennsylvaniaAvery ChingNorthwestern UniversityEvanston, IllinoisAlok ChoudharyNorthwestern UniversityEvanston, IllinoisKenin ColomaNorthwestern UniversityEvanston, IllinoisGuojing CongIBM T.J. Watson Research CenterYorktown Heights, New YorkThomas H. CormenDartmouth CollegeHanover, New HampshireElena Riccio DavidsonDartmouth CollegeHanover, New HampshireJaime DavilaDepartment of Computer Scienceand EngineeringUniversity of ConnecticutStorrs, Connecticutxxv 2008 by Taylor & Francis Group, LLCxxvi ContributorsJames DemmelUniversity of CaliforniaBerkeley, CaliforniaJack DongarraUniversity of TennesseeKnoxville, TennesseeandOak Ridge National LaboratoryOak Ridge, TennesseeMichael FactorIBM Research LaboratoryHaifa, IsraelJohn FeoMicrosoft CorporationRedmond, WashingtonAlex GontmakherTechnionIsrael Institute ofTechnologyHaifa, IsraelTeolo F. GonzalezUniversity of CaliforniaSanta Barbara, CaliforniaMichael T. GoodrichUniversity of CaliforniaIrvine, CaliforniaAnanth GramaPurdue UniversityWest Lafayette, IndianaMing GuUniversity of CaliforniaBerkeley, CaliforniaAjay GulatiDepartment of Electrical andComputer EngineeringRice UniversityHouston, TexasYijie HanUniversity of MissouriKansasCityKansas City, MissouriYozo HidaUniversity of CaliforniaBerkeley, CaliforniaOscar H. IbarraDepartment of ComputerScienceUniversity of CaliforniaSanta Barbara, CaliforniaJoseph JaJaInstitute for Advanced ComputerStudies and Department ofElectrical and ComputerEngineeringUniversity of MarylandCollege Park, MarylandWilliamKahanUniversity of CaliforniaBerkeley, CaliforniaChristos KaklamanisUniversity of Patras and ResearchAcademic ComputerTechnology InstitutePatras, GreecePanagiotis KanellopoulosUniversity of Patras and ResearchAcademic ComputerTechnology InstitutePatras, GreeceSamee Ullah KhanDepartment of Electrical andComputer EngineeringColorado State UniversityFort Collins, ColoradoSamir KhullerUniversity of MarylandCollege Park, MarylandYoo-Ah KimComputer Science andEngineering DepartmentUniversity of ConnecticutStorrs, ConnecticutLasse KliemannDepartment of Computer ScienceChristian Albrechts UniversityKiel, GermanySpyros KontogiannisComputer Science DepartmentUniversity of IoanninaIoannina, GreeceEvangelos KranakisSchool of Computer ScienceCarleton UniversityOttawa, CanadaDanny KrizancDepartment of Mathematics andComputer ScienceWesleyan UniversityMiddletown, ConnecticutVipin KumarDepartment of ComputerScienceUniversity of MinnesotaMinneapolis, MinnesotaJakub KurzakUniversity of TennesseeKnoxville, TennesseeJulie LangouUniversity of TennesseeKnoxville, TennesseeJulien LangouUniversity of ColoradoDenver, ColoradoBryant C. LeeComputer Science DepartmentCarnegie Mellon UniversityPittsburgh, PennsylvaniaYoung Choon LeeSchool of InformationTechnologiesThe University of SydneySydney, AustraliaJianwei LiNorthwestern UniversityEvanston, Illinois 2008 by Taylor & Francis Group, LLCContributors xxviiKeqin LiDepartment of ComputerScienceState University of New YorkNew Paltz, New YorkXiaoye LiLawrence Berkeley NationalLaboratoryBerkeley, CaliforniaWei-keng LiaoNorthwestern UniversityEvanston, IllinoisYing LiuDTKE CenterGraduate University of theChinese Academy of SciencesBeijing, ChinaPiotr LuszczekMathWorksNatick, MassachusettsAnthony A. MaciejewskiDepartment of Electrical andComputer EngineeringColorado State UniversityFort Collins, ColoradoKamesh MadduriComputational Science andEngineering DivisionCollege of ComputingGeorgia Institute of TechnologyAtlanta, GeorgiaOsni MarquesLawrence Berkeley NationalLaboratoryBerkeley, CaliforniaRami MelhemDepartment of ComputerScienceUniversity of PittsburghPittsburgh, PennsylvaniaAvi MendelsonIntel IsraelMTM Scientic IndustriesCenterHaifa, IsraelRuss MillerCenter for ComputationalResearchDepartment of Computer Scienceand EngineeringState University of New YorkBuffalo, New YorkDaniel MossDepartment of Computer ScienceUniversity of PittsburghPittsburgh, PennsylvaniaKoji NakanoDepartment of InformationEngineeringHiroshima UniversityHigashi Hiroshima, JapanMichael J. NelsonUniversity of CaliforniaIrvine, CaliforniaStephan OlariuDepartment of Computer ScienceOld Dominion UniversityNorfolk, VirginiaMichael A. PalisRutgers UniversityCamden, New JerseyEvi PapaioannouUniversity of Patras and ResearchAcademic ComputerTechnology InstitutePatras, GreeceBeresford ParlettUniversity of CaliforniaBerkeley, CaliforniaAndrei P aunDepartment of ComputerScience/IfMLouisiana Tech UniversityRuston, LouisianaAndrea PietracaprinaDepartment of InformationEngineeringUniversity of PadovaPadova, ItalyViktor K. PrasannaDepartment of ElectricalEngineeringUniversity of Southern CaliforniaLos Angeles, CaliforniaGeppino PucciDepartment of InformationEngineeringUniversity of PadovaPadova, ItalySanguthevar RajasekaranDepartment of Computer Scienceand EngineeringUniversity of ConnecticutStorrs, ConnecticutSergio RajsbaumInstituto de MatemticasUniversidad Nacional Autnomade MxicoCiudad Universitaria, MexicoE. Jason RiedyUniversity of CaliforniaBerkeley, CaliforniaCatherine L. RubyCenter for ComputationalResearchDepartment of Computer Scienceand EngineeringState University of New YorkBuffalo, New YorkWojciech RytterWarsaw UniversityWarsaw, PolandSartaj SahniUniversity of FloridaGainesville, Florida 2008 by Taylor & Francis Group, LLCxxviii ContributorsNicola SantoroSchool of Computer ScienceCarleton UniversityOttawa, CanadaAssaf SchusterTechnionIsrael Institute ofTechnologyHaifa, IsraelRonald ScrofanoComputer Science DepartmentUniversity of Southern CaliforniaLos Angeles, CaliforniaSudip SealDepartment of Electrical andComputer EngineeringIowa State UniversityAmes, IowaKonstantin ShaginTechnionIsrael Institute ofTechnologyHaifa, IsraelQingmin ShiInstitute for Advanced ComputerStudiesUniversity of MarylandCollege Park, MarylandGregory ShkloverTechnionIsrael Institute ofTechnologyHaifa, IsraelHoward Jay SiegelDepartment of Electrical andComputer EngineeringDepartment of Computer ScienceColorado State UniversityFort Collins, ColoradoPaul G. SpirakisResearch Academic ComputerTechnology InstituteUniversity of PatrasPatras, GreeceAnand SrivastavDepartment of ComputerScienceChristian Albrechts UniversityKiel, GermanyStanimire TomovComputer Science DepartmentUniversity of TennesseeKnoxville, TennesseeJerry L. TrahanDepartment of Electrical andComputer EngineeringLouisiana State UniversityBaton Rouge, LouisianaRamachandran VaidyanathanDepartment of Electrical andComputer EngineeringLouisiana State UniversityBaton Rouge, LouisianaPeter J. VarmanDepartment of Electrical andComputer EngineeringRice UniversityHouston, TexasAmitabh VarshneyDepartment of ComputerScience and Institute forAdvanced Computer StudiesUniversity of MarylandCollege Park, MarylandUzi VishkinUniversity of Maryland Institutefor Advanced Computer StudiesDepartment of Electrical andComputer EngineeringUniversity of MarylandCollege Park, MarylandChristof VoemelUniversity of CaliforniaBerkeley, CaliforniaYung-Chun (Justin) WanGoogle Inc.Mountain View, CaliforniaChih-Fang WangDepartment of ComputerScienceSouthern Illinois University atCarbondaleCarbondale, IllinoisPeter WidmayerInstitute for TheoreticalInformaticsETH ZurichZurich, SwitzerlandYuanyuan YangDepartmental of Electricaland Computer EngineeringState University of New York atStony BrookStony Brook, New YorkZhenghao ZhangDepartmental of Electrical andComputer EngineeringState University of New York atStony BrookStony Brook, New YorkDakai ZhuUniversity of Texasat San AntonioSan Antonio, TexasAlbert Y. ZomayaDepartment of ComputerScienceSchool of InformationTechnologiesThe University of SydneySydney, Australia 2008 by Taylor & Francis Group, LLCIModels 2008 by Taylor & Francis Group, LLC1EvolvingComputationalSystemsSelim G. AklQueens University1.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11.2 Computational Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2Time and Speed What Does it Mean to Compute? Sequential Model Parallel Model A FundamentalAssumption1.3 Time-Varying Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4Quantum Decoherence Sequential Solution ParallelSolution1.4 Time-Varying Computational Complexity . . . . . . . . . . . . 1-6Examples of Increasing Functions C(t ) Computing withDeadlines Accelerating Machines1.5 Rank-Varying Computational Complexity . . . . . . . . . . . . 1-9An Algorithmic Example: Binary Search The InverseQuantum Fourier Transform1.6 Interacting Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12Disturbing the Equilibrium Solutions Distinguishabilityin Quantum Computing1.7 Computations Obeying Mathematical Constraints . . . 1-17Mathematical Transformations Sequential Solution Parallel Solution1.8 The Universal Computer is a Myth . . . . . . . . . . . . . . . . . . . . . 1-191.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-20Acknowledgment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-20References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-211.1 IntroductionThe universe in which we live is in a constant state of evolution. People age, trees grow, the weather varies.From one moment to the next, our world undergoes a myriad of transformations. Many of these changesare obvious to the naked eye, others more subtle. Deceptively, some appear to occur independently of anydirect external inuences. Others are immediately perceived as the result of actions by other entities.1-1 2008 by Taylor & Francis Group, LLC1-2 Handbook of Parallel Computing: Models, Algorithms and ApplicationsIn the realm of computing, it is generally assumed that the world is static. The vast majority ofcomputations take place in applications where change is thought of, rightly or wrongly, as inexistent orirrelevant. Input data are read, algorithms are applied to them, and results are produced. The possibilitythat the data, the algorithms, or even the results sought may vary during the process of computation israrely, if ever, contemplated.In this chapter we explore the concept of evolving computational systems. These are systems in whicheverything in the computational process is subject to change. This includes inputs, algorithms, outputs,and even the computing agents themselves. A simple example of a computational paradigm that meetsthis denition of an evolving systemto a limited extent is that of a computer interacting in real time with auser while processing information. Our focus here is primarily on certain changes that may affect the datarequired to solve a problem. We also examine changes that affect the complexity of the algorithm used inthe solution. Finally, we look at one example of a computer capable of evolving with the computation.Anumber of evolving computational paradigms are described for which a parallel computing approachis most appropriate. In Sections 1.3, 1.4, and 1.5, time plays an important role either directly or indirectlyin the evolution of the computation. Thus, it is the passage of time that may cause the change in the data.In another context, it may be the order in which a stage of an algorithm is performed that determines thenumber of operations required by that stage. The effect of time on evolving computations is also used tocontrast the capabilities of a parallel computer with those of an unconventional model of computationknown as the accelerating machine. In Sections 1.6 and 1.7, it is not time but rather external agentsacting on the data that are responsible for a variable computation. Thus, the data may be affected by ameasurement that perturbs an existing equilibrium, or by a modication in a mathematical structure thatviolates a required condition. Finally, in Section 1.8 evolving computations allow us to demonstrate theimpossibility of achieving universality in computing. Our conclusions are offered in Section 1.9.1.2 Computational ModelsIt is appropriate at the outset that we dene our models of computation. Two such models are introducedin this section, one sequential and one parallel. (A third model, the accelerating machine, is dened inSection 1.4.3.) We begin by stating clearly our understanding regarding the meaning of time, and ourassumptions in connection with the speed of processors.1.2.1 Time and SpeedIn the classical study of algorithms, whether sequential or parallel, the notion of a time unit is fundamentalto the analysis of an algorithms running time. Atime unit is the smallest discrete measure of time. In otherwords, time is divided into consecutive time units that are indivisible. All events occur at the beginning ofa time unit. Such events include, for example, a variable changing its value, a processor undertaking theexecution of a step in its computation, and so on.It is worth emphasizing that the length of a time unit is not an absolute quantity. Instead, the durationof a time unit is specied in terms of a number of factors. These include the parameters of the computationat hand, such as the rate at which the data are received, or the rate at which the results are to be returned.Alternatively, a time unit may be dened in terms of the speed of the processors available (namely, thesingle processor on a sequential computer and each processor on a parallel computer). In the latter case,a faster processor implies a smaller time unit.In what follows, the standard denition of time unit is adopted, namely: A time unit is the lengthof time required by a processor to perform a step of its computation. Specically, during a time unit, aprocessor executes a step consisting of1. A read operation in which it receives a constant number of xed-size data as input2. A calculate operation in which it performs a xed number of constant-time arithmetic and logicalcalculations (such as adding two numbers, comparing two numbers, and so on)3. A write operation in which it returns a constant number of xed-size data as output 2008 by Taylor & Francis Group, LLCEvolving Computational Systems 1-3All other occurrences external to the processor (such as the data arrival rate, for example) will be set andmeasured in these terms. Henceforth, the term elementary operation is used to refer to a read, a calculate,or a write operation.1.2.2 What Does it Mean to Compute?An important characteristic of the treatment in this chapter is the broad perspective taken to dene whatit means to compute. Specically, computation is a process whereby information is manipulated by, forexample, acquiring it (input), transforming it (calculation), and transferring it (output). Any form ofinformation processing (whether occurring spontaneously in nature, or performed on a computer builtby humans) is a computation. Instances of computational processes include1. Measuring a physical quantity2. Performing an arithmetic or logical operation on a pair of numbers3. Setting the value of a physical quantityto cite but a few. These computational processes themselves may be carried out by a variety of means,including, of course, conventional (electronic) computers, and also through physical phenomena [35],chemical reactions [1], and transformations in living biological tissue [42]. By extension, parallelcomputation is dened as the execution of several such processes of the same type simultaneously.1.2.3 Sequential ModelThis is the conventional model of computation used in the design and analysis of sequential (also knownas serial ) algorithms. It consists of a single processor made up of circuitry for executing arithmetic andlogical operations and a number of registers that serve as internal memory for storing programs and data.For our purposes, the processor is also equipped with an input unit and an output unit that allow it toreceive data from, and send data to, the outside world, respectively.During each time unit of a computation the processor can perform1. A read operation, that is, receive a constant number of xed-size data as input2. A calculate operation, that is, execute a xed number of constant-time calculations on its input3. A write operation, that is, return a constant number of xed-size results as outputIt is important to note here that the read and write operations can be, respectively, from and to themodels internal memory. In addition, both the reading and writing may be, on occasion, from and to anexternal medium in the environment in which the computation takes place. Several incarnations of thismodel exist, in theory and in practice [40]. In what follows, we refer to this model of computation as asequential computer.1.2.4 Parallel ModelOur chosen model of parallel computation consists of n processors, numbered 1 to n, where n 2. Eachprocessor is of the type described in Section 1.2.3. The processors are connected in some fashion and areable to communicate with one another in order to exchange data and results. These exchanges may takeplace through an interconnection network, in which case selected pairs of processors are directly connectedby two-way communication links. Pairs of processors not directly connected communicate indirectly bycreating a route for their messages that goes through other processors. Alternatively, all exchanges maytake place via a global shared memory that is used as a bulletin board. A processor wishing to send adatum to another processor does so by writing it to the shared memory, from where it is read by the otherprocessor. Several varieties of interconnection networks and modes of shared memory access exist, andany of them would be adequate as far as we are concerned here. The exact nature of the communicationmedium among the processors is of no consequence to the results described in this chapter. A study ofvarious ways of connecting processors is provided in [2]. 2008 by Taylor & Francis Group, LLC1-4 Handbook of Parallel Computing: Models, Algorithms and ApplicationsDuring each time unit of a computation a processor can perform1. A read operation, that is, receive as input a constant number of xed-size data2. A calculate operation, that is, execute a xed number of constant-time calculations on its input3. A write operation, that is, return as output a constant number of xed-size resultsAs with the sequential processor, the input can be received from, and the output returned to, either theinternal memory of the processor or the outside world. In addition, a processor in a parallel computermay receive its input from and return its output to the shared memory (if one is available), or anotherprocessor (through a direct link or via a shared memory, whichever is available). Henceforth, this modelof computation will be referred to as a parallel computer [3].1.2.5 A Fundamental AssumptionThe analyses in this chapter assume that all models of computation use the fastest processors possible(within the bounds established by theoretical physics). Specically, no sequential computer exists thatis faster than the one of Section 1.2.3, and similarly no parallel computer exists whose processors arefaster than those of Section 1.2.4. Furthermore, no processor on the parallel computer of Section 1.2.4 isfaster than the processor of the sequential computer of Section 1.2.3. This is the fundamental assumptionin parallel computation. It is also customary to suppose that the sequential and parallel computers useidentical processors. We adopt this convention throughout this chapter, with a single exception: InSection 1.4.3, we assume that the processor of the sequential computer is in fact capable of increasingits speed at every step (at a pre-established rate, so that the number of operations executable at everyconsecutive step is known a priori and xed once and for all).1.3 Time-Varying VariablesFor a positive integer n larger than 1, we are given n functions, each of one variable, namely, F0, F1, . . . ,Fn1, operating on the n variables x0, x1, . . . , xn1, respectively. Specically, it is required to computeFi(xi), for i =0, 1, . . . , n 1. For example, Fi(xi) may be equal to x2i .What is unconventional about this computation is the fact that the xi are themselves functions thatvary with time. It is therefore appropriate to write the n variables asx0(t ), x1(t ), . . . , xn1(t ),that is, as functions of the time variable t . It is important to note here that, while it is known that the xichange with time, the actual functions that effect these changes are not known (e.g., xi may be a truerandom variable).All the physical variables exist in their natural environment within which the computation is to takeplace. They are all available to be operated on at the beginning of the computation. Thus, for each variablexi(t ), it is possible to compute Fi(xi(t )), provided that a computer is available to perform the calculation(and subsequently return the result).Recall that time is divided into intervals, each of duration one time unit. It takes one time unit toevaluate Fi(xi(t )). The problem calls for computing Fi(xi(t )), 0 i n 1, at time t = t0. In otherwords, once all the variables have assumed their respective values at time t = t0, the functions Fi are tobe evaluated for all values of i. Specically,F0(x0(t0)), F1(x1(t0)), . . . , Fn1(xn1(t0))are to be computed. The fact that xi(t ) changes with the passage of time should be emphasized here.Thus, if xi(t ) is not operated on at time t = t0, then after one time unit xi(t0) becomes xi(t0+ 1), andafter two time units it is xi(t0+ 2), and so on. Indeed, time exists as a fundamental fact of life. It is real, 2008 by Taylor & Francis Group, LLCEvolving Computational Systems 1-5relentless, and unforgiving. Time cannot be stopped, much less reversed. (For good discussions of theseissues, see [28, 45].) Furthermore, for k > 0, not only is each value xi(t0 + k) different from xi(t0) butalso the latter cannot be obtained from the former. We illustrate this behavior through an example fromphysics.1.3.1 Quantum DecoherenceA binary variable is a mathematical quantity that takes exactly one of a total of two possible values atany given time. In the base 2 number system, these values are 0 and 1, and are known as binary digitsor bits. Todays conventional computers use electronic devices for storing and manipulating bits. Thesedevices are in either one or the other of two physical states at any given time (e.g., two voltage levels), onerepresenting 0, the other 1. We refer to such a device, as well as the digit it stores, as a classical bit.In quantum computing, a bit (aptly called a quantum bit, or qubit) is both 0 and 1 at the same time.The qubit is said to be in a superposition of the two values. One way to implement a qubit is by encodingthe 0 and 1 values using the spin of an electron (e.g., clockwise, or up for 1, and counterclockwise, ordown for 0). Formally, a qubit is a unit vector in a two-dimensional state space, for which a particularorthonormal basis, denoted by {|0, |1} has been xed. The two basis vectors |0 and |1 correspond tothe possible values a classical bit can take. However, unlike classical bits, a qubit can also take many othervalues. In general, an arbitrary qubit can be written as a linear combination of the computational basisstates, namely, |0 +|1, where and are complex numbers such that ||2+||2= 1.Measuring the value of the qubit (i.e., reading it) returns a 0 with probability ||2and a 1 with aprobability ||2. Furthermore, the measurement causes the qubit to undergo decoherence (literally, tolose its coherence). When decoherence occurs, the superposition is said to collapse: any subsequentmeasurement returns the same value as the one obtained by the rst measurement. The informationpreviously held in the superposition is lost forever. Henceforth, the qubit no longer possesses its quantumproperties and behaves as a classical bit [33].There is a second way, beside measurement, for decoherence to take place. A qubit loses its coherencesimply through prolonged exposure to its natural environment. The interaction between the qubit and itsphysical surroundings may be thought of as an external action by the latter causing the former to behaveas a classical bit, that is, to lose all information it previously stored in a superposition. (One can alsoview decoherence as the act of the qubit making a mark on its environment by adopting a classical value.)Depending on the particular implementation of the qubit, the time needed for this form of decoherenceto take place varies. At the time of this writing, it is well below 1 s (more precisely, in the vicinity of ananosecond). The information lost through decoherence cannot be retrieved. For the purposes of thisexample, the time required for decoherence to occur is taken as one time unit.Now suppose that a quantum system consists of n independent qubits, each in a state of superposition.Their respective values at some time t0, namely, x0(t0), x1(t0), . . . , xn1(t0), are to be used as inputs tothe n functions F0, F1, . . . , Fn1, in order to perform the computation described at the beginning ofSection 1.3, that is, to evaluate Fi(xi(t0)), for 0 i n 1.1.3.2 Sequential SolutionA sequential computer fails to compute all the Fi as desired. Indeed, suppose that x0(t0) is initiallyoperated upon. It follows that F0(x0(t0)) can be computed correctly. However, when the next variable, x1,for example, is to be used (as input to F1), the time variable would have changed fromt = t0 to t = t0+1,and we obtain x1(t0 + 1), instead of the x1(t0) that we need. Continuing in this fashion, x2(t0 + 2),x3(t0+3), . . . , xn1(t0+n 1), represent the sequence of inputs. In the example of Section 1.3.1, by thetime F0(x0(t0)) is computed, one time unit would have passed. At this point, the n 1 remaining qubitswould have undergone decoherence. The same problem occurs if the sequential computer attempts torst read all the xi, one by one, and store them before calculating the Fi. 2008 by Taylor & Francis Group, LLC1-6 Handbook of Parallel Computing: Models, Algorithms and ApplicationsSince the function according to which each xi changes with time is not known, it is impossible to recoverxi(t0) from xi(t0 + i), for i = 1, 2, . . . , n 1. Consequently, this approach cannot produce F1(x1(t0)),F2(x2(t0)), . . . , Fn1(xn1(t0)), as required.1.3.3 Parallel SolutionFor a given n, any computer capable of performing n calculate operations per step can easily evaluate theFi(xi(t0)), all simultaneously, leading to a successful computation.Thus, a parallel computer consisting of n independent processors may perform all the computations atonce: For 0 i n 1, and all processors working at the same time, processor i computes Fi(xi(t0)).In the example of Section 1.3.1, the n functions are computed in parallel at time t = t0, before decoherenceoccurs.1.4 Time-Varying Computational ComplexityIn traditional computational complexity theory, the size of a problemP plays an important role. If P hassize n, for example, then the number of operations required in the worst case to solve P (by any algorithm)is expressed as a function of n. Similarly, the number of operations executed (in the best, average, andworst cases) by a specic algorithmthat solves P is also expressed as a function of n. Thus, for example, theproblem of sorting a sequence of n numbers requires (n log n) comparisons, and the sorting algorithmQuicksort performs O(n2) comparisons in the worst case.In this section, we depart from this model. Here, the size of the problem plays a secondary role. In fact,in most (though not necessarily all) cases, the problem size may be taken as constant. The computationalcomplexity now depends on time. Not only science and technology but also everyday life provide manyinstances demonstrating time-varying complexity. Thus, for example,1. An illness may get better or worse with time, making it more or less amenable to treatment.2. Biological and software viruses spread with time making them more difcult to cope with.3. Spam accumulates with time making it more challenging to identify the legitimate email needlesin the haystack of junk messages.4. Tracking moving objects becomes harder as they travel away from the observer (e.g., a spaceshipracing toward Mars).5. Security measures grow with time in order to combat crime (e.g., when protecting the pri-vacy, integrity, and authenticity of data, ever stronger cryptographic algorithms are used, i.e.,ones that are more computationally demanding to break, thanks to their longer encryption anddecryption keys).6. Algorithms in many applications have complexities that vary with time from one time unit duringthe computation to the next. Of particular importance here area. Molecular dynamics (the study of the dynamic interactions among the atoms of a system,including the calculation of parameters such as forces, energies, and movements) [18, 39].b. Computational uid dynamics (the study of the structural and dynamic properties of movingobjects, including the calculation of the velocity and pressure at various points) [11].Suppose that we are given an algorithm for solving a certain computational problem. The algorithmconsists of a number of stages, where each stage may represent, for example, the evaluation of a particulararithmetic expression such asc a +b.Further, let us assume that a computational stage executed at time t requires a number C(t ) of constant-time operations. As the aforementioned situations show, the behavior of C varies from case to case. 2008 by Taylor & Francis Group, LLCEvolving Computational Systems 1-7Typically, C may be an increasing, decreasing, unimodal, periodic, random, or chaotic function of t .In what follows, we study the effect on computational complexity of a number of functions C(t ) that growwith time.It is worth noting that we use the term stage to refer to a component of an algorithm, hence a variableentity, in order to avoid confusion with a step, an intrinsic property of the computer, as dened inSections 1.2.1 and 1.4.3. In conventional computing, where computational complexity is invariant (i.e.,oblivious to external circumstances), a stage (as required by an algorithm) is exactly the same thing as astep (as executed by a computer). In unconventional computing (the subject of this chapter), computationalcomplexity is affected by its environment and is therefore variable. Under such conditions, one or moresteps may be needed in order to execute a stage.1.4.1 Examples of Increasing Functions C(t)Consider the following three cases in which the number of operations required to execute a computationalstage increases with time. For notational convenience, we use S(i) to express the number of operationsperformed in executing stage i, at the time when that stage is in fact executed. Denoting the latter by ti, itis clear that S(i) = C(ti).1. For t 0, C(t ) = t +1. Table 1.1 illustrates ti, C(ti), and S(i), for 1 i 6.It is clear in this case that S(i) = 2i1, for i 1. It follows that the total number of operationsperformed when executing all stages, from stage 1 up to and including stage i, isi

j=12j1= 2i1.It is interesting to note that while C(t ) is a linear function of the time variable t , the quantity S(i)grows exponentially with i 1, where i is the number of stages executed so far. The effect of thisbehavior on the total number of operations performed is appreciated by considering the followingexample. When executing a computation requiring log n stages for a problem of size n,2log n1 = n 1operations are performed.2. For t 0, C(t ) = 2t. Table 1.2 illustrates ti, C(ti), and S(i), for 1 i 5.In this case, S(1) = 1, and for i > 1, we haveS(i) = 2

i1j=1 S(j).TABLE 1.1 Number of Operations Required toComplete Stage i When C(t ) = t +1Stage i ti C(ti) S(i)1 0 C(0) 12 0 +1 C(1) 23 1 +2 C(3) 44 3 +4 C(7) 85 7 +8 C(15) 166 15 +16 C(31) 327 31 +32 C(63) 64 2008 by Taylor & Francis Group, LLC1-8 Handbook of Parallel Computing: Models, Algorithms and ApplicationsTABLE 1.2 Number of Operations Required toComplete Stage i When C(t ) = 2tStage i ti C(ti) S(i)1 0 C(0) 202 0 +1 C(1) 213 1 +2 C(3) 234 3 +8 C(11) 2115 11 +2048 C(2059) 22059TABLE 1.3 Number of Operations Required toComplete Stage i When C(t ) = 22tStage i ti C(ti) S(i)1 0 C(0) 2202 0 +2 C(2) 2223 2 +16 C(18) 2218Since S(i) >

i1j=1 S(j), the total number of operations required by i stages is less than 2S(i),that is, O(S(i)).Here we observe again that while C(t ) = 2C(t 1), the number of operations required by S(i),for i > 2, increases signicantly faster than double those required by all previous stages combined.3. For t 0, C(t ) = 22t. Table 1.3 illustrates ti, C(ti), and S(i), for 1 i 3.Here, S(1) = 2, and for i > 1, we haveS(i) = 22

i1j=1 S(j).Again, since S(i) >

i1j=1 S(j), the total number of operations required by i stages is less than2S(i), that is, O(S(i)).In this example, the difference between the behavior of C(t ) and that of S(i) is even more dramatic.Obviously, C(t ) = C(t 1)2, where t 1 and C(0) = 2, and as such C(t ) is a fast growingfunction (C(4) = 65, 536, while C(7) is represented with 39 decimal digits). Yet, S(i) grows at afar more dizzying pace: Already S(3) is equal to 2 raised to the power 4 65, 536.The signicance of these examples and their particular relevance in parallel computation are illustratedby the paradigm in the following section.1.4.2 Computing with DeadlinesSuppose that a certain computation requires that n functions, each of one variable, be computed. Spe-cically, let f0(x0), f1(x1), . . . , fn1(xn1) be the functions to be computed. This computation has thefollowing characteristics:1. The n functions are entirely independent. There is no precedence whatsoever among them; theycan be computed in any order.2. Computing fi(xi) at time t requires C(t ) = 2toperations, for 0 i n 1 and t 0.3. There is a deadline for reporting the results of the computations: All n values f0(x0), f1(x1), . . . ,fn1(xn1) must be returned by the end of the third time unit, that is, when t = 3.It should be easy to verify that no sequential computer, capable of exactly one constant-time operationper step (i.e., per time unit), can perform this computation for n 3. Indeed, f0(x0) takes C(0) = 20= 1 2008 by Taylor & Francis Group, LLCEvolving Computational Systems 1-9time unit, f1(x1) takes another C(1) = 21= 2 time units, by which time three time units would haveelapsed. At this point none of f2(x2), f3(x3), . . . , fn1(xn1) would have been computed.By contrast, an n-processor parallel computer solves the problemhandily. With all processors operatingsimultaneously, processor i computes fi(xi) at time t = 0, for 0 i n 1. This consumes one timeunit, and the deadline is met.The example in this section is based on one of the three functions for C(t ) presented in Section 1.4.1.Similar analyses can be performed in the same manner for C(t ) = t +1 and C(t ) = 22t, as well as otherfunctions describing time-varying computational complexity.1.4.3 Accelerating MachinesIn order to put the result in Section 1.4.2 in perspective, we consider a variant on the sequential modelof computation described in Section 1.2.3. An accelerating machine is a sequential computer capable ofincreasing the number of operations it can do at each successive step of a computation. This is primarilya theoretical model with no existing implementation (to date!). It is widely studied in the literature onunconventional computing [10,12,14,43,44,46]. The importance of the accelerating machine lies primarilyin its role in questioning some long-held beliefs regarding uncomputability [13] and universality [7].It is important to note that the rate of acceleration is specied at the time the machine is put in serviceand remains the same for the lifetime of the machine. Thus, the number of operations that the machinecan execute during the ith step is known in advance and xed permanently, for i = 1, 2, . . . .Suppose that an accelerating machine that can double the number of operations that it can performat each step is available. Such a machine would be able to perform one operation in the rst step, twooperations in the second, four operations in the third, and so on. How would such an extraordinarymachine fare with the computational problem of Section 1.4.2?As it turns out, an accelerating machine capable of doubling its speed at each step, is unable to solvethe problem for n 4. It would compute f0(x0), at time t = 0 in one time unit. Then it would computef1(x1), which now requires two operations at t = 1, also in one time unit. Finally, f2(x2), requiring fouroperations at t = 2, is computed in one time unit, by which time t = 3. The deadline has been reachedand none of f3(x3), f4(x4), . . . , fn1(xn1) has been computed.In closing this discussion of accelerating machines we note that once an accelerating machine has beendened, a problem can always be devised to expose its limitations. Thus, let the acceleration function be(t ). In other words, (t ) describes the number of operations that the accelerating machine can performat time t . For example, (t ) = 2(t 1), with t 1 and (0) = 1, as in the case of the acceleratingmachine in this section. By simply taking C(t ) > (t ), the accelerating machine is rendered powerless,even in the absence of deadlines.1.5 Rank-Varying Computational ComplexityUnlike the computations in Section 1.4, the computations with which we are concerned here have acomplexity that does not vary with time. Instead, suppose that a computation consists of n stages. Theremay be a certain precedence among these stages, that is, the order in which the stages are performedmatters since some stages may depend on the results produced by other stages. Alternatively, the n stagesmay be totally independent, in which case the order of execution is of no consequence to the correctnessof the computation.Let the rank of a stage be the order of execution of that stage. Thus, stage i is the ith stage to beexecuted. In this section, we focus on computations with the property that the number of operationsrequired to execute a stage whose rank is i is a function of i only. For example, as in Section 1.4, 2008 by Taylor & Francis Group, LLC1-10 Handbook of Parallel Computing: Models, Algorithms and Applicationsthis function may be increasing, decreasing, unimodal, random, or chaotic. Instances of algorithmswhose computational complexity varies from one stage to another are described in Reference 15. Aswe did before, we concentrate here on the case where the computational complexity C is an increasingfunction of i.When does rank-varying computational complexity arise? Clearly, if the computational require-ments grow with the rank, this type of complexity manifests itself in those circumstances where it isa disadvantage, whether avoidable or unavoidable, to being ith, for i 2. For example1. A penalty may be charged for missing a deadline, such as when a stage s must be completed by acertain time ds.2. The precision and/or ease of measurement of variables involved in the computation in a stage smay decrease with each stage executed before s.3. Biological tissues may have been altered (by previous stages) when stage s is reached.4. The effect of s 1 quantum operations may have to be reversed to perform stage s.1.5.1 An Algorithmic Example: Binary SearchBinary search is a well-known (sequential) algorithm in computer science. It searches for an element x ina sorted list L of n elements. In the worst case, binary search executes O(log n) stages. In what follows, wedenote by B(n) the total number of elementary operations performed by binary search (on a sequentialcomputer), and hence its running time, in the worst case.Conventionally, it is assumed that C(i) = O(1), that is, each stage i requires the same constantnumber of operations when executed. Thus, B(n) = O(log n). Let us now consider what happens to thecomputational complexity of binary search when we assume, unconventionally, that the computationalcomplexity of every stage i increases with i. Table 1.4 shows how B(n) grows for three different valuesof C(i).In a parallel environment, where n processors are available, the fact that the sequence L is sorted is ofno consequence to the search problem. Here, each processor reads x, compares one of the elements of Lto x, and returns the result of the comparison. This requires one time unit. Thus, regardless of C(i), therunning time of the parallel approach is always the same.1.5.2 The Inverse Quantum Fourier TransformConsider a quantum register consisting of n qubits. There are 2ncomputational basis vectors associatedwith such a register, namely,|0 = |000 00,|1 = |000 01,...|2n1 = |111 11.TABLE 1.4 Number of OperationsRequired by Binary Search forDifferent Functions C(i)C(i) B(n)i O(log2n)2iO(n)22iO(2n) 2008 by Taylor & Francis Group, LLCEvolving Computational Systems 1-11Let |j = |j1j2j3 jn1jn be one of these vectors. For j = 0, 1, . . . , 2n1, the quantumFourier transformof |j is given by

|0 +e2i0.jn|1

|0 +e2i0.jn1jn|1

|0 +e2i0.j1j2 jn|1

2n/2 ,where1. Each transformed qubit is a balanced superposition of |0 and |1.2. For the remainder of this section i =1.3. The quantities 0.jn, 0.jn1jn, . . . , 0.j1j2 jn, are binary fractions, whose effect on the |1component is called a rotation.4. The operator represents a tensor product; for example.(a1|0 +b1|1) (a2|0 +b2|1) = a1a2|00 +a1b2|01 +b1a2|10 +b1b2|11.We nowexamine the inverse operation, namely, obtaining the original vector |j fromits given quantumFourier transform.1.5.2.1 Sequential SolutionSince the computation of each of j1, j2, . . . jn1 depends on jn, we must begin by computing the latter from|0 +e2i0.jn|1 . This takes one operation. Now jn is used to compute jn1 from|0 +e2i0.jn1jn|1in two operations. In general, once jn is available, jk requires knowledge of jk+1, jk+2, . . . , jn, must becomputed in (nk +1)st place, and costs nk +1 operations to retrieve from|0+e2i0.jkjk+1 jn|1,for k = n 1, n 2, . . . , 1. Formally, the sequential algorithm is as follows:for k = n downto 1 do|jk 12

|0e2i0.jkjk+1 jn|1

for m = k +1 to n doif jn+k+1m = 1 then|jk |jk

1 00 e2i/2nm+2

end ifend for|jk |jk

12

1 11 1

end for.Note that the inner for loop is not executed when m > n. It is clear from the above analysis that asequential computer obtains j1, j2, . . . , jn in n(n +1)/2 time units.1.5.2.2 Parallel SolutionBy contrast, a parallel computer can do much better in two respects. First, for k = n, n1, . . . , 2, once jk isknown, all operations involving jk in the computa

handbook of parallel computing models, algorithms and applications (2008)

Documents