a fault diagnosis system for the connected home ...jakab/edu/litr/tmn/old/fault_diagn... ·...

7
IEEE Communications Magazine • November 2004 128 0163-6804/04/$20.00 © 2004 IEEE CONSUMER COMMUNICATIONS AND NETWORKING INTRODUCTION Although the idea of the “connected home” is not new — after all, the concept formed a cor- nerstone of the Jetsons cartoons dating from 1962 — it is beginning to look as if its time may have finally come. With widespread Internet access from homes, the growth in the use of broadband and wireless LAN technologies, and the emergence of new devices such as networked audio players, the vision of “devices working together with synergy to realize the potential of the digital decade” (Bill Gates [1]) may soon become a reality. Given this context, this article explores the role of fault diagnosis in the connected home and reports on how agent technology may be used to address this issue. It draws on experi- ences gained through the development of an experimental prototype fault diagnosis system (FDS). It closes by looking to the future and argues the need for standards to assist with suc- cessful resolution of faults in multivendor home systems of the future. THE CONNECTED HOME CONCEPT In the authors’ opinion, most current home net- works can be categorized as first-generation sys- tems: they typically encompass a few PCs sharing an Internet connection and possibly a printer. However, we believe the future lies with a sec- ond generation of systems built around a service gateway at the heart of the system. A service-gateway-based system extends the home network vision to include a range of devices, not just PCs, attached to a variety of types of local networks with differing properties, and in so doing should help to realize the truly connected home (Fig. 1). For example, both wired networks, such as IP over Ethernet and Home Audio Video Inter- operability (HAVi) over Firewire (IEEE1394), and wireless networks, such as Wi-Fi (IEEE802.11b), might be included. Indeed, con- nectivity for some subnetworks may include pow- erline technologies such as X10, where low-bit-rate signals are transmitted over mains cabling. The goal of such a home network is to enable a wide range of different applications to execute within the home environment. Applications would involve processes running locally on one or more devices and also possibly incorporate external services executing on remote service provider servers. Hence, such a system would provide the framework for deployment of so- called Internet appliances. The essential characteristic of a service gate- way is that it should support the dynamic down- load and installation of program code to enable on-the-fly selection and execution of new ser- vices and the straightforward introduction of new local devices and networks. Arguably, the leading standard in this area stems from the Open Services Gateway Initiative (OSGi) [2]. The OSGi was formed to develop and promote open specifications for the network delivery of managed services to devices primarily in the home. To this end, the OSGi have defined a standard known as the OSGi service platform [3]. It is important to note the change in empha- sis from service gateway to service platform as the standard has evolved and matured, and its value as a general-purpose distributed services platform has been recognized. Figure 2 provides a simplified representation of a home networking system based on the OSGi standard. This is based on Java, which enables the required dynamic code download. It is hard- ware- and operating-system-independent, merely requiring a small footprint Java virtual machine, and is designed to work with multiple network technologies, device access technologies, and services. OSGi functionality is structured into a core set of framework services and an extensible set of bundles. Bundles form the basis for dynam- ic code download. Peter Utton and Eric Scharf, Queen Mary, University of London ABSTRACT The connected home of the future, in which all consumer appliances in a home are net- worked together, is close to becoming the con- nected home of today. This article explores the role of fault diagnosis in such an environment and explains how agent technology may be applied. The article outlines the need for future standards that can maximize the benefit of a shared diagnostic system. A Fault Diagnosis System for the Connected Home

Upload: others

Post on 06-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A fault diagnosis system for the connected home ...jakab/edu/litr/TMN/old/Fault_Diagn... · diagnosing problems and localizing them to an individual component will be a key requirement

IEEE Communications Magazine • November 2004128 0163-6804/04/$20.00 © 2004 IEEE

CONSUMER COMMUNICATIONS ANDNETWORKING

INTRODUCTION

Although the idea of the “connected home” isnot new — after all, the concept formed a cor-nerstone of the Jetsons cartoons dating from1962 — it is beginning to look as if its time mayhave finally come. With widespread Internetaccess from homes, the growth in the use ofbroadband and wireless LAN technologies, andthe emergence of new devices such as networkedaudio players, the vision of “devices workingtogether with synergy to realize the potential ofthe digital decade” (Bill Gates [1]) may soonbecome a reality.

Given this context, this article explores therole of fault diagnosis in the connected homeand reports on how agent technology may beused to address this issue. It draws on experi-ences gained through the development of anexperimental prototype fault diagnosis system(FDS). It closes by looking to the future andargues the need for standards to assist with suc-cessful resolution of faults in multivendor homesystems of the future.

THE CONNECTED HOME CONCEPTIn the authors’ opinion, most current home net-works can be categorized as first-generation sys-tems: they typically encompass a few PCs sharingan Internet connection and possibly a printer.However, we believe the future lies with a sec-ond generation of systems built around a servicegateway at the heart of the system.

A service-gateway-based system extends thehome network vision to include a range of devices,not just PCs, attached to a variety of types of local

networks with differing properties, and in so doingshould help to realize the truly connected home(Fig. 1). For example, both wired networks, suchas IP over Ethernet and Home Audio Video Inter-operability (HAVi) over Firewire (IEEE1394),and wireless networks, such as Wi-Fi(IEEE802.11b), might be included. Indeed, con-nectivity for some subnetworks may include pow-erline technologies such as X10, where low-bit-ratesignals are transmitted over mains cabling.

The goal of such a home network is to enablea wide range of different applications to executewithin the home environment. Applicationswould involve processes running locally on oneor more devices and also possibly incorporateexternal services executing on remote serviceprovider servers. Hence, such a system wouldprovide the framework for deployment of so-called Internet appliances.

The essential characteristic of a service gate-way is that it should support the dynamic down-load and installation of program code to enableon-the-fly selection and execution of new ser-vices and the straightforward introduction ofnew local devices and networks. Arguably, theleading standard in this area stems from theOpen Services Gateway Initiative (OSGi) [2].The OSGi was formed to develop and promoteopen specifications for the network delivery ofmanaged services to devices primarily in thehome. To this end, the OSGi have defined astandard known as the OSGi service platform[3]. It is important to note the change in empha-sis from service gateway to service platform asthe standard has evolved and matured, and itsvalue as a general-purpose distributed servicesplatform has been recognized.

Figure 2 provides a simplified representationof a home networking system based on the OSGistandard. This is based on Java, which enablesthe required dynamic code download. It is hard-ware- and operating-system-independent, merelyrequiring a small footprint Java virtual machine,and is designed to work with multiple networktechnologies, device access technologies, andservices. OSGi functionality is structured into acore set of framework services and an extensibleset of bundles. Bundles form the basis for dynam-ic code download.

Peter Utton and Eric Scharf, Queen Mary, University of London

ABSTRACT

The connected home of the future, in whichall consumer appliances in a home are net-worked together, is close to becoming the con-nected home of today. This article explores therole of fault diagnosis in such an environmentand explains how agent technology may beapplied. The article outlines the need for futurestandards that can maximize the benefit of ashared diagnostic system.

A Fault Diagnosis System for theConnected Home

Page 2: A fault diagnosis system for the connected home ...jakab/edu/litr/TMN/old/Fault_Diagn... · diagnosing problems and localizing them to an individual component will be a key requirement

IEEE Communications Magazine • November 2004 129

Plug and play is a fundamental goal of OSGi,and support for automatic discovery is built intothe OSGi device access manager. For this towork smoothly end to end, with a minimum ofuser interaction, the system requires devices tobe able to announce themselves on connectionto their LAN. This feature can be provided by anumber of complementary technologies, such asJini [4], UPnP [5], Bluetooth [6], and HAVi [7].As a result, OSGi in combination with theseother technologies offers the basis for a soundservice-based home network environment.

The OSGi specification dictates that OSGiplatforms must support administration by aremote service provider, hence the presence ofthe remote manager role in Fig. 2. In fact, theOSGi promotes a philosophy it terms zero useradmin to reflect the fact that negligible effort onthe user’s part should be devoted to systemsadministration.

Another relevant emerging standard is thehome electronic system, part 1 of which waspublished in 2003 [8]. Informally known asHomeGate, it defines the architecture for amodular platform to act as a gateway for deliv-ery of broadband information to the home andfacilitate the interoperability of different LANtechnologies. In contrast to OSGi, HomeGate ismore hardware-oriented, mandating the use ofphysical plug-in modules rather than remotelydownloadable software bundles. It does, howev-er, have the backing of an international stan-dard.

THE NEED FOR FAULT DIAGNOSISWithin the home environment, the managementof faults will be critically important. Second-generation home networks will be inherentlycomplex, but must appear simple to users. Faultswill occur and as systems are likely to be assem-bled from components from different suppliers,diagnosing problems and localizing them to anindividual component will be a key requirement.It is likely that component suppliers will bereluctant to accept responsibility for resolvingfaults unless there is clear evidence that theblame lies at their door. For home networkingto achieve widespread acceptance, it willdemand automated support for identifying andresolving faults.

The management of fault conditions in net-works is a long established discipline (and ofcourse gives rise to the F in the FCAPSmnemonic for standard network managementfunctions, along with configuration, accounting,performance, and security), but second-genera-tion home networks are likely to exhibit certaincharacteristics that differentiate them from tradi-tional networks. They are likely to:• Be smaller in scale than typical business or

telecommunications networks• Incorporate a wide diversity of device types• Perhaps crucially have limited user expertise

available to resolve problemsCertainly there will not be a team of skilled

human experts on hand to act as technical sup-port. Consequently, maintaining the ethos ofzero user admin will demand a fault diagnosissystem that:

• Is highly autonomous• Provides expert advice to users• Exploits multiple knowledge sources to

enable the most informed decisions to bemadeThis article will consider some fault scenarios

nnnn Figure 1. The connected home: system outline.

Internet

Externalclient

Serviceprovider

PC

PDA

TV

UserLights

Gateway

LAN 1

Ethernet

Local domain — customer premisesExternal domain

LAN 2

LAN 4

Wi-fi

LAN 3

HaVi

X10

User

nnnn Figure 2. An OSGi -based system.

DeviceDevice

Appserver Bundles

OSGi service gateway

External domain

Hardware

OSGi framework

Java runtime environment

Operating system

Device

Internal domain: within the home

Internet

Remote serviceprovider

Managementserver

Remotemanager

Appserver

nnnn Figure 3. An example home network.

s2

h4h3

n5

n4

n7

h5

s1

h1g

wan

n1 n3

h2

Internet

n2 n6

Page 3: A fault diagnosis system for the connected home ...jakab/edu/litr/TMN/old/Fault_Diagn... · diagnosing problems and localizing them to an individual component will be a key requirement

IEEE Communications Magazine • November 2004130

for the example network shown in Fig. 3. In thishypothetical network configuration, s1 and s2are 4-port switches, node g acts as the gatewayfor external connectivity, and hosts h1–h5 areavailable to run applications. Network links arelabeled n1–n7, apart from the wide area networklink, labeled wan.

Our example uses a simple photo galleryapplication built as a service using the OSGiapplication programming interface (API). OSGienables services running on one service platformto call other services running on the same plat-form. Unfortunately, at present OSGi does notsupport federations of platforms and does notdirectly support calls to services on remote plat-forms. However, the expectation is that a fami-ly’s electronic photos would be distributed acrossmultiple nodes in the network, each node underthe control of a different family member. Thefacilities offered by an OSGi framework providebenefits for deployment of software on internalhosts as well as the gateway, so nodes g andh1–h5 in our scenario are all OSGi enabled plat-forms. The photo application allows users tobrowse JPEG images on both local and remotenodes, and achieves this by using the Java remotemethod invocation (RMI) API to make callsbetween PhotoService instances on differentnodes.

The photo application is structured into twotypes of OSGi bundle: a PhotoUI bundle, whichoffers a browser based user interface and callsthe underlying PhotoService to access photoresources; and a Photo bundle, which imple-ments the PhotoService itself. Within the exam-ple home network, these bundles are deployedon nodes h2–h5 and can be accessed frombrowsers on any of nodes h1–h5. Figure 4 illus-trates a deployment model.

FAULT DIAGNOSIS APPROACHESIn general, the purpose of fault diagnosis withinsystems is to identify least replaceable units (thesmallest faulty components that can actually bereplaced). Once such a component has beenidentified there is no need to diagnose furtherinside the component. The level of granularity ofcomponents will depend on the system underconsideration. In the context of a connectedhome, we can regard network links, nodes,devices, and software as such components. Togenerate a diagnosis we will need to collectobservations of operational behavior that we willtreat as a set of symptoms (i.e., the input to ourdiagnostic process). The output is a diagnosisthat can be regarded as a logical propositionasserting that a particular combination of com-ponents is faulty. A diagnosis may be a complexproposition covering sets of alternate suspects.

Candidate fault diagnosis techniques includeexperience-based techniques using heuristicknowledge [9] and model-based techniquesinvolving models of expected system structureand behavior [10]. We have taken the model-based approach.

Different models can be used to address dif-ferent concerns. For example, models could rep-resent user behavior, application structure,application behavior, application deployment, ornetwork connectivity. This article shows howmodels of application structure, deployment, andnetwork connectivity can help us resolve prob-lems.

Each of the bundles for our Photo applicationexposes a number of external interfaces (ways inwhich the functionality they contain may beinvoked) and embodies a number of dependen-cies (other system entities on which the coderelies for successful operation). Figures 5 and 6illustrate dependency models for the two bundles.White nodes represent external (callable) entrypoints to the bundle, tan nodes represent codewithin bundles, and orange nodes represent mis-cellaneous other entities that may range fromcomplex subsystems (e.g., RMI) to simple appli-cation configuration settings (e.g., PhotoRoot,which represents the fact that a property identify-ing the root directory for photos needs to be setfor PhotoService to operate successfully).

Dependency models are a common techniquewithin fault diagnosis and involve the applicationof a simple principle: the fault free-operation ofan entity is reliant on the fault-free operation ofall dependent entities. Therefore, the successfuloperation of an entity can be used to exonerateall its dependents, whereas the detection of afailure in an entity will implicate that entity andall its dependents until they can be cleared.

nnnn Figure 4. The deployment model for a photo application with an examplehome network.

PhotoApplication

PhotoServiceImpl PhotoUI

PhotoClient

Browser

h1 h2 h3 h4 h5

deployed on

deployed on deployed on

consists of

is a

nnnn Figure 5. The dependency model for the PhotoService implementation bundle.

PhotoRMIService

RMIPhotoRoot

PhotoService

PhotoServiceImpl

Framework HttpService

Page 4: A fault diagnosis system for the connected home ...jakab/edu/litr/TMN/old/Fault_Diagn... · diagnosing problems and localizing them to an individual component will be a key requirement

IEEE Communications Magazine • November 2004 131

These principles hold for analysis of bothapplications and networks, and provide us with abasis for diagnosing faults across a wide range ofsystems. However, we do need a means of test-ing our diagnoses. At a network level we canincorporate and automate the long establishedpractice of pinging another network node tocheck connectivity. This normally involves theuse of Internet Control Message Protocol(ICMP) messages at a network level. At anapplication level we can incorporate test proce-dures for entities defined in a dependencymodel. This article will explain how these fea-tures are incorporated within an agent-basedfault diagnosis system (FDS).

AGENT-BASED SOLUTIONSArguably the leading innovation in the field ofartificial intelligence during the last decade hasbeen the emergence of so-called intelligent agentsoperating in the context of distributed or multi-agent systems. Developments in agent technologyhave benefited immensely from the increasingavailability of low-cost distributed processingpower and the ubiquity of TCP/IP networks. Thetechnology has also received an impetus fromthe emergence of Java as a platform-indepen-dent programming environment and, morerecently, from the emergence of the Foundationof Intelligent Physical Agents (FIPA) [11] as astandards body encouraging interoperationbetween competing agent platforms.

The key characteristic of a multi-agent-basedsolution lies in the idea that the whole is morethan the sum of its parts. Individual agents neednot be very complex, but the emergent behaviorarising from their interactions should be able tosolve problems that are impractical or impossiblefor an isolated agent to tackle.

There are a several reasons why agents aresuited to fault diagnosis within home networks:• This task is naturally distributed. Agents can

operate within the distributed nodes of thehome network, execute local tests on com-ponents, and cooperate with other agents toidentify problems. Decentralizing process-ing also removes potential single points offailure, and the processing load on a remotemanager can certainly be reduced if we pro-cess diagnostics locally.

• Different agents can implement differentanalysis techniques (e.g., heuristic or model-based), and an open agent architectureallows the straightforward introduction ofalternate analyzers within the FDS.

• The goal of the system being highlyautonomous aligns particularly well withagent philosophy. We can visualize such asystem as embedding various rule-basedand model-based analyzers within the homenetwork, exploiting a range of knowledgesources and taking the initiative to run avariety of system tests in order to provideproactive assistance to human users. Fur-thermore, in the event of network disrup-tion, agents operating in different partitionscan still reason about their view of theproblem and maintain some level of serviceto users.

Of course, an agent approach is not withoutits drawbacks. For example, the computingresources at some nodes may be limited. Howev-er, by separating essential “per node” functional-ity from more computationally demandingprocesses, it may be possible to accommodate abalance between “lightweight” and “heavy-weight” agents.

THE FDS PROTOTYPEThe FDS prototype is designed to take advan-tage of three pre-existing software packages: anOSGi framework developed by Gatespace [12],the FIPA-OS Agent Framework [13], and JESS,the Java Expert System Shell [14].

FIPA-OS provides an open source Java-basedimplementation of a FIPA compliant agent plat-form. As such it provides facilities for messagepassing between distributed agents and directorylookup of the identities of registered agents.

JESS provides a rule-based programming sys-tem with good performance characteristics thatmay be readily integrated with other Java-basedsystems. JESS is used to implement various ana-lyzer agents within FDS, including both applica-tion- and network-level analyzers.

In addition to JESS- based analyzers, a pureJava alternate analyzer agent has also beenimplemented. This embodies a graph-based algo-rithm to analyze network connectivity. FDS alsoincludes Modeller, Reporter, Tester and Collab-orator agents. The Modeller maintains dynamicmodels of the system (e.g., of network topology).Static models (e.g., of application structure) aregenerally packaged within application bundles.

The FIPA-OS, JESS, and FDS software hasbeen packaged into a set of OSGi bundles thatallow OSGi mechanisms to be used to deployFDS code by remotely and dynamically updatingthe FDS software code for each distributednode. To achieve this, compiled FDS code ismade available on a predefined code serverwithin the network.

GENERATING DIAGNOSESDiagnosis results consist of alternate sets of sus-pects (i.e., possibly faulty components) and a setof cleared (i.e., fault-free) components. Thesecan be represented as AND/OR tree structures

nnnn Figure 6. The dependency model for the PhotoUI bundle.

DisplayRootServlet DisplayPhotoServlet

Framework HttpService

PhotoService PhotoUI IndexPage

Page 5: A fault diagnosis system for the connected home ...jakab/edu/litr/TMN/old/Fault_Diagn... · diagnosing problems and localizing them to an individual component will be a key requirement

IEEE Communications Magazine • November 2004132

or logical propositions. Within FDS, diagnosesare generated from symptoms by analyzer agents.Symptoms may be collected proactively by peri-odically “routining” components within the net-work (i.e., interrogating devices and invokingtest procedures) or by reacting to notifications ofabnormal behavior from instrumented applica-tions or external agencies. The FDS prototypeexposes an interface, FDSService, as an OSGiservice that instrumented applications such asthe Photo application can use to pass informa-tion into FDS when they encounter a problem.In a Java application, this would typically followan exception being thrown. Although instrumen-tation may be regarded as intrusive, without theactive participation of applications, diagnosticability will be limited. In such a case, periodicautomated checks of network connectivity arestill possible, and FDS also provides a Useragent to allow manual entry of observations.However, this still requires application modelsor heuristic knowledge to drive the diagnosis,and manual symptom entry is intrusive for theuser. Essentially more informed diagnosis is pos-

sible with the cooperation of applications and, ofcourse, the availability of more information.

Let us consider some fault scenarios for ourexample network. Imagine that an instance ofPhotoService running on h2 encounters a prob-lem getting a response from the instance on h5.This may be because the h5 node is unreachable,perhaps the node itself is down, or network linkn3, n4, or n7 has become disconnected, or switchs1 or s2 is faulty. Alternately, perhaps the Photo-Service on h5 has experienced a runtime failure.

The application itself may attempt to presentadvice to the user of the problem experienced,although this may not prove to be helpful, evento an experienced user. Figure 7 shows examplesproduced by the PhotoUI bundle when encoun-tering different exceptions. The second exampleat least indicates there may be a network-levelproblem, but the first is less than helpful (it wasactually triggered by deliberately seeding anArrayOutOfBounds fault in PhotoServiceImpl).

As mentioned, FDS employs different analyz-ers for different models, but the results of thesedifferent analysis viewpoints can be integrated

nnnn Figure 7. Example application error displays.

Page 6: A fault diagnosis system for the connected home ...jakab/edu/litr/TMN/old/Fault_Diagn... · diagnosing problems and localizing them to an individual component will be a key requirement

IEEE Communications Magazine • November 2004 133

and — a key point — faults at a lower level canbe used to subsume faults at higher levels. Thismeans that the application analyzer will look nofurther if a network level analyzer has alreadyidentified a problem that has generated suspectsthat match any of its own suspect set and soexplain the symptoms being experienced.

The model used by the application analyzer isgenerated by combining the bundle dependencymodels and deployment models. In our examplescenario, PhotoUI running on h2 notifies itslocal FDS agent of a problem with PhotoServiceon h2 and in turn the PhotoService on h2 reportsa problem with PhotoRMIService on h5. Thistriggers invocation of the application analyzer,which in the absence of further symptoms con-cludes there is a problem with one of the com-ponents associated with the Photo application,with either one of its dependent entities or oneof the platforms on which they are deployed.This results in an initial 54 suspects: 49 softwarecomponents (the 11 distinct entities shown inFigs. 5 and 6 deployed on each of platformsh2–h5 plus the five instances of browser fromFig. 4) and platforms h1–h5.

The application analyzer knows nothing ofnetwork connectivity or network-level problemsbut is able to consult with a network-level ana-lyzer for this viewpoint. The network connectivi-ty model in conjunction with reachabilitysymptoms enables a network analyzer to identifypossible faulty intermediate nodes and subnets,not just faulty destination nodes. The moresymptoms input to an analysis, the more tightlyfocused the results.

Let us assume that node h5 is powered down.With the benefit of unreachable symptoms forh5 and reachable symptoms for all other nodes,the network-level analyzer would be able to con-clude that either component h5 or n7 is faulty.Either possible fault would explain why h5 isunreachable but all other nodes are reachable.

The network analyzer contributes this conclu-sion into the working hypothesis, and the applica-tion analyzer is satisfied that its suspects aresubsumed by the network-level suspects. Thepresence of the network fault is sufficient toexplain the symptoms. However, in this scenariowe need additional information to be able toarbitrate between a fault in the IP network at n7or a fault in h5 itself. For these situations, FDSimplements a low-bit-rate secondary communica-tions channel using the powerline technologyX10. As a final test it sends a message to h5 overthe mains cabling. X10 is a broadcast medium,and all powered up hosts will receive the messagevia their serial port. However, only h5, assumingit is up, would reply. The message times out, andFDS concludes that h5 is indeed faulty.

Alternately, if our network is fully opera-tional, FDS will start to test suspect softwarecomponents using Tester agents that invoke testprocedures devised specifically for the compo-nents concerned. Test procedures are suppliedin Java classes named after specific entities in adependency model (e.g., Test_PhotoRoot is theclass containing the test procedure for entityPhotoRoot) and deployed via OSGi bundles.

The test procedure embodies knowledge ofhow to trigger execution of the particular entity

and what its expected results should be. Theprocedure therefore returns a value indicatingwhether the test has been passed or failed. Thisenshrines the principle of information hidingand decouples applications from the diagnosticsystem: dependency models are built into bun-dles as resource files and registered with FDSon bundle activation; test procedures are pro-vided in separate bundles that can be dynami-cally updated. Therefore, FDS does not need toknow anything about the internals of the entitiesunder test.

It is computationally expensive to test everysuspect, and the order in which tests areapplied can have a significant impact on thespeed of problem resolution. Remember that asuccessful test that demonstrates full function-ality of a target entity can be used to exonerateits dependents, so top-down testing of thosecomponents more likely to be okay (as assessedusing heuristic knowledge) can be effective inrapidly pruning the search space. Bottom-uptesting of suspects with known fault history canhelp identify the culprit. Early targeting of sus-pects on the primary problem node (h5) is alsoa valid strategy.

However, the diagnostic procedure is only asgood as the models and test procedures availableto it. Identifying the ArrayOutOfBounds faultwould be dependent on the effectiveness of theTest_PhotoServiceImpl procedure, and it is notpractical to guarantee complete test coverage forevery eventuality. Operational fault diagnosisshould not be considered a substitute for thor-ough development testing. Configuration prob-lems such as those detected by Test_PhotoRootare more typical of the class of problem this typeof approach can economically solve.

However, during this test phase, FDS wouldproceed to test selected suspects, accumulateadditional symptoms, and rediagnose until a sin-gle suspect set remains (i.e., the diagnosis is thatthis combination of components is faulty) or testoptions are exhausted.

Diagnosis, as we have considered it thus far,has been driven from a single viewpoint withinthe network, typically the location where anabnormal notification has originated. However,multiple problems may well be notified at differ-ent locations in a distributed system, both in thesame timeframe and at different times. Thesemay relate to the same causal fault or differentones. Rather than pursuing each problem inde-pendently, agents within a multi-agent systemcan interact with each other in various ways toformulate a collective solution. We have experi-mented with two distinct methods we call:• Holistic diagnosis, to signify analysis using

all relevant observations (i.e., those ema-nating from multiple locations and view-points) in one operation

• Collaborative diagnosis, which involvesagents comparing the results of local diag-noses and agreeing on a shared resultHopefully, with the editors’ permission, a

future article will be able to explore these differ-ent interaction methods and highlight their prosand cons. Certainly we have found different situ-ations where each method performs better thanthe other.

The network

connectivity model

in conjunction with

reachability

symptoms enables

a network analyzer

to identify possible

faulty intermediate

nodes and sub-nets,

not just faulty

destination nodes.

Page 7: A fault diagnosis system for the connected home ...jakab/edu/litr/TMN/old/Fault_Diagn... · diagnosing problems and localizing them to an individual component will be a key requirement

IEEE Communications Magazine • November 2004134

LOOKING TO THE FUTURE

The ideas presented here could be extended tomore detailed models for addressing applica-tion functionality and also to models coveringthe internal behavior of devices. For example,application models derived from UML-stylesequence diagrams (message sequence charts)could be used to expose expected applicationbehavior to enable more informed model-based reasoning about fault conditions andhelp target suspect software components. Fur-thermore, different problems, such as stream-ing related issues for multimedia applications,will demand different types of models and/orheuristic knowledge and appropriate support-ing analyzers.

Of course, individual applications couldembed more intelligence to resolve operationalproblems for themselves, but there is a clearbenefit in using a shared diagnostic infra-structure to avoid the need for repeated imple-mentation of common functionality. In addition,components cannot be expected to reason aboutthe effect of interactions between entities aboutwhich they know nothing; hence, the need for anindependent diagnostic capability that is awareof the applications deployed on the network andthe network topology.

To maximize the utility of a shared diagnosticinfrastructure, applications and devices wouldneed to be able to register models of theirexpected behavior, thus enabling the diagnosticmechanisms to be applied as the home networkis extended. This implies the need for standardsfor the exchange of data between diagnosticclients and the infrastructure.

Universal Plug-and-Play (UPnP), mentionedearlier as a complementary technology to OSGi,may provide a more elegant solution for theinterfacing of client applications and devices tothe diagnostic infrastructure than the currentmechanisms employed within the FDS proto-type. UPnP employs wireline protocols, whichmeans it is platform-neutral, and uses XML fordata exchange. In the context of FDS with stan-dardized analyzers, the data to be exchangedwould be problem notifications and models. Foran open FDS where individual analyzers andother agents could be replaced, this would alsomean symptoms and diagnoses. For effectivecommunication, ontology and taxonomy agree-ments are needed between participants. This isan area where HomeGate could make valuablecontributions. Its purpose is to define the meansfor disparate systems in the home to interoper-ate. One aspect of this would be a common lan-

guage for communication; indeed, part 2 of thestandard is intended to address taxonomy issues.

CONCLUSIONSThis article has argued that fault diagnosis willbe important for home networks of the future. Ithas outlined how agent-based diagnostic solu-tions might be applied in this domain by meansof a worked example. It closes by looking to thefuture and the need for standards to facilitatefault diagnosis within a connected home embody-ing components from multiple vendors.

If shared diagnostic infrastructure becomes areality and standardization bears fruit, the bene-fits will be considerable: consumers will enjoy animproved user experience, vendors will find theirproducts more highly regarded by users, the vol-ume of calls to help desks will be reduced, andresidual calls should be more quickly resolved asrelevant diagnostics would be readily at hand.

REFERENCES[1] “Come Over To My House,” Guardian Online, 23 Jan.,

2003; http://www.guardian.co.uk/online/[2] OSGi, http://www.osgi.org[3] OSGi Service Platform, rel. 3, Mar. 2003, IOS Press.[4] Jini, http://www.sun.com/jini[5] UPnP, http://www.upnp.org[6] Bluetooth, http://www.bluetooth.com[7] HAVi, http://www.havi.org[8] Home Electronic System Standards, ISO/IEC 15045-1, com-

mittee ISO/IEC JTC1 SC25 WG1; http://hes-standards.org[9] C. Price, Computer-Based Diagnostic Systems, Springer

Verlag (Practitioner Series), 1999.[10] W. Hamscher, L. Console, and J. de Kleer, Readings in

Model Based Diagnosis, Morgan Kaufmann, 1992.[11] FIPA, http://www.fipa.org[12] Gatespace’s Service Gateway Application Development

Kit, http://www.gatespace.com/[13] FIPA-OS, http://fipa-os.sourceforge.net; http://www.

emorphia.com[14] Sandia Nat’l. Labs, JESS, http://herzberg.ca.sandia.gov/jess/

BIOGRAPHIESPETER UTTON ([email protected]) holds degrees inmaterials science, computer science, and telecommunica-tions. He has spent a substantial part of his career workingat the main BT research laboratories at Adastral Park, Unit-ed Kingdom. He currently lectures on a range of softwarerelated subjects for the Electronic Engineering Departmentat Queen Mary, University of London. The connected homeis his personal research interest.

ERIC SCHARF ([email protected]) is a lecturer inthe Department of Electronic Engineering at Queen Mary,University of London. Since 1989 he has participated inmany European Union funded projects on intelligent net-works and risk assessment. In particular, he coordinatedthe recent successful EU-funded TORRENT project on intelli-gent residential access networks. His interests are in com-munication networks and computing, and he has writtenand co-authored several papers in these areas.

To maximize the

utility of a shared

diagnostic

infrastructure,

applications and

devices would need

to be able to register

models of their

expected behavior,

thus enabling

the diagnostic

mechanisms to be

applied as the

home network

is extended.