triggers: a dynamic approach to testing multiuser software · triggers: a dynamic approach to...

10
Triggers: A Dynamic Approach to Testing Multiuser Software Robert F. Dugan Jr., Ephraim P. Glinert Department of Computer Science Rensselaer Polytechnic Institute Troy, New York 12180, U.S.A. dugan, glinert @cs.rpi.edu ABSTRACT In this paper we describe triggers, a novel technique for test- ing multiuser applications. Triggers provide a framework in which selected application events induce the execution of virtual user test scripts. Unlike conventional execution based verification systems which require complete prescription of multiuser tests, our framework allows the tester to interact with virtual user scripts in situ. Additionally, the extensi- ble framework supports any application event or sequence of events and provides an editor to configure the specifics of a triggering event at runtime. Triggers can be used to investi- gate a variety of problems common to multiuser applications including: synchronization, performance, human-computer interaction, and human-human interaction. We evaluated our framework on a conventionally tested, mature multiuser ap- plication. Our evaluation found triggers particularly effective at uncovering race conditions and problems with response time under load. Keywords Multiuser Software, Computer Supported Cooperative Work, CSCW, Testing, Verification, Distributed Computing 1 INTRODUCTION A critical component is missing from multiuser Computer Supported Cooperative Work (CSCW) application develop- ment that is taken for granted in single user applications: support for live user testing. Anytime someone wants to test a single user application, they can pose as the user and run the application. It is difficult for a single person to perform a live user test when multiple users are required [4]. State of the art commercial and research testing systems do not provide adequate guidance or support for a single person to perform live multiuser verification. Existing methodologies take a broad based approach to the evaluation of a CSCW application. While acknowledging that technology plays a role in a CSCW system, these methods give few details on how its evaluation should proceed. Our research has focused on improvements to execution based testing of multiuser or CSCW software. We have de- veloped CAMELOT [4], a technology-focused methodology for testing CSCW software. Developers, user interface spe- cialists, performance engineers and quality assurance per- sonnel can use CAMELOT to evaluate software technology that comprises a CSCW application. CAMELOT provides an organized set of specific techniques that can be used for technological evaluation. The methodology breaks the test- ing process into two stages: single user and multi-user. In the single user stage, General Computing and Human Com- puter Interaction features are examined. During the multi- user stage, Distributed Computing and Human-Human In- teraction aspects are investigated. A unique code is associ- ated with each technique. The code provides a classification scheme for the tests used and problems uncovered during application evaluation. We believe CAMELOT’s techniques are inclusive of most of the technology tests an evaluator would want to perform on a CSCW application. We devised Rebecca [4], an architecture for an execution based test system, motivated by the desire to support live user participation in a CSCW test. The trigger subsystem de- scribed in this paper plays a key role in this area. Test triggers are similar in concept to database triggers. A database trigger executes a set of instructions whenever a predefined condi- tion is met. For example, a database trigger might recalcu- late the weekly paycheck amount whenever an employee’s yearly salary is changed. Rebecca’s trigger subsystem mon- itors each CSCW user. When a user performs a predefined action or sequence of actions, the trigger fires causing the execution of test scripts by one or more virtual users. For example, if a user types ”Hello” in a chat application, a test trigger might execute a script causing a different user to re- spond with ”Hi! How are you?” Triggers radically change the concept of a test session by giving virtual users the ability to react to live users. Instead of being a prescribed process, multiuser testing becomes a dynamic one. Triggers can be used to investigate a vari- ety of problems common to multiuser applications including: synchronization, performance, human-computer interaction, and human-human interaction. These problems are difficult to manually test with limited resources, and challenging or

Upload: trannhu

Post on 29-Jul-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Triggers: A Dynamic Approach to Testing Multiuser Software · Triggers: A Dynamic Approach to Testing Multiuser Software Robert F. Dugan Jr., Ephraim P. Glinert Department of Computer

Triggers: A Dynamic Approach to Testing Multiuser SoftwareRobert F. Dugan Jr., Ephraim P. Glinert

Department of Computer ScienceRensselaer Polytechnic InstituteTroy, New York 12180, U.S.A.�

dugan, glinert � @cs.rpi.edu

ABSTRACTIn this paper we describe triggers, a novel technique for test-ing multiuser applications. Triggers provide a frameworkin which selected application events induce the execution ofvirtual user test scripts. Unlike conventional execution basedverification systems which require complete prescription ofmultiuser tests, our framework allows the tester to interactwith virtual user scripts in situ. Additionally, the extensi-ble framework supports any application event or sequence ofevents and provides an editor to configure the specifics of atriggering event at runtime. Triggers can be used to investi-gate a variety of problems common to multiuser applicationsincluding: synchronization, performance, human-computerinteraction, and human-human interaction. We evaluated ourframework on a conventionally tested, mature multiuser ap-plication. Our evaluation found triggers particularly effectiveat uncovering race conditions and problems with responsetime under load.

KeywordsMultiuser Software, Computer Supported CooperativeWork, CSCW, Testing, Verification, Distributed Computing

1 INTRODUCTIONA critical component is missing from multiuser ComputerSupported Cooperative Work (CSCW) application develop-ment that is taken for granted in single user applications:support for live user testing. Anytime someone wants to testa single user application, they can pose as the user and runthe application. It is difficult for a single person to performa live user test when multiple users are required [4]. Stateof the art commercial and research testing systems do notprovide adequate guidance or support for a single person toperform live multiuser verification. Existing methodologiestake a broad based approach to the evaluation of a CSCWapplication. While acknowledging that technology plays arole in a CSCW system, these methods give few details onhow its evaluation should proceed.

Our research has focused on improvements to executionbased testing of multiuser or CSCW software. We have de-veloped CAMELOT [4], a technology-focused methodologyfor testing CSCW software. Developers, user interface spe-cialists, performance engineers and quality assurance per-sonnel can use CAMELOT to evaluate software technologythat comprises a CSCW application. CAMELOT providesan organized set of specific techniques that can be used fortechnological evaluation. The methodology breaks the test-ing process into two stages: single user and multi-user. Inthe single user stage, General Computing and Human Com-puter Interaction features are examined. During the multi-user stage, Distributed Computing and Human-Human In-teraction aspects are investigated. A unique code is associ-ated with each technique. The code provides a classificationscheme for the tests used and problems uncovered duringapplication evaluation. We believe CAMELOT’s techniquesare inclusive of most of the technology tests an evaluatorwould want to perform on a CSCW application.

We devised Rebecca [4], an architecture for an executionbased test system, motivated by the desire to support liveuser participation in a CSCW test. The trigger subsystem de-scribed in this paper plays a key role in this area. Test triggersare similar in concept to database triggers. A database triggerexecutes a set of instructions whenever a predefined condi-tion is met. For example, a database trigger might recalcu-late the weekly paycheck amount whenever an employee’syearly salary is changed. Rebecca’s trigger subsystem mon-itors each CSCW user. When a user performs a predefinedaction or sequence of actions, the trigger fires causing theexecution of test scripts by one or more virtual users. Forexample, if a user types ”Hello” in a chat application, a testtrigger might execute a script causing a different user to re-spond with ”Hi! How are you?”

Triggers radically change the concept of a test session bygiving virtual users the ability to react to live users. Insteadof being a prescribed process, multiuser testing becomes adynamic one. Triggers can be used to investigate a vari-ety of problems common to multiuser applications including:synchronization, performance, human-computer interaction,and human-human interaction. These problems are difficultto manually test with limited resources, and challenging or

Page 2: Triggers: A Dynamic Approach to Testing Multiuser Software · Triggers: A Dynamic Approach to Testing Multiuser Software Robert F. Dugan Jr., Ephraim P. Glinert Department of Computer

impossible to test using existing execution based testing sys-tems.

To determine the efficacy of our work, CAMELOT and aJava based implementation of Rebecca were used to evalu-ate a mature CSCW application. The evaluation uncoveredover twenty bugs in the application and provided valuablefeedback for future research.

2 RELATED WORKExecution-based testing focuses on software evaluation inlater stages of the software life cycle. Prior work onexecution-based testing systems for applications with astrong user interface component can be grouped into threemain categories: commercial, academic, and Computer Sup-ported Cooperative Work.

Commercial systems like Platinum Technology’s FinalExam C/S-Test [12], Mercury Interactive’s Test Suite [9],Rational Software’s SQA Suite [14], and JavaStar [19] pro-vide a complete software test environment that includes anarchitecture for: test execution, test development, test failureanalysis, test measurement, test management, and test plan-ning [15]. Recording user activity in a script creates a test.The recording can be played back as a ”virtual user” to exe-cute the test. Multiuser testing is supported by the ability toexecute several single user scripts in parallel. Virtual usersare coordinated using a combination of script synchroniza-tion primitives and script execution control provided by thetest system’s session manager.

Academic research into execution based testing of GUI soft-ware has focused on specific problems like script reusability[7, 11], test case automation [1, ?], script analysis [10], vi-sual program testing [18], and multi-modal record/playback[6, 3]. MITRE’s Multi-Modal Logger (MML) improves mul-tiuser testing by allowing the actions of multiple users to berecorded and played back in a single script [3]. The record-ing automatically imbeds multiuser coordination in the testscript.

With the exception of MML, CSCW testing has focused pri-marily on evaluation methodologies rather than the devel-opment of testing architectures. The methodologies takea broad based view of CSCW software. Early work inCSCW evaluation had a strong psychological component.Researchers focused on the group dynamics generated bythe introduction of collaborative software into an organiza-tion [13]. CSCW specific methodologies include PETRA[17], SESL [13], and MITRE’s Evaluation Working GroupMethodology [3, 2]. These techniques integrate both the psy-chological and technological aspects of a CSCW system intothe evaluation.

3 ARCHITECTURERebecca is an execution based test system architecture mo-tivated by our desire to support interaction between live andvirtual users during application tests. Our discussion of trig-

Figure 1: Rebecca’s General Architecture

gers begins with an overview of Rebecca’s architecture de-picted in Figure 1.

General ArchitectureRebecca’s centralized server coordinates one or more dis-tributed testing agents. The server provides a user inter-face that allows the tester manipulate agents, test scripts andtriggers. The server is also responsible for synchronizationmanagement that allows coordinated execution of test scriptsacross agents.

A Rebecca agent is activated for each CSCW user at startup.This is accomplished by adding a small amount of instru-mentation to the CSCW application being tested (e.g. twolines of code in the Java-based implementation, Rebecca-J).The agent resides on the CSCW user’s local machine. Theagent is responsible for responding to server commands, andmonitoring local application state.

Rebecca’s server commands to the agent include requests tocreate, edit, save, load, execute, and synchronize test scripts.For remote manipulation of an agent’s test script, the serverdisplays a VCR-like control panel and editing window. Allediting and control commands are forwarded to the agent thatowns the test script. Agent feedback about the state of thetest script is sent to the server and displayed in the controlpanel.

The agent is also responsible for monitoring application statein the form of components and component activity. Rebeccacreates virtual user behavior by feeding test script actionsinto application components. Test script actions are createdby recording component activity.

Rebecca provides a flexible, object-oriented mechanism fordefining any portion of the CSCW application as a compo-nent and any activity in a component as an event. By default,where the implementation language permits, all UI compo-nents are automatically defined. Rebecca-J, for example, au-

2

Page 3: Triggers: A Dynamic Approach to Testing Multiuser Software · Triggers: A Dynamic Approach to Testing Multiuser Software Robert F. Dugan Jr., Ephraim P. Glinert Department of Computer

tomatically monitors all AWT and Swing components andactivity that occurs within those components.

Rebecca allows any application activity to be recorded andreplayed. A filtering system allows the tester to record spe-cific application events. Integration with existing softwaredevelopment environment is encouraged using the applica-tion’s programming language as the test script language. Testscripts are reusable as the software evolves through runtimeresolution of components and component-centric events. AVCR-like user interface simplifies common record/playbacktasks. In addition to triggers, improvements to multiuser testscript synchronization include an orchestration metaphor,simplified synchronization mechanisms, deadlock detection,and deadlock recovery.

Trigger ArchitectureRebecca’s trigger subsystem executes test scripts for one ormore CSCW users in response to component activity. Tosupport trigger management, Rebecca’s server provides auser interface to create, edit, load, save, and activate triggers.

Trigger creation involves interaction between the server andone or more agents. Using the server’s user interface, thetester selects a triggering agent, application component andcomponent event that will fire the trigger. The tester thenselects a test script and executing agent in response to thetrigger. The triggering agent and executing agent can be at-tached to different CSCW users.

Figure 2 shows the trigger management architecture diagramfor Rebecca. A detailed discussion of the architecture is ori-ented around trigger creation and activation. For examplesof how to use triggers see Section 4.

In step one, the triggering agent is selected from a list ofregistered agents. One agent is added to this list for eachCSCW application user. As users enter and exit the applica-tion, this list is updated automatically. The selection createsa new trigger listener within the agent. The server is given aremote handle to communicate with the trigger listener.

In step two, the tester selects an application component tolisten to. The server uses an MVC proxy design pattern topresent a remote component browser [5]. Using the browser,the user selects the agent component that will generate thetriggering event. If the agent component model changeswhile browsing, the server will be notified via the proxymodel. Because the agent and server execute in different pro-cess spaces, the server retrieves the persistent store id of theselected component, rather than its runtime id. The serverthen passes the component id to the trigger listener using thelistener’s remote handle.

In step three, a filter for the triggering event is selected. Thefilter is called a threshold model. The threshold model al-lows the tester to specify precisely the characteristics of atriggering event. For example, the triggering event might bemouse activity inside a GUI push button. A threshold model

for mouse activity might be used to select a specific type ofmouse activity, such as double clicking the mouse button.

Threshold models register themselves with the server at ini-tialization time. If necessary, the tester can add customthreshold models to the server. Custom threshold modelsmust also register with the server at initialization time. Boththe server and the agent are informed of the model selection.On the agent side, default parameters for the model are set.These default values are used to filter component events ifthe tester chooses not to edit the threshold model. Addition-ally, the threshold model registers interest with the compo-nent that will fire the trigger. On the server side, an editor forthe threshold model is configured.

In step four the tester edits the threshold model using themodel’s editor. The editor allows the tester to customize thethreshold for a specific event. Editors are model specific.The editor can be as simple as a set of editable fields describ-ing an event or as sophisticated as a graphical subsystem thatspecifies the region of the application user interface whereevent must occur. The result of the editing process is a set ofparameters that determine how the threshold will filter com-ponent events. These parameters are sent from the server tothe threshold model residing in the agent.

In step five, the script executing agent is selected from a listof registered agents. The tester’s selection creates a newrecording player within the executing agent. The triggeragent is given a remote handle to communicate with the ex-ecuting agent.

In step six, the trigger agent’s CSCW user begins using theapplication. The user can be either a live or a virtual userexecuting a test script. The trigger agent threshold modelfilters events generated by the tester selected component. Ifan event meets the threshold condition, the trigger fires.

Returning to the mouse event example, when the cursor isinside the push button region, all mouse events are sent to thethreshold model. The model ignores the events until the userpresses the mouse button generating a double click event.Once received, the mouse press event causes the trigger tofire.

In step seven, the trigger is finally fired resulting in two ac-tions. First, the trigger agent activates the execution agent’srecording player. This will cause the player to replay itsrecording. Second, the server side trigger counter is incre-mented. The change is displayed in a counter field associatedwith the trigger on the server. A check is performed to see ifthe firing count is exceeded. If exceeded, the trigger agent isdisabled so that no more firings can occur.

Threshold ModelsSelecting a threshold model is an important part of the triggerprocess. Events generated by the selected component arepassed to the threshold model for filtering. If the event orsequence of events passes the threshold filter, then the trigger

3

Page 4: Triggers: A Dynamic Approach to Testing Multiuser Software · Triggers: A Dynamic Approach to Testing Multiuser Software Robert F. Dugan Jr., Ephraim P. Glinert Department of Computer

Figure 2: Rebecca’s Trigger Architecture

fires. The exact nature of the filter depends on the thresholdmodel selected. It can be as simple as ensuring that the eventis of a specific type like a mouse event in a push button, or ascomplicated as a state machine that reaches its end state withthe correct sequence of events like keyboard strokes in a textarea producing the word ”Hello”. Rebecca-J implements anumber of threshold models (see Table 1).

Threshold Model DescriptionkeyPressed KEY PRESSED eventkeyReleased KEY RELEASED eventkeyTyped KEY TYPED eventmouseClicked MOUSE CLICKED eventmouseEntered MOUSE ENTERED eventmouseExited MOUSE EXITED eventmousePressed MOUSE PRESSED eventmouseReleased MOUSE RELEASED eventmouseMoved MOUSE MOVED eventmouseDragged MOUSE DRAGGED eventpropertyChange state change component generates a

PROPERTY CHANGE eventkeySequence sequence of KEY PRESSED eventsmouseRegion mouse event in designated GUI location

Table 1: Threshold Models implemented in Rebecca-J

Depending on the threshold model, an editor may be avail-able for runtime customization. The editor may be as simpleas a set of form fields specifying characteristics of the eventor as sophisticated as a graphical editor indicating the area ofa GUI the event must occur.

mousePressed Threshold ModelRebecca-J’s mousePressed model typifies a simple eventtype threshold model. The threshold model examines allmouse events generated by the selected component. If themouse event is of type MOUSE PRESSED, then the triggerfires. Because of the simplicity of this threshold model, noeditor is necessary.

mouseRegion Threshold ModelRebecca-J’s mouseRegion model is an example of a so-phisticated threshold model. An editor, shown in Figure 3provides runtime properties of the mouse event that must besatisfied for the trigger to fire.

After the tester has used the remote component browser toselect a GUI component, the mouseRegion threshold edi-tor is activated. The editor queries the remote component forits dimensions and displays a facsimile in a graphical editingwindow. The tester selects one or more geometric regionsand draws them on top of the facsimile. Finally, a specifictype of mouse event is selected from a pull-down list. Onlyevents that occur in the tester drawn regions with the speci-fied type will fire the trigger.

Figure 3 gives an example of the mouseRegion editor inuse with an application. The JButton1 button compo-nent is selected using the remote component browser. ThemouseRegion editor gets the dimensions of the button re-motely and displays them in the graphical editing window.The tester presses the Zoom In button several times to en-large the push button facsimile. The rectangle geometricregion is selected. The tester draws a rectangle in the up-per right quarter of the facsimile. The tester then selects the

4

Page 5: Triggers: A Dynamic Approach to Testing Multiuser Software · Triggers: A Dynamic Approach to Testing Multiuser Software Robert F. Dugan Jr., Ephraim P. Glinert Department of Computer

Figure 3: mouseRegion threshold model editor

mousePressed event from the pull-down list and pressesthe OK button to end the editing session. The thresholdmodel is now configured to fire whenever the CSCW userpresses the mouse button in the upper right corner of theJButton1 push button.

Threshold Model CustomizationLike most of Rebecca’s subsystems, the threshold modelsubsystem is extensible. A tester can create new thresholdmodels that weren’t anticipated at the time Rebecca was im-plemented. Threshold model customization requires threesteps: model registration, comparator definition, and editordefinition. The model must be registered with the system sothe user can select it from the server’s trigger panel user in-terface. The comparator determines if an event received bythe trigger listener from the selected component is equivalentto an event stored in the threshold model. The editor allowsruntime configuration of the threshold model.

Rebecca uses the MVC design pattern for threshold modeleditors. The model is the threshold model itself. Thethreshold view displays representation of the model and themodel’s configuration parameters. The threshold controllerconverts user actions directed at the view into commands thatmanipulate the model.

Event Sequence TriggersIn addition to firing a trigger in response to a single event,Rebecca provides the ability to fire after a sequence ofevents. For example, instead of firing whenever a keyboardevent is generated, Rebecca can fire when a specific sequenceof keyboard characters is typed (e.g. ”Hello”). A runtimeconfigurable event state machine to supports this capability.Events generated from the selected component are ignoreduntil one equivalent to the first input condition event in thestate machine is generated. The machine progresses throughstates until it reaches a final state or until an event occursthat does not satisfy a move to the next state. When the ma-chine reaches its final state, the trigger is fired and it returnsto the start state. Rebecca-J implements this state machine in

the abstract class ThresholdModelmethod isThresh-oldMet().

One Agent, Multiple TriggersIn some situations, a single threshold model is not suffi-cient to express the conditions under which a trigger shouldfire. For example, consider a trigger for a Property-ChangeInt event that fires if the new state is less than tenor greater than twenty. The PropertyChangeInt thresh-old editor allows only one boolean condition to be specifiedon the event. Rebecca allows the user to create multiple trig-gers on the same component and event in the same agent.This allows one trigger to be created for each boolean condi-tion. In the state trigger example, one trigger would be cre-ated for values less than ten and a second for values greaterthan twenty. Both triggers would exist in the same agent, lis-ten for the same events from the same state component, andreplay the same recording.

Multiple triggers allow the user to OR a set of thresholdmodel tests on an event, but what about other boolean op-erations? Rebecca does not provide for any other booleanoperation. In order to provide complete a complete setof boolean operations, a boolean algebra and would needto be adopted. The operators in this algebra would con-sist of AND, OR, and NOT. The variables in the algebrawould be a list of trigger names. Triggers specified in aboolean expression would never fire. A boolean expres-sion would be assigned its own trigger. If the boolean ex-pression was satisfied, then its trigger would fire. Con-sider an example where two state triggers, Trigger-GreaterThan10 and TriggerLessThan20 are set onthe same PropertyChangeInt component. Trig-gerGreaterThan10 fires when the new integer value isgreater than ten. TriggerLessThan20 fires when thenew value is less than twenty. The syntax for a booleanexpression trigger that fired for integer value changesbetween ten and twenty would look like: (Trigger-GreaterThanl0 AND TriggerLessThan20).

TimersThere are some situations where a virtual user script mayneed to be activated by a timer, rather than application activ-ity. For example, the tester may want virtual user to manip-ulate a widget or type some characters every few seconds.Periodic load like this is useful for observing performancecharacteristics of the application under test. Rebecca’s trig-ger architecture includes provisions for such time-based trig-gers.

Unlike other triggers, timers are managed entirely by theserver so it is not necessary to select a remote agent com-ponent and threshold model. The tester configures the timerby double clicking on the timer widget that appears with ev-ery trigger panel in the server. A timer selection windowappears and is used to manage timers once they have beenconfigured. The tester can add a new timer, delete existing

5

Page 6: Triggers: A Dynamic Approach to Testing Multiuser Software · Triggers: A Dynamic Approach to Testing Multiuser Software Robert F. Dugan Jr., Ephraim P. Glinert Department of Computer

timers, edit an existing timer, or attach the selected timer tothe trigger panel.

Timer triggers are managed by the server, rather than by re-mote agents. This allows multiple recording players to beattached to the same timer. When the timer fires, all of therecording players are activated near-simultaneously. Simul-taneous replay of recordings creates realistic approximationsof application use. For example, periodic dialog between agroup of users in a chat application could be simulated.

True simultaneous replay of recordings, implemented withnetworking techniques such as broadcasting, makes for un-predictable testing. The tester can never be sure if the replaystarted at exactly the same moment on each machine. Soft-ware and hardware layers throughout the command’s pathfrom server to agent introduce delays that can keep it frombeing processed immediately. The cause of these delays canchange during the test session making the replay order dif-ferent each time the trigger fires. When a timer trigger firesRebecca iterates through an ordered list of players, activat-ing each in turn. The server provides the tester with a userinterface to configure the ordered list.

4 USAGE CASESTriggers are a powerful mechanism for testing distributedapplications. The principal advantage of triggers over tra-ditional test systems is that a live user can be incorporatedinto a test session. Virtual users react to events generated byother users, live or virtual. This reactive approach creates testsessions that are dynamic, rather than completely prescribed.The next several sections present examples of trigger use.

Usage Case 1: Synchronization ProblemsRace conditions are a common synchronization problem indistributed systems. A race condition occurs when sys-tem behavior is dependent on instruction ordering betweenthreads of execution. One way to test for the presence of arace condition is to create a situation using parallel threadsvarying the execution order.

Consider a simple distributed application with two buttons+/- and a counter display. Whenever a button is pressed,the counter is incremented/decremented by one and the newvalue is displayed in the text field of all distributed copies ofthe application. A potential race condition exists when userspress the count buttons near-simultaneously.

Triggers can help test for this race condition. A trigger couldbe set up between the live user and a virtual user. Wheneverthe live user presses the + button, the virtual user immedi-ately reacts by pressing the - button. If the count field reads0, then no matter how many times the live user presses the +button, it should always read 0.

The probe for the race condition can be coupled with a stresstest by creating a second trigger. Stress tests help uncoverapplication flaws that aren’t exposed under normal use. Thenew trigger presses the + button whenever the original virtual

user presses the - button. Now we have a situation wheretwo virtual users are reacting to each other. The maximumtrigger count field in the panel for the new trigger is set to1000. The test begins with the live user pressing the + but-ton. Assuming the count field begins with 0, at the end ofthe test it should read 1. In addition to using duration tostress test the system, stress can be increased by setting theplayback delay on the trigger recordings to NO DELAY.

Testing for race conditions is possible with a traditional test-ing system. One advantage of prescribed testing over triggersis the ability to test with scripts executing simultaneously onseparate machines. In contrast to the simplicity of triggers,however, some effort is required to produce the test case. Forthe test described in this section, the tester would be forcedto create two scripts, one for each virtual user. Some formof synchronization primitive would have to be added so thatboth scripts began execution at the same time. A loopingconstruct would be necessary so that the button presses couldbe repeated. Finally, the modified scripts would have to besaved, compiled, and loaded into the test system.

Usage Case 2: Performance ProblemsAnother vexing difficulty when developing synchronousmultiuser applications is investigating performance issueslike the responsiveness of the system under load. Responsetime is particularly important for user interface portions ofthe application. User anxiety rises when interactive compo-nents, such as a slider bar or pull down list, do not respondwithin milliseconds [19].

Consider another simple distributed application with a pushbutton and slider bar. The tester wants to investigate the per-formance effect that pushing the button in one copy of theapplication has on the moving the slider bar. Slider bars areuseful way to get a ”feel” for the responsiveness of the appli-cation. If the cursor doesn’t track well with a slider bar, thenthe user will notice it immediately.

The tester creates a trigger that activates a virtual user thatpresses the push button. The trigger fires whenever the liveuser moves the slider bar. During testing the slider bar doesmove sluggishly indicating a performance problem. Curi-ous about whether part of the problem is due to a backlogof queued push button events, the tester resets the trigger’sthreshold model to use the mouseRegion model. The lefthalf of the live user’s slider bar is marked as the triggeringregion for the virtual user. Now the tester can compare howthe slider bar behaves when inside and outside the triggeringregion. If the slider bar is sluggish inside the region, but im-mediately responsive outside it, then the user knows that anevent backlog is not the problem.

Compared to triggers, traditional testing systems only havethe ability to place a gross load on the system. This is ac-complished by running one or more iterating scripts againstthe application. A live user can participate in such a test ses-sion, but only in an uncoordinated fashion with the scripts.

6

Page 7: Triggers: A Dynamic Approach to Testing Multiuser Software · Triggers: A Dynamic Approach to Testing Multiuser Software Robert F. Dugan Jr., Ephraim P. Glinert Department of Computer

Figure 4: A shared drawing/chat application.

Using the example from this section, the tester executes ascript that repeatedly presses the push button in one copy ofthe application. A live user interacts with the slider bar inanother copy of the application while this script executes.

Since live user-based script control is unavailable with tradi-tional testing, all interactions occur while the script executes.Script execution can be controlled from the test system, but itwould be difficult or impossible to implement the test casesin this section. When the cursor nears the slider bar, inter-action would stop while the tester configured and started thepush button script. Interaction would then continue and asthe slider bar’s behavior was observed under load. When thecursor was about to leave the slider bar, interaction wouldstop while the tester terminated the push button script. Thetest cases would be impossible if the application and test sys-tem were on the same machine because the cursor wouldhave to leave and re-enter the application to control the testscript. The mouseRegion trigger test case would be impos-sible to imitate because by the time the tester had shut offthe test script and returned to the application, the backlog ofqueued events would have been processed.

Usage Case 3: Simulating User BehaviorIn some situations, the tester may desire one or more virtualusers to simulate ”normal” user behavior. This allows thetester to observe how the application behaves under normaluse. The definition of normal is application specific. Gener-ally, the tester assigns a profile to each virtual user consistingof initiated and reactive behaviors. Initiated behavior con-sists of actions performed by the virtual user without regardto other activity in the application. For example, the virtualuser might perform a series of mouse movements culminat-ing in a button press every couple of minutes. Rebecca canmodel this behavior by replaying a test script continuously,or using a timer trigger.

Reactive behavior consists of virtual user actions triggeredby other user activity. For example, when testing a chat pro-gram, the virtual user might send the message ”Hello, your-

self!” in response the string ”Hello” being typed by anotheruser. Rebecca models this behavior using triggers that firebased on the activity of other users.

Consider the shared drawing/chat application in Figure 4.The tester wants to create a profile for a virtual user that willinteract with the live user during testing. For initiated be-havior, the virtual user will draw several shapes every sixtyseconds. This behavior provides a periodic load on the sys-tem typical of normal use. Using Rebecca, the tester createsa recording that draws several shapes. A timer trigger thatfires once every sixty seconds is attached to the recording.

The virtual user also has three reactive behaviors in its pro-file. The first behavior is a friendly attitude when the liveuser types ”Hi”, ”Hello”, or ”Hey” in the chat window. Thevirtual user responds with ”Welcome. I’m drawing in the up-per right corner and using blue!” The tester sets up this be-havior by recording the welcome message and copying therecording to keySequence triggers for each possible liveuser greeting.

The second behavior is a protective attitude about the upperright hand quarter of the drawing area. Whenever the liveuser’s mouse enters this area and begins drawing, the virtualuser sends the angry message ”Hey! I’m drawing in this area,draw somewhere else!” This behavior is set up by recordingthe angry message and attaching a mouseRegion triggerto the recording. The mouseRegion’s editor is configuredto fire to mousePressed events in the upper right quarterof the drawing area. The live user’s agent is selected as thetrigger listener.

The third behavior is another protective attitude towards thedrawing color used by live user. The virtual user only drawswith the color blue. If the live user chooses blue as well,the virtual user sends the angry message ”Hey! Blue is mycolor. You’ve ruined the picture and we are starting overagain.” and clears the drawing area. If the live user choosesa different color, the virtual user sends the message ”Greatchoice! That will complement my blue very nicely.” In or-der to describe this behavior to Rebecca, a new component,event, and threshold model for propertyChangeColormust be written.

The specifications for the propertyChangeColor areas similar to propertyChangeInt. The component is astate change component. It contains the current value of thelocal user’s drawing color. The event is a property changeevent. It is generated by the component when the user’sdrawing color changes. The value transmitted in the eventis the new drawing color. The threshold model tests forchanges to the drawing color component. An editor config-ures the model by allowing the user to select a drawing colorfrom a list in combination with a ”this” or ”all-but-this” op-tion. If this is chosen, then the trigger will fire when thedrawing color is selected. If all-but-this is chosen, then thetrigger will fire when any other color is selected.

7

Page 8: Triggers: A Dynamic Approach to Testing Multiuser Software · Triggers: A Dynamic Approach to Testing Multiuser Software Robert F. Dugan Jr., Ephraim P. Glinert Department of Computer

The virtual user’s color selection behavior is created withtwo triggers. The first trigger is attached to a recording thatsends the angry message and erases the drawing area. AproperyChangeColor threshold model is configured tofire when the live user selects the color blue. The secondrecording sends the friendly message. It is attached to apropertyChangeColor trigger that fires when the liveuser selects any color except blue.

Rebecca’s support for reactive behavior in virtual usersgreatly expands the possibilities for simulating user activity.Traditional testing systems do not have this capability. Forinitiated behavior, Rebecca’s continuous replay and triggertimer mechanisms offer a simple alternative to the labor in-tensive task of adding wait commands and loop constructs,compiling, loading and running a traditional test script.

Usage Case 4:A Flurry of Activity: Trigger ChainingThe usage cases discussed in this section have dealt withvirtual users reacting individually to live user actions. Re-becca’s trigger subsystem also supports trigger chaining.Trigger chaining allows virtual users to react to each otherin a coordinated fashion. A trigger chain is created when atrigger is fired in reaction to event generated from a virtualrather than a live user. Triggers can be strung together acrossmany virtual users to create a chain reaction to a single event.

Consider the following scenario. Three triggers are regis-tered with the live user’s agent. Any one of these triggerswill cause the same recording in VirtualUser0’s agent to ex-ecute. Two triggers, TriggerD and TriggerE, registered withVirtualUser0 fire because of an event in this recording. Trig-gerE causes the replay of a recording in VirtualUser2, wherea branch of the chain terminates. TriggerD activates a record-ing in VirtualUser1 that fires TriggerF. TriggerF replays arecording in VirtualUser3 where the last branch of the chainterminates.

Figure 5 shows how trigger chaining can be used to extendthe shared drawing area test from the previous use case. Asecond virtual user is added to the test. As in the original test,when the live user types a greeting in the chat window, theoriginal virtual user, VirtualUser0, responds with ”Hello”.This response, however, is chained to another trigger thatcauses VirtualUser1 to type ”Don’t be fooled by the friendlygreeting. VirtualUser0 is actually very testy. If you want Ican prove it to you. Just say ’show me’” in the chat window.

If the live user types any phrase containing ”show me” inthe chat window, VirtualUser1 replays a recording of mousemovements in the upper right corner of the drawing area.Chained to these mouse events is VirtualUser0’s angry re-sponse ”Hey! I’m drawing in this area, draw somewhereelse!” This triggers a retort from VirtualUser1 ”See, I toldyou! I’ll tell you something else. Don’t even think aboutdrawing in blue. VirtualUser0 really hates that.” Virtu-alUser0 gets in the last word with ”I heard that!” triggeredby the key sequence ”hates that” from VirtualUser1.

Although unimplemented in Rebecca-J, Rebecca has a sec-ond form of chaining: trigger state chaining. Trigger statechaining allows the tester to construct a non-deterministic fi-nite automaton from a set of triggers. The input grammarfor the automaton is quadruple of component, event, thresh-old model, and trigger count for each trigger in the chain.State transitions occur when the input condition for the cur-rent state is satisfied (the trigger fires).

State change triggers are useful in situations where the testerwould like a trigger to replay different recordings as the testsession progresses. Consider the VirtualUser0’s responsewhen the live user selects the color blue. Instead of sendingthe message and erasing the graphics in a single recording,the virtual user’s response could be broken up into a collec-tion of smaller responses. The first time the live user selectsthe color blue VirtualUser0 issues a warning: ”Hey! Blueis my color, you’ll ruin the picture if you use blue too.” Ifthe live user selects another color, then the state machine re-turns to the start-state. If, however, the live user begins todraw with the color blue, VirtualUser0 issues another warn-ing: ”Stop drawing in blue or I’ll erase the whole picture!”If the live user selects another color, then the state machinereturns to the start-state. However, if the user continues todraw VirtualUser0 sends the message ”I’m going to keeperasing the drawing area until you change your drawing colorfrom blue.” and clears the drawing area. This message/eraserecording is triggered by live user drawing activity until an-other drawing color is selected.

Although Rebecca-J has not implemented trigger state chain-ing, what would the implementation look like? Each triggerpanel would include a state chaining button. When the buttonis pressed the user would see a dialog box listing all regis-tered triggers. The user selects one or more of the registeredtriggers and adds them to the next state list. All triggers inthe next state list are disabled until the trigger being editedhas fired the maximum firing amount of times. The testercan check the ”Reset Count” box for triggers in the next statelist. This resets the trigger fired count for that trigger to zerowhen the trigger is enabled. The ”Reset Count” box allowsmovement to a previous state in the trigger state chaining di-agram.

5 EVALUATIONOur triggering architecture and implementation was includedin an overall evaluation of CAMELOT and Rebecca [4].The Reconfigurable Collaboration Network (RCN), a matureCSCW shared windowing application was the applicationunder test [16]. RCN allows multiple users to share controlof a single public machine using a remote machine with onlyone user in control of the public at a time. Race conditionand response under load triggers were particularly effective.

From early testing, it appeared that the public machine wasmaintaining state information about the controlling remotemachine’s keyboard. We were concerned that a race condi-

8

Page 9: Triggers: A Dynamic Approach to Testing Multiuser Software · Triggers: A Dynamic Approach to Testing Multiuser Software Robert F. Dugan Jr., Ephraim P. Glinert Department of Computer

Figure 5: Trigger chaining extends shared drawing area test.

tion might arise when one user took unexpected control ofthe public machine from another user. To see if this cre-ated a race condition, a recording for a virtual user takingcontrol of the public machine and typing ”aaa” was madeand connected to a trigger. The trigger fired whenever thelive user was in control of the public and pressed the SHIFTkey. During the tests, the public machine interpreted UserB’skeyboard presses as ”AAA”. Additional trigger test using theALT and CTRL keys indicated the same problem.

A similar trigger test determined that if the live user did notrelease the mouse button before the virtual user took over thepublic, then all virtual user mouse events were interpreted asmouse button press events. The test also discovered that thelive user could still control the public machine’s screen cur-sor as long as the mouse button stayed pressed. Only theability to integrate live and virtual users in a single test ses-sion made this discovery possible.

Several other race conditions were exposed using triggers in-cluding problems joining sessions, users, teams, and publicmachines.

We were also able to user triggers and threshold models toexplore the response time of the application under load. Wewere particularly interested in the effect that RCN’s ghostcursors would have on the public machine’s system cursorand vice-versa. Because all mouse events had to be trans-mitted across the network to a central server on the public,this centralized design had the potential to cause a perfor-mance bottleneck. The mouse region threshold editor wasused to create a trigger that would fire when the public’s sys-tem cursor entered the top half of the screen. The trigger

was attached to the recordings of ghost cursor movement forseveral virtual users. Several tests were conducted using atrigger based on the mouse region threshold editor, the pub-lic system cursor, and RCN’s ghost cursor. Results from thistesting showed the system did not scale well and uncoveredmemory leaks on both the public and private machines.

6 CONCLUSIONCommercial and academic testing systems that provide mul-tiuser support completely prescribe the test session from startto finish precluding live user participation. Virtual user or-chestration is specified using a combination of script syn-chronization primitives and session manager scheduling.

Triggers radically change the concept of a test session by giv-ing virtual users the ability to react to live users. Instead ofbeing a prescribed process, multiuser testing becomes a dy-namic one. Triggers operate on any application activity thatcan be recognized by the test system. A set of triggers canbe defined on the same event or chained together to providecomplex virtual user behavior.

Threshold models provide sophisticated control over the fir-ing of a trigger. If an event or sequence of events passesthe threshold condition, then the trigger fires. Several mod-els were introduced for user interface events (e.g. mouse-Pressed,mouseRegion), application state change (e.g.propertyChangeInt), and timers. Threshold modelsrange in sophistication from the simple (e.g. a mouse but-ton press) to the complex (a specific type of mouse buttonpress in a specific region of a user interface component) andare completely customizable.

Usage cases were presented to demonstrate the capabilities

9

Page 10: Triggers: A Dynamic Approach to Testing Multiuser Software · Triggers: A Dynamic Approach to Testing Multiuser Software Robert F. Dugan Jr., Ephraim P. Glinert Department of Computer

of triggers. These included detecting race conditions, deter-mining response time under load, simulating realistic userbehavior, and the ability of virtual users to react other virtualusers.

An evaluation of triggers was conducted on RCN, a matureshared windowing CSCW application. Serious flaws in RCNwere uncovered prompting the following comment when Re-becca uncovered one troublesome race condition:

i don’t think any tester wouldhave ever discovered that. sim-ply for discovering that, i con-sider rebecca a success. J.J. Johns,RCN Developer

7 ACKNOWLEDGMENTSThis research was supported in part by the National ScienceFoundation under awards CDA-9634485 and CCR-9527151.The authors would also like to thank Professor Edwin H.Rogers and software developer J.J. Johns for their supportand advice during the evaluation of RCN using Rebecca-J.

REFERENCES

[1] D. Cohen, S. Dalal, A. Kajla, and G. Patton. The au-tomatic efficient test generator (aetg) system. In In-ternational Conference on Testing Computer Software,Washington, DC, 1994.

[2] J. Drury, L. Damianos, T. Fandercla, L. Hirschman,J. Kurtz, and B. Oshika. Scenario-based evaluation ofloosely-integrated collaborative systems. In Proceed-ings of Conference on Human Factors in ComputingSystems (CHI ’00). ACM Press, 2000.

[3] J. Drury, L. Damianos, T. Fanderclai,L. Hirschman, J. Kurtz, and B. Oshika. Method-ology for evaluation of collaborative systems,http://www.mitre.org/support/papers/tech papers99 00/damianos evaluating/index.shtml, 1999.

[4] R. F. Dugan Jr. A Testing Methodology and Architec-ture for Computer Supported Cooperative Work Soft-ware. Doctoral thesis, Rensselaer Polytechnic Institute,Department of Computer Science, 2000.

[5] E. Gamma, R. Helm, R. Johnson, and J. Vlissides.Design Patterns. Addison-Wesley, Reading, Mas-sachusetts, 1995.

[6] M. L. Hammontree, J. J. Hendrickson, and B. W. Hens-ley. Integrated data capture and analysis tools for re-search and testing on graphical user interfaces. In Pro-ceedings of Conference on Human Factors in Comput-ing Systems (CHI ’92), pages 431–432. ACM Press,1992.

[7] L. R. Kepple. The black art of gui testing. Dr. Dobb’sJournal, 19(2):40–42, 1994.

[8] A. M. Memon, M. E. Pollack, and M. L. Soffa. Usinga goal-driven approach to generate test cases for guis.In International Conference on Software Engineering,pages 257–266, Los Angeles, California, 1999. ACMPress.

[9] MercuryInteractive. Testdirector: Scalable test man-agement. User’s guide, Mercury Interactive Corpora-tion, 1997.

[10] H. Okada and T. Asahi. Guitester: A log-based us-ability testing tool for graphical user interfaces. IE-ICE Transactions on Information and Systems, E82-D(6):1030–1041, 1999.

[11] T. Ostrand, A. Anodide, H. Foster, and T. Goradia. Avisual test development environment for gui systems.In ACM SIGSOFT Internation Symposium on SoftwareTesting and Analysis, volume 23, pages 82–92, Clear-water Beach, Florida, 1998. ACM Press.

[12] PlatinumTechnology. Final exam c/s-test. Tutorial,Platinum Technology, Incorporated, 1997.

[13] M. Ramage. How to Evaluate Cooperative Systems.Doctoral thesis, Lancaster University, Department ofComputing, 1999.

[14] RationalSoftware. Sqa manager user’s guide. User’sguide, Rational Software Corporation, 1997.

[15] D. Richardson and N. Eickelman. A framework forsoftware test environments. In Eighteenth InternationalConference on Software Engineering (ICSE-18), 1996.

[16] E. H. Rogers, C. Geisler, J. Farley, J. Johns, andC. Parker. The reconfigurable collaboration network,a demonstration of collaborative system sharing. InProceedings of the European Computer Supported Co-operative Work Conference 1999, Amsterdam, Nether-lands, 1999.

[17] S. Ross, M. Ramage, and Y. Rogers. Petra: Participa-tory evaluation through redesign and analysis. Interact-ing with Computers, 7(4):335–360, 1995.

[18] G. Rothermel, L. Li, C. DuPula, and M. Burnett. Whatyou see is what you test: A methodology for test-ing form-based visual programs. In 20th InternationalConference on Software Engineering, pages 198–207,1998.

[19] SunMicrosystems. Javastar overview, 2000.

[20] L. J. White. Regression testing of gui event interac-tions. In International Conference on Software Main-tenance, pages 350–358, Monterey, California, 1996.IEEE Computer Society Press.

10