interacting with the disappearing computer: evaluation of ... · interacting with the disappearing...

Interacting with the Disappearing Computer: Evaluation of the Voyager Development Framework

Anthony Savidis, Constantine Stephanidis,

ICS-FORTH Technical Report

Abstract

We have developed a programming framework named Voyager, for making interactive applications with dynamically composed User Interfaces, consisting of remote input / output elements hosted by environment devices. This framework reflects our perspective of the disappearing computer concept as an infrastructure enabling mobile uses to exploit on the fly any available proximate devices for the purposes of interaction. The evaluation of such a demanding development instrument requires appropriate methods and techniques to validate its appropriateness and usability both as a programming tool, but also with respect to the eventual interactive quality of the applications that can be implemented. In this context, the detailed evaluation process is described, carried out in the context of the 2WEAR Project, employing techniques for software evaluation, process improvement assessment and usability evaluation.

Keywords

Disappearing computing, ubiquitous computing, ambient interactions, software evaluation, usability evaluation.

Table of Contents

1 INTRODUCTION.......................................................................................4

2 DEVELOPMENT INSTRUMENTS ............................................................5 2.1 APPROACH 5

2.1.1 EASE-OF-USE ASSESSMENT QUESTIONNAIRE 8 2.1.2 APPROPRIATENESS ASSESSMENT QUESTIONNAIRE 9 2.1.3 ROBUSTNESS ASSESSMENT QUESTIONNAIRE 14 2.1.4 PROCESS IMPROVEMENT ASSESSMENT QUESTIONNAIRE 15

2.2 PROCESS 16 2.3 PARTICIPANTS 17 2.4 RESULTS 17

2.4.1 EASE-OF-USE ASSESSMENT RESULTS 17 2.4.2 APPROPRIATENESS ASSESSMENT RESULTS 18 2.4.3 ROBUSTNESS ASSESSMENT RESULTS 19 2.4.4 PROCESS IPROVEMENT ASSESSMENT RESULTS 20

2.5 CONSOLIDATION 21 2.5.1 Ease-of-use 21 2.5.2 Appropriateness 21 2.5.3 Robustness 21 2.5.4 Process improvement 21

3 INTEROPERABILITY FRAMEWORK ....................................................23 3.1 APPROACH 24 3.2 PROCESS 25 3.3 SUBJECTS 25 3.4 RESULTS 25 3.5 CONSOLIDATION 26

4 DEMONSTRATOR APPLICATIONS ......................................................27 4.1 APPROACH 27

4.1.1 TECHNIQUES AND TOOLS FOR SUBJECTIVE EVALUATION 27 4.1.2 IBM COMPUTER USABILITY SATISFACTION QUESTIONNAIRES 28 4.1.3 WHAT IS BEING MEASURED 28

4.2 PROCESS 35 4.2.1 SCENARIO SCRIPTING 35 4.2.2 SCENARIO STRUCTURE 35

4.3 SUBJECTS 37 4.4 RESULTS 38

4.4.1 ASQ questionnaires 38 4.4.2 CSUQ questionnaires 39

4.5 CONSOLIDATION 41

Savidis & Stephanidis, 2003, Interacting with the Disappearing Computer

5 CONCLUSIONS......................................................................................42

6 BIBLIOGRAPHY.....................................................................................43


1 INTRODUCTION

The 2WEAR project is targeted towards demonstrating the following innovative features in the context of ubiquitous disappearing computing:

• Dynamic composition of application functionality on-the-move from remote dynamically discovered services;

• Dynamic composition of the User Interface on-the-move from remote dynamically discovered interaction appropriate devices;

• Employment of common remote wireless communication technology supporting short-range radio, specifically BlueTooth™;

• Dynamic re-configuration, either system-initiated or user-initiated;

• Demonstration of indicative applications which display the above features.

In this context, apart from the internal continuous evaluation, during the development of the various independent 2WEAR components, an overall assessment study has been coordinated. The subject of this study constituted the key project milestones, namely:

• Development instruments. Concerns our own approach in reaching the 2WEAR goals. Although those instruments constitute internal project results, a type of internal assessment, mainly carried out by the interested parties, was essential in identifying plans for future continuation and exploitation. The tools were functionally validated in the context of the demonstration applications, by verifying the presence of the above target features.

• Interoperability. This is a crossing-point that actively concerned all partners at the lower levels of implementation. The identification of an effective and efficient interoperability umbrella has been evaluated through following a very strict schedule for definition (all together), implementation (each partner) and testing (all together) of the interoperability framework within less than six months. This was actually supported by a series of organised workshops, during which the approach has been validated functionally by intensive interoperability testing of the independent components.

• Applications. In the 2WEAR project main emphasis has been put in providing feasibility demonstrators to show that technology can be developed that exhibits some distinctive ubiquitous computing properties. This practically constitutes the starting point for further efforts to pursue best practices so that the artifacts produced by the employment of such new enabling technology may become far more usable and attractive, by targeting mainly usability demonstrators. However, the initial project’s involvement in the experimental use of such new types applications by end-users required some form of validation of the user interface design decisions to ensure overall acceptance and usability.


2 DEVELOPMENT INSTRUMENTS

The internal development instruments concern mainly the particular incarnation of our software development strategy to develop the 2WEAR demonstrator applications, as realised by means of:

• Software system architecture;

• Application Programming Interfaces (APIs);

• Programming patterns and regulations.

Naturally, the ability of the development instruments to meet the particular goals set-up by the 2WEAR project is verified by the feasibility of the demonstration applications. The latter satisfy the futuristic application scenario specification, and in particular those mostly related to dynamic User Interface re-configuration.

This reflects well-known practices in the software development domain, based on which the appropriateness of the tool is validated once it enables the construction of the target software artefacts within the specified timeframe and by consuming only the initially allocated human resources (NASA SEL, 1995).

However, as part of our exploitation plans regarding the Voyager User Interface Framework (UIF) we are planning its larger-scale employment for new categories of distributed dynamically configurable applications (e.g. file manager, agenda, health monitor, smart-home control panel, etc.), as well as its distribution to a larger community of programmers, than those engaged in the course of the 2WEAR project. In this context, it has been of strategic interest to set-up a background evaluation process for software production, by employing internationally recognised techniques in this context, such as software metrics and software process improvement (NASA SEL, 1995), in conjunction to usability evaluation.

2.1 APPROACH The followed approach has been based on subjective usability evaluation, through semi-formal questionnaires (i.e., participating subjects had to respond to certain questions with scores, but they could also provide textual explanations if necessary), emphasizing qualitative feedback (i.e., small number of subjects, with in-depth information collection), rather than statistical significance (i.e., large number of subjects, with more shallow information collection). This approach fits well with the purpose of the study, which is to provide an early assessment of four specific metrics, as indicated under Figure 2.1.1.


Metric to be assessed Purpose of assessment

Ease of use

Measures whether the tool can be easily deployed by programmers, reflecting a typical small learning curve, while supporting style consistency together with an economy of features, thus offering an optimally select ed set of programming elements.

Appropriateness

Measures the programming model distance between the particular problem domain (i.e. distributed dynamically re-configured User Interfaces), and the repertoire of programming elements and components offered to solve it (i.e. the UIF software library). The smaller this distance, the more appropriate the tool for its particular domain.

Robustness

Measures the presence of internal defects and operational malfunctions, together with the tolerance on either external failures (e.g. network breakdown) or tool misuse (i.e. the programming elements are not used the way the should be). The smaller the number of internal defects, and the higher the tolerance, the more robust a development tool is.

Process improvement

Measures the practical benefits which lead to software-process improvement, due to the employment of a particular tool, in comparison to more traditional production methods. Typically, reduction on various key parameters is monitored: newcomer engagement time, development time, defect fixing time, average design implement test cycle time, etc.

Figure 2.1.1: The key assessment metrics in the evaluation of the development tools (i.e. the User Interface Framework).

To effectively assess the above metrics, four different questionnaires have been formulated. Those were given and explained to all the programmers engaged in the development process prior to the initiation of the implementation phase. The questionnaires for each category are explained below.

Generally, questionnaires for development tool evaluation are rarely published, since they mostly constitute in-house properties of companies that actually commercialise software development tools. The available literature mostly accounts for specific research-oriented systems, with informal expert evaluations (i.e. no users are engaged, but experts evaluate certain features).

One such method, which has been driven the evaluation of APIs of multi-platform User Interface programming toolkits, is provided in (Savidis et al., 1997), developed in the content of the ACCESS TP-1001 EU Project. The questionnaires developed in (Savidis et al., 1997) evaluate the following elements:

• Adequacy of the programming model;

• Ease of understanding the programming model;

• Consistency of the programming model;

• Appropriateness with respect to the underlying semantics.


Additionally, one of the most acknowledged methods to evaluate software architectures during the development cycle, in order to allow early assessment and modification, has been employed in the context of the UIF development. This method, known as the Quality Attribute Workshop (QAW), has been developed at the Software Engineering Institute of the Carnegie Mellon University (SEI/CMU, 2003). Historically, SEI/CMU has conducts activities the domain of User Interface development tools. One such tool is the SERPENT User Interface development tool, which was released as open-source in the early 90s. In the middle 90s, due to the large number of UI tool projects, the Human Computer Interaction Institute (HCII) was founded as a spin-off of SEI/CMU, mainly focusing on high-end UI development tools and methods. In this context, many of the methods developed at SEI/CMU had been either initiated or applied in the context of UI tool development efforts, which turn those methods even more appropriate for the 2WEAR UIF.

Figure 2.1.2: Overview of the QAW (original picture taken from http://www.sei.cmu.edu/publications/documents/02.reports/02tn013/02tn013.html).

The QAW method is based on scenario generations, test case development, test case architecture analysis and results’ presentation. In the context of the 2WEAR project, the early test scenario was a “distributed e-mail application” (described in D10), the early test case developments attributed to the simulation of remote resources with local GUI components (simulation scenario, also discussed in D10), while the test case architecture design and presentation, reflecting the results of the simulation experiment, led to the first version of the UIF. Following the initial pilot UIF development stage, the test scenarios were substituted by the futuristic application environment specification (real scenarios), while the more systematic on-going evaluation process, which will be described below, was initiated.


http://www.sei.cmu.edu/publications/documents/02.reports/02tn013/02tn013.html

2.1.1 EASE-OF-USE ASSESSMENT QUESTIONNAIRE

A. Was it difficult to understand the role of the various programming elements?Classes Very difficult Difficult Normal Easy Very easy

Member functions Very difficult Difficult Normal Easy Very easy

Function arguments Very difficult Difficult Normal Easy Very easy

Return types Very difficult Difficult Normal Easy Very easy

Error codes, exceptions Very difficult Difficult Normal Easy Very easy

Constants, macros Very difficult Difficult Normal Easy Very easy

Enumerated types Very difficult Difficult Normal Easy Very easy

Overloaded members Very difficult Difficult Normal Easy Very easy

Templates classes Very difficult Difficult Normal Easy Very easy

Template functions Very difficult Difficult Normal Easy Very easy

Feel free to provide any comments that may explain your score or provide some additional information to the assessment. In case you score on the negative side, please elaborate by providing information on the specific element (e.g. BeepOutputResource class, Release function).

The following part of the questionnaire concerns the various packages of the UIF as those are described in D10. Additionally, the questionnaire elaborates on specific features of each UIF package.

B. Was it difficult to understand the functional result of API method calls? Very difficult Difficult Normal Easy Very easy

Very difficult Difficult Normal Easy Very easy





Abstract UI objects Selector

Text entry

Valuator

Confirm box

Radio group

Message box Very difficult Difficult Normal Easy Very easy

IO resources basics Very difficult Difficult Normal Easy Very easy

IO resources extensions Very difficult Difficult Normal Easy Very easy






UI kernel Resource allocator

Resource manager

Focus manager

Main loop

Client control Very difficult Difficult Normal Easy Very easy

Feel free to provide any comments that may explain your score or provide some additional information to the assessment. In case you score on the negative side, please elaborate by providing information on the specific API element and function-call considered as problematic; it is desirable that you supply accompanying source code which


identifies the problem.

C. Was it difficult to program with the abstract UI objects? Creation, destruction Very difficult Difficult Normal Easy Very easy

Focus control Very difficult Difficult Normal Easy Very easy

Content management Very difficult Difficult Normal Easy Very easy

Call-back management Very difficult Difficult Normal Easy Very easy

Deriving behaviours Very difficult Difficult Normal Easy Very easy

Interface construction Very difficult Difficult Normal Easy Very easy

Dialogue flow control Very difficult Difficult Normal Easy Very easy

Feel free to provide any comments that may explain your score or provide some additional information to the assessment. In case you score on the negative side, please elaborate by providing information on the specific object and facility which was problematic (e.g. Selector class, SetCurrOption function).

2.1.2 APPROPRIATENESS ASSESSMENT QUESTIONNAIRE

In this context, the three key features which had to be evaluated concerned: (a) the external software architecture, i.e. the one visualised to the programmer as a general skeleton on which the implemented system is structured; (b) the APIs; and (c) the programming patterns, being some key micro-structures which define how API calls have to be made (sequencing, conditionality, presence).

A. Do you think the internal architectural components fit the domain problem? Channels Appropriate Likely ok Neutral Likely wrong Inappropriate

Device resources Appropriate Likely ok Neutral Likely wrong Inappropriate

Abstract objects Appropriate Likely ok Neutral Likely wrong Inappropriate

Dialogue styles Appropriate Likely ok Neutral Likely wrong Inappropriate

Preference manager Appropriate Likely ok Neutral Likely wrong Inappropriate

Focus control Appropriate Likely ok Neutral Likely wrong Inappropriate

Dialogue flow control Appropriate Likely ok Neutral Likely wrong Inappropriate

Feel free to provide any comments that may explain your score or provide some additional information to the assessment. In case you score on the negative side, please elaborate by providing information on the specific component you consider as inappropriately engaged. You may suggest to bring more components, or merge some, by justifying your choice.


B. Do you think the external architectural components fit the domain problem? Appropriate Likely ok Neutral Likely wrong Inappropriate

Appropriate Likely ok Neutral Likely wrong Inappropriate




Application manager

IORM Server

IORM Proxy

UI Device Server

UI Resource Components

Client Application Appropriate Likely ok Neutral Likely wrong Inappropriate

Feel free to provide any comments that may explain your score or provide some additional information to the assessment. In case you score on the negative side, please elaborate by providing information on the specific component you consider as inappropriately engaged. You may suggest to bring more components, or merge some, by justifying your choice.

The following questionnaire assesses the designed architecture control-flow specification to effectively implement the real demands of concrete run-time use scenarios.

C. Do you think the scenarios are addressed by the designed control flow? UI device discovery Appropriate Likely ok Neutral Likely wrong Inappropriate

UI device loss Appropriate Likely ok Neutral Likely wrong Inappropriate

Client application start Appropriate Likely ok Neutral Likely wrong Inappropriate

Client application close Appropriate Likely ok Neutral Likely wrong Inappropriate

Switch to client application Appropriate Likely ok Neutral Likely wrong Inappropriate

Switch to AM Appropriate Likely ok Neutral Likely wrong Inappropriate

Automatic configuration Appropriate Likely ok Neutral Likely wrong Inappropriate

Manual configuration Appropriate Likely ok Neutral Likely wrong Inappropriate

AM failure Appropriate Likely ok Neutral Likely wrong Inappropriate

Client application failure Appropriate Likely ok Neutral Likely wrong Inappropriate

Client dialogue stall Appropriate Likely ok Neutral Likely wrong Inappropriate

AM dialogue stall Appropriate Likely ok Neutral Likely wrong Inappropriate

Global suspend Appropriate Likely ok Neutral Likely wrong Inappropriate

Global resume Appropriate Likely ok Neutral Likely wrong Inappropriate

Feel free to provide any comments that may explain your score or provide some additional information to the assessment. In case you score on the negative side, please elaborate by providing information on the specific scenario(-s) that you think cannot be handled by the designed control flow logic..


The following questionnaire requires in depth knowledge of the structure of the architecture. Since the QAW method is not so much prescriptive as to how the architecture is to be judged, but mostly on how the assessment process is to be fused with the developed process, informal systematic methods to deliver architectural information were employed.

D. Please score the following properties of the architectural packages Very much Pretty much Neutral Probably not Sure not

Very much Pretty much Neutral Probably not Sure not






Cohesion (overall)

IO resource basics

IO resource extensions

UI kernel

IO task base

IO task instantiation

Protocol base

Protocol extension Very much Pretty much Neutral Probably not Sure not








Coupling (overall)

IO resource basics


UI kernel

IO task base


Protocol base









Modularity (overall)

IO resource basics


UI kernel

IO task base


Protocol base


*Cohesion is to be judged with for elements of the same package (do they fit together>).

*Coupling is to be judged for elements of the same package (do they fit together).

*Modularity is to be judged for the splitting of the packages (does the split ensure highest cohesion and looser coupling internally and externally)?

Feel free to provide any comments that may explain your score or provide some additional information to the assessment.


In particular, explanations and documentation, largely based on the detailed information provided within D10, have been used. Additionally, software walkthroughs, based on acquiring hands-on experience with the software library and its source code, in an on-line tutorial fashion, have been carried out. Finally, informal architecture presentations, using the Class - Responsibilities - Collaborations (CRC) method (Beck and Cunningham, 1989), in conjunction to typical class inheritance and association diagrams, have been provided. In Figure 2.1.2.1 the initial CRC design of the UI kernel is depicted, while in Figure 2.1.2.2 the inheritance and collaboration diagram for the IO resource basics / extensions and the Protocol basic / extensions packages are shown. Both were presented to the participants of the QAW study.

Figure 2.1.2.1: CRC initial design of the UI kernel as discussed in a QAW early session.


Ack

Res

ourc

e

Dev

Res

ourc

e

StdP

roto

Supe

r

StdP

roto

2WEA

RSt

dPro

toH

CI

StdP

roto

ToD

evR

esou

rceP

roto

Supe

r

ToD

evR

esou

rceP

roto

2WEA

RTo

Dev

Res

ourc

ePro

toH

CI

ToD

evR

esou

rceP

roto

Net

Link

Sock

etLi

nkH

CI

2WEA

R N

etA

PI Inpu

tRes

ourc

eIn

putE

vent

From

Inpu

tRes

ourc

ePro

toSu

per

From

Inpu

tRes

ourc

ePro

to2W

EAR

From

Inpu

tRes

ourc

ePro

toH

CI

From

Inpu

tRes

ourc

ePro

to

But

tonI

nput

Res

ourc

eB

utto

nInp

utEv

ent

From

Butto

nInp

utR

esou

rceP

roto

Supe

r

From

But

tonI

nput

Res

ourc

ePro

to2W

EAR

From

Butto

nInp

utR

esou

rceP

roto

HC

I

From

But

tonI

nput

Res

ourc

ePro

to

SS

S

S

InputResourceController

Out

putR

esou

rce

Out

putE

vent

Ret

urne

dPar

ms

Sing

leTe

xtLi

neO

utpu

tRes

ourc

e

Sing

leTe

xtLi

neO

utpu

tEve

nt

Sing

leTe

xtLi

neR

etur

nedP

arm

s

ToSi

ngle

Text

Line

Out

putR

esou

rceP

roto

Supe

r

ToSi

ngle

Text

Line

Out

putR

esou

rceP

roto

WEA

R

ToSi

ngle

Text

Line

Out

putR

esou

rceP

roto

HC

I

ToSi

ngle

Text

Line

Out

putR

esou

rceP

roto

SD

evR

esou

rce

From

Out

putR

esou

rceP

roto

Supe

r

From

Out

putR

esou

rceP

roto

WEA

R

From

Out

putR

esou

rceP

roto

HC

I

From

Out

putR

esou

rceP

roto

S

Out

putR

esou

rceC

ontro

ller

S

From

Dev

Res

ourc

ePro

toSu

per

From

Dev

Res

ourc

ePro

to2W

EAR

From

Dev

Res

ourc

ePro

toH

CI

From

Dev

Res

ourc

ePro

toS

Pend

ingR

etur

nedP

arm

sCon

trolle

rS

S

Ack

Res

ourc

e

Dev

Res

ourc

e

StdP

roto

Supe

r

StdP

roto

2WEA

RSt

dPro

toH

CI

StdP

roto

ToD

evR

esou

rceP

roto

Supe

r

ToD

evR

esou

rceP

roto

2WEA

RTo

Dev

Res

ourc

ePro

toH

CI

ToD

evR

esou

rceP

roto

Net

Link

Sock

etLi

nkH

CI

2WEA

R N

etA

PI Inpu

tRes

ourc

eIn

putE

vent

From

Inpu

tRes

ourc

ePro

toSu

per

From

Inpu

tRes

ourc

ePro

to2W

EAR

From

Inpu

tRes

ourc

ePro

toH

CI

From

Inpu

tRes

ourc

ePro

to

But

tonI

nput

Res

ourc

eB

utto

nInp

utEv

ent

From

Butto

nInp

utR

esou

rceP

roto

Supe

r

From

But

tonI

nput

Res

ourc

ePro

to2W

EAR

From

Butto

nInp

utR

esou

rceP

roto

HC

I

From

But

tonI

nput

Res

ourc

ePro

to

SS

S

S

InputResourceController

Out

putR

esou

rce

Out

putE

vent

Ret

urne

dPar

ms

Sing

leTe

xtLi

neO

utpu

tRes

ourc

e

Sing

leTe

xtLi

neO

utpu

tEve

nt

Sing

leTe

xtLi

neR

etur

nedP

arm

s

ToSi

ngle

Text

Line

Out

putR

esou

rceP

roto

Supe

r

ToSi

ngle

Text

Line

Out

putR

esou

rceP

roto

WEA

R

ToSi

ngle

Text

Line

Out

putR

esou

rceP

roto

HC

I

ToSi

ngle

Text

Line

Out

putR

esou

rceP

roto

SD

evR

esou

rce

From

Out

putR

esou

rceP

roto

Supe

r

From

Out

putR

esou

rceP

roto

WEA

R

From

Out

putR

esou

rceP

roto

HC

I

From

Out

putR

esou

rceP

roto

S

Out

putR

esou

rceC

ontro

ller

S

From

Dev

Res

ourc

ePro

toSu

per

From

Dev

Res

ourc

ePro

to2W

EAR

From

Dev

Res

ourc

ePro

toH

CI

From

Dev

Res

ourc

ePro

toS

Pend

ingR

etur

nedP

arm

sCon

trolle

rS

S

Figure 2.1.2.3: Working class inheritance and association diagram for IO resource basics / extensions and Protocol basics / extensions packages as discussed in a QAW early session.


2.1.3 ROBUSTNESS ASSESSMENT QUESTIONNAIRE

The assessment of the operational robustness is a test-first process, meaning it can only be applied as far as some pretty stable versions of the software system become available. Additionally, robustness is usually evaluated for the three main generations of a software system, namely beta, alpha and final. Normally, each newer generation should be more robust than the previous one with respect to similar operations and functional features. In the performed study, software testers were engaged, who used the system in the following quite extreme way:

• No particular use-scenario is followed;

• No conditionality on operations for fault-free operation is assumed;

• Every available feature is used exhaustively and iteratively;

• Destructive operations (such as those falling in the “remove”, “delete”, “stop”, “kill” family) are repeatedly applied;

• The basic operations with numerous alternative parameters (in the specific case, manual or automatic re-configuration by moving close or far to devices continuously) are applied countless times;

• Input to the system is provided even when the system does not seem ready to accept it;

• Warning messages are the user continues behaving with the system as if everything is normal.

The questionnaire for robustness, targeted to such software testers, is quite simple and emphasises collection of statistical information. The results of such a study provide valuable information for the bug fixing and software update process, so as to reach the next, hopefully more stable, system generation. The questionnaire has to be filled in for each different use session (hence, a tester had to fill-in multiple instances of the following questionnaire), and for each different defect that is detected.

Please document all detected software defects. Use one form per defect.

Defect explanation

Defect type Fatal Non-fatal, side effects Non-fatal, isolated

Defect appearance Always Canonical Reproducible Seems random

Defect effect on operation Cancelled always

Cancelled sometimes

Mostly error completion

Mostly fine completion

Not affected at all

Defect system level User Interface Functionality Data OS level

Defect frequency In _______ cases out of ________, the defect appears.


2.1.4 PROCESS IMPROVEMENT ASSESSMENT QUESTIONNAIRE

This has been the only part of the overall development tools study that actually required comparative assessment. The objective of this study was to clearly identify the benefits in software developed process inherent by the introduction of a large-scale software framework with higher-level programming entities, like the UIF, on top of a basic software library like the interoperability library (i.e. 2WEAR channels), when through the latter, from the feasibility point of view, we can still “do the job”. Hence, the key question of the study was:

What if we do not introduce any UIF at all, but we instead try to develop all 2WEAR demonstrator applications by employing only the basic protocols and the library of channels? Can we manage to complete such a development effort to complete successfully and in a shorter time, in comparison to the case where the UIF is employed?

The questionnaire for this study is split in two parts. The first is a checklist of all the potential software elements that are engaged in demonstrator application development, trying not to be biased with respect to the one or the other development approach. In this respect, the checklist extrapolates the likely appearing software patterns, due to the desirable features of the applications, which should be translated to fully working source code by the two approaches. The second part is the actual questionnaire that has to be filled by each participant (i.e. application programmer), aiming to collect some key quantitative (objective) and qualitative (subjective) metrics.

Software patterns to be implemented for demonstrator applications 1. Remote UI I/O device management

• For each different UI I/O resource category

2. Registry of UI I/O devices

3. Registry of UI I/O resources per device

4. UI element notion for user interaction

• Its up to developers to decide what element model to implement

5. UI element configuration per element category

• Engaged UI I/O resources

6. Ranking of configurations

7. Dynamic selection of configurations

8. Testing if a configuration is viable

9. Automatic (preferences) configuration

10. Manual (user) configuration

11. Focus element management

12. Client application main loop

13. Client focus application

14. Client application focus switching


Software implementation metrics to be documented for each pattern Time to complete [___] months, [___ ] weeks, [___] days [_] ready

Source lines of code (SLOC) required [_____________]

Defects caused in the process [_____________]

Defects still remaining [_____________]

Reusability of implementation High Good Some Limited No way

Percentage of defect fixing time [____]% of time to fix bugs

Tolerance for new features (re: entropy) High Good Some Limited No way

Controllability characterisation High Good Some Limited No way

2.2 PROCESS

The way the overall evaluation process has been planned is depicted in Figure 2.2.1. Prior to each phase, there have been informal sessions presenting all key aspects of the UIF, at various levels of technical detail. The evaluation process has been organised so that it would be initiated at the starting date of the demonstrator application development, which was chronologically put at the beginning of July 2003. The assessment for ease-of-use, process improvement and robustness were actually fused with the development of the beta release for the demonstrators. Additionally, the process improvement process required the parallel development of only one of the demonstrators without the use of the UIF.

Application scenarios.Software challenges.Functional features.

UIF introduction.External architecture.Internal architecture.

Packages.APIs.Coding guidance.

Month 1,June 2003

Month 2,July 2003

Month 3,August 2003

Ease-of-use

Process improvement

Robustness

Appropriateness

Demonstrator applications’beta release production phase







Month 1,June 2003

Month 2,July 2003

Month 3,August 2003

Month 1,June 2003

Month 2,July 2003

Month 3,August 2003

Ease-of-use

Process improvement

Robustness

Appropriateness

Demonstrator applications’beta release production phase

Figure 2.2.1: The time-plan of the evaluation process, being combined with the development of the beta demonstrator versions.


2.3 PARTICIPANTS

The participants in the assessment process vary for each evaluated subject. Since this process had to be fused with the target demonstrator application development phase, all people engaged in the study were also assigned a real development role (i.e. not just for the purposes of the evaluation). Hence, the results of the study reflect real demands, during a three months development period.

Participant profile No Assessment engagement Senior software engineer and system architect

2 Appropriateness, ease-of-use

Senior programmer 1 Ease-of-use, process improvement, robustness

Junior programmer (student) 1 Process improvement, robustness

Junior programmer (student) 1 Process improvement

Figure 2.3.1: The time-plan of the evaluation process, combined with the development of the beta demonstrator versions.

2.4 RESULTS

The results of the study indicate the quantitative analysis of the results, showing how the participants replied to the questionnaires. For each of the questionnaires, each column, starting from left to right, corresponds to a question preserving the order of appearance. Each ‘star’ inside a cell indicates the value assigned by one participant.

2.4.1 EASE-OF-USE ASSESSMENT RESULTS

A. Was it difficult to understand the role of the various programming elements?Worst Bad Neutral Good * ** ** * * ** * Best ** *** *** * * ** *** ** * **


B. Was it difficult to understand the functional result of API method calls? Worst

Bad

Neutral

Good ** * * * * * *** **

Best *** *** * ** ** *** *** ** *** ** ** *** * ***

C. Was it difficult to program with the abstract UI objects? Worst

Bad

Neutral

Good **

Best *** *** *** *** * *** ***

2.4.2 APPROPRIATENESS ASSESSMENT RESULTS

A. Do you think the internal architectural components fit the domain problem? Worst

Bad

Neutral

Good ** ** * *

Best ** * ** ** *

B. Do you think the external architectural components fit the domain problem? Worst

Bad

Neutral

Good * ** *

Best ** * ** ** *

C. Do you think the scenarios are addressed by the designed control flow? Worst

Bad

Neutral *

Good * * * * * ** * *

Best ** ** * * * ** ** ** * ** * *


D. Please score the following properties of the architectural packages Worst

Bad

Neutral

Good * * * ** * * * ** * * ** * ** * ** **

Best * ** * * ** * ** * ** * ** * * ** * ** ** ** * **

2.4.3 ROBUSTNESS ASSESSMENT RESULTS

The robustness questionnaires were supplied to the evaluation team every week for the two months period of application development and evaluation. In this context, eight collection rounds were completed, the results of which are summarised in Figure 2.4.3. As shown, the frequency of defect appearance degrades with time, demonstrating that the bug fixing process was effective. Additionally, the stabilisation of the overall UIF and demonstrator applications over time is clearly indicated.

Defect frequency

193170

6075 68 70

329

223201

132

35 4123 12 30

50

100

150

200

250

1 2 3 4 5 6 7 8

Week

Tota

l def

ects

Figure 2.4.3: Defects frequency distribution per development week. The two data series correspond to the two programmers engaged in the assessment process. The variations are observed due to the differentiation of the components employed by each programmer.


2.4.4 PROCESS IPROVEMENT ASSESSMENT RESULTS

In total, each of the three participants in the process improvement evaluation was provided with fourteen (14) questionnaires, one for each of the software patterns that had to be implemented. Two of the participants employed the complete UIF for the demonstrator application development while the third was required to implement the CityGuide application form scratch using only the interoperability software library (channels). The results for all the various software patterns have been consolidated in one questionnaire for each of the participants, through the following simple rules:

o If a software pattern is not fully implemented within the designated timeframe, the participants required to make a subjective estimate of all the affected metrics: time to complete, SLOCs, and defect metrics.

o All numeric metrics have been averaged;

o All non-numerically scored metrics have been averaged, assuming 1 to be the highest logical score, and 5 to be the lowest.

Process improvement metrics

1 3,512

0 1

15

1 14

15

103

29

4

65

5 5

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8

1:Time, 2: KLOCs, 3: Bugs out, 4: Bugs in, 5: Reuse, 6: %Bug fixing, 7: Entropy, 8: Control

Met

ric n

orm

alis

ed v

alue

Figure 2.4.4: Process improvement assessment results. The higher the value of the metric, the lower the effectiveness and efficiency of the process. In each pair of bars, the left bar corresponds to the use of the UIF, while the right bar concerns the development from scratch.


2.5 CONSOLIDATION

The final evaluation results of the performed comprehensive study have been displayed in the previous section,. Similar questionnaires have been completed and analysed for each of the three assessment phases, namely the initial, interim and final phases. The results of the final assessment phase indicate the following:

2.5.1 Ease-of-use

The UIF has been considered particular easy to deploy, while most participants scored the usability-related choices of the APIs at the best level. When it comes to specifically scoring the ease of understanding the operational effect of functions calls, most of the participants have chosen to characterise it is as good, meaning there is some room for improvement in making API calls more self documented as to what they actually cause; but overall the result is largely acceptable.

2.5.2 Appropriateness

The results on appropriateness were also very good, but less enthusiastic in comparison to ease-of-use. Typically, when judging the way a development approach solves the problem, evaluators were far more sceptic, being more reluctant to characterise a software design choice as optimal. Overall, the scores that were given fall somewhere between best and good, which we could characterise as very good. In one case, a participant expressed the opinion that without really experiencing the use of the UIF as such, it was hard to judge if the UIF could handle some application scenarios in an optimal way; in fact, it was suggested that this should be retrospectively considered in accordance with the Process Improvement assessment. This participant was given the opportunity to re-consider the scores after conducting discussions with the programmers engaged in Process Improvement assessment. Looking at the overall outcome of this evaluation process, it is clear that the overall architectural decisions can probably be enhanced, however, the current choices can be characterised safely as very good, after a quite intensive investigation of the various software architecture features and their appropriateness in matching the particular problem domain.

2.5.3 Robustness

The robustness results provide objectives measures showing that clearly, software defects are reduced in a better than linear way, while at the end of the development phase, the observed defects are less than ten. For a software system of this size, i.e., 65 KLOCs (thousands of lines of code), the average distribution of defects is really very small (i.e., ~1 defect per 6.5 KLOCs ), objectively falling within the category of very robust software systems.

2.5.4 Process improvement

In this assessment process, the results were somehow more predictable. It was already known that the complexity of the domain, as well as the demanding features


of the futuristic application scenarios, which were decided very early in the 2WEAR Project, would introduce a lot of difficulties to any programmer without the help of a comprehensive large library like the UIF. The assessment process simply provided evidence for these expectations. It was purposefully decided to assign the difficult role of implementation from scratch of the CityGude application to a more experienced programmer than the one actually developing the CityGuide via the UIF. The result, with objective measurements appearing in Figure 2.4.4, has demonstrated clearly that within a typical application development timeframe of two months, it is far unrealistic to pursue the development of a single application from scratch by a single programmer. The CityGuide application, without the use of the UIF, was actually never completed. After the first month of the test implementation, the responsible programmer started discussions with the lead software designer of the UIF, stating that unless some re-usable primitives are structured, the implementation at the protocol level tends to become really chaotic and uncontrollable after a certain point. The Process Improvement process proved something which is common wisdom in software development: unless appropriate tools are provided, and common behavioural patterns are delivered as re-usable software entities, similar solutions for similar problems will have to redeveloped many times. In this context, the UIF provably improved the process of demonstrator application development and of distributed dynamically composed interactive applications in general.


3 INTEROPERABILITY FRAMEWORK

The interoperability framework concerns the development approach taken by the 2WEAR consortium to resolve dynamic run-time service utilisation, while enabling clients and services to run on different machines, hosting different operating systems. For the evaluation purposes, it is logically split in three parts:

• Specification approach. In 2WEAR this is done in an EBNF grammar.

• Operational semantics. The run-time execution semantics, explaining how each grammar expression maps to protocol-level actions and events.

• Implementation library. The transformation to a software library compliant to the operational semantics of the interoperability framework.

The specification approach and the operational semantics were agreed early, during a consortium meeting at London in fall 2002. The results of this meeting were consolidated and reported in Deliverable 9, Report on System Interoperability and Device Interactions. Then, each partner took over independently the effort of building an in-house version of the interoperability library, followed by a carefully designed test-plan to verify that all independent implementations, carried out in different programming languages and platforms, were100% compatible to each other.

Serviceprotocol

specification

Server-sideImplementation

InteroperabilityLibrary

(server version)

Client-sideImplementation


(client version)

comply comply

Server Client

Serviceprotocol

specification

Server-sideImplementation


(server version)

Client-sideImplementation


(client version)

comply comply

Server Client

Figure 3.1: Independent implementation of servers and clients, relying upon the protocol definition agreement and the interoperability framework.

Based on this approach, the independent implementation of services was straightforward, and relied on the agreement of the various protocol definitions (Figure 3.1).


3.1 APPROACH The evaluation approach relies on subjective usability evaluation with a small group of participants, but with a large evaluation period engaging real practice and experience with the interoperability framework. All the 2WEAR consortium partners participated in this study. Informal questionnaires were set-up to identify the usability, as well as the effectiveness, of the interoperability framework. Every consortium partner was requested to reply to the questionnaire only after all the developments had been completed, so as to ensure that the consolidated wisdom could be captured in their responses.

A. Usability questionnaire. Specification approach. Ease of use. Reading protocols Very easy Easy Normal Difficult Very difficult

Writing protocols Very easy Easy Normal Difficult Very difficult

Revising protocols Very easy Easy Normal Difficult Very difficult

You may fill this questionnaire only if you have been engaged in making a client for a service (i.e. you had to read and understand the protocol), or had to write a protocol (either as a service provider, or as a documenter), or had to revise own or other partner’s protocols.

B. Usability questionnaire. Operational semantics. Ease of understanding. Atomic tokens Very easy Easy Normal Difficult Very difficult

Dialogues (channels) Very easy Easy Normal Difficult Very difficult

Token encoding Very easy Easy Normal Difficult Very difficult

Channel establishment Very easy Easy Normal Difficult Very difficult

How to make messages Very easy Easy Normal Difficult Very difficult

How to control token flow Very easy Easy Normal Difficult Very difficult

You may fill this questionnaire only if you had been engaged in developing either clients or servers, or both, meaning you had to necessarily understand in detail all operational semantics. The “how to make messages” concerns the way you had to decompose communication in tokens to send logical messages.

A. Effectiveness questionnaire. Implementation approach. Compactness Very good Good Normal Bad Very bad

Robustness Very good Good Normal Bad Very bad

Flexibility Very good Good Normal Bad Very bad

Interoperation Very good Good Normal Bad Very bad

Modifiability Very good Good Normal Bad Very bad

Extensibility Very good Good Normal Bad Very bad

Performance Very good Good Normal Bad Very bad

Since the library was implemented by each partner independently, this questionnaire does not ask partners to assess how good they implemented the library, but how effective was the model itself to allow an implementation with the above qualities. In other words, you score “how good is the interoperability model to allow you provide a high-quality implementation”.


Compactness. Does it allow for tiny-scale libraries (i.e. less than 5 KLOCs)?

Robustness. Does it allow for low-defect implementations (i.e. the more formal, the more robust)?

Flexibility. Does it allow for various communication policies to be implemented?

Interoperation. Does it enable run-time interoperation among diverse components (language-wise, OS-wise)?

Modifiability. Is protocol modification uptake a far than a painful job?

Extensibility. Is adding new protocol features a work orthogonal to the library?

Performance. Does it introduce a small number of layers for communication-related processing?

3.2 PROCESS The assessment of the interoperability framework has been carried out in two phases: (a) an initial continuous informal process, following the formulation of the approach (in fall 2002), for testing and verifying the adopted approach, supported by a concrete consortium-agreed test-plan and a few bilateral workshops; and (b) as a final formal assessment phase, through questionnaires, after all the developments had been completed. The results of the final assessment process are reported in this Section.

3.3 SUBJECTS All partners of the 2WEAR consortium, engaged in the definition and / or implementation of service protocols, participated in the evaluation process of the interoperability framework.

3.4 RESULTS

In total, six (6) questionnaires were filled-in by the consortium partners. Each questionnaire was necessarily given the same gravity, since the overall interoperability framework was unanimously discussed and agreed by all consortium partners.

Usability A Very easy ** *** **

Easy ** * **

Normal * * *

Difficult

Very difficult * * *


Usability B Very easy ** ** ** * *

Easy ** ** * *** **

Normal ** ** ** ** ** ***

Difficult * ***

Very difficult *

Effectiveness A Very good * * *** *** * * *

Good * *** *** *** ** ** **

Normal **** * ** ** **

Bad * * * *

Very bad

3.5 CONSOLIDATION Overall, the interoperability framework proved to achieve its objectives, since all independent services and applications managed to interoperate as needed by the futuristic application scenario. However, the question of the assessment process was mainly: have we done the job in a usable and effective way throughout the course of the project? Following the results of the assessment process, the answer to this question is clearly positive. However, the results are definitely not enthusiastic, indicating on the one hand that none of the voters was biased by being contributor to the interoperability framework, while on the other hand showing that there is plenty of room for improvement.

The less positive results were gained when judging the ease-of-use of the framework, meaning that its formal approach on communication semantics seemed a little difficult to some programmers; this is normal in formal protocol specification, where there is a natural trade-off between accuracy of representation and ease of use. In contrast, the most positive results were observed in effectiveness evaluation, meaning that the adopted framework, though being somehow difficult to deploy, was quite powerful for the project’s specific purposes.


4 DEMONSTRATOR APPLICATIONS

4.1 APPROACH The approach employed for the valuation of the demonstrator applications has mainly emphasized the extraction of information concerning the overall interface quality. Subjective evaluation aims to provide tools and techniques for collecting, analysing and measuring the subjective opinion of users when using an interactive software system. Subjective measures are quantified and standardised indicators of psychological processes and states as perceived by the individual user.

Typically, subjective evaluation, if it is to deliver maximum benefits, requires a fairly stable prototype of the interactive system. It is a type of usability evaluation which tries to capture the satisfaction of end users when using a particular system. It therefore does not rely upon expert’s opinion or the opinion of any other intermediary evaluation actor. Subjective evaluation has recently received substantial attention amongst researchers. resulting in the development of several new techniques which are progressively being taken up by industrial organisations. Some of the benefits of subjective evaluation versus other engineering techniques for usability evaluation include the following:

• Subjective evaluation measures the end-user opinion that is frequently dismissed by other engineering approaches to usability evaluation.

• Subjective evaluation techniques are very reliable, valid as well as efficient and effective in comparison with alternatives available.

• When conducting subjective usability evaluation, the analyst should carefully select the sample users to be representative of the target user group.

4.1.1 TECHNIQUES AND TOOLS FOR SUBJECTIVE EVALUATION

There are several techniques available for subjective evaluation. These include interviews which may be structured or unstructured, the use of diary studies as well as talk about methods. Various questionnaire techniques have been successfully introduced and widely used in subjective evaluations.

Questionnaires may be factual, attitude-based or survey type. Some of the most popular questionnaires which have been developed include the QUIS Questionnaire developed by Kent Norman at the University of Meryland and subsequently refined by Ben Shneiderman (Shneiderman, 1992, pages 485-493), the IBM Computer Usability Satisfaction Questionnaires (Lewis, 1995) and the SUMI questionnaire (Kirakowski, 1994). A detailed examination of these questionnaires and the accompanying literature is of course beyond the scope of this deliverable. However, we will briefly elaborate on some of the details of the IBM Computer Usability Satisfaction Questionnaires, which constitute the technique selected for use in the evaluation of the demonstrator applications.


4.1.2 IBM COMPUTER USABILITY SATISFACTION QUESTIONNAIRES

These questionnaires constitute an instrument for measuring the users subjective opinion in a scenario-based situation. Two types of questionnaires are typically used; the first, namely ASQ, is filled in by each participant at the end of each scenario (so it may be used several times during an evaluation session), while the other one, namely CSUQ is filled in at the end of the evaluation (one questionnaire per participant). Both these questionnaires are attached in the Appendix.

The primary criteria which were used to select the IBM Computer Usability Satisfaction Questionnaires as opposed to another technique include the following. First of all these questionnaires are publicly available for any one to use whereas the alternatives require the acquisition of a license from their vendors. Secondly, and perhaps most importantly, the IBM Computer Usability Satisfaction Questionnaires have shown to be extremely reliable (0.94).

With respect to SUMI, it should be mentioned that its equally reliable but this reliability is derived from an increased number of questionnaire items; in this initial study, we could engage only three participants. Thirdly, the IBM Computer Usability Satisfaction Questionnaires do not require any special software while they are not computationally demanding. On the other hand both QUIS and SUMI are supplied with accompanying software. Finally, another important determinant was the time required to analyse the results, which is substantially less in the case of the IBM Computer Usability Satisfaction Questionnaires. In addition, the IBM Computer Usability Satisfaction Questionnaires have been in use for several years now at various industrial sites and research centres.

4.1.3 WHAT IS BEING MEASURED

The result of the subjective evaluation with the IBM Computer Usability Satisfaction Questionnaires is a set of psychometric metrics which can be summarised as follows:

• ASQ score, for a participants’ satisfaction with the system for a given scenario.

• OVERALL metric providing an indication of the overall satisfaction score.

• SYSUSE metric providing an indication of the system’s usefulness.

• INFOQUAL metric providing the score for information quality.

• INTERQUAL metric providing the score for interface quality.

For those metrics it should be noted that low scores are better than high scores due to the anchors used in the 7-point scales. The specific questionnaires, which have been employed in the usability evaluation process, are presented below.


4.2 PROCESS

4.2.1 SCENARIO SCRIPTING

It is important to mention, that Subjective Usability Evaluation using the IBM questionnaires requires a scenario-based procedure. To this effect, a comprehensive scenario has been developed to facilitate the evaluation process. After performing the scenario, the end-users were requested to fill in the ASQ questionnaire and the CSUQ questionnaire (as previously presented).

4.2.2 SCENARIO STRUCTURE

The purpose of the specifically designed scenario was to engage the end-users in an process of exhaustive utilisation of the various features of the 2WEAR-enabled applications. The scenario itself was logically split in three key parts:

• Application management. Users would have to perform all types of application management, ranging from initiation, to focus change, and termination.

• Dynamic interface re-configuration. Users would have to be engaged in numerous scenarios of dynamic re-configuration, either manual or automatic.

• Application-specific features. Users would try to exploit all application-specific features.

A. Application management scenario • Initiate applications

o Initiate City Guide application (once).

o Initiate Alarm application (once).

o Initiate Break Out application (twice).

• Focus switching among applications

o Focus to City Guide

o Focus to Alarm

o Focus to Break Out (once for each)

• Terminating and restarting applications from the Application Manager

o Terminate Break Out (one of the two instances)

o Terminate City Guide

o Initiate a new instance of City Guide

• Quitting applications from their own interface

o Quitting Break Out

o Initiate a new instance of Break Out

o Quitting City Guide

o Initiate a new instance of City Guide


B. Dynamic interface re-configuration scenario • Automatic re-configuration

o Review the configuration file for each application and edit

Selector style preferences

Text entry style preferences

Dynamic re-configuration behaviour

• If interfaces tries to optimise interface when new UI I/O devices are detected

• If interface tries to persist on the previous style of selector and text-entry objects

o Approach UI I/O devices (get in range)

o Move away from UI I/O devices (get out of range)

• Manual re-configuration

o For all applications, use the “Configuration” option of the top application menu to configure on-the-fly

Selector style preferences

Text entry style preferences

Dynamic re-configuration behaviour

Force on-demand interface re-configuration

Save configuration

Load configuration

o For the Break Out application, use the “game configuration” option to configure

Input control

Output control

Return to game and test update configuration

o For the City Guide application, use the “display configuration” option to configure the use of the graphic display on-the-fly

Public or notepad display

Force on-demand graphic display re-configuration

C. Application-specific features scenario • City Guide

o Create a route with recording period 10 seconds, entitled “A walk in the building”

o Start recording this route and activate its display on the graphic display

o After one minute

Stop recording this route

Save it

Delete it

o Create a route with a recording period 20 seconds, entitled “Visit all offices”

o Start recording this route and activate its display on the graphic display

o Load the ““A walk in the building” route

Set it to visible


o Show the current position, in green colour, with a rectangular representation

o Create two landmarks, “John’s office” and “George’s office”, and activate their display

o Create landmarks for pictures with recorded location information

o Save landmarks

o Delete each of the landmarks

o Load the landmarks

• Alarm application

o Create an alarm with an exact time two (2) minutes after the time of your own watch

Give it the id “Meeting”

o Create an alarm with an elapsed time of five (5) minutes)

Give it the id “Coffee ready”

o Update the “Meeting” alarm to take place two (2) minutes later

o Save alarms

o Load alarms

o Create an alarm “Mary will visit me”

o Cancel the alarm “Mary will visit me”

• Break Out application

o Initiate the game

o Try to reach score five (5) with no platforms

o Pause the game and move back and forth from the top menu

o Resume the game

o Play a new game

o Try to reach score (3) with platforms

o Play with two players via turn taking until game is over

Play three rounds

4.3 SUBJECTS

Three participants were involved in this study: (a) female, graduate of a computational linguistics department, native Italian speaker; (b) male, graduate of social studies department, native German speaker; and (c) female, graduate of a Mathematics department, native Greek speaker. Before the study, the participants were given an on-line demonstration of the demonstrator applications, in which they were allowed to comment in a “thinking aloud” process, being given all the necessary explanations and clarifications. Then, they were introduced to the usage scenario, spending some time to give the appropriate guidance as to what exactly should be done.

Following, all evaluation sessions, in which a single participant had to use separately all the applications, were serially, since only one 2WEAR system infrastructure was available. It should be noted that the study required to set-up an integrated system demonstration requiring: one laptop PC, two iPAQs, one desk-top PC, the MASC module, the MASC wristwatch, and the MASC GPS, all connected through BT. In this


context, it was decided to perform one interactive session per day and study and analyse the collected results of each day to save time.

4.4 RESULTS In the results, some participants have commented their scores. Their original comments are reported below. The ASQ questionnaire had to be filled-in for each of the three scenarios by each participant, while the CSUQ questionnaire only once for each participant.

4.4.1 ASQ questionnaires

Participant 1, ASQ

Scenario A

3 Quite simple to start-up applications. No “Start menu” nightmare.

4

3

Scenario B

4 The net seemed to be somehow slow. Is this normal?

3

3

Scenario C

3

4 The game was nice, but indeed too slow.

3

Participant 2, ASQ

Scenario A

3 I was expecting things to be rather difficult!

5 When having multiple app instances, I need to distinguish those.

4

Scenario B

3

4 I though it was stalled at some point, but then it responded.

3 I was satisfied by the feel of controlling all “fancy device change” stuff!!

Scenario C

3 That’s a nice surprise indeed!

4 I know it’s a “hell of a net” here, but can you make it more fast?

3

Participant 3, ASQ


Scenario A

3 It was quite fast to start applications. Maybe because they are small…

4

4 Not very clear when the application is ready.

Scenario B

4 I always feel things could be done better.

3 Funny, but auto changing devices was more fast than I expected!

3 Interesting in such a small interface space, things are still understandable!

Scenario C

3

3 I would love to have this game with more speed.

3 Not very clear how the score is measured…:-)

4.4.2 CSUQ questionnaires

Participant 1, CSUQ

1. 4

2. 4 I could give a better score, but it’s a new paradigm that needs more attention.

3. 3 For the type of applications, this seemed very reasonable.

4. 4

5. 3

6. 4 It gets more comfortable after a while.

7. 4 Surprisingly it was.

8. 4 Especially with the game….

9. 5

10. 4

11. 3

12. 4

13. 3

14. 3

15. 3

16. 3 It is a different kind of experience. Not having a computer at all!

17. 3 By the way, where is everything running?

18. 3

19. 3 Pretty much it has so many things, in such a small “package”

Participant 2, CSUQ

1. 3 I love this small funny system!

2. 4 I love it, but can’t be that easy!


3. 4 I was perplexed a little with manual configuration. I prefer auto jobs.

4. 3

5. 3

6. 3 This is really a subjective opinion.

7. 5 May be I am too old, or maybe this is a very new paradigm.

8. 4 Was not much about creating something

9. 3 I thing the messages are really good

10. 5 I think it takes some time, and I do not know why. Is it “net” stuff?

11. 3

12. 3

13. 3 I said that again.

14. 3

15. 3

16. 3

17. 2 I am really very positive with this!

18. 3

19. 3 Just a notice: I would like more graphics devices.

Participant 3, CSUQ

1. 4

2. 5

3. 4

4. 2 What was asked was done quickly.

5. 4

6. 4

7. 5 I could use more time to learn it.

8. 3

9. 4 I think the system solves “its own” problem without needing me.

10. 4 Good messages and confirmations.

11. 2 Very careful design of messages, sure not done by programmers:-)

12. 3

13. 3

14. 3

15. 3

16. 3

17. 3 But I will go back to my PC with happiness.

18. 3 Scenario-wise, it does

19. 3 More than I expected from a UI spread here and there.


4.5 CONSOLIDATION

ASQ results Participant 1 Participant 2 Participant 3

Scenario A 3.3 4 3.7 Scenario B 3.3 3.3 3.3 Scenario C 3.3 3.7 3

CSUQ results Participant 1 Participant 2 Participant 3

OVERALL (1-19) 3.52 3.31 3.4 SYSUSE (1-8) 3.75 3.62 3.8

INFOQUAL (9-15) 3.57 3.28 3.14 INTERQUAL (16-18) 3 2.66 3

From the above results, the overall conclusion is that the subjective opinion of users regarding the demonstrator applications is quite good, while the interface passes successfully the acceptability test by taking scores being less than four (4), which is considered to be the basic pass threshold. The best scores are observed in the interface quality metric (INTERQUAL), which is considered to be very good for such a new experimental interaction paradigm. A very good score is also given to the quality of information and representation (i.e., INFOQUAL), which again, assuming the very limited “breadth” of the 2WEAR distributed devices for delivering presentations, is considered to be a very good score.

One important final note, however, is that, apart from this initial evaluation conducted in the 2WEAR project, in order to generally validate this positive evidence, by performing an in depth largely accurate evaluation, a significant number of subjects should be engaged in the process.


5 CONCLUSIONS

The evaluation process has been a very systematic and resource-demanding process, which addressed multiple perspectives of the 2WEAR project. Apart from typical usability evaluation, which has been conducted for the demonstrator applications, software engineering evaluation of the development instruments through widely acknowledged techniques such as Quality Attribute Workshop and software walkthroughs have been heavily applied. Additionally, the inter-operability framework has been given an overall retrospective evaluation by consortium partners after the developments have been completed.

The results of the final evaluation were somehow expected, since in 2WEAR, the assessment process was not applied as an afterthought, but was concurrent to all development actions and decisions. This was a key ingredient for success so as to ensure that the resulting delivered artefacts, mainly the integrated system prototype which incarnates the futuristic application scenario, could be delivered both on time and with all the initially planned features.

Most importantly, the evaluation process gives concrete indications for potential improvements, which will technically guide our future research efforts in this direction. The 2WEAR project provides open the path towards more developments and further experimentation, where more applications need to be developed, while the role of assessment and evaluation is clearly put in the foreground.


6 BIBLIOGRAPHY

Bevan, N., Macleod, N. (1994): Usability measurement in Context, Behaviour and Information Technology, 13(1&2), 132-145.

NASA Software Engineering Laboratory (SEL), 1995. SOFTWARE MEASUREMENTGUIDEBOOK, Goddard Space Flight Center, Revision 1, 1995.

ISO (1992): International Standards Organisation, ISO 9241 Part 11.

Kent Beck, Ward Cunningham (1989). A Laboratory For Teaching Object-Oriented Thinking, http://c2.com/doc/oopsla89/paper.html.

Kirakowski, J. (1994): Subjective Usability Measurement Inventory, MUSIC Project.

Lewis, R. J. (1995): IBM Computer Usability Satisfaction Questionnaires: Psychometric Evaluation and Instructions for Use, International Journal of Human-Computer Interaction, 7(1) 57-78.

Savidis, A. Petrie, H., McNally, P., Ahonen, M., Koskinnen, M., Stamatis, C. (1997). Internal Report on the usability evaluation of the Platform Integration Module toolkits. The ACCESS Consortium 1997 ©.

SEI/CMU (2003). Quality Attribute Workshops, http://www.sei.cmu.edu/ata/qaw.html.

Shneiderman, B (1992): Designing the User Interface, Strategies for effective Human Computer Interaction, Second Edition, Addison-Wesley.


http://c2.com/doc/oopsla89/paper.html

http://www.sei.cmu.edu/ata/qaw.html

interacting with the disappearing computer: evaluation of ... · interacting with the disappearing...

Documents