virtual assistants and their performance in professional ...1470600/fulltext01.pdf · today,...

IN DEGREE PROJECT TECHNOLOGY,FIRST CYCLE, 15 CREDITS

, STOCKHOLM SWEDEN 2020

Virtual Assistants and Their Performance In Professional Environments

ERIK PERSSON

JOHAN TORSSELL

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

Svensk Sammanfattning

Fran mitten av 1900-talet har virtuella assistenter utvecklats och forfinats dar teknologin gatt franen mangd regler till assistenter drivna av artificiell intelligens. Idag kan virtuella assistenter tillforavarde till organisationer och bidra till ett hallbart samhalle bland annat genom att utfora enkla ochaterkommande uppgifter samt minska ojamlikheter orsakad av partiska radgivare i kansliga fragor.Trots framgangen har nuvarande forskning inte fokuserat pa evalueringen av virtuella assistenter iindustriella sammanhang.

Syftet med denna rapport ar att utvardera virtuella assistenter fran ett tekniskt, ekonomiskt ochorganisationellt perspektiv for att forsta dess prestation i industriella miljoer. Arbetet har genomfortsi samarbete med IBM och en av deras kunder som foredrar att forbli anonyma. I detta foretag ar tvaIBM Watson Assistant under utveckling; en for deras IT Service Desk och en for deras avdelning forEthics & Compliance. I studien har bade kvantitativa och kvalitativa metoder anvants, dariblandanvandartestning och frageformular, for att inkludera alla aspekter av de virtuella assistenternasprestation. I denna process har diskussioner forts med experter inom IBM samt medarbetare paforetaget for vilket den praktiska implementationen studerats for att fa en forstaelse for bade generelloch specifik kunskap ur olika perspektiv.

I denna rapport kan foljande slutsatser dras. Ett, den tekniska prestationen kan bestammas medkvantitativa matetal sa som tackning (coverage), sakerhet (confidence), precision och hjalpsamhet(helpfulness), och kompletteras med kvalitativa matetal som anvandarnojdhet och upplevd forstaelsefor anvandaren. Tva, specifik teknisk prestation ar relativ och de tekniska begransningarna samtmognad bor anvandas som komplement till utvarderingen av assistenterna. Tre, identifieradeorganisationsfordelar inkluderar:

• reducerad time-to-resolution,• reducerad hanteringstid,• support oppen dygnet runt,• skalbarhet, och• anvandarforstaelse

Slutsatserna i de specifika fallen visar att en virtuell assistent som implementeras inom ett smalareomrade, som en assistent for Ethics & compliance, enklare kan implementeras samt presterar relativtbra aven i en mindre utvecklad miljo. Bredare omraden, som en assistent for IT-support, kravermer arbete for att prestera pa en hog niva men kan vara annu mer vardefull an assistenten i detsmala omradet nar den blivit tillrackligt utvecklad.

DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 1

Abstract—Contributors from the mid 20th century up to

now have developed and refined virtual assistants, taking the

technology from a set of rules to assistants driven by Artifi-

cial Intelligence. Today, virtual assistants can provide value in

organisation and support a sustainable society by conducting

basic and repetitive tasks, and help reduce inequalities caused

by biased advisors on sensitive topics. Despite its prosperity,

current research somewhat lack focus on the evaluation of virtual

assistants in industrial applications.

The purpose of this paper is to evaluate virtual assistants

from a technical, economical and organisational perspective, in

order to understand their performance and value in an industrial

environment. This has been done in collaboration with IBM

and a client company which prefers to remain anonymous in

this report. In this company, two IBM Watson Assistants are

under development; one for the IT Service Desk, and one for

the Ethics & Compliance department. To cover all aspects of

the virtual assistants’ performance, quantitative and qualitative

methods were used by conducting user testings and surveys. In

this process, discussions have been conducted with IBM experts

and employees of the firm for which the practical implementation

has been studied, to gain a general and specific understanding

from different perspectives.

From this paper, the following can be concluded. First, techno-

logical performance can be described using quantitative metrics

such as coverage, confidence, precision and helpfulness, and

should be complemented using qualitative measures such as user

satisfaction and perceived user understanding. Second, specific

technological performance is relative and the technical limitations

as well as it’s maturity should be used as a complement to

the evaluation of the assistants. Third, identified organisational

benefits include:

• reduced time-to-resolution,

• reduced handling time,

• all-hour-support,

• scalability and

• user understanding

Conclusions specific for the use cases show that an assistant

implemented in a narrower use case, that is the Ethics &

Compliance assistant, easier can be implemented and performs

relatively well also in less developed environments. A broader use

case, such as the IT assistant, requires more effort to perform at

a high level but may be even more beneficial than in the narrow

use case once sufficiently refined.

Index Terms—Virtual Assistants, Watson Assistant, Virtual As-

sistant Evaluation, Potential Cost Savings, Organisational Value

I. INTRODUCTION

IN 1950, Alan Turing asked the question, “Can Machinesthink?”, and proposed an experiment for this. He described

a game consisting of three entities - A, B and C where C is aninterrogator with the mission to identify the gender of A andB. A’s objective is to deceive the interrogator while B’s is tohelp the interrogator. The interrogator would write questions,and A or B would answer. Turing then proposed the question,“What will happen when a machine takes the part of A inthis game?”. This idea formed The Turing Test - a test of amachine’s ability to display intelligence indistinguishable fromthat of a human. [1]

Around the mid 1960s, MIT professor Joseph Weizenbaumcreated ELIZA, an early Natural Language Processing (NLP)computer program able to attempt The Turing Test. ELIZA

used keyword recognition and context identification to simu-late an understanding of the users. [2] However, consideringthe primitive version of NLP and Natural Language Under-standing (NLU), the machine lacked the ability to upholda conversation and was limited to its narrow skills, thusconcluding The Turing Test unsuccessful. [3] Still today, NLPis key for all modern virtual assistants. Virtual assistants havea broad range of definitions whereas Cambridge Dictionarydefines it as “a computer program or device that is connectedto the internet and can understand questions and instructions,designed to help you to make plans, find answers to questions,etc.”. [4] Due to its wide definition, assistants are commonlycategorised as those purely driven by rules and those drivenby machine learning and artificial intelligence. The latter arealso divided into rule-driven dialog flows or machine learntstories. [5] [6]

From a socioeconomic perspective, virtual assistants canhelp with unbiased information and advice on sensitivesubjects. For instance, it can advice on harassment in theworkplace and therefore indirectly reduce harassment andinequalities. Moreover, it can respond objectively on questionsregarding career change that might otherwise be difficult to aska manager. It is notable however that assistants implementedin industry, are not designed to pass the Turing Test, butrather to answer specific questions or help users throughprocesses. This differs from general virtual assistants, suchas Google Assistant, Siri or Alexa, where part of the goal isto be as human like as possible. Furthermore, considering theassistants’ ability to perform simple tasks, the well-being canimprove as the tasks left for the employee are more complexand thus stimulating. Undoubtedly, the benefits are importantworldwide and the value they can provide are aligned withthe UN’s Sustainable Development Goals, particularly (3), (8)and (10). These concern Good health and Well-being, Decentwork and Economic growth and Reduced inequalities[7]

Despite the virtual assistants’ advantages and benefits forsociety and firms, questions on technical aspects such asimplementation complexity and technological maturity arise.Also, questions on the firms potential cost savings and intan-gible benefits remain. Consequently, this paper will addressthree key problem formulations:

1) How can virtual assistants’ performance be evaluated?2) What value can virtual assistants provide to an organi-

sation?3) What are the potential cost savings of using virtual

assistants?

In order to study the research questions, this paper is dividedinto six parts. The first part deals with the technical aspect ofhow virtual assistants work, as well as the current researchon organisational value and cost savings when using virtualassistants. The next chapter covers this paper’s relation tocurrent research and focuses on how it distinguishes itself fromprevious research. Next part is about the methodology usedfor this study, which includes a summarised view on how thispaper’s research questions have been answered. The fourthsection presents the findings of the research, focusing on thetwo key themes that have been taken into consideration, i.e.,


the technical aspects of virtual assistants and what socioe-conomic value they can provide to an organisation. Chapterfive provides a deeper discussion of the results and the paperis summarized the conclusions from the study to answer theresearch questions.

Thanks to the collaboration with IBM and the companyfor which the implementation was studied, the problem for-mulations are investigated using the IBM Watson Assistant.In this case, two assistants are currently being developed forthe company in question; one for Ethics & Compliance andone for the IT Service Desk. The work has been ongoingfor eighteen weeks with a small team of IBM consultants.Both assistants are results of a pilot project, and are not yetconsidered production ready. These two assistants are used forthis paper, both to gain insights in implementation processes,as well as to evaluate them in an industrial environment.

II. BACKGROUND - TECHNICAL PERSPECTIVE

This section focuses on the two core challenges discoveredwhen creating virtual assistants: How to understand userintents and how to perform the requested tasks. Due to thecomplex nature of these key functions, the former is typicallysolved using advanced and combined methods for NaturalLanguage Understanding. [6] [8]

To further break down the concepts of virtual assistants, arange of terms are frequently used. These terms are describedbelow, using the example sentence “What are the openinghours for the store in Silicon Valley?” to clarify them:

• Utterance. A user’s request or statement. In the example,the utterance is “What are the opening hours for the storein Silicon Valley?”.

• Intent. A specific goal or idea conveyed from the utter-ance. The intents would be to find out the opening hoursof the store, in the example sentence.

• Entity. A term or object, which provides context for anintent. [9] Here, Silicon Valley is recognised as a location.

• Domain of knowledge or Skill. A domain covers arange of intents, which sets the limit of the assistant’sability. [10] [6] The Skill in the example could be“customer support”, including topics such as openinghours, locations, return of goods et cetera.

Saloni Potdar, Senior Software Engineer in Cognitive An-alytics and Deep Learning, identified some key factors toconsider on virtual assistants for this paper. Apart from in-tent classification and entity-recognition, she acknowledgedofftopic identification to realise when not to answer, as well asprofanity filtering to avoid obscene language in conversations.Furthermore, Potdar stated that algorithms for spellcheckshould be considered essential to make the intents and entitieseasier identified.

A. Dialog Flow ConfigurationOne way to create dialog between the user and the virtual

assistant is by using predetermined rules. The rule-baseddialog flows are configured by first manually adding intentsand entities, and then building a conversation path for the

users. Example sentences for the intents are constructed, whilstsynonyms for the entities are added. After this process, theassistant is trained to recognise and classify intents from theuser inputted utterance. The entities are either based on a listof words, rule-based or machine learnt. In the first case, themodel is not trained but rather is a list purely maintained bythe developer, and in the latter the entities are apprehendedbased on the context in which they are used. [9] [6]

Another less common way to configure dialogues is touse machine learning. In this way, positive and negativesample conversations are used to determine the conversationwith the user. At each conversational turn, the model usesprobabilistic estimations to determine the next direction ofdevelopment. Consequently, if a large amount of data andlabeled user conversations are present, the machine learningapproach might be more practical. In addition, the machinewill become more practical due to its ability to self-improve.It can do so since it does not require an explicit dialog flow,unlike the rule-based configuration. [6]

B. ComparisonThe rule-based approach works well with incremental de-

velopment because of the straightforward process to createpredictable functionality and test the feature. Predeterminedconversation flows can be practical when there is little data onuser interactions, and the desire to create the system quicklyis present. However, rule sets tend to become large and im-practical with more complex systems since the machine needsto behave naturally towards the user. In contrast, an initialisedsystem with a large amount of available data works well withthe machine-learnt conversation flow. Due to its independence,the model can be trained to create elaborate dialog. In spiteof its capabilities, this approach reduce the possibility ofcontrolling specific conversation flows. Additionally, a largeset of example stories would be necessary if one wants toensure that all entities for an intent are present, somethingthat could be faster configured with a rule policy. [6]

Admittedly, both approaches have advantages and disadvan-tages. Therefore, a mixture of predetermined rules combinedwith machine learnt stories can be used. However, the mostcommon configuration today, also used by IBM Watson As-sistant, is rule-based. [6]

C. LimitationsDespite the prosperity and developments in the area, virtual

assistants still have limitations. For instance, a Domain ofknowledge or question category - such as email issues -generally supports 10-50 intents, and if the number exceeds600 the domain would typically become impracticable. [6]Additionally, the different intents are not equally frequentlyrequired. However trivial it might sound, the assistants are alsolimited by the amount of topics they can cover. A key factorin this is to not only consider how accurate and precise theresponse is, but also how well the reply can help and satisfythe user.

In addition to limitations specific to virtual assistants,fundamental challenges are undoubtedly present in NLP and


NLU. In particular, using multiple or conditional utteranceshas proved difficult for the classifiers to identify. For example,using the phrase “I want to buy an umbrella, unless you sellrain jackets” a human would identify that the intent is to buya rain jacket. The virtual assistant would, however, typicallybe confused by the sentence. Furthermore, intents can only betrained using already existing data and phrasing. As a result,if one were to write a completely different utterance than whathas been used to train the model, the virtual assistant mighthave difficulties understanding the user’s intents.

III. BACKGROUND - ORGANISATIONAL VALUE

This section focuses on the organisational value a virtual as-sistant can provide to an organisation in general as well as thespecific company studied. It includes the current process forthe studied use cases as well as their key challenges amongstother common challenges discussed with the company. Thebenefits of virtual assistants are divided into two parts wherethe tangible section describes the benefits possible to quantifyand the intangible those who are not.

A. Ethics & Compliance’s Internal InvestigationBefore deciding to implement a virtual assistant the Ethics

& Compliance department wanted insight on whether virtualassistants would be beneficial, and made an internal investi-gation by questioning nine employees within the firm. Thequestions were created to receive information on what theusers’ consider important in a virtual assistant, what theircurrent frequencies of contact with the Ethics & Complianceteam were, sentiment towards virtual assistants in general, andhow experienced they were with virtual assistants.

As a result, some key conclusions could be drawn. First,it was found that users value response speed as much asreceiving comprehensive advice. Second, users tend to havepsychological barriers to use a virtual assistant, hence wantingto only use it as advice and not allowing the assistant totake direct action. Third, it was concluded that users findit difficult to search through the business practice policy,something that a virtual assistant could simplify. In terms oforganisational benefits for the Ethics & Compliance team, itwas estimated that between 1/2 and 1 full-time equivalentworkload would be freed by using the virtual assistant toanswer questions of simpler character. Additionally, there areintangible benefits such as getting insights from conversationanalytics which could be used to increase user understanding,direct incremental improvements and to clarify the content.Also, the research found that virtual assistants would bevaluable for users as it would be all-time available and beable to provide answers instantly, as well as enabling usersto ask otherwise sensitive questions such as questions makingthe employee appear incompetent or disloyal.

B. Current Process1) IT Service Desk: The current process for the studied

company’s service desk is initialised by someone reporting anissue using a self-service tool in their service portal. Next,

an IT Service Desk agent manually sets the category, priorityand other parameters based on the issue description. Based onthese parameters, the ticket is routed to the correct team tohandle the inquiry. If the inquiry is routed correctly, the agentassigned starts working on a solution to the issue, otherwise,it is transferred to the correct group. The service desk is open24/7 where issues usually are being responded to within onehour.

2) Ethics & Compliance: The current process for inquiriesregarding ethics and compliance begins with a generalist legalcounsel being contacted over email based on the division theinquiry comes from. The counsel then searches the companypolicy for relevant rules as well as assessing the question orissue to make sure every aspect is covered. An answer is thenformulated with insights into best practice which may not beavailable in the policy. If the generalist is in need of furtherguidance, an internal specialist is contacted as a first escalationand if necessary, an external expert is advised.

C. Key Challenges

A key challenge for the specified use cases is long time-to-resolution. According to Forrester, One of the reasons forlong time-to-resolution is limited service hours, especiallyfor international companies operating in different time zoneswith centralised service desks. Other reasons for long time-to-resolution are queues and multistep routing between agentswhich decrease the user experience as well as incurs costto the service organisation. [11] Furthermore, repetitive basictasks are common, time-consuming and costly. For instance,password resets commonly account for 20 % - 50 % of theIT-support requests. [12]

The key challenges for the company for which the practicalimplementation was studied, differed between the use cases.For the IT Service Desk, a key challenge was inconsistencybetween different cases and countries which may be a resultfrom poor documentation and lack of predefined answers forfrequently asked questions. Due to lack of documentationand the structure of the organisation with multiple differentIT Service Desks, new agents have a long learning process.Furthermore, recurring simple tasks is a key challenge. Theglobal service desk manager mentioned that around 40% ofthe inquiries are regarding outlook issues, distribution lists,shared mailboxes or otherwise email related. Those inquiriescould potentially be automated using a virtual assistant.

For the team working with ethics and compliance, a keychallenge was lack of standardisation and consistency resultingin answers to inquiries highly depended on the advisors’individual interpretation of the policies. Moreover, as thereis no standardisation and as the team is spread out around theglobe with close to no native speakers in English, it requires alot of energy and time to keep the language at a professionallevel and making sure it can not be misinterpreted. Anotherkey challenge is recurring questions regarding basic rules,where even the simplest inquiries require at least 10 minutes oftime according to a legal counsel at the company in question.Furthermore, availability is a key challenge as limited service


hours and different time zones result in inconvenient waitingtimes.

D. Use CasesThere are three general types of interactions for virtual

assistants:• Agent assist• Customer self-service• Employee self-service [11]

Agent assist refers to a solution where service agents areaugmented by blending automation with human labour. Oneapproach to this is to let the virtual assistant handle routinetasks like gathering relevant information, authenticating theusers and then routing them to a relevant human agent toresolve the issue. A different approach is to have a virtualassistant that monitors the conversation and provides the hu-man agent with suggested responses. The suggested responsescan either be sent to the user by the click of a button, modifiedor rejected. The agent’s decision can in this case be used tofurther train the assistant to improve its accuracy. [13]

The second use case refers to customer-facing virtual assis-tants that fully answers basic questions and runs predefineddialog. More complex inquiries can be handled by searchinga knowledge base or handling the conversation to a suitablehuman agent. [11]

The third use case is the one that is in focus in this reportand it refers to internal virtual assistants. The assistants aredesigned similar to the virtual assistants in the customer self-service use case with the exception of the audience. IT ServiceDesk and human resources are common support functionswhere virtual assistants are implemented. [11]

IV. THE PAPER’S RELATION TO CURRENT RESEARCH

Current research shows how virtual assistants function, andkey factors to consider in an assistant. From a technicalperspective, these key factors are good indicators to use uponevaluating the assistants. [3] [6] When considering organisa-tional value and cost savings, literature shows guidelines asto how virtual assistants can be evaluated both using tangibleand intangible metrics. [11]

The purpose of this paper is to combine technical andeconomical evaluations in order to cover virtual assistants fromboth perspectives. Through this combination, the aim is tounderstand virtual assistants using qualitative and quantitativemeasurements to investigate their performance and value.

V. METHOD

The method was divided into 8 steps: (See figure 1)1) Pre-study2) Practical implementation3) Understanding the organisation4) Data collection by user testing5) Performance evaluation6) Calculations of key performance indicators7) Calculation of potential cost savings

8) Analysis of deployment timing

Phase 1. To investigate and answer the research questionsquantitative and qualitative methods were combined. Theresearch was initiated by exploring current research on thetopic from an organisational and technological perspective.The literature study was complemented by discussions withIBM subject matter experts. In the next step, a team of IBMconsultants were followed in their practical implementation ofthe virtual assistants to gain an understanding of the effort andskills required for implementation. During this step, a deeperunderstanding of the virtual assistants and Watson Assistant inpractice was acquired. Following, discussions and interviewswere conducted with employees of the firm to get a thoroughunderstanding of the organisation, their processes as well astheir key challenges. This was structured as multiple meetingsgoing from understanding their purpose of the project tounderstanding their processes and key challenges in detail. Themeetings were conducted with the IT Service Desk managers,an Ethics & Compliance legal counsel as well as the head ofEthics & Compliance.

Phase 2. The second phase of the research method focusedon collecting, analysing and drawing conclusions from data.Before collecting the data, research objectives and questionswere developed in a formulation stage. To reach the objectives,a quantitative sampling method was used by conducting usertesting and sending out questionnaires (see appendix D and F).The user was first asked to answer general questions regardingtheir usage of the services provided by the IT Service Deskand the Ethics & Compliance team. This part was followedby a testing process where the user was asked to get familiarwith the virtual assistant by chitchatting, that is, writing thingssuch as ”How are you?” or ”Tell me a joke”. Once the userwas familiar with the virtual assistant, it was asked to phrasemultiple questions within pre-specified topics. To help the user,example questions on the topics were given. The reason fordividing it into different topics was to get a broader rangeof questions with a more realistic distribution. Otherwise,the risk would be that some testers only asked questionswithin a specific topic, for example email related issues orcorruption. The testing phase was followed by a survey abouttheir general experience of the virtual assistant and how theythought they would use it. This survey and testing process wassent to employees within the company for which the practicalimplementation was studied, specifically a group of Ethics &Compliance ambassadors working in various countries andpositions with a shared interest in ethics and compliance.Furthermore, a questionnaire was sent out to the Ethics &Compliance team as well as the IT Service Desk to collectdata regarding time spent on tasks on different complexitylevels (see appendix E and B).

Once the survey and testing process was completed, theconversation logs were extracted from Watson Assistant andcleaned. The full data cleaning process is described under DataCleaning.

The conversations were then analysed using existing toolsfrom the IBM Watson Assistant team, primarily the MeasureNotebook and the Effectiveness Notebook to calculate perfor-


mance metrics such as precision and helpfulness. An estimateon the amount of traffic that would go through the assistantrather than a live agent was made based on historical statisticsand the questionnaire answers. [14] The results were thencombined with other contextual dimensions such as averagecost rate of the different divisions personnel to calculatepotential cost savings.

The report was then concluded in an analysis of whetherthe virtual assistant were ready for deployment and if thetiming is right to invest in the technology. This was basedon performance, IBM expert statements, current research andpotential cost savings. The method is visualised under figure1.

Fig. 1: Visual representation of the applied method.

A. Technical PerformanceTo evaluate the virtual assistants’ technical performance in

the specific use cases, two IBM created Jupyter Notebookswere used together with conversation logs from the conducteduser testings. The first notebook, the Measure Notebook,contains a set of metrics to describe an assistant’s overallperformance, and was developed to identify well performingareas as well as lagging areas. The second notebook, the Effec-tiveness Notebook, is focused around the relative performanceof each intent and entity. For the technical evaluation, theMeasure notebook was first used. From this, an annotationfile was created and used for the Effectiveness Notebook. TheEffectiveness Notebook could then analyse the intents aftera partly manual annotation. From this several metrics couldbe calculated, whereas the most relevant for this report wasprecision and helpfulness.

1) Measure Notebook: This notebook gave two key perfor-mance indicators, coverage and average confidence. Coveragemeasures the system on an utterance level, compared to confi-dence which measures the system on a conversation level. Inother words, the two metrics represent the portion of inquiriesit can respond to and how certain it is of the identified intents.Coverage is measured based on a predetermined thresholdin confidence on the classified intents. [15] Another usefulmeasure based on helpfulness is task fulfillment rate, whichdescribes how well the assistant can help the users reach theirend goals without the help of a human agent.

2) Effectiveness Notebook: In this notebook a confusionmatrix can be built and used to calculate True Positives,False Positive, False Negatives and True Negatives in the

intent classification. Other summary metrics such as numberof utterances, average helpfulness, precision and variance overintents were displayed using the notebook. An important noteon helpfulness is that this measure has a subjective definition,and can therefore vary depending on the goals of the assistant.This notebook was modified for this paper to be able toexamine precision and helpfulness with and without chitchatas a contributing factor. Chitchat is conversations containingnon-use case specific information and is often associated withutterances such as ”How are you?” or ”Tell me a joke”. [16]

3) Data Cleaning: In order to use the Effectiveness Note-book an annotation file created from the Measure Notebookwas used. This annotation file was partly generated by thenotebook but needed a human’s annotation to determine pre-cision and helpfulness. The manual work consisted of notingif the correct intent had been chosen, if the response wascorrect and also if the response was helpful for the user.For this paper, incorrect usage was removed from the logs,for instance when users asked questions in not supportedlanguages or utterances incomprehensible to a human assistant.Furthermore, the assistant’s user interface presents pre-writtenresponse options in some dialog stories, instead of letting theuser writing freely and then recognising the intents within thedialog story. When these alternatives occurred in the logs, theywere removed with the motivation that they would skew theresults and display a stronger intent classification, while theuser simply chose between alternatives in reality.

4) Assumptions: In the evaluation two primary assumptionswere made. First, the amount of low, medium and highcomplexity tasks determined from the surveys is representativein the logs. This means that the same complexity levels will beused when calculating coverage on different complexity levels.Next, the coverage and task fulfillment rate differs betweencomplexity levels by a factor of 1.7. That is, coverage andtask fulfillment rate is 70 % higher in low complexity taskscompared to the metrics on medium complexity. The mediumcomplexity’s coverage and task fulfillment rate is consequently70 % higher than for high complexity tasks. The numbershave been determined with the help from the two contactsin the company. These contacts have experience within theirrespective fields and have been involved in developing thevirtual assistants hence being familiar with the assistants’abilities and limitations.

VI. RESULTS

The following section displays the results from the practicalinvestigation and has been divided into two parts for eachassistant. The first part shows results from the user testings,based on metrics from the two notebooks presented in theMethod section. The second subsection concerns the econom-ical evaluation and displays results from the survey for the ITService desk and the Ethics & Compliance department.

A. IT Service Desk1) Technical Evaluation: For this paper, there were initially

108 conversations logged from the user testings and roughly


1000 messages. According to the results from the MeasureNotebook, 72 % of the questions were covered by the virtualassistant. Furthermore, the assistant had an average confidenceof 61 % when determining the intents.

From the Effectiveness Notebook, as figure 2 below dis-plays, precision was calculated to 55 % and helpfulness to 72%.

Fig. 2: Visual representation of precision, helpfulness and theirvariances (red line) for the IT Service Desk virtual assistant.In this representation, chitchat is included.

Next, all chitchat was excluded which resulted in theprecision calculated to 48 % and helpfulness to 69 % (seefigure 3).

Fig. 3: Visual representation of precision, helpfulness and theirvariances (red line) for the IT Service Desk virtual assistant.In this representation, chitchat is excluded.

2) Economic Evaluation: This section focuses on the po-tential cost savings of using virtual assistants, based on thesurvey taken by 8 IT Service Desk agents as well as theconversation logs extracted from the user testings.

TABLE I: IT Service Desk workload survey

Lowcomplexity

Mediumcomplexity

Highcomplexity

Averagepercent 46 % 35 % 19 %

Standarddeviation 20 pp 9 pp 13 pp

Averagetime on task 8 min 27 min 68 min

Range timeon task 5 - 11 min 19 - 35 min 48 - 88 min

From the survey regarding the IT Service Desk virtualassistant, the following results were acquired.

TABLE II: IT Service Desk virtual assistant survey

Average RangeIT related issues per quarter 7.1 5 - 9.1Issues brought up with ser-vice desk 4.6 2.7 - 6.4

Have jobs related to IT 53%The Assistants overall un-derstanding 31%

Assistant usage instead ofown research 30%

Will use assistant next time 43%Overall satisfaction 44%

B. Ethics & Compliance

1) Technical Evaluation: The user testings for the Ethics& Compliance assistant resulted in 90 logged conversationsand just under 700 messages. Of these, the virtual assistantwas able to cover 74 %. Also, the average confidence for thisassistant was 64 %, showing how certain the assistant waswhen identifying the intents from the utterances.

From the Effectiveness Notebook, as figure 4 displays,precision was calculated to 74 % and helpfulness to 73 %.

Fig. 4: Visual representation of precision, helpfulness and theirvariances for the Ethics & Compliance virtual assistant. In thisrepresentation, chitchat is included.


Next, all chitchat was excluded, and the resulted, as figure 5displays, in the precision calculated to 73 % and helpfulnessto 68 %.

Fig. 5: Visual representation of precision, helpfulness and theirvariances for the Ethics & Compliance virtual assistant. In thisrepresentation, chitchat is excluded.

2) Economic Evaluation: This section focuses on the poten-tial cost savings of using virtual assistants, based on the surveytaken by 7 legal counsels working in Ethics & Compliance aswell as the conversation logs extracted from the user testings.

TABLE III: Tangible metrics

Average salary 90 000 eVacation pay 9 000 e

Overhead costs 40 %Average tax labor rate 40 %

CREC 178 200 e/year

From the survey asked to the Ethics & Compliance teamabout their workload, the following results could be acquired.

TABLE IV: Ethics & Compliance workload survey

Lowcomplexity

Mediumcomplexity

Highcomplexity

Average in-quiries permonth

19 12.5 6.2

Standarddeviation 22.2 14.2 6.9

Averagetime on task 24 min 2.2 h 5.9 h

Range timeon task 15 - 32 min 1.4 - 2.9 h 4.4 - 7.3 h

From the survey regarding the Ethics & Compliance virtualassistant, the following results could be acquired.

TABLE V: Ethics & compliance virtual assistant survey

Average RangeE&C related questions perquarter 2 0.9 - 3.1

Questions brought up withE&C 1.4 0.4 - 2.5

The Assistants overall un-derstanding 51%

Assistant usage instead ofown research 50 %

Will ask assistant next time 73 %Overall satisfaction 63 %

VII. DISCUSSION

This section discuss the observed results divided into thethree key aspects mentioned throughout the paper. The firstpart discusses the technical performance of IBM WatsonAssistant in the two specific use cases, the IT Service Desk andthe Ethics & Compliance department. The second part presentsthe organisational value of the two virtual assistants examinedin regard to tangible as well as intangible benefits. Finally,deployment timing and technological maturity is discussed forthe specific use cases and in a general perspective.

A. Technical Performance

1) IT Service Desk in Comparison to Ethics & Compliance:The results show better coverage and precision for the assis-tant implemented for the Ethics & Compliance departmentcompared to the one for the IT Service Desk. Indeed, equiv-alent results could be seen in user satisfaction and perceivedunderstanding. This result can be caused by several reasonsdivided into subtopics of Subject differences in IT and Ethics& Compliance, Survey differences as well as Difference in usergroups.

Subject differences in IT and Ethics & Compliance. Onefactor as to why the performance was better for the Ethics &Compliance assistant could be due to differences between theuse cases. IT as a subject is broad and it therefore requiresmore effort to cover all questions employees might have. Incomparison, Ethics & Compliance follows the company policywith quite specific information. The number of topics andquestions are of less diverse nature, thus making it easier toforesee what might be asked. In terms of independence, theIT Service Desk assistant would be more independent oncesufficiently refined due to the nature of the questions. TheEthics & Compliance assistant would in contrast be difficultor impossible to make independent due the risks with givingwrong or ambiguous answers.

Survey differences. Despite efforts to make the surveysas similar as possible, difference could be found in termsof example utterances. The virtual assistant in Ethics &Compliance had more specific example questions on topics topartly ease the process of inventing a new question for the userand partly to make the topic itself easier understood. As anexample, to create a question on competition law can be con-


sidered more difficult than a question on email issues. Unlikeemail issues, competition law needs more specified exampleutterances to avoid over-representation on utterances such as”What is competition law?” or ”Tell me about competitionlaw”. This might however skew the results, where the Ethics &Compliance assistant test group displayed greater ability to askquestions similar to those the Ethics & Compliance counselhas foreseen and implemented. In contrast, the IT assistanttest group asked more general questions not foreseen by theIT Service Desk, such as ”How do I respond to an email?”.

Difference in user groups. Groups of testers will havedifferent experiences using the virtual assistants due to variousexpectations and phrasing that may be more or less suitedfor the assistant. These differences may be due to the usersbackground and experience in using similar technology. Thecompany where the practical implementation was studiedchose applicants for the user test groups after what theythought were representative for future usage. Despite this fact,the risk of over- and under-represented groups are still present.Therefore, this evaluation could have been improved by takingdifferent user test groups and calculate an average over thesegroups, to diminish over- and under-representation in the tests.

2) User Understanding: For the purpose of displayingaccurate measurements on the different metrics - that is,coverage, confidence, precision and helpfulness - data cleaningof various wrong uses and repetition errors have been made.However, in reality one must also consider these when evaluat-ing the assistant. In the user testings, cases of people using theassistant as a search engine occurred. Queries such as ”covid-19” and other utterances not specific to the use cases canneedlessly cause harm to user satisfaction. Users have alsobeen found to repeat the utterance instead of rephrasing itwhen the virtual assistant displays a lack of understanding.Due to this, one should consider the effects of an increasedknowledge on how to effectively use virtual assistants. Thiswould result in better performance which leads to improveduser satisfaction.

3) Confidence Threshold: When the virtual assistant con-siders itself confident enough in its intent classification togive a response, it does so based on a confidence threshold.An increase of this threshold would in theory lower thecalculated coverage, but potentially larger satisfaction, sincea higher confidence in intent classification often is connectedto accurate responses. However, the analysed logs showedthat some intents with a confidence close to the thresholdgave a more accurate response than those with significantlyhigher confidence. Consequently, an analysis in thresholddetermination is left for future research, where focus shouldbe on choosing a threshold that gives least dissatisfaction.

4) Chitchat and How it Affects Performance: Another rel-evant factor from a technical perspective is the precision andhelpfulness when evaluated with and without the presenceof chitchat. Chitchat is not necessarily considered a ”part”of the assistant’s domain and would therefore arguably beunnecessary to display. Furthermore, chitchat have intents

which are - in comparison - easy to classify and thereforeincrease the overall performance. This could result in skewedresults with over-represented chitchat and thus higher scores.For the assistant in the IT Service Desk, the precision wassignificantly lower when chitchat was excluded from theanalysis. The assistant for Ethics & Compliance was howeveronly slightly affected. Notable however is the importance ofchitchat in a virtual assistant and that user satisfaction couldlessen without it. The primary reason for this is that chitchatmakes the assistant more human and fun to use. Furthermore,the user might not understand the assistant without being ableto ask questions such as ”what can you do?” or ”who areyou?”. Chitchat is also especially important for a user in thebeginning when the user wants to know the assistant’s abilitiesand discover its traits. When familiar with the assistant’slimitations and possibilities, the users’ chitchat would decreaseand focus would be laid on the actual questions.

5) Coverage, Precision and Helpfulness: In this paper, thesemetrics have been used to evaluate the technical performanceand potential cost savings. The helpfulness has in this casebeen used to understand how well the conversations could havebeen contained, but can also be presented as a quantitativemetric on how satisfied a user would be of the responses fromthe assistant. Precision has also been chosen over coveragein this paper to understand how well the virtual assistant canunderstand the intent based on the utterance. This is primarilydue to the fact that coverage was based on a thresholdrather than statistical facts and gold labels, unlike precision.However, one could also combine this by either laying anaverage between the two, or manually calculate coverage anduse it instead.

6) The Assistants’ Abilities over Different Complexity Lev-els: One might argue that coverage and task fulfillment ratebased on the assumptions made under Assumptions could beconsidered high, especially for low complexity tasks. However,this assumption has been made based on the quantitativeanalysis and from discussions with the contacts in the company(see section Assumptions) and is therefore deemed reasonablein this paper. To understand how much time the company cansave from using the assistant, one can see that the Ethics &Compliance use case would result in 21 minutes saved onaverage per low complexity task. Furthermore, 40 minuteswould be saved on medium complexity tasks, and 37 minuteson high complexity tasks. Despite this, a deeper analysis on thedifferent complexity levels and how the assistant can performon each level should be made in future research.B. Organisational Value

1) Tangible Benefits: Tangible benefits are defined as ben-efits that can be measured within the organisation. This couldbe potential cost savings in terms of reduced workforce orthe reduction of the average handle time. The main tangiblebenefits are divided into 2 parts:

• Task fulfillment rate• Benefits specific for the IT Service Desk use case


Task fulfillment rate. Task fulfillment rate refers to theportion of the task the virtual assistant on average can fulfill,i.e how well the assistant can help the user reach its end goalwithout the help of a human. The task fulfillment rate willincrease as the virtual assistant is improved and users are morecomfortable using the technology.

Benefits specific for the IT Service Desk use case.

Automatically setting ticket parameters As the virtualassistant automatically can set parameters in the tickets for theIT Service Desk, the manual labour required from the humanagents will decrease and lead to a potential cost saving.

2) Intangible Benefits: Intangible benefits are defined asbenefits hard or impossible to measure. This could be im-proved user experience or decreased time to resolution. In thissection, the intangible benefits are divided into 2 parts:

• General intangible benefits realised by the entire firm• Benefits Specific for the Ethics & Compliance use case

General Benefits

Reduced time-to-resolution and 24 x 7 x 365 support.

As the virtual assistant is available 24 x 7 x 365 and instantlycan answer questions, it will reduce the time to resolution,especially for international companies operating in differenttime zones with centralised service desks. Reduced time-to-resolution increases the user experience and allows the user toget back to their day-to-day activities quicker.

Competitive advantage as an early adopter. Being an earlyadopter of virtual assistants and digitalisation in general yielda competitive advantage as the company will be more adoptedto the technology and therefore use it more efficiently. Thisis due to the technology being built into the organisationalstructure, culture and processes. Furthermore, this will enableuser testing that generates statistics which can be evaluated toimprove the performance of the implemented virtual assistants.

Increased brand value. With the use of virtual assistantscompanies are seen as innovative with their AI solutions,something that could increase the brand value. This benefitis more valuable for companies with client facing virtualassistants but can be realised for internal roles as well.

Improved employee satisfaction. As the virtual assistantwill handle repetitive basic inquiries, human agents can focuson the more advanced and challenging tasks. This will lead toimproved employee satisfaction and performance.

Statistics to further improve the way to work. Thevirtual assistant generates valuable data that when analysedcould be used to optimise the way to work. This wouldcreate more user-centered departments with an increased userunderstanding and a better user experience. The data could alsobe used to find out employee pain points and system flaws.For example if many users ask questions on how to connectto VPN, the VPN solution might need to be made more userfriendly. An example from the Ethics & Compliance use caseis if many users ask around offering gifts, the company mightwant to include this into corporate training.

Scalability. An important factor for IT solutions isscalability. As the virtual assistants can be scaled to handle

an unlimited number of users, the increase of workforceneeded for an increased service demand would be reduced.Despite having an increased total service demand, the humanworkload would be less affected, than without automation,thanks to the virtual assistant’s ability to handle a portion ofthe cases. This would result in fewer new hires and thereforedecrease the total cost for an increased service demand.This will be especially important for the studied IT ServiceDesk as one of their key challenges, a long learning processfor new hires due to lack of documentation and a complexorganisation, results in an expensive on-boarding process.Furthermore, the process to find and hire new employees isexpensive in itself.

Benefits Specific for Ethics & Compliance

Increased awareness in ethics and compliance. By usinga virtual assistant, the user might feel more comfortablediscussing difficult topics such as suspected fraud, corruptionor harassment than if they were to speak with a human legalcounsel. According to a legal counsel at the studied company,this also applies to non-difficult conversations since employeesdo not wish to give the appearance of not understanding and donot wish to consume too much of the counsels time. The use ofa virtual assistant will therefore lead to increased awarenessaround ethics and compliance, an increased user experienceand an increase in reported cases of harassment, fraud andcorruption. Furthermore, this will increase the compliance ofthe firm and help to reduce the amount of harassment, fraudand corruption.

C. Deployment Timing and Technological Maturity

According to IBM Watson Assistant’s algorithm develop-ment team lead Saloni Potdar, the technological advancementhas come far enough to be efficient in an everyday application.Potdar recon it is time to implement a virtual assistant and thatthe sooner the deployment has been made, the better. Thisis due to two main reasons: internal technological maturityand external technological expectations. On the first point,she considers the technological maturity on IBM WatsonAssistant well enough to be deployed even if the assistantis not fully developed. This way, the system can be provideddata to build up the service, while giving the advantages ofautomation in the already developed areas. She explains thata way to maintain a good outgoing impression while buildingthe assistant is to initially trust the system less and let ahuman supervise its responses, and let the system be moreindependent as it evolves. In terms of external technologicalexpectations, Potdar notes the technological hype which hassurrounded virtual assistants and concludes that the currenthype and therefore expectations are lower compared to afew years ago. Due to a general acceptance of the currentlimitations of natural language processing and understanding,users accept failures today that might not be accepted in thefuture.

Potdar’s statement is directly connected to the Gartner’sHype Cycle (GHC), a concept to graphically depicts the com-mon pattern that follows with each newly arisen technology or


other invention. Through GHC all new developments consist offive phases, where emphasis is laid on the inflated expectationsfrom the invention. [17]

Fig. 6: The GHC for Artificial Intelligence in July 2019, withVirtual Assistants circled. [18]

According to Gartner (see figure 6), virtual assistants arein the phase of Trough of Disillusionment. In this phase,the investments are low as the interest wanes due to unmetexpectations. As a result, the investments are low, and areonly able to survive if the providers improve the product ina manner that satisfies early adopters. However, in the nextphase - that is, the Slope of Enlightenment - the organisationalbenefits of the technology become concrete and tangiblewhich leads to more enterprises starting pilots given that thetechnology survives. [19]

This introduces a discussion on whether to deploy a virtualassistant before the Slope of Enlightenment, or not. Gartnerestimates that the Plateau of Productivity will be reachedwithin two to five years, and so the time between phasesis thin. The risk of waiting would be that others havedeveloped and deployed their assistants. This would resultin both competitors already having developed assistants, aswell as increased customer expectations from the technology’smaturity. In contrast, if one were to deploy today, the risks ofunmet expectations and a longer payback period is present.However, if the implementation is successful, then there is theclear advantage of a mature and experienced virtual assistantas an early adopter of the technology.

VIII. CONCLUSION

The conclusions of this paper are the following. First, tech-nological performance can be represented using quantitativemetrics such as coverage, confidence, precision and helpful-ness, and can be complemented using qualitative measuressuch as user satisfaction and perceived user understanding.From a technical perspective, one should not only think about

the specific use case the assistant is implemented fore, butalso think about how chitchat can improve the user experience.Furthermore, one should consider the effects of an increasedknowledge on how to effectively use virtual assistants. Thiswill result in an increased user understanding and betterperformance which will lead to improved user satisfaction.A deeper analysis on the confidence’s affect on coverage isleft for future research.

In terms of technological performance, one should also con-sider the technology’s limitations and maturity. The specifictechnological performance is subjective and the technology’slimitations as well as its technological maturity can be usedto complement the virtual assistants’ evaluation. When con-sidering the maturity, one should include the firms internaltechnological and organisational maturity as well as the exter-nal technological expectations. Further analysed was the entrytiming on the market, where one should especially considerthe Gartner Hype Cycle as well as the risks and advantagesof being an early adopter of the technology.

Next, the value a virtual assistant can provide to an organisa-tion includes reduced time-to-resolution as well as having theadvantage of all-hour-support. Also, it can improve employeesatisfaction and simplify scalability. Moreover, the statisticsfrom the conversations is valuable for further improvements,both for the virtual assistant and the way to work. Finally it canbe concluded that due to its ability to primarily help with lowand medium complexity tasks, a virtual assistant can make theagents more efficient and satisfied with their workplace whilelessening their workload.

Conclusions specific for the use cases show that an assistantimplemented in a narrower use case, that is the Ethics &Compliance assistant, can be easier implemented and performbetter in a less developed environment. As a result, thepotential cost savings for the Ethics & Compliance departmentwas calculated to 139 508 e/year. It is however difficult in thisuse case to make the assistant independent on human legalcounsels, considering the importance of clear and unambigu-ous replies. A broader use case, such as the IT assistant withdistinct questions within specific domains, requires more effortto perform at a high level but may be even more beneficialand independent once sufficiently refined. Consequently, thepotential cost savings were calculated to 65 958 e/year forthe IT Service Desk assistant in its current state.


APPENDIX AIT SERVICE DESK COST SAVINGS CALCULATIONS

Metric Source Ref. Value

Total volumeInternalticketingsystem

V95 600per year

Volume requiringmanual parameterset

Interview Vm70 910per year

Percentage of traf-fic to the virtual as-sistant

Survey V R 43%

Percentage lowcomplexity Survey V RL 46%

Percentage mediumcomplexity Survey V RM 35%

Percentage highcomplexity Survey V RH 19%

Coverage low com-plexity

PerformanceCL 66%

task fulfillment ratelow complexity Helpfulness CRL 94%

Coverage mediumcomplexity

PerformanceCM 39%

task fulfillment ratemedium complexity Helpfulness CRM 55%

Coverage highcomplexity

PerformanceCH 23%

Task fulfillmentrate highcomplexity

Helpfulness CRH 33%

Percentage of sto-ries creating tickets Estimation RT 50%

Overall coverage PerformanceCO 48%

Volume containedlow complexity

V R · V ·V RL · CL ·

CRL

V CL 11 732

Volume containedmedium complexity

V R · V ·V RM ·

CM · CRM

V CM 3 086

Volume containedhigh complexity

V R · V ·V RH · CH ·

CRH

V CH 593

Volume tickets cre-ated

V R · V ·RT · CO

Vm 7 318

Handling time lowcomplexity Survey HTL 9 min

Handling timemedium complexity Survey HTM 25 min

Handling time highcomplexity Survey HTH 62 min

Handling timemanual parameterset

Interview HTm 75 s

Average cost rate Interview CRIT36 000e/year

Metric Source Ref. ValuePotential savingslow complexity

CRIT ·V CL ·HTL

PSL30 458e/year

Potential savingsmedium complexity

CRIT ·V CM ·HTM

PSM22 255e/year

Potential savingshigh complexity

CRIT ·V CH ·HTH

PSH10 606e/year

Potential savingsautomaticparameter set

Vm ·RT ·CO PSm2 639e/year

Total potentialsavings

PSL +PSM +PSH +PSm

65 958e/year

APPENDIX BIT SERVICE DESK AGENT SURVEY QUESTIONS

1) What amount of tasks (in %) do you deal with on thefollowing complexity levels?

• Low complexity• Medium complexity• High complexity

2) How much time does it take for you on average to handlean inquiry of the following complexity? Please note thatthis is your total time spent on the question includingreading and understanding the question, searching forinformation on the topic, formulating an answer andfollowing up. Please exclude waiting/idle time

• Low complexity

– Less than 5 min– 5 - 10 min– 10 - 20 min– 20 - 40 min– 40 - 60 min– 1 h or more

• Medium complexity


• High complexity



APPENDIX CETHICS & COMPLIANCE COST SAVINGS CALCULATIONS

Metric Source Ref. Value

Total volume Survey V4 520 per

yearAmount low com-plexity Survey V RL 50%

Amount mediumcomplexity Survey V RM 33%

Amount high com-plexity Survey V RH 16%

Percentage of traf-fic Survey V R 73%

Coverage low com-plexity CL 97%

task fulfillment ratelow complexity CRL 90%

Coverage mediumcomplexity CM 57%

task fulfillment ratemedium complexity CRM 53%

Coverage highcomplexity CH 34%

task fulfillment ratehigh complexity CRH 31%

Volume containedlow complexity

V · V RL ·CL · CRL

V CL 1 440

Volume containedmedium complexity

V · V RM ·CM · CRM

V CM 329

Volume containedhigh complexity

V · V RH ·CH · CRH

V CH 56

Handling time lowcomplexity Survey HTL 24 min

Handling timemedium complexity Survey HTM 2.2 h

Handling time highcomplexity Survey HTH 5.9 h

Average cost rate Interview CRIT178 200e/year

Potential savingslow complexity

CRIT ·V CL ·HTL

PSL49 292e/year

Potential savingsmedium complexity

CRIT ·V CM ·HTM

PSM61 941e/year

Potential savingshigh complexity

CRIT ·V CH ·HTH

PSH28 275e/year

Total potentialsavings

PSL ·PSM ·PSH

139 508e/year

APPENDIX DIT SERVICE DESK ASSISTANT SURVEY & TESTING

A. IntroductionThis is a survey and a testing process to gain valuable

insights how the IT Service Desk virtual assistant performs.

In the following steps, you are going to be asked 3 generalquestions about how you work with IT related issues. You arethen going to be guided through a testing process to test theassistant by asking a series of questions on different topics.After the testing process, we would appreciate if you couldanswer 3 questions regarding the virtual assistant and howyou might utilise it.

NOTE: The following will NOT be presented to yoursupervisor. When presenting the conclusions of this survey,you will be anonymous.

B. Survey Part 1, General Questions1) How many IT related issues do you have each quarter?

• Less than 2• 2 - 6• 7 - 11• 12 - 15• 16 or more

2) For how many of those issues do you ask the IT ServiceDesk for assistance?

• Less than 2• 2 - 6• 7 - 11• 12 - 15• 16 or more

3) Is your job related to IT?• Yes• No

C. Testing Process1) Begin with getting familiar with the virtual assistant by

trying out general conversations, for example startingwith:

• Hello!• Who are you?• What can you do?• Tell me a joke

2) Please ask 2-3 questions regarding emailissues/inquiries, for example: ”I would like to create ashared mailbox”

3) How often did the assistant understand your emailrelated questions?

• Always• Almost always• Usually• Sometimes• Almost never

4) Please ask 2-3 questions regarding MS Teams, for ex-ample: ”I have audio issues in Teams”

5) How often did the assistant understand your Teamsrelated questions?

• Always• Almost always• Usually• Sometimes


• Almost never6) Please ask 2 or more questions related to IT issues, for

example: ”I have received a suspicious email” or ”I needWIFI access for a guest”

7) How often did the assistant understand your general ITquestions?

• Always• Almost always• Usually• Sometimes• Almost never

D. Survey Part 2, Questions Regarding The Virtual Assistant1) How often will you use the virtual assistant instead of

searching for a solution yourself?• Always• Almost always• Usually• Sometimes• Almost never

2) Will you ask the assistant next time you need IT-support?• Very likely• Likely• Neutral• Unlikely• Very unlikely

3) How satisfied are you with the virtual assistant?• Scale from 1 - 10

4) Do you have any other comments? For example: Whatfunctionality would make you use the assistant more?

• Free text

APPENDIX EETHICS & COMPLIANCE COUNSEL SURVEY QUESTIONS

1) How many tasks do you deal with of the followingcomplexity levels each month? (Examples were includedin the survey but those will not be presented here dueto confidentiality)

• Low complexity• Medium complexity• High complexity

2) How much time does it take for you on average to handlean inquiry of the following complexity? Please note thatthis is your total time spent on the question includingreading and understanding the question, searching forinformation on the topic, formulating an answer andfollowing up

• Low complexity– Less than 15 min– 15 - 30 min– 30 - 60 min– 1 - 3 h– 3 - 5 h– 5 h or more

• Medium complexity

– Less than 15 min– 15 - 30 min– 30 - 60 min– 1 - 3 h– 3 - 5 h– 5 h or more

• High complexity– Less than 15 min– 15 - 30 min– 30 - 60 min– 1 - 3 h– 3 - 5 h– 5 h or more

APPENDIX FETHICS & COMPLIANCE SURVEY AND TESTING PROCESS

A. IntroductionThis is a survey and a testing process to gain valuable in-

sights how the Ethics & Compliance virtual assistant performs.In the following steps, you are going to be asked 2 generalquestions about how you work with Ethics & Compliance.You are then going to be guided through a testing process totest the assistant by asking a series of questions on differenttopics. After the testing process, we would appreciate if youcould answer 3 questions regarding the virtual assistant andhow you might utilise it.

NOTE: The following will NOT be presented to yoursupervisor. When presenting the conclusions of this survey,you will be anonymous.

B. Survey Part 1, General Questions1) How many questions do you have regarding the Business

Practice Policy each quarter?• Less than 2• 2 - 4• 5 or more

2) How many of those questions do you ask to the Ethics& Compliance team each quarter?

• Less than 2• 2 - 4• 5 or more

C. Testing Process1) Begin with getting familiar with the virtual assistant by

trying out general conversations, for example startingwith:

• Hello!• Who are you?• What can you do?• Tell me a joke

2) Please ask 1-3 questions about gifts and hospitalities,for example: ”Can I invite a customer for dinner?”

3) Please ask 1-3 questions regarding corruption or conflictof interest, for example: ”I suspect corruption” or ”CanI hire my brother?”


4) How often did the assistant understand your Teamsrelated questions?

5) Please ask 2 or more other questions regarding ethicsand compliance, for example: ”What do I do if I havebeen harassed?” or ”What can I speak about with acompetitor?”

6) How often did the assistant understand you?• Always• Almost always• Usually• Sometimes• Almost never

D. Survey Part 2, Questions Regarding The Virtual Assistant1) How often will you use the virtual assistant instead of

searching for clarity yourself?• Always• Almost always• Usually• Sometimes• Almost never

2) Will you ask the assistant next time you have a questionregarding ethics and compliance?

• Very likely• Likely• Neutral• Unlikely• Very unlikely

3) How satisfied are you with the virtual assistant?• Scale from 1 - 10

4) Do you have any other comments? For example: Whatfunctionality would make you use the assistant more?

• Free text

ACKNOWLEDGMENT

The authors wish to thank both IBM and the company wherethe assistants are developed, for the opportunity to write thisthesis, and their help in collecting the data for this paper.Furthermore, thanks should be given to Dr. Mattias Wiggbergfor the aid and advice as project supervisor from KTH. Theauthors would finally like to give special thanks to Mr. AndreasHerman, for his professional guidance and valuable support asresearch project supervisor from IBM.

REFERENCES

[1] A. M. Turing, Computing Machinery and Intelligence. Mind, 1950.[2] J. Weizenbaum, Computer Power and Human Reason: From Judgment

to Calculation. New York: W. H. Freeman and Company, 1976.[3] M. Radziwill, N. Benton, “Evaluating quality of chatbots and intelligent

conversational agents,” 2017.[4] Virtual assistant: meaning in the cambridge english dictionary.

[Online]. Available: https://dictionary.cambridge.org/dictionary/english/virtual-assistant?topic=computer-programming-and-software

[5] P. Greenberg, “Chatbots: Conversation for all of us.” Pitney Bowes.[6] “Technology landscape review,” Torchbox: Council Chatbots, 2019.[7] UNDP. Sustainable development goals. [Online]. Available: https://www.

undp.org/content/undp/en/home/sustainable-development-goals.html[8] D. Jurafsky and J. H. Martin, In Speech and language processing.

Harlow: Pearson, 2014.

[9] Watson assistant foundations. [Online]. Available: https://learn.ibm.com/course/view.php?id=4329

[10] Creating a skill. [Online]. Available: https://cloud.ibm.com/docs/assistant?topic=assistant-skill-add

[11] “The total economic impact of ibm watson assistant,” Forrester, 2020.[12] M. Deane. (2019) 5 key it service desk challenges and

how to overcome them. [Online]. Available: https://itsm.tools/5-key-it-service-desk-challenges-and-how-to-overcome-them

[13] “Stop trying to replace your agents with chatbots,” Forrester, 2019.[14] M. Farshid, “Market research analysis,” 2019.[15] Measure watson assistant performance. [Online].

Available: https://github.com/watson-developer-cloud/assistant-improve-recommendations-notebook/blob/master/notebook/Measure\%20Notebook.ipynb

[16] Measure watson assistant performance. [Online].Available: https://github.com/watson-developer-cloud/assistant-improve-recommendations-notebook/blob/master/notebook/Effectiveness\%20Notebook.ipynb

[17] Hype cycle. [Online]. Available: https://www.gartner.com/en/information-technology/glossary/hype-cycle

[18] L. Goasduff. (2019, September) Top trends on thegartner hype cycle for artificial intelligence, 2019.[Online]. Available: https://www.gartner.com/smarterwithgartner/top-trends-on-the-gartner-hype-cycle-for-artificial-intelligence-2019

[19] Hype cycle research methodology. [Online]. Available: https://www.gartner.com/en/research/methodologies/gartner-hype-cycle

Johan Torssell J. Torssell is currently pursuing hisB.S degree in industrial engineering and manage-ment at the Royal Institute of Technology (KTH),Stockholm, Sweden.

Since 2018, he has been a Project Manager withinIoT and Industry 4.0 at IBM, Sweden.

In this thesis, Mr. Torssell has been working onall sections with an extra focus on the organisationalbenefits.

Erik Persson E. Persson is currently pursuing hisB.S degree in industrial engineering and manage-ment at the Royal Institute of Technology (KTH),Stockholm, Sweden.

In this thesis, Mr. Persson has been working onall sections with an extra focus on the technicalperspective.

www.kth.se

TRITA-EECS-EX-2020:540

virtual assistants and their performance in professional ...1470600/fulltext01.pdf · today,...

Documents