mobile dictation with automatic speech recognition for healthcare purposes tuuli keskinen, aleksi...

Mobile DictationWith Automatic Speech Recognition for

Healthcare Purposes

Tuuli Keskinen, Aleksi Melto, Jaakko Hakulinen, Markku Turunen, Santeri Saarinen, Tamás Pallos

TAUCHI research center, School of Information Sciences, University of Tampere, Finland

Riitta Danielsson-Ojala, Sanna SalanteräDepartment of Nursing Sciences, Faculty of Medicine, University of Turku, Finland

Kites symposium 2013

Content

• Background & Motivation• Dictation application• User evaluation• Results• Discussion and conclusion• Ending words

Background

• First speech recognition systems for medical reporting were developed over 20 years ago [1]

• Doctors’ dictations are still commonly typed manually, but utilization of speech recognition is increasing especially in radiology and pathology

• Nurses’ use of speech recognition is rare and often limited to filling the templates

[X] Numbers refer to the actual references in the paper.

Background

• Utilizing speech recognition in Finnish healthcare studied, e.g., in [2] where radiologists were followed changing from cassette-based recording to speech recognition based dictating

• Several studies in the area of speech recognition in healthcare done, e.g., [1, 3, 4, 5]

• Previous studies focus mainly on objective qualities, such as dictation durations and recognition error rates

Motivation

”Voi kun meilläolisi

mahdollisuus saneluun!”

- Anonyymi YTHS:n sairaanhoitaja

Motivation for our study

• Paucity of utilizing speech recognition in Finnish healhcare, especially in nursing

• Obvious and unnecessary delays in getting patient information to the next treatment steps

• Lack of research focusing on the user expectations and experiences of dictation applications utilizing speech recognition in healthcare

Content


Dictation application

• Based on ”MobiDic” by Turunen et al. [6]• The mobile client (Android application on a

tablet) includes functionality for recording and editing dictations, and modifying the dictation texts

• The server side manages the dictations (audio and text) and communicates with speech recognition engines and M-Files document management system

Dictation application• Not only speech recognition is utilized, but a variety of other

tools is included to improve results:– State of the art natural processing tools (e.g., spelling and grammar

checking)– Statistics based on user actions– Optimized multimodal touch-screen U

• Distributed application model makes a variety of use cases possible:– Real-time distributed assisted dictation– Workflow management– Plug-and-play component management (e.g., speech recognizer, NLP

tools, document management)– UI can be adapted for different usage cases and devices

Dictation application – v2.0

Content


User evaluation

• Real-world context, real users and real dictations• Two wound care nurses in one of the University

Hospitals in Finland• Lasted three months in total, covering 30 and 67

dictations for the participants• Wizard-of-Oz approach– The medical language model available was based on

medical and nursing documentation, and thus, it was not sufficient to recognize the language used by the wound care nurses

Methodology

• Background interview– Main focus on participants’ normal practices on

making and/or dictating patient entries• Subjective data gathered with questionnaires– User expectations and experiences (SUXES [8])– Usability-related experiences (SUS [9])– Open questions

• Log data– All application and server events logged

SUXES method • Enables comparison between user expectations before the usage

and user experiences after the usage on a set of statements• Expectations reported by giving two values

– acceptable level: the lowest acceptable quality level for even using the system (or property)

– desired level: the uppermost level that can even be expected of the system (or property)

• Experiences reported by giving a single value on the same statements

• Expectations form a gap where the experienced level is usually expected to be– If below something is wrong; If above success

SUXES method

• Expectations

• Experiences

• Comparison

Using the phone is fast.

Low High

x x

Using the phone is fast. x

Using the phone is fast.

Expectations and experiences

• We used the nine original statements of SUXES– speed, pleasantness, clearness, error free use,

error free function, learning curve, naturalness, usefulness, and future use

• …and five additional statements comparing the dictation application to the normally used entry practice– faster, more pleasant, more clear, easier, and

prefer in the future

Content


User expectations on the application

Median responses of acceptable – desired levels (grey areas), n=2.

User experiences on the application

Median responses of acceptable – desired levels (grey areas) and experiences (black circles), n=2. P1 and P2 refer to participant 1 and 2.

User expectations compared to normal entry practice

Median responses of acceptable – desired levels (grey areas), n=2.

User experiences compared to normal entry practice

Median responses of acceptable – desired levels (grey areas) and experiences (black circles), n=2.

Content


Discussion

• The desired level was 6 or 7 on all statements• The experienced level was at least 6 on all but

one statements• The usefulness of the dictation application can

clearly be seen in the results• More importantly, the participants would

prefer using the application in the future, i.e., they would be ready to drop their familiar and safe routines

Conclusion

• Due to not having an accurate enough language model for nurses’ purposes, we used a Wizard-of-Oz scenario to finalize the speech recognition results

• The user experience results show a true potential for our dictation application – not only to smoothen dictation process, but as a relevant option for writing the nursing entries

Future work

• Finalizing a language model for nurses and utilizing it in Finnish healthcare to enable totally automatic dictation-to-text process is crucial

• We are not developing the language models by ourselves, but will be in close collaboration with our partners in the development and evaluation

• We are also developing our application further to provide even more pleasurable user experience and seamless process

Future Work

• In order to make this reality, we need a proper process for iterative deployment: not a stand-alone product which can be sold to hospitals, for example

• We have developed all necessary components: client and backend software, connections to 3rd party components, tools to support deployment, and a complete deployment process

• Ready for commercialization – looking for partners!

Global market

Acknowledgements

• Project ”Mobile and Ubiquitous Dictation and Communication Application for Medical Purposes” (”MOBSTER”)

• Funded by the Finnish Agency for Technology and Innovation (TEKES)

• Lingsoft and M-Files, and other project partners

mobile dictation with automatic speech recognition for healthcare purposes tuuli keskinen, aleksi...

Documents

speech recognition engines

area of speech recognition

recognition error rates1

dictation durations

finnish healthcare

doctors dictations

dictation textsthe server

variety of use cases