requirements and partitioning of otoacoustic emission

Chair of Real-Time Computer SystemsDepartment of Electrical and Computer EngineeringTechnical University of Munich

Requirements and Partitioning of

Otoacoustic Emission

Measurement Algorithms

Rodrigo Hernangomez Herrero

Master’s Thesis

Requirements and Partitioning of

Otoacoustic Emission Measurement

Algorithms

Anforderungen und Aufteilung von

Messalgorithmen fur Otoakustische

Emissionen

Requisitos y Particionado de Algoritmos

para la Medida de Emisiones Otoacusticas

Master’s Thesis

Supervised by Prof. Dr. sc. Samarjit Chakraborty

Chair of Real-Time Computer Systems

Department of Electrical and Computer Engineering

Technical University of Munich

Advisor Nils Heitmann

Author Rodrigo Hernangomez Herrero

Submitted on September 22, 2017

This thesis was typeset using the XeTeX

typesetting system developed by Jonathan

Kew.

Declaration of Authorship

I, Rodrigo Hernangomez Herrero, declare that this thesis titled “Requirements and Partition-

ing of Otoacoustic Emission Measurement Algorithms” and the work presented in it are my

own unaided work, and that I have acknowledged all direct or indirect sources as references.

This thesis was not previously presented to another examination board and has not been

published.

Signed:

Date:

Abstract

Otoacoustic Emissions (OAEs) are a technique for objective diagnosis of hearing impairment.

Their application fields extend primarily to cases where the patient cannot actively cooperate

in the clinical intervention, such as in hearing screening of neonates.

Because of the high cost of professional equipment, smartphones arise as a newer, cheaper

tool to perform such tests. This Master’s Thesis’ objective is the analysis of the computa-

tional requirements of an OAE screening embedded system given a set of medical specifica-

tions. This device will communicate with a smartphone either per USB or through a wireless

protocol, which adds the possibility to partition the algorithm among both systems.

The approach to accomplish the analysis involves the measurement of power consumption

and real-time performance in the embedded system for different settings and implementation

variants. Models can be built out of the experimental results, so that the profiled parameters

can be linked to performance. This may in turn be used to find out the best set of hardware

and software parameters to fulfill the application requirements in an optimal way.

ETSITESCUELA TECNICA SUPERIOR DE INGENIEROS DE TELECOMUNICACIÓN

Requisitos y Particionado de

Algoritmos para la Medida de

Emisiones Otoacusticas

Chair of Real-Time Computer Systems

Department of Electrical and Computer Engineering

Technical University of Munich

Tutor Nils Heitmann

Autor Rodrigo Hernangomez Herrero

Munich, 27 de septiembre de 2017

Resumen

Se conoce como emisiones otoacusticas (OAE por sus sigles en ingles) a una serie de tecni-

cas para el diagnostico objetivo de la discapacidad auditiva de una persona. El ambito de

aplicacion de estas abarca primordialmente aquellas situaciones en las que el paciente no

puede participar activamente en la evaluacion clınica, como sucede en el caso de la revision

medica de neonatos.

El elevado coste de los equipos clınicos profesionales ha propiciado la aparicion de los smartp-

hones como una nueva herramienta que puede desempenar esta labor de forma mas ase-

quible. En este contexto, el objetivo que este Trabajo Fin de Master persigue es el analisis

de los requisitos computacionales de un sistema empotrado para la evaluacion de OAE ba-

jo un conjunto de especificaciones medicas. Tal dispositivo debera comunicarse via USB o

inalambricamente con un smartphone, lo que anade la perspectiva de particionar el algoritmo

entre los dos sistemas.

El enfoque escogido para llevar a cabo el analisis comprende la medida del consumo energetico

y el rendimiento en tiempo real del sistema empotrado para diferentes ajustes y variantes de

implementacion. A raız de los resultados experimentales se puede construir un modelo que

relacione los parametros examinados con el rendimiento del sistema. A su vez, esto puede

ser usado para hallar el mejor conjunto de parametros hardware y software que cumplan los

requisitos de la aplicacion de forma optima.

Contents

List of Figures X

List of Acronyms XI

1. Introduction 1

1.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3. Document Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2. Background 7

2.1. Engineering Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2. OAE Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3. Platform Architecture 18

3.1. Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2. Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4. Experiments 28

4.1. Physical setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2. DPOAE profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3. Impact of clock frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.4. Averaging schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.5. FFT and Goertzel algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.6. Audio codec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5. Case scenarios 47

5.1. Global model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.2. Systematic setting of parameters . . . . . . . . . . . . . . . . . . . . . . . 48

6. Conclusions 56

6.1. Future development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

A. Appendices 58

A.1. DPOAE with USB. Current and CPU Load Profile . . . . . . . . . . . . . 58

A.2. DPOAE with USB. Energy and Partitions Profile . . . . . . . . . . . . . . 61

A.3. Averaging Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

A.4. FFT and Goertzel Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

List of Figures

1.1. Global estimates on prevalence of hearing loss . . . . . . . . . . . . . . . . 2

1.2. Prevalence of Disabling Hearing Loss vs. GNI per capita . . . . . . . . . . 2

2.1. TEOAE recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2. DPOAE recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3. OAE system block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1. Picture of the whole hardware platform . . . . . . . . . . . . . . . . . . . 20

3.2. Host vs. Device DPOAE detection . . . . . . . . . . . . . . . . . . . . . . 25

3.3. DPOAE Partition Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1. Schematic diagram of experiment configuration . . . . . . . . . . . . . . . 30

4.2. DPOAE profiling capture . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3. USB test. Sampling frequency performance . . . . . . . . . . . . . . . . . 32

4.4. USB test. Buffer length performance . . . . . . . . . . . . . . . . . . . . . 33

4.5. USB test. Sample size performance . . . . . . . . . . . . . . . . . . . . . . 34

4.6. USB test. Partition performance . . . . . . . . . . . . . . . . . . . . . . . 35

4.7. Test without USB. fclk = 48 MHz . . . . . . . . . . . . . . . . . . . . . . . 35

4.8. Impact of clock frequency on current consumption for averaging partition 38

4.9. Goertzel and FFT time performance . . . . . . . . . . . . . . . . . . . . . 45

5.1. BLE consumption current . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

List of Acronyms

ABR Auditory Brainstem Response.

ADC Analog-to-digital Converter.

AR Artifact Rejection.

BLE Bluetooth Low Energy.

CDC Communication Device Class.

CMA Cumulative Moving Average.

CPU Central Processing Unit.

DAC Digital-to-analog Converter.

dB Decibel.

dBFS Decibels relative to Full Scale.

DFT Discrete Fourier Transform.

DMA Direct Memory Access.

DPOAE Distortion Product Otoacoustic Emission.

DSP Digital Signal Processor.

FFT Fast Fourier Transform.

FPU Floating Point Unit.

GNI Gross National Income.

I2C Inter-Integrated Circuit.

I2S Inter-IC Sound.

MCU Microcontroller Unit.

MIMD Multiple Instruction Multiple Data.

OAE Otoacoustic Emission.

OHC Outer Hair Cells.

PLL Phase-Locked Loop.

RAM Random Access Memory.

RISC Reduced Instruction Set Computer.

SIMD Single Instruction Multiple Data.

SISD Single Instruction Single Data.

SNR Signal-to-noise Ratio.

SPL Sound Pressure Level.

TEOAE Transient Evoked Otoacoustic Emission.

USART Universal Synchronous/Asynchronous Receiver/Transmitter.

USB Universal Serial Bus.

WHO World Health Organization.

“La patria es la familia y los amigos. Heimat heißt Familie und Freunde.

Homeland is your family and your friends.”

A mi familia, lejos y cerca. A los ausentes, que siempre llevo conmigo. A

los presentes, que siempre me faltan. A todo aquello que permanece firme

cuando lo demas se tambalea. Al inmerecido orgullo que me profesan, pues

a sus ensenanzas me debo enteramente.

A mis raıces, mi pueblo y los amigos que siempre esperaran a los madrilenos

en verano.

Al colegio San Viator, pilar desde donde me apoyo para lograr mis metas. A

CAL y a la Parroquia Virgen de la Fuensanta, a mis hermanos de comunidad

con los que fe y amistad se funden en un abrazo hasta volverse indistinguibles.

A Dios, y a cada trocito de Dios que hay en las personas con las que me

encuentro, en la vida que me rodea y en los objetivos que persigo.

A la Universidad Politecnica de Madrid, y en especial al grupo de investi-

gacion B105. Sois alegrıa, pasion y talento sin complejos.

To all the amazing international people that I have met in Munich. It is so

unfair that our paths diverged so soon, when we started realizing the wonderful

person we were in front of.

Meinen lieben Willis im Wohnheim und sonstigen deutschen Freunden. Man

redet von Muttersprache, weil man sie in der Familie lernt. Deutsch ist ja

uberhaupt nicht meine Muttersprache, aber diese Idee berechtigt mich irgend-

wie zu sagen, dass Ihr fast wie eine Familie fur mich hier seid. Ihr habt mir

ganz viele Worte beigebracht, darunter steht aber ,,Integration” als meine

Lieblingsvokabel.

Der Technischen Universitat Munchen. Da konnte ich spuren, wie gluck-

lich ich bin, an so einer ausgezeichneten Universitat studieren zu konnen,

wo Unterricht, Forschung und Studenten die Prioritat sind. Dem Lehrstuhl

fur Realzeit-Computersysteme, und allen seinen Mitarbeitern und Studen-

ten. Danke fur die Gelegenheit, mit Euch arbeiten und von Euch bei der

Masterarbeit unterstutzt werden zu konnen.

1Introduction

“Alcanza la excelencia y compartela.”

“Achieve excellence and share it.”

Saint Ignatius of Loyola

In this first chapter, a presentation of this thesis’ topic is provided. The motivation

for the research on the topic will be discussed, as well as the intended outcomes and

the contribution to the scientific and global community. At the end of the chapter

a brief explanation of this document’s organization can be found.

Motivation

According to the World Health Organization (WHO), there are around 360 million

people in the world with disabling hearing loss, which makes up to 5.3% of the

world’s population. The prevalence of this condition is unequally distributed across

the globe: while it affects 4.9% of male adults in Western Europe, North America,

Oceania and Pacific Asia, this number doubles in South Asia, as Figure 1.1 accounts

for. In fact, 9% of male adults and 8.8% of female adults suffer from hearing loss in

countries such as India, Afghanistan, Pakistan or Bangladesh, this resulting in this

region being the most affected in the world [1].

Deafness also has a strong impact among infants and children. 32 million people

between 0 and 14 years old are estimated to be partially or totally deaf all over the

world, which represents 1.7% of the overall children population and 9% of the whole

affected population. Again, South Asia is the region where this problem strikes in

a hardest way, as the prevalence for this age group lies around 2.4%. As a matter

of fact, a correlation exists between the average Gross National Income (GNI) per

capita and the prevalence of disabling hearing loss of a region, both for children and

for adults (See Figure 1.2).

2 INTRODUCTION

East Asia : 22 %

High-income : 11 %

Central/East Europe and Central Asia : 9 %

Sub-Saharan Africa : 9 %Middle East and North Africa : 3 %

South Asia : 27 %

Asia Pacific : 10 %

Latin America & Caribbean : 9 %

Figure 1.1.: Global estimates on prevalence of hearing loss. Data source: [1]

High Income: 0.5%

Cent/East Europe andCent Asia1.6%

S-Sahara Africa: 1.9%

Middle East and North Africa: 0.92%

South Asia: 2.4%

Asia Pacific: 2%

LatinAmericaandCaribbean: 1.6%

East Asia: 1.3%

y = 0.0266x-0.334

0%

1%

1%

2%

2%

3%

3%

4%

- 5 10 15 20 25 30 35 40 45 50

Prev

alen

ce o

f Disab

ling He

arin

g Lo

ss f

or ch

ildre

n

Average GNI per capita (thousands US Dollars)Prevalence of Disabling Hearing Loss for children until 14 years old

(a) Prevalence in children

High Income: 18%

Cent/East Europe and Cent Asia 36%

S-Sahara Africa: 44%

Middle East and North Africa: 26%

South Asia: 48%

Asia Pacific: 43.5%Latin America and Caribbean: 39%

East Asia: 34%

y = 0.5212x-0.208

0%

10%

20%

30%

40%

50%

60%

- 5 10 15 20 25 30 35 40 45 50

Pre

vale

nce

of Disab

ling Hea

ring

Loss

(ad

ults 65+)

Average GNI per capita (thousandsUS Dollars)

Prevalence of Disabling Hearing Loss for adults over 65 years old

(b) Prevalence in adults older than 65 years

Figure 1.2.: Prevalence of Disabling Hearing Loss vs. GNI per capita. Image source: [1]

While hearing loss is a severe handicap for people of all ages and conditions, newborn

and children are the ones who suffer its consequences in a most critical manner.

Hearing is together with vision the most essential human sense and it takes an

active part on child development regardless of the region and the culture. The

most notably aspect of this is language acquisition, which is typically accomplished

through speech. In cases where hearing impairment hinders or even makes impossible

to gain oral language skills, early intervention is vital to ease the adverse effects that

this may lead to, including academic dropout or even social exclusion.

In fact, several studies point out that there is a significant correlation between the age

of enrollment in such intervention programs and the degree of language development

gained through it. In particular, in [2] it is concluded that children who are enrolled

by 11 months of age or earlier exhibit a degree of vocabulary and verbal skills that

approximates that of non hearing-disabled children at the age of 5, no matter the

extent of their impairing condition. On the other hand, later enrolled children score

lower on such metrics. This should serve as an evidence of the importance of early

identification and its impact on the child’s developmental success.

Motivation 3

In order to achieve a correct early identification it is needed to provide a potentially

deaf child with the right diagnostic tools. As it was just discussed above, hearing

loss should be detected at a rather early stage where this condition can only be

determined with a medical approach through screening and diagnostic tests. In such

a context, a concise explanation is required of what the different types of hearing

loss tests are and which ones of them suit the target patient the best.

Subjective and Objective Hearing Testing

Over the years, medical doctors have come out with different techniques to assess

whether a patient has hearing difficulties or not. These techniques can be most

remarkably classified according to the degree of involvement of the patient in the

test itself, leading to the categories of subjective and objective hearing testing.

Subjective tests require the patient to react to some kind of stimulus in an active

way. Pure-tone audiometry is the most usual example of this category, where the

subject is presented with a set of audio tones played to their ears and he or she

must acknowledge to have perceived each corresponding tone. Nevertheless, there

are other instances of subjective tests with diagnostic significance, such as speech

testing or reflex audiometry. In the latter one, some kind of reflexive behavior is

sought in response to short loud narrow-band sounds, avoiding the necessity of an

agreed code to confirm a successful stimulus perception.

On the other hand, objective tests are performed without cooperation from the

patient. Instead of this, they rely on some physiological characteristics of the hear-

ing system to detect hearing loss. As an example for this group both Otoacoustic

Emissions (OAEs) and Auditory Brainstem Responses (ABRs) can be addressed.

In OAEs, a probe fits into the ear and emits certain sounds to stimulate the inner

ear. This generates an acoustic response that is recorded and analyzed by the probe

to screen hearing impairment. As for ABRs, brain electrical activity is measured in

the presence of sound stimulation through electrodes attached to the subject’s head.

While some subjective approaches, namely reflex audiometry, are suitable for new-

born and child screening, objective tests are usually preferred. From those, ABRs

present the best performance, as they are capable of determining the hearing thresh-

old. OAEs only proof the state of the inner ear, and a fail to detect such emissions

may be caused by some reasons other than hearing impairment, such as a noisy

environment. The drawback of ABRs is the set up of the test, which involves the

use of electrodes. In regard of this, OAEs are much less invasive and faster, which

is the reason why they are typically used in newborn screening at hospitals.

OAEs’ less costly set up also make it an interesting choice for non developed areas

as it requires less equipment, although the overall cost of such clinical devices may

4 INTRODUCTION

still be out of reach in several territories. Taking into account the discussion at the

beginning of the chapter and the numbers in Figure 1.2, it appears as something

sensible that if deafness is regarded as a global problem, then the stress should

be put on the most affected regions. They happen to be rather underdeveloped,

underprivileged areas as well, which adds more value to the different OAE techniques

as influent actors in the worldwide fight for the integration of hearing impaired

people.

Hardware requirements for OAE screening devices are minimal, yet the character-

istics of their different components must be outstanding and this may end up with

commercial devices whose price lies around US$ 3,000 [3]. In this way, it would be

desirable to find a way to bring down this cost in order to spread the use of this

screening technique out of hospitals into health care centers, both in developed and

underdeveloped countries.

Objectives

For all the above reasons, this work has as its major objective to bring down the

cost of OAE screening as a meaningful step into the worldwide spread of early

identification of disabling hearing loss and early intervention.

There are different approaches in which this can be achieved. The most classical one

would be to start from a commercial OAE device and cut down functionalities and

performance aspects in the hope that a cheaper, yet still clinically-suitable device

could come out of it.

The generalized use of smartphones, nevertheless, might provide a powerful tool that

takes this idea even further. Assuming the availability of a smartphone or tablet

(from now on, both will be referred with the generic term host) in most health care

centers and hospitals, it is possible to take advantage of its existing hardware to

make the bill of materials of the desired gadget shorter. It is a common idea that

has been exploited over the past years, where plenty of functionalities have been

increasingly migrated to such devices in the form of apps. Here it is important to

remark that, according to Ericsson Mobility Report of June 2017, the number of

smartphone subscriptions in 2016 was 3.9 billions, and it is expected to reach 6.8

billions in 2020, which would roughly represent 88% of the world’s population at

that point [4].

However, the host on its own is not prepared to perform such a test. As it will be

explained in the following sections, an ear-probe with a certain number of loudspeak-

ers and microphones is compulsory to perform OAE screening and diagnosis. This

demands the design of an external device (from now on, simply device) that helps

Objectives 5

the host with its task of ear stimulation, response recording and emission detection.

Such design leads to a set of interesting questions:

I What is the best technology to communicate device and host? More specifi-

cally, should both elements be wired together or connected through a wireless

link?

I OAE tests comprise different processing stages. Which of them should be

performed by the device, and which of them should be executed in the host?

I What are the hardware requirements of the device? If hardware requirements

are fixed, what are the capabilities of the device?

These three questions are interdependent. Hardware constraints such as Random

Access Memory (RAM), clock frequency or sampling rate of analog-digital conversion

have an impact in software development, which in turn affects power consumption.

The same goes for host communication and algorithm distribution among host and

device. By profiling the code through all possible options, insight on the issue can

be gained and the best options can be selected. This will help to achieve maxi-

mum efficiency and optimal use of resources while complying with all application

requirements.

Assumptions

In order to elaborate a systematic and scientific discourse that helps to answer these

questions, some aspects of the study will be delimited and some assumptions will be

made:

I OAEs will be the basis for the tests. In particular, the focus will be set on a

subcategory of them called “Distortion Product Otoacoustic Emissions”. More

information about OAEs and their subtypes will be found in the next chapter.

I Among all the hardware components that can be found inside an OAE de-

vice, the microcontroller will be the major subject of study. This presumably

includes all the typical parameters that are associated to the choice of a micro-

controller (architecture, clock rate, memories), as well as the software structure

that resides in it. Other important elements, including the audio codec, the

microphone or the loudspeakers, will not undergo such a research and will only

be briefly addressed.

I It would be important to take a look into the clinical performance of the device

to make an overall assessment. Nevertheless, due to the lack of patients for

a meaningful clinical study and the scope limitation to the microcontroller,

6 INTRODUCTION

this aspect cannot be successfully evaluated. Therefore, this report will deal

with those technical aspects strictly related to the area of the electrical and

electronic engineering, which is in any case the expected field of study.

Document Organization

The discussion in this document is structured into a series of blocks. The first of

them is an introduction that gives an overview of the problem of hearing impairment

and the benefits of OAE procedures to fight against it. It leads to the work of this

thesis to find the best settings for a smartphone-driven device to accomplish OAE

screening.

Before starting with the actual body of work, the background chapter provides basic

knowledge in the areas of embedded systems, signal processing and audiology to

understand the overall discussion within the document. The platform architecture

for tests’ hardware and software framework are also described prior to moving on to

the experiment section. Here all the profiling tests that have been carried out are

detailed, as well as the lessons that can be learned from them.

Finally, the gained information during experiments is applied through some case

scenarios, which leads to some conclusions to answer the initial questions. The

possible future work for this thesis is addressed as well.

2Background

“El que lee mucho y anda mucho, ve

mucho y sabe mucho.”

“He who reads much and walks much,

sees much and knows much.”

Miguel de Cervantes, Don Quixote

The aim of this chapter is to provide familiarity with some ideas that are vital for

the understanding of the thesis. Although an electrical engineering background is a

prerequisite to understand the whole work, some notions from this field are explained

here for convenience. Medical concepts around OAEs are discussed in more depth,

as it is not the usual framework of similar works.

Engineering Background

This whole thesis revolves around the development of a medical embedded device

in a technical approach. Thus, a certain degree of familiarity with the underlying

technology of such a device is desirable to fully understand all the explained concepts.

In the case of study, a Microcontroller Unit (MCU) or simply microcontroller is used

to perform some processing. Accordingly, some explanation about signal processing

and about microcontrollers will be provided.

Signal Processing

The MCU uses mathematical operations to extract the desired features out of the

recorded signals. As it will be later discussed, the fundamental information to be

extracted is the frequency spectrum of the signal, which can be accomplished through

the Fourier Transform.

The Fourier Transform is a linear transformation that decomposes a signal into its

8 BACKGROUND

frequential complex components. As this transformation is analog, it is not directly

applicable to a digital platform like a MCU. Fourier Transform’s digital version, the

Discrete Fourier Transform (DFT), is used instead. Actually, what is used in this and

in most contexts is DFT’s most efficient implementation, the Fast Fourier Transform

(FFT). The particularity of this algorithm is its speedup over the straightforward

one: while the complexity for a naive implementation of DFT is quadratic (O(n2)),

FFT’s complexity is quasilinear (O(n log n)) [5].

DFT takes N sampled values (this samples are generally complex-valued, although

for many applications input data is real-valued) and returns N complex values. These

values display the frequency spectrum in the interval [0, fs], fs being the sampling

rate. Thus, DFT represent a signal’s spectrum at a certain frequency resolution,

that is, adjacent samples describe frequential components that differ by some ∆f .

This ∆f is related in this way to N and fs:

∆f =fsN

(2.1)

There is yet another complementary method to extract frequency components named

Goertzel’s algorithm. In this variant, a single DFT term is calculated through a

digital filter. Consequently, if M terms from a signal of length N must be calculated,

the process is repeated for M different filters, leading to a complexity of O(MN).

Although this is computationally less efficient than FFT, for a small number of terms

M it is indeed faster [5].

In many cases, the interest of these complex values lies only on their magnitude. The

absolute value is calculated out of them and it is often transformed into a logarithmic

scale, whose unit is called Decibel (dB).

For audio signals, this logarithmic scale is usually referred to the reference value

p0 = 20µPa, which is considered the threshold of human hearing. The resulting

quantity receives the name of Sound Pressure Level (SPL). Humans can perceive

sounds between 0 and 120 dB [6].

In a digital domain, numbers have a finite range where overflow is avoided. In such

a context, a maximum amplitude signal exists, which is normally used as a reference

value. dB units become in this case Decibels relative to Full Scale (dBFS) and its

maximum value is 0, as a signal’s amplitude cannot be greater than this reference.

A last interesting signal theory concept is the Signal-to-noise Ratio (SNR). It is again

a logarithmic value, but expressed as a difference between a dB value representing

a signal level, and another dB value representing a noise level. If this value is

recalculated again into linear units, it becomes the quotient between both quantities.

Engineering Background 9

Important Aspects on Microprocessors

Architectures

Each MCU has an underlying architecture that defines its capabilities and lim-

itations. Here a remark must be made about the difference between the words

microcontroller and microprocessor. In a general sense, a microcontroller is a micro-

processor with several input and output interfaces to communicate with peripherals,

which can even be integrated into it. Thus, in the context of embedded systems this

term is often preferred over microprocessor, as this is the common scenario. In the

scope of this work both terms will be used interchangeably.

In the light of parallelism, different architectures can be addressed. Simpler proces-

sors with only one core fall into the category Single Instruction Single Data (SISD),

while multiprocessor are normally built into a Multiple Instruction Multiple Data

(MIMD) architecture. In the first variety, a single core fulfills a task by operating

with a unique set of instructions on the same data. When there is more than one

core available, the task is decomposed into multiple set of instructions that are used

by the different cores to process data in a distributed way [7].

There are, however, specialized microprocessors with specific additional features.

Some of them are known as Digital Signal Processors (DSPs) and they are commonly

found in multimedia applications where heavy signal processing is required. DSPs

own an instruction set with some special operations such as filtering or multiply-

accumulate, and they often exhibit a Single Instruction Multiple Data (SIMD) ar-

chitecture. This enables them to use special instructions that perform the same

operation (e.g. an addition) on large vectors of data [7][8].

Another crucial facet of microcontrollers is fixed-point vs floating-point arithmetic.

While fixed-point treats real data types roughly as integers, in floating-point a sig-

nificand and an exponent are defined, which sacrifices range on behalf of precision.

Power and Energy Consumption

One of the most important aspects on the MCU’s performance is energy consump-

tion. Specially in a wireless scenario where the OAE device must be operated with

batteries, a power analysis is vital to determine its autonomy and its final cost.

As it can be found in classical literature for the topic, power scales with voltage and

frequency [7]:

Powerdynamic ∝1

2V 2f (2.2)

But then frequency is taken out of the equation for energy:

10 BACKGROUND

Energydynamic ∝ V 2 (2.3)

According to the formulas, reducing clock frequency will help reduce power for a

certain task, but it will not have an impact on the overall energy consumption.

Nevertheless, frequency does have a proportional relationship with voltage. This

means that an optimal pair of frequency and voltage values can be set, such that a

digital system fulfills some functional criteria while consuming as little as possible.

Another interesting issue within energy consumption is what some sources call

communication-computation trade-off [9]. Mostly used for wireless sensor networks,

this term makes reference to a certain trade-off between the processing of some data

inside a wireless node and the transmission of this processed data. As processing

aims to reduce data size, which then implies lower bandwidth requirements for wire-

less communication, the goal in the cited sources is to find an optimal spot where

total energy consumption is minimized.

Software Paradigms

In multimedia and signal processing applications, embedded software usually has to

cope with some challenging requirements like latency or throughput. Such issues are

often handled with a real-time programming approach.

For instance, when a continuous stream of sampled data must be processed, the

processing rate per sample must be less than the sampling rate so that the whole

process can be performed in real time. This can be accomplished by processing

sample per sample, but sometimes it is more efficient or even necessary to gather

a set of samples into a buffer and process them all together. In this case buffer

processing time must be smaller than the time needed to acquire one complete

buffer.

If samples are continuously being acquired or generated, then double buffering is

normally implemented. In this way, two buffers are used as an interface between

processing and input/output. At every instant, one of the two buffers is used for

processing while the other takes care of interfacing with the outside world. When all

samples from the second buffer have been transmitted, both buffers change roles. In

this way, both tasks (processing and transmission) never interfere with each other [8].

In order to accelerate sample transmission, Direct Memory Access (DMA) may be

used. DMA takes charge of loading and storing data among registers without inter-

vention of the Central Processing Unit (CPU), which greatly simplifies software [8].

As a result, if processing rate is faster than sampling rate (which is mandatory for

real-time applications), the CPU may remain idle for an amount of time before the

next buffer is ready to be processed, while DMA keeps moving samples. MCUs are

OAE Basics 11

normally equipped with low energy modes that deactivate modules and clocks that

are not needed in order to save energy.

In this specific case, the main clock and the CPU can be deactivated once processing

is finished while the DMA module keeps performing the background task along with

other peripherals. DMA must inform the CPU when a new buffer is ready to be

processed, which is typically accomplished through the use of interrupts.

If this scheme is followed, CPU load can be defined as the proportion of time when

the CPU is active.

OAE Basics

OAEs are a trusted technique in audiology and hearing screening. From the historical

perspective, their discovery can be traced back to Dr. David Kemp’s contribution.

The British physicist was the first person to measure these emissions in 1978, and

few years later the first applications as a diagnosis tool emerged. Along the decades,

OAEs have gained popularity in the clinical word as an infant hearing screening

method, and a whole market has arisen with a vast range of devices with different

capabilities and functionalities [6].

From the anatomic and physiologic point of view, the principle of OAEs lies on the

cochlea. This is is a rolled tubular cavity inside the human inner ear where sound

transduction occurs. That is, this is the organ responsible for translating audible

acoustic waves into electrochemical signals that the brain can process. The actual

process in which this is accomplished is complex and is totally out of scope for this

work, but for the sake of comprehension hair cells and the basilar membrane must

be discussed.

The basilar membrane is a resonant structure within the cochlea, whose physical

characteristics vary along its length. This results in a resonance frequency on its

surface being dependent of longitudinal location. Hair cells are distributed on this

membrane and they are stimulated by vibration, which only occurs when a wave

with a certain frequency activates the particular membrane region where a group

of hair cells stand. In a way, the basilar membrane, together with the hair cells,

maps acoustic frequency into spatial location, which is the basis for the perception

of sound in mammals [6].

Having said that, hair cells can be grouped into Inner Hair Cells and Outer Hair Cells

(OHC). Inner Hair Cells are the actual acoustic-electrochemical transducers, while

OHC participate in a so called “active mechanism” within the hearing process. When

an audible acoustic wave travels through the ear into the cochlea and reaches them,

OHC vibrate to generate a kind of “mechanical amplification”. Such amplification

12 BACKGROUND

is the source of OAEs, which in such context are described as a by-product of the

mechanical amplification. In other words, OHC inside the cochlea create acoustic

waves that serve as a feedback in the hearing process, and these can also be regarded

as acoustic responses that can be recorded and measured with a microphone.

One interesting aspect of the so-called cochlear amplification is its non-linearity.

Apart from the positive effect this has on human hearing’s dynamic range, it is a

feature that plays an important role in most types of Evoked OAEs.

Classification of OAEs

As it has been already introduced, different types of OAEs exists. Firstly, a distinc-

tion can be noted between Spontaneous and Evoked OAEs. The first ones refer to

those that are recorded without the presence of any artificial stimulus. They are

rarely used in clinical applications, as ear stimulation leads to greater amplitude

levels, which ultimately leads to easier detection [6].

Evoked OAEs are, consequently, the most used group in the medical world. These

can also be divided in different subcategories taking into account which stimuli are

applied in each case, but in practice two of them stand out: Transient Evoked

Otoacoustic Emissions (TEOAEs) and Distortion Product Otoacoustic Emissions

(DPOAEs).

In TEOAEs, the ear is fed with transient clicks of a very short duration. Because

of their short temporal span, these clicks have a broadband frequential spectrum,

which leads to the stimulation of the whole basilar membrane. This results in the

generation of OAEs for all the spectrum, which then can be recorded by the OAE

probe, as Figure 2.1 shows.

Figure 2.1.: TEOAE recording. Lighter gray plot at the lower left corner shows the evoked spec-trum. Image source: [6]

OAE Basics 13

The amplitude level of the emissions in TEOAE tests is tens below that from the

stimuli. Furthermore, both stimuli and emissions overlay on the frequency spectrum

and on time, which may cause stimuli to mask emissions. To avoid this, certain

protocols are employed where stimulus amplitude and polarity alternates so that

averaging cancels out its contribution (at least theoretically). In this case OAEs are

not affected by this cancellation, as their originating process is non-linear and so is

the relationship between stimulus and emission.

To clarify this cancellation protocol, the following scheme can be considered: A

sequence of clicks is fed to the inner ear. Each period of this sequence comprises

four clicks. The three first clicks have a normalized amplitude of 1 and a positive

polarization, while the last click in the sequence presents negative polarization and

3 times the amplitude of the remaining clicks in the sequence. When averaged, this

sequence of four clicks yields predictably a null value. It is not the same for the OAEs

produced by it, though. As the cochlea works in the non-linear region for TEOAEs

tests, the set of emissions caused by each click will exhibit similar amplitudes.

DPOAEs represent a contrast to TEOAEs. In this technique, two pure tones at

frequencies f1 and f2 are used as stimuli. Again thanks to cochlear amplification’s

non-linearity, OHC produce OAEs at frequencies that are integer linear combinations

of the two fundamental frequencies (i.e. 2f1−f2, 3f1−2f2, 2f2−f1, 3f2−2f1, f2−f1etc.) In Figure 2.2 a real DPOAE recording is presented, where both fundamental

tones and DPOAEs are visible.

2000 2500 3000 3500 4000 4500 5000 5500 6000

-30

-20

-10

0

10

20

30

40

50

60

[ ]

Level [dB SPL]

DPOAE spectrum recorded from healthy human ear.

2f1-f23f1-2f2

4f1-3f2

2f2-f13f2-2f1

4f2-3f1

f1

f2

Figure 2.2.: DPOAE recording. Lighter spikes correspond to stimulus frequencies, while dark onesrepresent the different distortion products. Image source: [6]

This approach has two implementation advantages compared to TEOAE. On the

one hand, now both stimuli and emissions are narrow-band signals which occupy

different regions of the frequency spectrum. This means that stimuli must no longer

be averaged out, as they can be told apart in frequency domain. On the other

14 BACKGROUND

hand, narrow-band stimuli are also easier to calibrate. As it will be explained in this

chapter, calibration is an important aspect of OAE testing.

As its main drawback, it can be mentioned that DPOAE does not test all frequency

range of interest at once, whereas TEOAE does. This means that, in order to conduct

a thorough test of the basilar membrane, multiple DPOAE tests with different f1

and f2 must be performed.

Another particularity of this variety is that it requires the use of two loudspeak-

ers, each one of them playing one of the two stimulus tones. The reason behind

that is that loudspeakers always present a certain degree of non-linearity as well.

Consequently, if both pure tones were digitally mixed and output through the same

loudspeaker, the loudspeaker itself would produce intermodulation tones at DPOAE

frequencies, which would mask real emissions.

In any case, the implementation simplicity of DPOAE is the reason why it was

chosen as the OAE screening modality to begin with and why this thesis deals

almost entirely with this specific method.

Implementation of OAE Procedures

Now that some insight about OAE screening has been provided, the actual im-

plementation of OAE procedures can be discussed. This normally consists of a

preliminary calibration phase and a proper test phase.

Calibration

In order to make tests clinically meaningful, they must go under a calibration process.

This takes care of two fundamental aspects:

1. That recorded audio can be correctly linked to a physical magnitude.

2. That stimulus parameters (more specifically SPL) can be precisely determined.

The first one is achieved through microphone calibration, while for the second one

in-ear calibration is needed.

Microphone calibration is the procedure to extract the relationship between the

numerical values obtained through the microphone after digitalization and the actual

magnitude they represent. This is normally frequency-dependent and is independent

of the particularities of a test. The main factors that influence it are the analog

circuitry, including microphone itself and signal conditioning stages, and some codec

parameters as gain or sample length. On Figure 2.3, this process characterizes the

signal path between points “B” and “C”.

OAE Basics 15

Audio codec

Microphone

Speaker(s)

EarA

B

C

Microcontroller

Figure 2.3.: OAE system block diagram

Thanks to its independence with individual tests, microphone calibration is only

required once for an OAE device. The actual methodology to accomplish it may

vary and it is not that interesting for the study, although a common factor is the use

of a sound level meter. The important remark here is that the calibration outcome

is a table of fixed values that represent the response of microphone and codec at

a certain frequency. Even in cases where the codec parameters of the device may

change, this normally has a deterministic impact on such values, which can be simply

recalculated and/or interpolated accordingly.

The situation for in-ear calibration looks quite different. This one is performed after

the probe has been inserted into the ear and its goal is to guarantee a certain SPL

at the eardrum. Probe insertion has a vital influence on the relation between output

SPL of the loudspeaker and SPL at the eardrum, and it can be generally asserted

that this relation is different each time the probe is introduced into the ear. It is

also a frequency-dependent relation, so after the process some tables representing

frequency spectra are obtained. Stimulus signals will be later modified according

to this tables, which implies more processing for broadband signals than for pure

tones. Thus, the path from “A” to “B” according to Figure 2.3 becomes calibrated.

During in-ear calibration, it is assumed that the SPL recorded by the microphone

equals the SPL at the eardrum. Although this is strictly not true, it is a fair

approximation that serves clinical purposes. However, there are further procedures

described in the literature [6].

OAE detection

Once the environment is calibrated, OAE testing is ready to start. Although there

are differences depending on the chosen OAE modality to be used, the following

general scheme is valid for all of them:

1. Play stimulus through the loudspeaker(s) (In the case of Evoked OAEs).

16 BACKGROUND

2. Record response into a buffer while stimulus is playing.

3. If a transformation were applied to this single buffer, the noise level of this

single recording (also called noise floor) would be typically too high and it

would mask actual emissions. In order to solve this, several buffers are recorded

and averaged into a single buffer, so that the noise floor goes down while

the emissions persist. This averaging can be performed either in time or in

frequency domain.

4. Apply transformation (typically FFT) to the averaged buffer. If the buffer is

already frequency-averaged this step is unnecessary.

5. If performing a diagnosis test, present the frequency spectrum of the response.

In the case of a screening test with Pass/Fail result, analyze frequency coef-

ficients to obtain a useful metric. Such metrics may also be calculated for

diagnosis tests as a clinical help. A typical example of such metrics is SNR

calculation of OAEs. To obtain the SNR, the noise floor SPL is subtracted

from the SPL of the emission in dB. This implies calculating the noise level,

which involves defining a frequency region to be considered noise and average

over this region.

Artifact Rejection

In OAE testing it is assumed that the noise floor will decrease by buffer averag-

ing, but it actually may increase. This will happen when the incoming recorded

buffer is contaminated with an unusually higher noise level, which may be caused by

ambient noise (e.g.opening a door) or by physiological noise, either voluntary (e.g.

swallowing) or involuntary (e.g. pulse).

In order to detect these bad samples (also called artifacts) and minimize their neg-

ative impact, some rules may be applied. In [6], the following Artifact Rejection

(AR) techniques are described:

I Noise filtering: The recorded frequency range may be larger then needed.

Consequently, by filtering the signal, high or low frequency noise may be elim-

inated. This is somewhat taken for granted in digital signal processing, as

it is a compulsory step in digitalization. Furthermore, this is also implicitly

implemented within FFT.

I Large magnitude AR: Signals to be detected are small. With this premise,

an incoming buffer may be spotted as artifact if its SPL level at a frequency

of interest is unusually large.

I Repeatability AR: In this approach, the incoming buffer is split into two

different buffers and the averaged mean from each buffer is then compared.

OAE Basics 17

If the difference between the means is too high, it can be concluded that the

data presents some kind of local low-frequency noise, which matches with the

nature of the noise OAE tests deal with.

Once artifacts have been detected, it is algorithm-dependent what to do with them.

The simplest approach is to just discard them if a predefined threshold has been

exceeded. For some cases, it may also have a positive influence to make them into

the average after some kind of weighting depending on their score in the AR method.

Stop rules

The duration of the test is yet to be defined. In order not to prolong it more than

necessary, some stop rules may be applied. The following ones are typically used:

1. To stop the test when a number of recorded buffers has been reached. This

number can be either counted on actual recorded buffers or on valid buffers

(that is, after AR). For the first case, this is equivalent to having a fixed test

time.

2. To stop the test when the noise floor has reached a certain level.

3. To stop the test when OAE SNR is above a certain threshold. This naturally

means that in cases where this SNR cannot be achieved, one of the other

criteria above must be applied.

3Platform Architecture

“Caminante, no hay camino, se hace

camino al andar.”

“Wanderer, there is no path, the path

is made by walking.”

Antonio Machado, Campos de Castilla

This chapter consists of a description of the framework that has been implemented

to profile OAE algorithms. The underlying hardware has an undeniable impact on

them, so an attempt to gain familiarity with it follows in the next section. Software

itself is also described, not only in its algorithmic form related to the ultimate

application of medical diagnosis, but also in low-level detail, both for device and

host. As the experiments will show, these low-level details may have a considerable

influence in the results.

Hardware

In order to have a valid framework to implement OAE algorithms, different hard-

ware components were selected and put together. These are specifically a micro-

controller, an audio codec and an OAE probe.

Microcontroller

It was decided to work with a MCU belonging to the ARM Cortex-M family. ARM is

one of the most popular processor architectures worldwide, and its Cortex-M family,

entirely composed of 32-bit Reduced Instruction Set Computer (RISC) machines, is

present in a multitude of embedded systems [10].

The chosen microcontroller was Silicon Labs’ EFM32TM Wonder Gecko, which has

a Cortex-M4F single core. This core is one of the most powerful ones in the family,

Hardware 19

and its major differences with Cortex-M3 are the inclusion of DSP instructions and

the presence of a hardware single-precision Floating Point Unit (FPU).

This processor was framed into the starter kit EFM32TM Wonder Gecko STK-3800,

also manufactured by Silicon Labs. The specific MCU model used on this board is

the EFM32WG990F256, which has a 256 KB Flash and a 32 KB RAM.

The Wonder Gecko is also equipped with a wide variety of interfaces and peripheral

units that can be accessed through the board and its pin headers. The most relevant

ones on this thesis’ scope are the DMA, the Universal Synchronous/Asynchronous

Receiver/Transmitter (USART) with support of different communication protocols

(specifically, it will be used for the Inter-IC Sound (I2S) communication), the Inter-

Integrated Circuit (I2C) bus and the Universal Serial Bus (USB). The last one can

make use of the assembled Micro-USB connector.

The board also provides different options to power the MCU, selectable by an elec-

trical switch. The options are namely three: power through battery, Micro-USB or

Mini-USB (This last connector is used for debugging purposes). This fact will gain

importance when speaking about power measurements.

Audio Codec

In order to transform digital data into acoustic data and vice versa, an audio codec

was used. The choice was the low power stereo audio codec SGTL5000 from NXP

Semiconductors. The relevant features of this component are the following:

I Stereo audio input and output

I Integrated headphone amplifier.

I Integrated microphone amplifier.

I I2S data interface.

I I2C control interface.

I Integrated programmable Phase-Locked Loop (PLL) to manage sampling fre-

quency.

The codec also displays a wide range of audio processing capabilities, which are not

of interest of the application.

A board shield with an assembled SGTL5000 was used to access the codec. This

shield was designed by PJRC for its Teensy microcontroller development system,

and it was pinned to a perfboard to wire it properly and to make it easily pluggable

to the Wonder Gecko starter board and to the OAE probe.

20 PLATFORM ARCHITECTURE

OAE Probe

Perfboard

Starter kit

Pin header

Microcontroller

Audio codec

Micro-USB

Mini-USB

Figure 3.1.: Picture of the whole hardware platform

OAE Probe

The last hardware element of the system is an OAE probe provided by the company

Path Medical. This piece is composed by two headphone speakers and an electret

microphone, having each one of these components an isolated duct inside a tube

that is inserted into the ear. While tests are running this tube is coated with a foam

or silicone pluggable seal to isolate the inner ear from external noise.

The probe is connected to the perfboard through a 14-pin connector, although only

7 pins are used. The used pins are namely positive and negative terminals of the

first loudspeaker (2), positive and negative terminals of the second loudspeaker (2)

and bias voltage, ground and output terminal for the microphone (3).

Thanks to the presence of two loudspeakers, this probe is suitable for DPOAE.

Software

Software is in charge of detecting OAEs using the hardware platform recently de-

scribed. In a distributed paradigm like this, where the algorithm is divided into

device and host, both sides have to be considered.

Device

As a bare microcontroller-based system, the implemented OAE device is programmed

in C language. The tasks to be fulfilled are the following:

Software 21

I Set the audio codec with the right parameters and transfer audio data to and

from it.

I Set the host-communication interface. In this study this is accomplished

through USB, although this should be replaced with a wireless interface to

be determined by the results of this work.

I Wait for a command from host and serve it accordingly.

I If the command compels an OAE test, execute it according to the partition

scheme, as it will be introduced in section 3.2.3.

In this section the first two low-level features will be explained.

Codec operation

The chosen audio codec requires a clock signal to be sent into it to feed the PLL.

The MCU uses its Timer 0 to generate a 12 MHz clock signal for it, using the

high-frequency peripheral clock as a source.

Once a clock is provided, the different codec parameters can be set through I2C

commands. In this communication, the MCU always acts as a master and the codec

as a slave. The most important parameters are sampling rate, sample length and all

the different volumes and gains for the microphone, the headphones, the Analog-to-

digital Converter (ADC) and the Digital-to-analog Converter (DAC).

Among all these gains, the only one who is variable on execution time is the one from

the DAC. The reason behind it is that stimuli have a variable dBFS level depending

on in-ear calibration. In order to generate stimulus signals with the proper level,

they must be attenuated, which can be done either by software (multiplying by

an attenuation factor) or by the codec (setting the proper DAC attenuation). The

scheme that brings the best results is a hybrid between both: calculated attenuation

is first approximated through DAC settings, which in this case has a resolution of

0.5 dB. The remaining level difference is then achieved by a multiplying factor in

the code.

The remaining interaction with the codec to be explained is data transmission. As

mentioned in 3.1.2, the I2S protocol is used for that. Here the roles are the opposite

as in I2C: the MCU is the slave and the codec acts as master. In stereo mode, two

words are sent bidirectionally at the sampling rate, and each one takes 16 bit cycles

for 16-bit length or 32 for 24-bit length. The same slots are preserved in mono mode,

so the bit rate is always:

bit rate = 2× b× fs [bps] (3.4)


where b represents bit cycles per word.

I2S data transmission is handled by the DMA, which also has to be programmed.

4 channels have to be set, 2 for each data direction. However, the right reception

channel is always muted because there is only one incoming source into the MCU

(that is, there is only one microphone). Each channel is managed in a ping-pong

way with a corresponding callback function when one of the two buffers has been

fully transferred. Because the maximum DMA transfer size is smaller than the ap-

plication’s requirements, ping-pong operation has to be implemented at two different

levels:

I Callback level: DMA is programmed with a lower transfer size than the

buffer size. When the specified data amount has been transferred, only a por-

tion of a complete buffer has been transmitted. The callback takes care of

updating ping and pong pointers, so that the transmission continues seam-

lessly.

I Buffer level: Callback operation will eventually reach the end of a ping-

pong signal buffer. At this moment, DMA ping-pong pointers will just start

addressing the other signal buffer.

DMA will set a flag variable when it has gone over a complete buffer. At this point

the CPU will be typically idle and will wake up from a low energy mode. Because of

the ongoing I2S operation, the only possible low energy mode in the Wonder Gecko

is Sleep Mode (Energy Mode 1), which is the lowest power consuming level that

allows synchronous peripheral communication [11].

Two-level ping-pong operation causes extra wake-ups into Run Mode (Energy Mode

0) while waiting for buffer completion, which is a sub-optimal yet unavoidable

method. Not all DMA transfers exhibit such behavior, though. Transfers can be

classified into these three different categories:

1. Muted buffers: is the case of the right recorded channel already described,

or the inactive playback channel during mono operation. In this case, both

source and destination memory addresses are static. One of them corresponds

to the USART Tx/Rx, while the other points a null value. Buffer size is set

to the maximum capable value to minimize wake-ups, and when this chan-

nel generates an interrupt the callback function only activates the mechanism

again. Despite being a dummy DMA operation, it is compulsory in order to

preserve frame synchronization in I2S.

2. Static buffers: That corresponds to stimulus buffers, which always contain

the same periodically-played data. Because of this, ping-pong only operates

Software 23

at callback level here. As for other DMA settings, destination address here

is always USART’s Tx register and source address is provided by callback

updates and accordingly incremented within DMA operation.

3. Dynamic buffers: Recorded signals account for this group, where ping-pong

operation at both levels becomes essential. Here DMA’s source destination is

USART’s Rx register and the destination address is managed by callbacks and

internally incremented.

Apart from target addresses, callbacks and memory increments, an important setting

of DMA is data size. This determines the number of bytes a single DMA data transfer

consists of. Such a fine detail raises a big concern in the Wonder Gecko.

The root of this problem is endianness: I2S protocol is big-endian and Wonder

Gecko’s memory organization is little-endian. Byte swap can be activated within

the USART to circumvent the problem, but USART’s input and output registers

can only hold up to a halfword (2 bytes) at a time. This solves it for 16-bit samples

but not for 24-bits, where a sample must be split into two consecutive accesses to

the 2-byte long USART buffer. This compels the program to manually reverse bytes

in 24-bit mode, adding a processing overhead.

USB interface

In order to implement the USB communication with the device, a project example

provided by Silicon Labs was modified to adapt it to the needs of the application.

Thus, USB descriptors are configured to implement a USB Communication Device

Class (CDC), which is one of the simplest USB classes to transmit data.

USB data transfers are accomplished through non-blocking interrupts of small size,

but two blocking functions were implemented to hide the complexity of interrupt

accesses and allow bigger transfer sizes in transmission and reception, respectively.

Inside the mentioned functions, the non-blocking counterparts are iteratively called

and the device waits in Sleep Mode up to each interrupt completion.

Host

For this study, a desktop computer was used as a host for the sake of simplicity and

to ease profiling procedures. Thus, different Python applications and scripts may be

used to access the device.

Firstly, a Python application developed by TUM’s Chair of Real-Time Computer

Systems was used to test functionality. In order to set the application up to interact

with the device, a specific-interface had to be added to it and a simple protocol

was built among the two systems. The particularity (and also the limitation) of this


application is that it takes charge of all the computation. The tasks left for the device

are solely storing of stimulus buffers, playback, recording and data transmission to

the host. In this way, this application can evaluate low-level peripheral management

but not algorithm partition.

That is the reason why a second set of Python scripts were written from scratch to

emulate real host behavior. These are the following:

I config-oae.py : It sends a command to set OAE parameters up, namely sam-

pling rate, sample size, buffer length and thresholds for artifact rejection.

I calib-oae.py : It launches an in-ear calibration process. A chirp signal (that

is, a sine wave of time-dependent frequency) is used as broadband signal, and

a certain number of buffers are averaged before transforming into frequency

domain to extract the frequency response. This process is repeated for each

one of the two loudspeakers, as spatial diversity may lead to different acoustic

transmission characteristics.

I partitioned-dpoae.py : Launches a DPOAE test, where the processing stages

implemented in the device are selectable. Other selectable parameters are f1,

stimulus SPLs or the number of buffers to be recorded.

Both calib-oae.py and partitioned-dpoae.py are programmed as if they were part

of an actual host environment, which means that they do not demand more data

from the device than needed in a real application. Nevertheless, a debug mode was

implemented where every recorded sample is also transmitted. In this way, and

by testing each different partition, it was possible to check that the C algorithm

inside the device yields the same results as the Python algorithm on the computer.

Figure 3.2 points this fact out.

Algorithm Partitioning

As it was already outlined in 2.2.2, OAE detection needs different processing steps,

also referred as processing stages. For the tested implementation, DPOAE was

chosen, and a RAM-saving approach was taken to be able to work with large buffer

lengths. The designed scheme is outlined in Figure 3.3 and described as follows:

1. Artifact Rejection: The incoming recorded buffer with fixed-point samples

undergoes AR tests to check if it is valid. The data is only further processed if

it scores under a threshold both for large magnitude AR and for repeatability

AR, otherwise is discarded. Both tests are calculated in floating point making

sample by sample conversion to save RAM occupation.

Software 25

1400 1600 1800 2000 2200 2400

Frequency (Hz)

−20

0

20

40

60

SPL

(dB

)

Averaged window

Figure 3.2.: Host vs. Device DPOAE detection. Red data corresponds to the raw recorded dataprocessed by the Python application. Dashed black data represents the frequencycomponents computed by the device, which overlay with values computed on thehost. The two spikes at f1 = 2000 and f2 = 2440 Hz are stimuli’s spectra, bothdisplaying a SPL of 60 dB. The blue dot is the component for the OAE at 2f1 − f2,and the blue line indicates the noise level. Obtained SNR lies around 16 dB.

ArtifactRejection

AveragingFrequency

components

extraction

SNRcomputation

Sample buffer

Rawdata

Valid dataAveraged

data

Frequencycomponents SNRTo Tx:

Figure 3.3.: DPOAE Partition Scheme

I Large magnitude AR: Goertzel algorithm is used to extract the SPL

at 2f1 − f2 and this value is calibrated according to the microphone cali-

bration table before comparing with the threshold.

I Repeatability AR: The buffer containing N samples is split in two.

Average values are calculated for sample 1 to N/2 and for sample N/2

+1 to N. At the same time, the maximum absolute value for the whole

buffer is found out. The final value to be compared against the threshold

is the following:

scoreLarge AR =

∣∣∣∣∣∣∣N2−1∑

n=0

xn −N−1∑n=N

2

xn

∣∣∣∣∣∣∣maxn |xn|

(3.5)


2. Averaging: If the buffer is valid, it is averaged in time-domain into a floating

point average buffer. Again, this calculation is performed through sample-by-

sample floating-point conversion. Cumulative Moving Average (CMA) is used,

so that only a floating-point buffer is needed and so that this buffer always

represents a valid average. CMA update is calculated as follows:

CMAn+1 = CMAn +xn+1 − CMAn

n+ 1(3.6)

3. Extraction of frequency coefficients: A set of frequency components are

calculated through Goertzel up to obtain their uncalibrated power values in

natural scale. The number of frequency bins to be evaluated depends on

multiple factors, where the following reasons can be remarked:

I Sample-domain vs. frequency-domain: One or more frequencies

of interest must be chosen as “signal values”, and another amount of

bins must account for “noise samples”. The first approach might be to

take a fixed number of bins left and right of the frequencies of interest,

which is easy, predictable and convenient. However, it has more physical

significance to define this region in terms of frequency, which results in a

dependency of the number of bins on sampling rate and buffer length, as

evidenced in Equation 2.1. If this frequency bound is also related with

stimulus frequencies (e.g. (f2 − f1)/2 ), then f1 and f2 also play a role.

This region where the extracted frequency components lie will be further

referred as the “observation window”.

I Clearance region: A pure tone may not yield an ideally perfectly sharp

DFT. This means that some frequency bins surrounding this tone may

be influenced by it and follow a slope, which makes them have a greater

value than they would have without the presence of this tone. This phe-

nomenon is called spectral leakage and is the reason why a certain number

of frequency bins left and right of the OAE may be discarded, establishing

a clearance region.

The total number of frequency components to extract is 2 × (halfwidth −clearance) + 1. In the tested implementation, clearance is 0 and halfwidth is

2, so 5 components have to be extracted

4. SNR computation: In the final step, the SNR of the OAE at 2f1 − f2 is

calculated. Consequently, the frequency components calculated in the previous

step consist of a single signal value and a set of noise values. In this step:

a) frequency components are computed in dB and calibrated.

Software 27

b) noise values are arithmetically averaged in dB.

c) noise SPL is subtracted from signal SPL to obtain the SNR.

Partitioning this algorithm means drawing a vertical line between two stages in

Figure 3.3, which will be referred as the partitioning border. Stages left of the

partitioning border are processed in the device and right of it in the computer. The

5 black dots in the figure represent the five possible spots (or partitioning spots) for

the partitioning border, and an arrow coming from each of them points out the data

to be sent, which decreases in volume as the border shifts to the right.

For instance, if no stages are performed on the device, throughput equals sampling

rate times sample size. Nevertheless, some of the incoming buffers may be discarded,

and Artifact Rejection can thus spare their transmission if it is executed on the

device. Extracting frequency components will bring throughput even lower, as only

a portion of the spectrum (i.e. only a subset of the values from the whole FFT) is

needed to calculate the SNR. And if the SNR is also computed on-device, then only

a floating point value out of a whole buffer is sent.

If the partitioning border is placed in the last three partitioning spots, it has to be

decided how often data is sent, which also implies how often the two last stages are

executed. This adds another degree of freedom, which will be further referred as

“OAE extraction rate” or simply extraction rate. Throughput is then divided by

extraction rate for these three spots. Extraction rate equals 4 in the tests, which

means that data is sent every four buffers.

4Experiments

“La inspiracion existe, pero tiene que

encontrarte trabajando.”

“Inspiration exists, but it has to find

you working.”

Pablo Picasso

This chapter explains the method used to profile the performance of DPOAE tests

on the device. By analyzing the results, secondary profiling tests are designed to try

to optimize time efficiency and energy consumption.

As a result of the experiments, the behavior of the platform is characterized and

the extracted information can construct a model to predict the impact of additional

implementation.

Physical setup

The goal of this work was to analyze software’s impact on the following aspects of

the microprocessor:

I Memory (specifically RAM) occupation.

I CPU load.

I Energy consumption

The first item does not need exhaustive profiling, as the bulk of RAM occupation

can be calculated a priori. As for time and energy performance, however, physical

measurements are required.

The procedure to achieve them involved two lines of action:

1. Current measurement: Energy can be extracted from current measure-

ments, as Equation 4.7 indicates:

Physical setup 29

Energy =

ˆPower · dt =

ˆV · I · dt (4.7)

If MCU voltage is constant, which is a reasonable approximation, then energy

basically implies integrating measured current over time. In discretized mea-

surements, summation can approximate integration for the current sequence

in, so that:

Energy ' V ·∑n

in ·∆t = V ·∑n

in/fs (4.8)

where fs is the measuring sampling rate.

2. Timestamping: In order to link the measured current values to a code section

and to calculate the elapsed time in it, the use of timestamps becomes essential.

A way of implementing this is using one or several additional digital lines that

inform whenever the code enters a different section of interest.

For the Wonder Gecko, current was measured placing a 1.5 Ω shunt resistor between

the battery and micro-USB throws of the power switch mentioned in Section 3.1.1.

In this way, if the MCU is powered via micro-USB and the power switch is in battery

position when no battery is connected, all the MCU current flows through the shunt

resistor. As the voltage that is wired to the audio codec perfboard comes from the

debugging mini-USB, the voltage drop at the resistor is only proportional to MCU’s

current. Figure 4.1 represents this situation graphically.

As for timestamping, a single digital output was used. This signal, referred as digital

toggle, has two functions: triggering and timestamping. Triggering is managed by

an initial falling edge, which occurs at the beginning of each profiled program that

is loaded onto the MCU. Any following change of digital level from 1 to 0 or vice

versa (i.e., any following toggle) will be interpreted as a timestamp, which means

that the time when it happens may be stored and tied to a specific point in the code,

according to a pattern that is known beforehand.

National Instruments’ I/O data acquisition card PXIe-6363 was used to retrieve all

the measurements from this setup. Specifically, a differential analog input channel

was configured to measure shunt resistor’s voltage drop, and the digital toggle was

connected to a digital input channel and to a trigger channel. As it has already been

discussed, the latter connection is meant to ensure synchronism between the digital

and the analog signal.

Data acquisition is handled by different Python scripts that operate the National

Instrument PXI measurement system. The basic behavior is the following: digital

and analog input channels are configured, and then the measurements are started

in the digital channel. Analog start is triggered by a falling edge of the digital line

30 EXPERIMENTS

Microcontroller Mini-USB

V1+

D+

D-

VMCU

Micro-USB

V2+

Codec Board

I²S

I²C

Powerswitch

Figure 4.1.: Schematic diagram of experiment configuration

to create a temporal reference between the analog and digital channel. Once the

tests are finished, data acquisition is stopped in all channels and analog values are

aligned to the first falling edge in the digital measurement. Then the timestamps

are used to match the measured current values with the profiled code.

If the test involves USB communication with the device, different OAE commands

can be dynamically launched to modify test parameters and keep record of progress.

If USB is not enabled in the device, the expected number of toggles can be calculated

beforehand and then data can be periodically acquired while counting the measured

toggles to estimate progress.

In any case, toggles help to pack several tests with different parameters into a sin-

gle execution and later label the portions of this measurement corresponding to

individual tests and test fragments.

DPOAE profiling

The parameters used to perform the tests, as presented in Chapters 2 and 3, are the

following:

I Audio codec sampling frequencies [Hz]: 16000, 24000, 32000, 48000.

I Audio codec sample size [bits]: 16, 24.

I OAE buffer length [samples]: 512, 1024, 2048.

I Partitioning scheme: All five possible, as outlined in 3.2.3.

I Host communication: USB.

I Number of recorded buffers: 20.

I OAE extraction rate: 4.

DPOAE profiling 31

I Number of extracted frequency components: 5.

16-bit samples are stored as 16-bit signed integers, while 24-bit samples are stored

as 32-bit signed integers to keep computations manageable. Because of this greater

size, the combination of 2048 sample buffer length and 24 sample size is avoided to

prevent running out of RAM.

Apart from this restriction, all possible parameter combinations are carried out in a

total amount of 100 tests. Profiled code has been compiled using gcc’s optimization

level O2. Experiments with unoptimized code were also conducted. However, they

shall not be discussed in this document, as they led to a much lower performance

and thus this would not be a valid option in a real scenario.

First test: USB link

The first set of tests that were profiled included USB transmission of the results.

They will be considered the reference implementation for the rest of the document.

After taking measurements, with the help of the digital toggle, all measured current

values are classified either as belonging to a processing stage, to the idle time before

a new buffer recording is completed, or to irrelevant inter-test data. Figure 4.2 shows

how a labeled measurement looks.

0.10 0.12 0.14 0.16 0.18 0.20 0.22

time [s] +1.739×102

10

15

20

25

30

Curr

ent

[mA

]

Consumption current of test set

Figure 4.2.: DPOAE profiling capture. This frame corresponds to a frequency extraction partition.In the figure seven processing periods can be observed. In two of them the frequencycomponents are actually extracted, which is plotted in blue. In the rest only AR andaveraging is performed. It is possible to confirm visually that the extraction rate is4. Pink data corresponds to idle time between processing periods. Each peak duringidle time indicates a CPU wakeup to update the DMA at a callback level.

32 EXPERIMENTS

The classified measurement is analyzed as follows: For each test, the current is

averaged separately for processing periods (yielding iA) and for idle time (yielding

iI). The overall mean current (i) from the test is also calculated by averaging both

processing and idle periods together. Additionally, CPU load (denoted as τ) is

estimated as the fraction of time where the CPU remains active over total test time.

These four parameters are not independent with each other, as the overall mean

current can be calculated according to Equation 4.9.

i = τ · iA + (1− τ) · iI (4.9)

Once these features have been analyzed for all tests, individual tests’ features can

be averaged together according to a certain common parameter, e. g. sampling

frequency. In this way, it can be observed how this specific parameter affects per-

formance. For the sake of conciseness, only a small selection of the extracted results

will be discussed in the body of the document, while the rest of them can be looked

up in Appendices A.1 and A.2.

16000 24000 32000 48000Sampling frequency [Hz]

0

5

10

15

20

25

30

CP

UL

oad

[%]

10

15

20

25

30

Con

sum

pti

oncu

rren

t[m

A]

Artifact Rejection

iIiAi

τ

(a) Artifact rejection


0

5

10

15

20

25

30

CP

UL

oad

[%]

10

15

20

25

30

Con

sum

pti

oncu

rren

t[m

A]

Standalone

iIiAi

τ

(b) Standalone

Figure 4.3.: USB test. Sampling frequency performance

Figure 4.3 shows these averaged features leaving sampling frequency as a free pa-

rameter and choosing two different partitions. The following important conclusions

can be extracted from the plot:

I As intuition dictates, CPU load increases proportionally to sampling frequency.

I Idle current is independent of both sampling frequency and partitioning, which

is also expected.

I Active current is also independent of sampling frequency, but it differs along

partitions. An explanation for this can be the different instructions that are

used in each processing stage.

If buffer length is analyzed in the same way, the main difference to be found is that

DPOAE profiling 33

now CPU load is slightly decreasing, as Figure 4.4 points out. To explain this, the

relation between the two quantities can be analyzed using parameters r, ts and c.

ts = 1/fs is the sampling period, which is the time a sample takes to be acquired.

r is the time the MCU needs to process one sample, and c is the static amount of

time that is spent while processing a buffer, which is independent of the buffer size.

This three parameters can be used to relate buffer size N with CPU load, as it can

be seen in Equation 4.10.

τ =N · r + c

N · ts=(r +

c

N

) 1

ts=(r +

c

N

)fs (4.10)

When N is increased, CPU load approximates asymptotically to r/ts. This formula

also describes the sampling frequency’s linear behavior for a fixed or averaged buffer

length.

512 1024 2048Buffer length in samples

0

5

10

15

20

25

30

CP

UL

oad

[%]

10

15

20

25

30

Con

sum

pti

oncu

rren

t[m

A]

Artifact Rejection

iIiAi

τ

(a) Artifact rejection


0

5

10

15

20

25

30

CP

UL

oad

[%]

10

15

20

25

30

Con

sum

pti

oncu

rren

t[m

A]

Standalone

iIiAi

τ

(b) Standalone

Figure 4.4.: USB test. Buffer length performance

Analysis of sample size yields analog conclusions as for the already discussed parame-

ters. Regarding results in Figure 4.5, it can be noted that CPU load is approximately

7% higher for 24 bits in the two first partitioning schemes. The reason behind is

that in these cases the sent data is integer, and thus occupies double the size in

24 bits (24-bit samples are handled as 32-bit signed integers). From averaging on,

samples are converted to floating point and the algorithm does not present further

differences with regard to sample size. Byte reversal in 24 bits, which is carried out

during the idle time, is responsible for the higher idle current under this value.

As a last step in this version’s profiling, processing stages are looked into separately

for every test, and both CPU load and energy consumption are computed. In this

case, all processing periods from a test are discarded except those at positions that

are multiples of the extraction rate. In that way, it is ensured that in the Frequency

extraction and SNR computation stages only OAE extraction periods are averaged.

Back to Figure 4.2, only the second and sixth periods perform OAE extraction.

34 EXPERIMENTS

No process AR Averaging Freq-Ext. Standalone10

15

20

25

30m

AAveraged current

16bit

24bit

(a) Averaged current


5

10

15

20

25

30

%

CPU Load

16bit

24bit

(b) CPU load


15

20

25

30

mA

Active current

16bit

24bit

(c) Active current


15

20

25

30

mA

Idle current

16bit

24bit

(d) Idle current

Figure 4.5.: USB test. Sample size performance

Figures 4.6a and 4.6b depict averages of these energy and CPU load analyses. The

most important remark that can be made is the predominance of the idle periods

in the total energy consumption. This is not the usual nor desired scenario for a

real-time embedded application, so idle consumption is an issue that would require

some insight.

Second test: No USB

A first attempt to decrease idle consumption can be shutting off USB communication.

This is coherent with the final goal of implementing a wireless device, as in that

scenario no USB protocol would be present.

The new averaged energy consumption can be seen in Figure 4.7a. For this test, the

“Tx” stage is not present anymore. Accordingly, the bar labeled as “No processing”

is composed only of idle periods. Energy values are approximately half of those for

the previous section. Idle current is also reduced to half the value, from ∼ 15 mA

Impact of clock frequency 35

No process ARAveraging

Freq-Ext.Standalone

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Ener

gy[m

J]

89.5% 83.5%75.4%

80.5% 77.4%

10.5%10.5%

15.0%

9.0% 9.0%

1.25 mJ 1.27 mJ 1.29 mJ1.35 mJ 1.35 mJ

Energy consumption per extraction cycle

Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle

(a) Energy distribution


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

92.2% 89.6%83.7%

90.5% 89.1%

7.8%7.8%

11.6%

CPU occupation per extraction cycle

Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle

(b) CPU load

Figure 4.6.: USB test. Partition performance

to ∼ 8 mA, as Figure 4.7b indicates.

As it was already mentioned, the DMA and I2S peripherals keep working in idle

mode and they make use of the high-frequency peripheral clock, which limits the

available low energy modes of the microcontroller to Sleep Mode. Both modules are

probably the most important contributors to these 8 mA of idle consumption.


Freq-Ext.Standalone

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Ener

gy[m

J]

100.0% 92.0% 87.4% 76.1% 72.3%

4.2%

12.2% 12.0%4.7%

4.4% 4.4%4.9% 4.8%

4.5% 4.4%0.67 mJ

0.7 mJ 0.71 mJ

0.76 mJ 0.77 mJ

Energy consumption per extraction cycle

Large AR

Rep. AR

Avg.

DFT

SNR

Idle

(a) Energy distribution of partitions

No process AR Averaging Freq-Ext. Standalone

10

15

20

25

30

mA

Idle current

16bit

24bit

(b) Idle current

Figure 4.7.: Test without USB. fclk = 48 MHz

Impact of clock frequency

Disabling USB brought idle consumption down, but for many configurations most

energy is still spent during idle time.

By looking at measurements such as in Figure 4.2, the first aspect that may be

discussed is CPU wakeups within idle periods due to DMA implementation. While

optimizing DMA would indeed reduce idle consumption, the narrow temporal width

of these bursts indicates that the potential gain is not crucial even in the best case.

36 EXPERIMENTS

Apart from this, there is not much more options left from the point of view of software

optimization. Nevertheless, as CPU load is low enough for all considered cases,

clock frequency of the MCU can be reduced in order to observe any improvement in

consumption.

Before the experiments take place, a model can be built to analyze the impact of

clock frequency reduction.

Dynamic power dependency with frequency was already addressed in Chapter 2, and

this can be linked to current:

Powerdynamic = v · i⇒ i =Powerdynamic

v∝ f (4.11)

As voltage is constant, current can also be related to frequency through a linear

model for active (iA) and idle (iI) components:

iA = qA · f + IA (4.12)

iI = qI · f + II (4.13)

Here qA and qI represent the consumed coulombs per clock cycle for both active and

idle modes, while IA and II are the static currents that do not depend on frequency.

Looking at CPU load τ , it can be assumed that it is inversely proportional to clock

frequency:

τ =fτf

(4.14)

The range of τ is [0, 1]. τ = 0 for f → ∞ and τ = 1 for f = fτ , so f ∈ [fτ ,∞). fτ

can be regarded as the full load frequency.

The overall consumption current is the average of iA and iI , weighted by τ :

i = τ · iA + (1− τ) · iI = (iA − iI) · τ + iI (4.15)

Now, substituting 4.12, 4.13 and 4.14 in 4.15 the total current can be expressed in

terms of frequency:

i (f) = (qA · f + IA − qI · f − II) ·fτf

+ qI · f + II =

= qI · f +fτ · (IA − II)

f+ fτ · (qA − qI) + II (4.16)

And by differentiating, the optimal operating frequency where consumption is min-

imized can be found:

Impact of clock frequency 37

∂i (fopt)

∂f= qI −

fτ · (IA − II)f2opt

= 0

qI =fτ · (IA − II)

f2opt

fopt =

√fτ · (IA − II)

qI(4.17)

imin = i (fopt) = 2√fτ · qI · (IA − II) + fτ · (qA − qI) + II (4.18)

Only if fopt ≥ fτ . If fopt lies out of the function’s domain, then f ′opt = fτ and

consequently:

i′min = i (fτ ) = qA · fτ + II (4.19)

In the platform, measurements were taken for fmax/2 and fmax/3 without USB

running, and the obtained active and idle currents and CPU load were compared

with those at fmax that were already measured for the repetition test without USB.

While iI is reasonably constant for all tests under a certain f , CPU load and iA can

differ greatly. Table 4.1 shows the results choosing iA from the averaging partition

and τ as a rough average of that same partition.

Table 4.1.: Measured values for different clock frequencies. Chosen partition: Averaging

iA iI τfmax = 48 MHz 18.5 mA 7.8 mA 5.6%

fmax/2 = 24 MHz 11.1 mA 5.6 mA 11.0%fmax/3 = 16 MHz 8.3 mA 4.5 mA 16.0%

Using linear regression, the following model parameters were calculated:

qA = 316.3 pC

qI = 100.5 pC

IA = 3.4 mA

II = 3.0 mA

fτ = 2.5 MHz

These lead to a minimum consumption current of 4.1 mA for fopt = 2.9 MHz.

38 EXPERIMENTS

2.5 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0

Clock frequency [MHz]

0

20

40

60

80

100

CP

Ulo

ad[%

]

iAiA · τiI

iI · (1− τ)

i

τ

0

5

10

15

20

25

Con

sum

pti

oncu

rren

t[m

A]

Figure 4.8.: Impact of clock frequency on current consumption for the averaging partition. Thegreen dotted line shows a linear approximation of the total current, ignoring theinverse term with f in 4.16. Large dots represent experimental data

The difference between IA and II is smaller than 0.5 mA. This explains why fopt is so

close to fτ and why the inverse term is almost negligible, as Figure 4.8 indicates. In

fact, if static currents for both modes were equal (IA = II = IS), Equations 4.16, 4.17

and 4.19 would be transformed as follows:

i (f) = qI · f +fτ · (IS − IS)

f+ fτ · (qA − qI) + IS = qI · f + fτ · (qA − qI) + IS

(4.20)

fopt =

√fτ · (IS − IS)

qI= 0; thus f ′opt = fτ (4.21)

i′min = i (fτ ) = qA · fτ + IS (4.22)

The inverse term would disappear, and i(f) would therefore become linear. The

different terms in Equation 4.20 can be explained in this way:

I qI · f : Is the only term related to frequency, and it only depends on how fast

iI decreases with f . It actually represents the idle dynamic current.

I fτ · (qA − qI): The difference (qA − qI) represents the coulombs per clock cycle

that solely the CPU consumes. It is multiplied by the full load frequency, so

it accounts for the current that the CPU would draw if it were permanently

working. Its independence with f gives a notion that a CPU task cannot

consume less by reducing frequency, as changes in CPU load and active current

Averaging schemes 39

cancel out.

I IS : Is the static current.

The decision to consider the nonlinear term in Equation 4.16 depends on the obtained

IA for a specific set of test parameters. It can be expected that, for this particular

platform, IA will not be much larger than II , as CPU activation should increase

dynamic consumption in a much heavier way than static consumption.

In any case, the overall conclusion that is valid for all cases is that idle dynamic

current increases with clock frequency at a path of qI = 100.5 pC and that II = 3

mA.

Averaging schemes

Averaging takes up to 6% of energy consumption in the reference implementation.

Consequently, it would be interesting to try to optimize this stage of the processing

chain by using a different approach.

For instance, CMA has been the chosen algorithm because it was assumed that

the averaged buffer should represent a physically meaningful average of the record-

ings throughout every buffer iteration. Nevertheless, two facts advise against this

normalization:

1. The averaged buffer is only further processed every nth buffers, where n equals

the extraction rate. In this sense, normalizing can be performed only one out

of n times.

2. SNR express a ratio between magnitudes. If these magnitudes are scaled by

the same factor, then scaling does not have an impact on the ratio. Scaling is

only needed when absolute physical values are required (e.g. if the algorithm

calculates SPL at f1 and f2 to ensure that it adapts to the specified L1 and

L2 values).

For these reasons, normalization of summed data was not included in the analysis.

In spite of this, the conducted experiments did profile normalization to certify that

in general it is not a good strategy.

Apart from this, up to this point artifact rejection has only been considered in its

simplest form, as a way of deciding if a buffer should be computed into the average

or discarded. As outlined in Section 2.2.2, the AR score can also be an averaging

weighting factor rather than a discarding one, taking the most out of poor quality

data.

Another dimension of averaging are data types. Samples from the codec always come

in fixed point format, whereas the actual implementation operates with floating point

40 EXPERIMENTS

arithmetic. Floating point is more expensive in terms of computation and energy,

but easier to implement thanks to its extended dynamic range. Fixed point is more

efficient but it involves extra concerns about over- and underflowing in order to keep

computations right.

In Table 4.2 a taxonomy of the different averaging schemes is outlined. One aspect

to classify them is whether they use AR to reject buffers or to weight them, and the

other one is whether data is normalized at each iteration or not.

Table 4.2.: Averaging schemes

Artifactrejection

Artifactweighting

Non-normalized Sum Weighted sumNormalized CMA Weighted CMA

For cumulative averaging an averaged buffer (~a) is preserved between iterations,

whereas for summation the preserved one is a summed buffer (~s). In either case

the contents of a sample buffer (~x) are added to the preserved buffer. This sample

buffer is no longer required after averaging, so weighting can be done in place.

For fixed point, the mentioned sample buffer is simply the one where the DMA moves

samples from the codec into. For floating-point, though, an intermediate floating

point sample buffer must be allocated to perform data-type conversion, requiring

extra memory.

However, this is only required if averaging is performed in block. If memory saving

is critical then samples can be converted individually.

For block processing, ARM provides a DSP library with different vectorized func-

tions. The ones that will be used here are:

I vector value conversion: Converts values from a vector with a certain data type

into a vector with another data type. In this section fixed point to floating

point is used.

I vector scale: Multiplies vector values by a constant factor into a destination

vector. If division is required, the reciprocal of the dividing factor must be used

as scale factor. It allows in-place computation so that source and destination

are the same memory region.

I vector shift : Shifts fixed-point values from a vector a certain number of bits

either right or left. It also allows in-place computation.

I vector add : Adds the values of two vectors into a destination vector.


I vector sub: Subtracts one vector’s values from the other vector’s values into a

destination vector.

While the documentation does not specify if adds and subs can be computed in-

place, experiments within the platform proved that at least it is possible in the

Wonder Gecko.

As a shortcut to write down the formulas for the averaging schemes, Equations 4.23

and 4.24 introduce the notation for buffer averaging for the nth iteration:

~an =

∑ni=1 ~xi

n=~sn

n(4.23)

~aWn =

∑ni=1 ωi · ~xi∑ni=1 ωi

=~sWn

Wn(4.24)

where Wn is the total sum of all weights up to the nth iteration:

Wn =n∑i=1

ωi =n−1∑i=1

ωi + ωn = Wn−1 + ωn (4.25)

Provided that at the beginning of the nth buffer iteration the sample buffer (~xn) and

either the summed (~sn−1) or averaged (~an−1) buffer from last iteration are available,

the update algorithms for every average variant are the following:

Sum: ~sn =

n∑i=1

~xi =

n−1∑i=1

~xi + ~xn = ~sn−1 + ~xn

Weighted sum: ~sWn =

n∑i=1

ωi · ~xi =

n−1∑i=1

ωi · ~xi + ωn · ~xn = ~sWn−1 + ωn · ~xn

CMA: ~an =~sn−1 + ~xn

n=

(n− 1) · ~an−1 + ~xn

n= ~an−1 +

~xn − ~an−1n

Weighted CMA: ~aWn =~sWn−1

Wn+ωn · ~xnWn

=Wn−1

Wn· ~aWn−1 +

ωn

Wn· ~xn =

= (1− αn) · ~aWn−1 + αn · ~xn

Fixed point data types as specified by ARM’s C libraries have 1.7, 1.15 and 1.31

formats. This means that the most significant bit is used as a sign and the remain-

ing bits represent the fractional part of a real number in the range [−1, 1). But

these data types are defined as signed integers of N bits, which lie in the range[−2N−1, 2N−1 − 1

]. Consequently, floating point conversion for the DSP library

hides an implicit extra scaling as described by Equation 4.26. The principles that

42 EXPERIMENTS

led to discarding average normalization are also valid here, but floating scaling is

already integrated in the block processing functions, so it can only be spared for

sample processing.

xfloat =xfixedb

=xfixed2N−1

(4.26)

Table 4.3 summarizes the arithmetic operations needed for every averaging approach.

Table 4.3.: Averaging operations (Div. = Divisions, Mult. = Multiplications, Sub. = Subtrac-tions, Add. = Additions)

Div. Mult. Sub. Add.Sum 1

Weighted sum 1 1CMA 1 1 1

Weighted CMA 2 1

Operations columns are sorted in terms of typical relative computational cost. That

is, divisions are typically the most expensive operation, followed by multiplication,

which is also more expensive than the rest of them. By taking this into account

it can be noted that averaging schemes are also somewhat sorted along rows from

expectedly cheaper to expectedly expensive.

But this situation can be tweaked to optimize some averaging versions, as both mul-

tiplications and divisions can be regarded as a vector scaling. In this way, divisions

can be transformed into multiplications by the reciprocal number, which should yield

better performance.

Then the table is transformed into Table 4.4.

Table 4.4.: Optimized averaging operations (Scl. = Scalings, Sub. = Subtractions, Add. =Additions)

Scl. Sub. Add.Sum 1

Weighted sum 1 1CMA 1 1 1

Weighted CMA 2 1

Energy and time performance of these four different functions were measured for

different buffer lengths in 3 different ways: in a straightforward implementation, in

an optimized approach according to Table 4.4 and also using the mentioned DSP

functions. DSP functions can also be referred as block processing functions, in oppo-


sition to the other variants that are executed sample-by-sample inside a loop. These

can then be referred as in-loop variants.

All of these measurements were repeated for 32 and 16 bit fixed point and for floating

point data type.

Then, time and energy consumption for all variants were plotted in function of

the buffer length, and a linear behavior was observed in almost all cases (See Ap-

pendix A.3). By dividing the slopes of these plots by the slope of a reference av-

eraging scheme, the resultant values come up as the proportion of time and energy

that each scheme takes in comparison with the reference one. These are called time

ratios and energy ratios.

In Section 4.2, floating point straightforward CMA was used as the averaging scheme,

so this one will be used as the reference scheme. Time ratios for the optimized and

DSP implementations under this reference are summarized in Table 4.5.

The obtained results generally confirm the theoretical discussion. The first con-

clusion to be drawn is that optimized software represents a major improvement

in performance. By using the optimized implementation of CMA instead of the

straightforward one, time computation per sample reduces down to 41.9% of the

original runtime.

Table 4.5.: Time ratios for different averaging schemes

Floating point Fixed point 32 bits Fixed point 16 bitsOptimized DSP Optimized DSP Optimized DSP

Weighted CMA 44.3% 75.7% 30.3% 87.9% 30.2% 47.1%CMA 41.9% 93.2% 30.2% 62.3% 32.5% 36.7%

Weighted sum 41.9% 64.6% 28.0% 50.1% 25.6% 27.9%Sum 37.2% 44.7% 25.6% 12.2% 25.6% 8.7%

Furthermore, implementations using block processing functions are in general terms

outperformed by these optimized in-loop variants. As the DSP library presumably

uses SIMD instructions inside its functions, it could be claimed that the generated

code for in-loop functions with the O2 optimization level also does, which would

explain this apparent contradiction. Only for the summing scheme in fixed point

this tendency is reverted.

As for the most efficient data type, 32-bit fixed point obtains the best scores for the

optimized version and 16-bit fixed point for DSP.

Normalization was also included in the experiments, both through arithmetic scal-

ing and through shifting. For this case, two extra functions were profiled where

normalization is added to the summing stage. Results in Table 4.6 indicate that

44 EXPERIMENTS

they are sensibly more expensive than summing. This cost is not only caused by

arithmetic operations but also by the fact that normalized data has to be written

to a different buffer than summed data, which causes memory writes to double.

Table 4.6.: Time ratios for normalized averaging schemes

Floating point Fixed point 32 bits Fixed point 16 bitsOptimized DSP Optimized DSP Optimized DSP

Sum and scaling 47.0% 57.0% 34.9% 50.0% 34.9% 27.9%Sum and shift 34.9% 22.1% 39.6% 19.8%

FFT and Goertzel algorithm

In the reference implementation, all frequency extractions are performed through

Goertzel’s algorithm. In its original form, the algorithm consists of a first stage where

the signal is processed through a digital filter and a final one where the complex

component is computed out of the two last values from the filtered signal. However,

if the complex value is not required, the two same values can be calculated differently

to obtain the squared magnitude. This is the approach in the implementation, as

phase information is not needed.

The literature consulted recommends Goertzel’s algorithm only for a small number

of DFT terms, as the computation of the whole spectrum presents quadratic com-

plexity [5]. Goertzel is said to be more efficient than FFT if the number of extracted

terms K satisfies:

K < log2N (4.27)

where N is the length of the signal buffer in samples.

In order to determine if this also holds in the device, the actual implementation (also

referred as in-loop Goertzel) was compared with the real FFT functions from ARM’s

DSP library. To ensure that both versions are compared in the same conditions, a

final stage was added to the FFT to calculate the squared magnitude of the K

computed complex values.

A third strategy was added to the test, where Goertzel is also performed with the

help of the DSP library, in a block processing approach. In particular, the first stage

filter is realized through a function that performs biquad filtering.

Similarly to the previous section, all tests where executed for 16-bit fixed point,

32-bit fixed point and floating point for different buffer lengths.

FFT and Goertzel algorithm 45

Just as before, the results point out that DSP functions do not accelerate Goertzel’s

algorithm, but they are around 100% slower. As for FFT, it outperforms Goertzel

for a lower number of terms than predicted by Equation 4.27. In fact, this quantity

increases less than in theory with N . Regarding the in-loop Goertzel implementa-

tion, it can be stated that FFT is preferable in floating point if K is greater than 7 or

if it is greater than 14 in 32-bit fixed point. For 16-bit fixed point, the computation

of only 5 terms is already faster for FFT. The number of squared terms K does not

seem to have a great impact on the overall FFT performance.

As for the most convenient data-type to operate with, if pure FFT is compared then

16-bit fixed point is the best choice. Surprisingly, floating point scores better than

32-bit fixed point both in time and energy efficiency. Regarding Goertzel, there are

practically no differences among data types for the straightforward implementation.

Figure 4.9 gives an overview of the experiment’s results, and Appendix A.4 can be

read for a deeper analysis.

5 10 15 20 25 30K (Goertzel DFT terms)

0

2

4

6

8

10

12

Du

rati

on[m

s]

SP floating point. N = 512

FFT

Software Goertzel algorithm

DSP Goertzel filter

(a) Floating point. N = 512


0

2

4

6

8

10

12

Du

rati

on[m

s]


FFT


DSP Goertzel filter

(b) Floating point. N = 2048


0

2

4

6

8

10

12

Du

rati

on[m

s]

16-bit fixed point. N = 512

FFT


DSP Goertzel filter

(c) 16-bit fixed point. N = 512


0

2

4

6

8

10

12

Du

rati

on[m

s]


FFT


DSP Goertzel filter

(d) 16-bit fixed point. N = 2048

Figure 4.9.: Goertzel and FFT time performance

A last remark can be made about DSP implementations of filtering and FFT re-

46 EXPERIMENTS

garding memory. None of them supports in-place computation, which demands the

use of a destination buffer of N samples in addition to the source buffer.

Audio codec

The audio codec in use was also profiled for a sampling frequency range between 16

kHz and 48 kHz, for SPLs between 50 and 70 dB. Experiments measured currents

within a maximum range of 6.5 to 7.4 mA, which gives an idea of the low influence

of the considered parameters in consumption.

Taking 7 mA as the measured value, power consumption equals 23.1 mW. This

complies with the provided values in the datasheet for PLL use.

5Case scenarios

“Yo soy yo y mi circunstancia, y si no

la salvo a ella no me salvo yo”

“I am I and my circumstance, and if I

don’t save it I don’t save myself”

Jose Ortega y Gasset

This chapter applies the obtained results to specific case scenarios, using wireless

consumption models as a guide for the behavior of a complete system.

Global model

Device characterization

Chapter 4 has drawn the following conclusions in relation to the parameter choices

for the studied DPOAE algorithm:

I Sampling rate: A higher sampling rate increases linearly the CPU load, which

means a rise in consumption.

I Buffer length: Long buffers soften the impact of overhead processing at the

expense of using more RAM memory.

I Averaging and DFT: Algorithms have been sorted based on performance, the

most efficient among possible implementations should be used.

I Fixed vs. floating point: It may be advantageous to work with fixed point

instead of with floating point in some cases, even though not all implications

of fixed point have been looked into detail.

I CPU frequency: It should be brought down to an optimal value. This will

typically lie close to the full CPU load frequency or even below, but in any

case the chosen frequency must respect the limit imposed by fτ .

48 CASE SCENARIOS

I Partition stage: If the device takes over more data processing, data throughput

is reduced.

This last point indicates the potential gain in wireless consumption by heavier on-

device processing. If all other parameter choices can be fixed, a wireless consump-

tion model will help to determine the best partitioning spot of the communication-

computation trade-off.

Wireless consumption model

The chosen wireless technology must support the data throughput required by the

partition stage while consuming as less as possible. Required throughput can be

higher than 1 Mbit/s for raw audio transmission or practically negligible for a stan-

dalone version.

Because of this disparity, two technologies have been initially considered: standard

Bluetooth and Bluetooth Low Energy (BLE).

The characteristics of Bluetooth have been extracted from [12]. According to this

source, Bluetooth’s maximum data rate is 720 kbps, and it consumes 102.6 mW in

transmission. As the reference voltage supply for all the study has been 3.3 V, this

power value corresponds to 31.1 mA in current.

BLE protocol has been considered under the terms discussed in [13]. According to

this, BLE is capable of sending up to four notifications each connInterval seconds.

A notification is a message in a high layer of the protocol stack that can carry up to

20 bytes of data payload, and connInterval is the time elapsed between two of these

transmissions. Considering maximum sizes, BLE carries 80 bytes per connInterval.

For the minimum value of connInterval (7.5 ms), BLE can theoretically achieve a

throughput of 85.33 kbps in an environment without errors. Increasing connInterval

will decrease throughput rate and consumption. An estimation of how consumption

current is affected by this parameter can be found in Figure 5.1.

Systematic setting of parameters

With the help of built models and gathered information, the following method can

be used to guide the process of deciding system settings for DPOAE:

1. Fix sampling rate and buffer length for a given frequency resolution. The aim in

this first step is to minimize sampling frequency and maximize buffer length,

as this leads to better performance. Nevertheless, two restrictions must be

considered:

Systematic setting of parameters 49

Figure 5.1.: BLE consumption current. Image source: [13]

a) Nyquist criterion: sampling rate must be greater than twice the highest

frequency of interest.

b) Available RAM: Buffer length improves performance at the expense

of RAM occupation. The most minimal setup consists of two stimulus

buffers, two ping-pong reception buffers and an average buffer. This leads

to a minimum RAM occupation described by Equation 5.28:

5× buffer length× sample size (5.28)

Extra memory costs are not considered here, but in-ear calibration tables,

extracted frequency values or additional variables do also take up RAM.

Stimulus buffers can be shorter than the rest, but this will reduce the

achievable frequency resolution for f1 and f2.

2. Choose averaging and frequency extraction implementations, either for fixed

or floating point. Choice criteria here include microcontroller architecture,

available RAM and number of significant DFT terms.

3. Taking the OAE extraction rate into account, predict values of τ and iA at the

maximum clock frequency for each partition and find the optimal operating

frequency that minimizes consumption current. For simplicity, a linear model

with IA = II is assumed.

4. Choose the most suitable wireless technology and estimate its consumption

for each partition, accordingly to the required data throughput. Add it to

computation consumption and pick the lowest result.

From the method above, the following application specifications can be deduced:

50 CASE SCENARIOS

I From the perspective of the medical application: frequency resolution, highest

frequency of interest, span of the observation region and refresh rate. The last

one can be defined as the rate at which new information must be presented

on the host. In regard of the highest frequency of interest, frequency range of

stimulus signals in DPOAE spans normally from 2000 to 4000 Hz [6].

I The selected microcontroller imposes some restrictions through its architecture

and the size of its RAM.

In order to exemplify this decision making process, two case scenarios will be pre-

sented. Both will exhibit similar medical specifications for different hardware. In all

cases the application will be based in ARM’s Cortex-M4.

First case scenario: High performance hardware

For the first case scenario, the application will be executed on a Cortex-M4F (Same

as in the Wonder Gecko) with 64 KB RAM. The frequency resolution cannot be

worse than 15 Hz and the noise will be always averaged over a region around the

OAE at 2f1−f2 with a total frequency span of f2−f1 (f2 = 1.22f1). The host must

receive new data every 0.125 seconds. Buffer must be weighted according to a score

obtained through large magnitude artifact rejection.

As the maximum frequency of interest is 4000 Hz, any sampling rate above 8000 Hz

is valid. The lowest profiled sampling rate will be selected, namely 16000 Hz. This

forces the buffer length to fulfill the condition:

buffer length >fs∆f

=16000

15= 1067 samples

Consequently, the next power of 2 is chosen, 2048. The achieved frequency resolution

is 16000/2048 = 7.8125 Hz.

No requirement is provided about sampling bits, so 16 bits will be chosen for now.

This makes the basic memory consumption be 5× 2048× 2 = 20 KB, according to

Equation 5.28. Therefore, there are still 44 KB of RAM left.

The observation region is (f2 − f1) = 0.22f1, which for the worst case (f1 = 4000)

equals 880 Hz or 880/7.8125 ' 113 DFT terms (this number must be odd so that

the region is symmetric around the OAE).

A new buffer is acquired every 2048/16000 = 0.128 seconds, which forces the ex-

traction rate to equal one in order to refresh every 0.128 seconds. This is slightly

above the requirement, but as refresh rate is not a critical parameter it can still be

considered valid.

As for algorithm choices, there are multiple suitable alternatives. As the core is

a Cortex-M4F and there is enough RAM, it is sensible to work in floating point.


Thus, averaging will be performed through a floating point implementation of the

weighted sum algorithm. For this averaging scheme the in-loop variant is preferred

over the use of DSP functions. The high number of extracted frequency components

also speaks in favor of using FFT, again in a floating point version. In addition,

Repeatability AR is not needed because weighting is only based in Large magnitude

AR.

Algorithm choices demand more RAM than originally. Averaged samples are single

precision floating point now so the averaged buffer doubles in size, while an extra

floating point buffer of length 2048 is needed for FFT. RAM occupation becomes:

4× buffer size× 2 + 2× buffer size× 4 = 16× buffer size = 16× 2048 = 32 KB

Still half of the RAM space is free for the stack, calibration tables and other variables.

In order to predict τ and ıA for all partitions, some reference values are needed

for current consumption and duration of the different processing stages. These are

obtained from the standalone version for the already selected sampling rate, buffer

length and data type in the test without USB at maximum clock frequency.

Experimental data shows that current values do not change substantially between

different implementations of the same algorithms, but speed does. For this reason,

the measured durations for all stages are recalculated according to the experimental

results:

I Because only a value is needed, it is still recommendable to use Goertzel for

Large magnitude AR.

I According to Table 4.5, the selected averaging algorithm reduces time to 41.9%.

I According to experimental data, FFT is 1.4 times slower than Goertzel in

floating point for the original 5 extracted DFT terms. On the other hand, if

Goerztel were used it would be roughly 113/5 = 22.6 times slower. It will be

assumed that the overall FFT speed for N = 2048 is not greatly affected by

squaring 113 terms instead of 5.

I SNR computation also becomes slower as a result of the increment in DFT

terms. Thus, the original time must be multiplied by 113/5 = 22.6.

Table 5.1 summarizes current value istage and duration tstage for the different process-

ing stages, and it also indicates the magnitude of the new value t′stage as a result of

software optimization. As discussed, performance decreases in Frequency extraction

and SNR computation.

Now, iA and τ can be computed for the different partitions. τ is calculated as the

52 CASE SCENARIOS

Table 5.1.: Summary of stage performance for first case scenario at full clock frequency

istage tstage t′stageLarge AR 19.1 mA 0.9 ms 0.9 msAveraging 16.0 mA 1.8 ms 0.8 ms

Frequency extraction 21.8 mA 2.9 ms 4.0 msSNR computation 17.2 mA 0.6 ms 12.8 ms

sum of involved active stages’ t′stage over buffer period (128 ms). iA is the average

of involved istage weighted by t′stage. Thanks to the assumption IA = II , qA can be

calculated from Equation 4.12 just as:

qA =iA|f=fmax − II

fmax(5.29)

After that, optimal clock frequency and overall computation current can be esti-

mated using Equation 4.20. f must be greater than fτ to avoid overrun, so for

simplicity it has been decided to define a set of available clock frequencies from 1

to 48 MHz in steps of 500 kHz and round up to the immediate bigger value in this

set. Computation current iC , clock frequency fclk and all intermediate values for

this case scenario can be found in Table 5.2.

Throughput can also be estimated by determining the size of the message that is

created every buffer period. Then the BLE parameter connInterval can be estimated

as the size of a connection message over throughput. The results for the case scenario

are visible in Table 5.3.

Table 5.2.: Estimation of partition computation consumption. First case scenario

iA τ fτ fclk iCNo processing 7.8 mA 0.0% 0 MHz 1 MHz 3.1 mA

Averaging 17.7 mA 1.3% 0.6 MHz 1 MHz 3.2 mAFrequency extraction 20.6 mA 4.4% 2.1 MHz 2.5 MHz 3.8 mA

SNR computation 18.3 mA 14.4% 6.9 MHz 7 MHz 5.2 mA

Because all buffers are averaged through weighting, AR does not bring throughput

down for this case and is not an issue within the communication-computation trade-

off. AR partition is, therefore, not considered for this case.

“Frequency extraction” and “SNR computation” are the only partitions with valid

connInterval values. This means that “No processing” and “Averaging” partitions

should use regular Bluetooth, whose consumption is much greater than the differ-

ences in computation consumption. This is a sufficient argument to discard these


Table 5.3.: Estimation of partition throughput. First case scenario

Messagesize

Throughput connIntervalComm.current

No processing 4096 B 256.00 kbps 2.5 ms ∼ 30 mAAveraging 8192 B 512.00 kbps 1.3 ms ∼ 30 mA

Frequency extraction 452 B 28.25 kbps 22.7 ms ∼ 2 mASNR computation 4 B 0.25 kbps 2560.1 ms ∼ 0.3 mA

two first partitions.

In the case of SNR computation, connInterval value is much bigger than the required

refresh rate, which would cause an unacceptable delay. connInterval = refresh rate

= 125 ms would be preferably used, where now the size of the connection message

is only four bytes.

Looking back to Figure 5.1, 125 ms yield a value between 0.1 and 1 mA, let 0.3 mA be

an approximation of it. For 22.7 ms, the current value is likely to lie between 2 and

3 mA. The difference in computation consumption between the last two partitions is

only 1.4 mA, while for communication consumption is apparently greater. Although

the available data for BLE consumption only allows such a rough estimation, in this

case scenario SNR computation is the most likely partition to be the most efficient.

Second case scenario: mid-performance hardware

In this second scenario, the available MCU has a Cortex-M4 core with 8 KB RAM.

This core is similar to the Cortex-M4F but lacks a hardware FPU, which may cause

floating point operations to take longer to execute.

Medical requirements in this case are less demanding: frequency resolution must be

around 30 Hz and noise is calculated averaging in a region of 125 Hz around the

OAE. Buffers must still be weighted according to Large magnitude AR and refreshing

rate needs to be less than 0.2 ms now.

Sampling frequency should be kept as low as possible. If buffer length equals 512:

∆f =16000

512= 31.25 Hz

Under this ∆f , for the observation region 125 Hz = 5 frequency bins, as in the

reference implementation. Frequency resolution cannot be pushed below that figure,

as basic RAM consumption assuming 16 bits per sample already takes more than

half of the RAM:

54 CASE SCENARIOS

5× 512× 2 = 5 KB

Furthermore, if there is no FPU, all arithmetic should be fixed point. In that case

averaging and frequency extraction should make use of 32-bit registers to handle

range issues, making the averaged buffer to double in size:

4× 512× 2 + 512× 4 = 6 KB

Only 2 KB are left for stack, calibration tables and static variables.

Goertzel shall be used instead of FFT because it is faster in 32-bit fixed point for 5

frequency bins. Weighted sum is also preferably done in-loop for this data type, as

it was in the former case scenario for floating point.

Buffer period is 512/16000 = 32 ms. As a result, extraction rate has to be set to 6

so that:

refresh rate = 6× 32 = 192 ms < 0.2 s

If Table 5.1 is repeated under these circumstances, it becomes:

Table 5.4.: Summary of stage performance for second case scenario at full clock frequency

istage tstage t′stageLarge AR 17.7 mA 0.2 ms 0.2 msAveraging 16.1 mA 0.5 ms 0.1 ms

Frequency extraction 20.4 mA 1.0 ms 1.0 msSNR computation 17.2 mA 0.6 ms 0.6 ms

The only stage experiencing improvement this time is averaging. The first reason for

this is that the number of extracted frequency terms remains 5, as in the reference

implementation. In AR and Frequency extraction, the explanation is completed

by the fact that Goertzel performance presents a very small variance among data

types. As for SNR computation, no big differences have been found in the conducted

experiments by not using the FPU.

An extraction rate greater than one has an influence in the calculation of clock

frequency and average current. For the last two partitions, frequency components

and SNR are only obtained every six buffers. τ is then the weighted average of the

CPU load during OAE extraction (denoted as τext) and that from the “Averaging”

partition. The same applies for iA, where now iA−ext describes the averaged current

when all stages in the partition are executed.

Table 5.5 gathers some of the values used for that estimation. Note that the oper-


ating clock frequency is chosen in a conservative approach considering τext and not

τ , so that OAEs can be extracted before the buffer period is over.

Table 5.5.: Estimation of partition computation consumption. Second case scenario

iA−ext τext fext fclk iCNo processing 7.8 mA 0% 0 MHz 1 MHz 3.1 mA

Averaging 17.1 mA 1.1% 0.5 MHz 1 MHz 3.2 mAFrequency extraction 19.5 mA 4.2% 2.0 MHz 2.5 MHz 3.4 mA

SNR computation 18.8 mA 6.0% 2.9 MHz 3 MHz 3.5 mA

Table 5.6.: Estimation of partition throughput. Second case scenario

Messagesize

Throughput connIntervalComm.current

No processing 1024 B 256.00 kbps 2.5 ms ∼ 30 mAAveraging 2048 B 85.33 kbps 7.5 ms ∼ 10 mA

Frequency extraction 20 B 0.83 kbps 768.0 ms ∼ 0.1 mASNR computation 4 B 0.17 kbps 3840.1 ms ∼ 0.1 mA

When examining Table 5.6 for communication consumption, it can be noted that,

thanks to the greater refreshing rate, the Averaging partition can now be imple-

mented with BLE, as it yields a valid connInterval value. However, it leads to a

wireless consumption of 10 mA, which makes it inappropriate.

For Frequency extraction and SNR computation connInterval is too high and causes

latency. It should be set in both cases to the refresh rate, 200 ms. It corresponds to

a consumption of around 0.1 mA in Figure 5.1, but in any case it is the same value

for both. The consumption gap between both partitions remains the same and thus

Frequency extraction partition stands as the new best alternative, although SNR

computation does not lie far from there.

6Conclusions

“Mientras haya un misterio para el

hombre, ¡habra poesıa!”

“As long as there is a mystery for man,

there will be poetry!”

Gustavo A. Becquer, Rima IV

This study has provided a methodical way to analyze the impacts of software choices

in the performance of OAE algorithms. As a result, it has lead to a model that can

predict the behavior of such algorithms when a set of conditions is provided.

It has been concluded that the sampling rate should be set as low as the Nyquist

criterion permits and the buffer length as large as the device’s RAM allows, in order

to get the best results in terms of frequency resolution and energy consumption.

Clock frequency should be adjusted to the optimum value, which will typically lie

close to the full load frequency.

Regarding algorithm implementations, average normalization should be avoided and

Goertzel may be used instead of FFT if the number of extracted frequency compo-

nents does not reach a certain threshold. All these algorithms may also be imple-

mented in fixed point for better efficiency.

In a wireless scenario, algorithm partitioning emerges as a parameter to minimize

overall energy consumption. What the case scenarios show is that the best partition

strategy implies performing at least Artifact Rejection, averaging and frequency

extraction on the device.

Future development

In spite of the accomplished progress, there is still a certain number of topics that

could be discussed and implemented using this work as a base. Some of them will

be listed below:

Future development 57

I Codec profiling: The audio codec, despite being vital for the application, has

not been profiled in depth. Different parts could be integrated in the system

and examined regarding both performance and consumption, and then this

new variable could be added to the equation to achieve a better solution.

I Wireless implementation: Wireless communication has only been addressed

theoretically. Using predictions from this document, a wireless version of the

device can be implemented and the exactitude of predictions can be asserted.

I TEOAE analysis: This other method could be studied in a similar manner

as DPOAE has been, obtaining a global vision of OAE algorithms.

I Calibration analysis: In-ear calibration also implies somewhat similar pro-

cessing as OAE algorithms. On/off-device decisions and communication-computation

trade-off also apply for them, which makes them a suitable research issue.

I Fixed point implementations: Fixed point has been discussed only superfi-

cially. If this is a real option, then actual fixed point versions of the algorithms

should be implemented (not only profiled) and its accuracy could be confronted

with floating point’s.

I Clinical significance: Clinical performance of the algorithms has been omit-

ted, focusing on computation performance. Interdisciplinary work to evaluate

the system under realistic parameters, along with real clinical testing, are es-

sential to deliver a reliable final product.

AAppendices

DPOAE with USB. Current and CPU Load

Profile

Sampling frequency


0

5

10

15

20

25

30

CP

UL

oad

[%]

10

15

20

25

30

Con

sum

pti

oncu

rren

t[m

A]

No processing

iIiAi

τ


0

5

10

15

20

25

30

CP

UL

oad

[%]

10

15

20

25

30

Con

sum

pti

oncu

rren

t[m

A]

Artifact Rejection

iIiAi

τ


0

5

10

15

20

25

30

CP

UL

oad

[%]

10

15

20

25

30

Con

sum

pti

oncu

rren

t[m

A]

Averaging

iIiAi

τ


0

5

10

15

20

25

30

CP

UL

oad

[%]

10

15

20

25

30

Con

sum

pti

oncu

rren

t[m

A]

Frequency Extraction

iIiAi

τ

DPOAE with USB. Current and CPU Load Profile 59


0

5

10

15

20

25

30

CP

UL

oad

[%]

10

15

20

25

30

Con

sum

pti

oncu

rren

t[m

A]

Standalone

iIiAi

τ

Buffer length


0

5

10

15

20

25

30

CP

UL

oad

[%]

10

15

20

25

30C

onsu

mpti

oncu

rren

t[m

A]

No processing

iIiAi

τ


0

5

10

15

20

25

30

CP

UL

oad

[%]

10

15

20

25

30

Con

sum

pti

oncu

rren

t[m

A]

Artifact Rejection

iIiAi

τ


0

5

10

15

20

25

30

CP

UL

oad

[%]

10

15

20

25

30

Con

sum

pti

oncu

rren

t[m

A]

Averaging

iIiAi

τ


0

5

10

15

20

25

30

CP

UL

oad

[%]

10

15

20

25

30

Con

sum

pti

oncu

rren

t[m

A]

Frequency Extraction

iIiAi

τ


0

5

10

15

20

25

30

CP

UL

oad

[%]

10

15

20

25

30

Con

sum

pti

oncu

rren

t[m

A]

Standalone

iIiAi

τ

60 APPENDICES

Sample size


15

20

25

30

mA

Averaged current

16bit

24bit


5

10

15

20

25

30

%

CPU Load

16bit

24bit


15

20

25

30

mA

Active current

16bit

24bit


15

20

25

30

mA

Idle current

16bit

24bit

DPOAE with USB. Energy and Partitions Profile 61

DPOAE with USB. Energy and Partitions Pro-

file

Energy consumption is averaged for a single processing cycle. Only extraction cycles

are considered.


Freq-Ext.Standalone

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

Ener

gy[m

J]

95.5% 91.8% 86.2% 88.0% 85.4%

4.5% 4.6%8.0%

5.3% 5.2%

1.53 mJ 1.56 mJ 1.57 mJ 1.6 mJ 1.62 mJ

Energy consumption. 16 kHz, 512 samples, 16 bits.

Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

95.8% 93.6% 88.8% 92.4% 90.6%

4.2% 4.3%7.6%

CPU occupation. 16 kHz, 512 samples, 16 bits.

Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

Ener

gy[m

J]

91.8% 88.6% 86.9% 88.7% 86.2%

8.2% 8.1% 8.1%5.3% 5.2%

1.55 mJ 1.58 mJ 1.59 mJ 1.62 mJ 1.63 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

92.4% 90.5% 89.3% 93.0% 91.2%

7.6% 7.6% 7.7%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle

62 APPENDICES


Freq-Ext.Standalone

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Ener

gy[m

J]

95.8% 93.0% 87.3% 89.9% 88.6%

4.2% 4.1%7.6%

4.6% 4.5%

3.07 mJ 3.11 mJ 3.14 mJ 3.18 mJ 3.2 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

96.1% 94.5% 89.7% 93.9% 93.0%

7.2%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Ener

gy[m

J]

92.2% 89.6% 87.9% 90.5% 89.1%

7.8% 7.7% 7.6%4.5% 4.5%

3.11 mJ 3.14 mJ 3.16 mJ 3.21 mJ 3.23 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

92.8% 91.3% 90.2% 94.4% 93.4%

7.2% 7.2% 7.2%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

1

2

3

4

5

6

Ener

gy[m

J]

95.9% 93.5% 87.7% 90.9% 90.2%

4.1%7.6%

4.2% 4.2%

6.13 mJ 6.2 mJ 6.26 mJ 6.35 mJ 6.36 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

96.2% 94.9% 90.0% 94.6% 94.2%

7.2%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0.0

0.2

0.4

0.6

0.8

1.0

Ener

gy[m

J]

92.9% 87.9%79.6% 82.5% 78.9%

7.1%6.8%

12.0%7.4% 7.3%

1.03 mJ 1.05 mJ 1.07 mJ1.1 mJ 1.12 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

93.4% 90.4%83.1%

88.6% 86.0%

6.6%6.4%

11.5%4.4% 4.4%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle



Freq-Ext.Standalone

0.0

0.2

0.4

0.6

0.8

1.0

1.2E

ner

gy[m

J]

87.6% 83.1% 80.7% 83.7% 80.0%

12.4%12.1% 12.0%

7.4% 7.3%

1.05 mJ 1.07 mJ 1.08 mJ1.11 mJ 1.13 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

88.4% 85.6% 84.0%89.6% 86.9%

11.6%11.5% 11.5%

4.4% 4.4%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0.0

0.5

1.0

1.5

2.0

Ener

gy[m

J]

93.7% 89.5%81.3%

85.3% 83.5%

6.3% 6.2%11.2%

6.6% 6.5%

2.06 mJ 2.1 mJ 2.13 mJ 2.18 mJ 2.19 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

94.1% 91.7%84.6%

90.9% 89.6%

5.9%5.9%

10.7%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0.0

0.5

1.0

1.5

2.0

Ener

gy[m

J]

88.5% 84.6% 82.2%86.2% 84.4%

11.5% 11.4% 11.2%6.6% 6.6%

2.1 mJ 2.14 mJ 2.16 mJ 2.21 mJ 2.22 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

89.2% 86.9% 85.3%91.6% 90.4%

10.8% 10.8% 10.7%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

1

2

3

4

Ener

gy[m

J]

94.0% 90.4%81.8%

86.7% 85.7%

6.0% 5.8%11.2%

6.1% 6.1%

4.11 mJ 4.19 mJ 4.24 mJ 4.34 mJ 4.35 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

94.5% 92.4%85.0%

92.0% 91.3%

5.5% 5.5%10.7%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle

64 APPENDICES


Freq-Ext.Standalone

0.0

0.2

0.4

0.6

0.8

Ener

gy[m

J]

90.7% 83.9%73.3%

77.4% 72.8%

9.3%9.1%

15.5% 5.1%9.7% 9.5%4.3%4.1%4.7% 4.5%4.4% 4.3%0.78 mJ

0.8 mJ 0.82 mJ0.85 mJ 0.86 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

91.3% 87.1%77.7%

85.0% 81.5%

8.7%8.7%

15.2%5.8% 5.8%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0.0

0.2

0.4

0.6

0.8

Ener

gy[m

J]

83.6% 78.1% 74.7%78.6% 74.1%

16.4%15.6% 15.5% 5.1%

9.9% 9.8%

0.8 mJ0.82 mJ 0.83 mJ

0.86 mJ 0.88 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

84.7% 81.2% 78.8%86.0% 82.6%

15.3%15.0% 15.1%

6.0% 6.0%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0.0

0.5

1.0

1.5

Ener

gy[m

J]

91.8% 86.2%75.5%

81.0% 78.5%

8.2%8.2%

14.8%

8.5% 8.5%4.3%

4.1% 4.1%1.55 mJ 1.59 mJ 1.62 mJ

1.67 mJ 1.69 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

92.3% 88.9%79.5%

88.0% 86.2%

7.7%7.8%

14.3%4.9% 4.9%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0.0

0.5

1.0

1.5

Ener

gy[m

J]

84.3% 79.9% 76.6%81.9% 79.5%

15.7%14.9% 14.8%

8.6% 8.5%

1.59 mJ 1.63 mJ 1.65 mJ1.7 mJ 1.72 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

85.3% 82.7% 80.4%88.7% 86.9%

14.7%14.2% 14.3%

5.0% 5.0%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle



Freq-Ext.Standalone

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5E

ner

gy[m

J]

92.0% 87.3%76.0%

82.5% 81.6%

8.0%7.8%

14.9%

8.1% 8.0%4.3%

4.2% 4.1%3.1 mJ 3.18 mJ 3.24 mJ

3.33 mJ 3.35 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

92.6% 89.8%79.9%

89.3% 88.7%

7.4%7.4%

14.4%4.5% 4.5%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Ener

gy[m

J]

88.4%76.3%

61.1%67.8% 61.3%

11.6%

13.3%

22.7% 7.4%

13.9%13.5%

6.2%5.9%

5.7%6.8%

6.6%6.3%

6.2%0.52 mJ

0.55 mJ0.57 mJ

0.6 mJ0.61 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

89.2%80.6%

66.5%77.5% 72.2%

10.8%

13.0%

22.8% 5.4%

8.7% 8.7%

4.4% 4.4% 4.3%4.2% 4.2% 4.2% 4.2%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Ener

gy[m

J]

76.1% 67.4% 62.9%69.5% 64.0%

23.9%23.4%

23.0% 7.2%

14.1%13.7%5.0%

4.8%4.6%

5.7% 5.7%5.4%

5.1%0.54 mJ

0.57 mJ 0.58 mJ0.61 mJ

0.62 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

77.4%71.4% 67.9%

78.9% 74.6%

22.6%23.0%

23.0% 5.3%

9.0% 9.0%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Ener

gy[m

J]

87.7% 79.9%63.9%

72.2% 69.0%

12.3%11.7%

21.8%

12.5% 12.3%6.2%

6.0% 5.9%4.8%

4.6%4.4% 4.4%

1.05 mJ1.09 mJ

1.12 mJ1.17 mJ 1.19 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

88.4% 83.8%

69.1%81.5% 79.0%

11.6%11.3%

21.7%

7.5% 7.5%4.3% 4.3% 4.3%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle

66 APPENDICES


Freq-Ext.Standalone

0.0

0.2

0.4

0.6

0.8

1.0

1.2E

ner

gy[m

J]

77.6% 70.1% 65.7%73.9% 70.5%

22.4%22.3%

21.6%

12.4% 12.2%5.2%

5.0% 4.9%4.2% 4.1%

1.09 mJ1.13 mJ 1.15 mJ

1.2 mJ 1.22 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

79.0% 73.7% 70.4%82.8% 80.2%

21.0%21.8%

21.6%

7.5% 7.5%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0.0

0.5

1.0

1.5

2.0

2.5

Ener

gy[m

J]

88.2% 81.0%64.8%

74.8% 73.2%

11.8%11.7%

21.8%

11.6% 11.5%6.3%

6.1% 6.0%2.1 mJ

2.17 mJ2.23 mJ

2.33 mJ 2.34 mJ


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle


Freq-Ext.Standalone

0

20

40

60

80

100

Tim

eocc

upat

ion

[%]

89.0% 84.5%

69.7%

83.8% 82.6%

11.0%11.3%

21.7%

6.8% 6.8%4.4% 4.4% 4.4%


Large AR

Rep. AR

Avg.

DFT

SNR

Tx

Idle

Averaging Profile 67

Averaging Profile

Single precision floating point

0 500 1000 1500 2000Buffer length

0.02

0.04

0.06

0.08

0.10

0.12

Ener

gy[m

J]

Floating point Single precision Straightforward

WCMA

Weighted Sum

CMA

Sum-Scale

Sum

0 500 1000 1500 2000Buffer length

0.5

1.0

1.5

Tim

e[m

s]

Floating point Single precision Straightforward

WCMA

Weighted Sum

CMA

Sum-Scale

Sum

0 500 1000 1500 2000Buffer length

0.02

0.04

0.06

0.08

0.10

0.12

Ener

gy[m

J]

Floating point Single precision Optimized

WCMA

Weighted Sum

CMA

Sum-Scale

Sum

0 500 1000 1500 2000Buffer length

0.5

1.0

1.5

Tim

e[m

s]

Floating point Single precision Optimized

WCMA

Weighted Sum

CMA

Sum-Scale

Sum

0 500 1000 1500 2000Buffer length

0.02

0.04

0.06

0.08

0.10

0.12

Ener

gy[m

J]

Floating point Single precision DSP

WCMA

Weighted Sum

CMA

Sum-Scale

Sum

0 500 1000 1500 2000Buffer length

0.5

1.0

1.5

Tim

e[m

s]

Floating point Single precision DSP

WCMA

Weighted Sum

CMA

Sum-Scale

Sum

68 APPENDICES

16 bit fixed point

0 500 1000 1500 2000Buffer length

0.02

0.04

0.06

0.08

0.10

0.12

Ener

gy[m

J]

Fixed point 16-bit Straightforward

WCMA

Weighted Sum

CMA

Sum-Scale

Sum

Sum-Shift

0 500 1000 1500 2000Buffer length

0.5

1.0

1.5

Tim

e[m

s]


WCMA

Weighted Sum

CMA

Sum-Scale

Sum

Sum-Shift

0 500 1000 1500 2000Buffer length

0.02

0.04

0.06

0.08

0.10

0.12

Ener

gy[m

J]

Fixed point 16-bit Optimized

WCMA

Weighted Sum

CMA

Sum-Scale

Sum

Sum-Shift

0 500 1000 1500 2000Buffer length

0.5

1.0

1.5T

ime

[ms]


WCMA

Weighted Sum

CMA

Sum-Scale

Sum

Sum-Shift

0 500 1000 1500 2000Buffer length

0.02

0.04

0.06

0.08

0.10

0.12

Ener

gy[m

J]

Fixed point 16-bit DSP

WCMA

Weighted Sum

CMA

Sum-Scale

Sum

Sum-Shift

0 500 1000 1500 2000Buffer length

0.5

1.0

1.5

Tim

e[m

s]


WCMA

Weighted Sum

CMA

Sum-Scale

Sum

Sum-Shift

Averaging Profile 69

32 bit fixed point

0 500 1000 1500 2000Buffer length

0.02

0.04

0.06

0.08

0.10

0.12

Ener

gy[m

J]


WCMA

Weighted Sum

CMA

Sum-Scale

Sum

Sum-Shift

0 500 1000 1500 2000Buffer length

0.5

1.0

1.5

Tim

e[m

s]


WCMA

Weighted Sum

CMA

Sum-Scale

Sum

Sum-Shift

0 500 1000 1500 2000Buffer length

0.02

0.04

0.06

0.08

0.10

0.12

Ener

gy[m

J]


WCMA

Weighted Sum

CMA

Sum-Scale

Sum

Sum-Shift

0 500 1000 1500 2000Buffer length

0.5

1.0

1.5

Tim

e[m

s]


WCMA

Weighted Sum

CMA

Sum-Scale

Sum

Sum-Shift

0 500 1000 1500 2000Buffer length

0.02

0.04

0.06

0.08

0.10

0.12

Ener

gy[m

J]


WCMA

Weighted Sum

CMA

Sum-Scale

Sum

Sum-Shift

0 500 1000 1500 2000Buffer length

0.5

1.0

1.5

Tim

e[m

s]


WCMA

Weighted Sum

CMA

Sum-Scale

Sum

Sum-Shift

70 APPENDICES

FFT and Goertzel Profile

Single precision floating point


0.0

0.2

0.4

0.6

0.8

Ener

gy[m

J]


FFT


DSP Goertzel filter


0

2

4

6

8

10

12

Dura

tion

[ms]


FFT


DSP Goertzel filter


0.0

0.2

0.4

0.6

0.8

Ener

gy[m

J]


FFT


DSP Goertzel filter


0

2

4

6

8

10

12

Dura

tion

[ms]


FFT


DSP Goertzel filter


0.0

0.2

0.4

0.6

0.8

Ener

gy[m

J]


FFT


DSP Goertzel filter


0

2

4

6

8

10

12

Dura

tion

[ms]


FFT


DSP Goertzel filter


0.0

0.2

0.4

0.6

0.8

Ener

gy[m

J]


FFT


DSP Goertzel filter


0

2

4

6

8

10

12

Dura

tion

[ms]


FFT


DSP Goertzel filter

FFT and Goertzel Profile 71

16 bit fixed point


0.0

0.2

0.4

0.6

0.8

Ener

gy[m

J]


FFT


DSP Goertzel filter


0

2

4

6

8

10

12

Dura

tion

[ms]


FFT


DSP Goertzel filter


0.0

0.2

0.4

0.6

0.8

Ener

gy[m

J]


FFT


DSP Goertzel filter


0

2

4

6

8

10

12

Dura

tion

[ms]


FFT


DSP Goertzel filter


0.0

0.2

0.4

0.6

0.8

Ener

gy[m

J]


FFT


DSP Goertzel filter


0

2

4

6

8

10

12

Dura

tion

[ms]


FFT


DSP Goertzel filter


0.0

0.2

0.4

0.6

0.8

Ener

gy[m

J]


FFT


DSP Goertzel filter


0

2

4

6

8

10

12

Dura

tion

[ms]


FFT


DSP Goertzel filter

72 APPENDICES

32 bit fixed point


0.0

0.2

0.4

0.6

0.8

Ener

gy[m

J]


FFT


DSP Goertzel filter


0

2

4

6

8

10

12

Dura

tion

[ms]


FFT


DSP Goertzel filter


0.0

0.2

0.4

0.6

0.8

Ener

gy[m

J]


FFT


DSP Goertzel filter


0

2

4

6

8

10

12

Dura

tion

[ms]


FFT


DSP Goertzel filter


0.0

0.2

0.4

0.6

0.8

Ener

gy[m

J]


FFT


DSP Goertzel filter


0

2

4

6

8

10

12

Dura

tion

[ms]


FFT


DSP Goertzel filter


0.0

0.2

0.4

0.6

0.8

Ener

gy[m

J]


FFT


DSP Goertzel filter


0

2

4

6

8

10

12

Dura

tion

[ms]


FFT


DSP Goertzel filter

FFT and Goertzel Profile 73

FFT comparison

256 512 1024 2048FFT Size

0.0

0.2

0.4

0.6

0.8

Ener

gy[m

J]

FFT Arithmetic comparative analysis

32-bit fixed point

16-bit fixed point

SP floating point

256 512 1024 2048FFT Size

0

2

4

6

8

10

12

Dura

tion

[ms]

FFT Arithmetic comparative analysis

32-bit fixed point

16-bit fixed point

SP floating point

These two last figures compare FFT performance for the different considered data

types. Pure FFT is used, without any squared terms.

Bibliography

[1] WHO, “Global estimates on prevalence of hearing loss.” Available at http:

//www.who.int/pbd/deafness/WHO_GE_HL.pdf, 2012. [Online; accessed 19-

September-2017]. cited on p. 1, 2

[2] M. P. Moeller, “Early intervention and language development in children who

are deaf and hard of hearing,” Pediatrics, vol. 106, no. 3, pp. e43–e43, 2000.

cited on p. 2

[3] B. O. Olusanya, M. J. Chapchap, S. Castillo, H. Habib, S. Z. Mukari, N. V.

Martinez, H.-C. Lin, B. McPherson, et al., “Progress towards early detection

services for infants with hearing loss in developing countries,” BMC health ser-

vices research, vol. 7, no. 1, p. 14, 2007. cited on p. 4

[4] P. Cerwall, “Ericsson mobility report,” tech. rep., Ericsson, June 2017.

Available at https://www.ericsson.com/assets/local/mobility-report/

documents/2017/ericsson-mobility-report-june-2017.pdf. [Online; ac-

cessed 19-September-2017]. cited on p. 4

[5] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-time Signal Process-

ing (2nd Ed.). Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1999. cited

on p. 8, 44

[6] S. Dhar and J. W. Hall, Otoacoustic Emissions: Principles, Procedures, and

Protocols. Core Clinical Concepts in Audiology, Plural Publishing, Inc, 2012.

cited on p. 8, 11, 12, 13, 15, 16, 50

[7] J. L. Hennessy and D. A. Patterson, Computer architecture: a quantitative

approach. Elsevier, 2011. cited on p. 9

[8] D. Reay, Digital Signal Processing and Applications with the OMAP-L138 eX-

perimenter. John Wiley & Sons, Inc., 2012. cited on p. 9, 10

[9] M. Tahir and R. Farrell, “Optimal communication-computation tradeoff for

wireless multimedia sensor network lifetime maximization,” in Wireless Com-

munications and Networking Conference, 2009. WCNC 2009. IEEE, pp. 1–6,

IEEE, 2009. cited on p. 10

http://www.who.int/pbd/deafness/WHO_GE_HL.pdf

http://www.who.int/pbd/deafness/WHO_GE_HL.pdf

https://www.ericsson.com/assets/local/mobility-report/documents/2017/ericsson-mobility-report-june-2017.pdf

https://www.ericsson.com/assets/local/mobility-report/documents/2017/ericsson-mobility-report-june-2017.pdf

Bibliography 75

[10] T. Martin, The Designer’s Guide to the Cortex-M Processor Family. Oxford:

Newnes, 2013. cited on p. 18

[11] Silicon Labs, Application Note 0007.0: MCU and Wireless MCU Energy

Modes, 2017. Available at https://www.silabs.com/documents/public/

application-notes/an0007.0-efm32-ezr32-series-0-energymodes.pdf.

[Online; accessed 19-September-2017]. cited on p. 22

[12] J.-S. Lee, Y.-W. Su, and C.-C. Shen, “A comparative study of wireless protocols:

Bluetooth, UWB, ZigBee, and Wi-Fi,” in Industrial Electronics Society, 2007.

IECON 2007. 33rd Annual Conference of the IEEE, pp. 46–51, IEEE, 2007.

cited on p. 48

[13] C. Gomez, J. Oller, and J. Paradells, “Overview and evaluation of Bluetooth

Low Energy: An emerging low-power wireless technology,” Sensors, vol. 12,

no. 9, pp. 11734–11753, 2012. cited on p. 48, 49

https://www.silabs.com/documents/public/application-notes/an0007.0-efm32-ezr32-series-0-energymodes.pdf

https://www.silabs.com/documents/public/application-notes/an0007.0-efm32-ezr32-series-0-energymodes.pdf

requirements and partitioning of otoacoustic emission

Documents