faculty of informatics and information...

Slovak University of Technology in Bratislava

Faculty of Informatics and Information Technologies

FIIT-5212-74260

Michal Ševcík

Eye Blink Detection Using Webcam

Bachelor Thesis

Degree Course: Informatics

Field of study: 9.2.1 Informatics

Place of development: Institute of Computer Engineering and Applied Informatics,

FIIT STU Bratislava

Supervisor: Ing. Andrej Fogelton

May 2017

Slovak University of Technology in Bratislava

Faculty of Informatics and Information Technologies

BACHELOR THESIS ASSIGNMENT

Student Name: Ševcík Michal

Degree Course : Informatics

Field of Study: Informatics

Bachelor Thesis: Eye Blink Detection Using Webcam

Assignment:

Dry eye syndrome is a common disease of computer users. Users tend to decrease blink

frequency while looking at computer screen due to which is the tear film insufficiently spread

on the eye. Analyze available approaches of blink detection. Propose an algorithm that will

detect blinks of the user, with the aim to analyze their frequency. Suggest modifications

of existing approaches to increase their detection performance. Test the modifications and

evaluate them on existing datasets.

The thesis has to contain:

Annotation in Slovak and English language

Analysis of the problem

Description of the solution

Evaluation

Technical documentation

Bibliography

Electronic medium with developed product with documentation

Place of development: Institute of Computer Engineering and Applied Informatics,

FIIT STU Bratislava


The deadline to submit for the winter semester: 13.12.2016

The deadline to submit for the summer semester: 9.5.2017

ANOTÁCIA

Slovenská technická univerzita v Bratislave

FAKULTA INFORMATIKY A INFORMACNÝCH TECHNOLÓGIÍ

Študijný odbor: Informatika

Autor: Michal Ševcík

Bakalárska práca : Detekcia žmurkania webkamerou

Vedúci bakalárskej práce: Ing. Andrej Fogelton

máj, 2017

Bakalárska práca opisuje tému detekcie žmurkania pomocou webovej kamery. V práci an-

alyzujeme oblast’ detekcie žmurkania a poskytujeme nenárocný algoritmus, ktorý vie pra-

covat’ v reálnom case a s použitím obycajnej webovej kamery. Pocas vývoja algoritmu up-

ravujeme už existujúce algoritmy. Opisujeme preco je daný algoritmus potrebný, a v ktorých

oblastiach sa dajú takéto algoritmy využívat’. Dalej rozoberá niekol’ko existujúcich riešení

problému, medzi nimi aj metóda využívajúca analýzu pohybových vektorov, ked’že v oblasti

dosahuje najlepšie výsledky a pracuje v reálnom case. Opísaná je tiež metóda váženého gra-

dientu. Kvôli jej dobrým výsledkom vyzerá sl’ubne, ale nepracuje v reálnom case.

Zameriavame sa na metódu Váženého Gradientu. Urobili sme niekol’ko zmien aby sme

zlepšili jej použitelnost’ v reálnom case. Prvou zmenou je zmena rátanej vlastnosti na

priemernú y pozíciu gradientov z predchádzajúceho váženého priemeru gradientov. Následne

sme vytvorili, stavový automat pre detekciu žmurkania.

Predstavená metóda splna požiadavku použitia v reálnom case. Na Basler5 datasete dosahuje

v priemere lepšie výsledky ako originálna metóda, naneštastie znižuje výsledky originálnej

metódy o 4% na ZJU datasete.

ANNOTATION

Slovak University of Technology Bratislava

FACULTY OF INFORMATICS AND INFORMATION TECHNOLOGIES

Degree Course: INFORMATICS

Author: Michal Ševcík

Bacchelor thesis: Blink Detection Using Webcam


2017, May

The bachelor thesis describes problem of blink detection using webcam. We analyze ex-

isting blink detection algorithms and propose algorithm that is capable of running in real-

time while using only webcam. We adjust existing algorithms to improve their real-time

performance. First we describe algorithm importance and possible usability. Several state-

of-the-art blink detection methods are described, among them blink detection method based

on motion vectors analysis, because it achieves the best results among the state-of-the-art

methods. Weighted Gradient Descriptor method is explained in more detail because of its

promising results, but it does not provide real-time performance.

We focus more on Weighted Gradient Descriptor method. We make several changes to im-

prove its real-time performance. The feature is changed from weighted average of gradients

to average y position of gradients. Afterwards we designed state machine for blink detection.

The proposed method fulfill the real-time use. It achieves better performance on Basler5

dataset in average F1 Score than the original method, unfortunately it deteriorates perfor-

mance of the original method by 4% on the ZJU dataset.

Declaration of Honor

I hereby declare that I wrote this thesis independently with professional supervision of Ing.

Andrej Fogelton with citated bibliography.

May 2017 in Bratislava Signature

Acknowledgment

I want to say thanks to my supervisor Ing. Andrej Fogelton for his professional guidance,

valuable advices and patience while writing this thesis. I also want to thank to my family

and friends for their support. Big thanks goes to faculty for the opportunity to study what I

like and all the valuable information it gave me, through out the years of study, that made the

writing of this thesis easier.

Contents

1 Introduction 1

1.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 State of The Art 3

2.1 Weighted Gradient Descriptor . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Eye blink detection based on motion vectors analysis . . . . . . . . . . . . 8

2.3 Local Binary Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.1 Blink detection using Local Binary Patterns . . . . . . . . . . . . . 12

3 Adjusted Weighted Gradient Descriptor 15

3.1 Average Gradient Positions . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2 Blink Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Evaluation 19

4.1 Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2 ZJU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3 Researcher’s Night . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.4 Basler5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Experiments 23

5.1 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.2 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.3 Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Contents Michal Ševcík

6 Conclusion 29

A Technical Documentation 33

B Plan Review 37

C Resumé 39

C.1 Úvod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

C.2 Aktuálny stav oblasti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

C.2.1 Deskriptor váženého gradientu . . . . . . . . . . . . . . . . . . . . 40

C.2.2 Detekcia žmurkania založená na analýze pohybových vektorov . . . 41

C.2.3 Lokálne Binárne Vzory . . . . . . . . . . . . . . . . . . . . . . . . 41

C.3 Upravený deskriptor váženého gradientu . . . . . . . . . . . . . . . . . . . 41

C.3.1 Priemerná pozícia gradientov . . . . . . . . . . . . . . . . . . . . 42

C.3.2 Detekcia žmurkania . . . . . . . . . . . . . . . . . . . . . . . . . 42

C.4 Evaluácia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

C.5 Záver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

D DVD Contents 43

ii

Chapter 1

Introduction

Blinking is important for health of the eyes. During blink, eyelids spreads the tear film oneye surface to serve as protection from dust and bacterias. Blink rate is defined as the blinkcount in a minute. Average blink rate while rest is 22 Blinks Per Minute (BPM) [Portello –Chu 2013]. While reading or using computers it can decrease drastically to 7 BPM. Thereare more factors that affect blink rate:

• allergies,

• mood,

• drugs,

• pregnancy, and others.

According to Yolton et al. [Yolton et al. 1994], gender also effects blink rate. Blink rate of 59males and 86 females have been measured for 5 minutes. 44 females were on birth controlpill, which also effected blink rate. Results show that men blink in average 14.5 times perminute, women not taking birth control pills blinked in average 14.9 times per minute, whichis lower compared to women on birth control with average 19.6 blinks per minute.

Decreased blink rate is dangerous for our eyes. It can often cause eye disease commonlyknown as Dry Eye Syndrome (DES) [Blehm et al. 2005]. DES decreases quality and quantityof tears in our eyes. Eyes of DES patient are becoming dry faster compared to a healthyperson. There can be many reasons behind DES, for example:

• prolonged computer usage,

• bad air conditioning,

• allergies,

• age,

• medication,

• contact lens use, and others.

Person with DES often suffer from decreased blink rate and therefore tear film is not spreadproperly and eye is becoming dry and more vulnerable to dust and bacterias. DES is followedby many symptoms, mainly with:

• eye irritation,

Chapter 1. Introduction Michal Ševcík

• eye redness,

• eye fatigue,

• blurry vision,

• sensitivity to light, and others.

To prevent DES subject can use artificial tears to support eye moisture. People with DES canimprove room condition so that they do not sit in a room where air flow is drying their eyes.They should keep their blink rate high enough so that tear film is spread properly on the eyes.Ignoring DES symptoms and not reflecting to them can develop to serious diseases such ascornea thinning, cornea perforation or loss of goblet cells [Javadi – Feizi 2011]. There areseveral ways to diagnose DES:

• Fluoroscein Tear Break-Up Time (TBUT) test – describes the quality of tear film ineye. It measures the interval between a complete blink and first appearance of dryspot. If this interval is less than 3 seconds, person is diagnosed with dry eye.

• Schimmer test – filter paper (Schimmer strips) is placed on the lower lid to measuremoisture of the eye in interval of 5 minutes. Patient is allowed to blink normally.Afterwards the strip is analyzed and if the tear soaking is less than 10 millimeters, dryeye is diagnosed.

There are several statistics we can observe from blinks. Blink rate is one of them and it isimportant to keep it at 14 to 20 blinks per minute so that tear film is spread properly on theeye.

Blink detection is used to monitor driver drowsiness. Dinges et al. [Dinges et al. 1998]proved that PERiod of time when eye is CLOSed (PERCLOS) statistic is valuable sign ofdrowsiness. While becoming drowsy our blink rate increases and the blink duration increasesaccordingly, therefore the more drowsy we are the higher the PERCLOS is. It is used withcombination of lip tracking to monitor driver. If driver is becoming drowsy, system warnshim to take a break [Lenskiy – Lee 2012]. Systems like this are used to prevent micro-sleepwhile driving.

Blink detection can be also used for algorithms, that help disabled people interacting withcomputers. The authors Królak and Strumiłło [Królak – Strumiłło 2009] crated a system thattracks eyes to simulate mouse movement and uses long voluntary blinks as mouse clicks.The authors are detecting two types of blinks short blinks (shorter than 200 ms), which areconsidered spontaneous and long blinks (longer than 200 ms), which are used for interactionpurposes. Their algorithm is used to: control the mouse and emulate keyboard, enter text toan text editor or turning off the computer.

1.1 Requirements

We focus on blink detection algorithm that will run in real-time. To maximize algorithmusability blink has to be detected with maximum acceptable delay of 0.5 second. Blink hasto be distinguished from closed eyes. Closed eye is long voluntary eye closure. To makealgorithm widely usable it has to run on most consumer-quality computers and notebooks,therefore low CPU usage is required. As a sensor we will use common webcam with mini-mum resolution 640× 480 at 30 Frames Per Seconds (FPS).

2

Chapter 2

State of The Art

Most of the eye blink detection algorithms consist of these three steps:

• face detection,

• eye detection,

• eye blink detection.

For face and eye detection, Viola-Jones algorithm [Viola – Jones 2001] is often used.

Motion vectors are used to describe movement of pixels within the image and with combi-nation of other techniques they can be used to detect state of the eye. Motion vectors aremostly computed by optical flow [Brox 2014]. For example Fogelton and Benesova [Fogel-ton – Benesova 2016] propose algorithm based on standard deviation of motion vectors ineye region. Motion vectors are calculated for each pixel in detected eye region. Normal-ization with intraocular distance takes place to achieve invariance to face region size. Theaverage motion vector and its standard deviation is calculated. During head movements mo-tion vectors are similar in magnitude and orientation, compared to blink when magnitude ofmotion vectors corresponding to eyelid increases significantly. Average motion vector andstandard deviation is fed to a state machine. Each eye has its own state machine. Blink isdetected when one of the state machines goes into its blink detection state. The left and rightblink is considered as one if Intersection Over Union (IOU) is higher than given threshold.The authors tested this method on various datasets. This method achieves 100% precisionand 98.08% recall on ZJU dataset and 100% precision and 90.16% recall on Talking facedataset.

Color and intensity of pixels can be also used for blink detection. Dinh, Jovanov and Ad-hami [Dinh et al. 2012] use Intensity Vertical Projection (IVP) to detect blinks. IVP is totalintensity of pixels in one row. There are two local minimum in IVP, because eyebrow andiris area should be darker than skin area. IVP and Eye Opening is calculated for eye region.Eye Opening is ratio between iris area over the skin area (Figure 2.1). Eye is consideredopened, if the Eye Opening is higher than threshold. This method was tested on the authorsown dataset with detection rate 94.8%.

Eyelid State Detection (ESD) [Ayudhya – Srinark 2009] is measurement used to classifystate of the eyelid. Bottom half of the detected eye image is thresholded. Beginning withthreshold equal to 0 median blur is applied on the image. If resulted image contains at leastone black pixel (pixel with value 1 after applying median blur and image binarization) the

Chapter 2. State of The Art Michal Ševcík

Figure 2.1: Example of Intensity Vertical Projection computed from detected eye area with annotatedpoints used to calculate Eye Opening Ratio [Dinh et al. 2012].

algorithm returns the threshold as ESD value. Otherwise, the threshold is increased and thesame sequence follows. ESD calculation is shown in Algorithm 1. Graph of ESD values iscreated. High peaks in ESD signalize that eye is closed and low peaks signalize that the eye isopened. Eventually a finite state machine is used to detect blinks. It uses statistics calculatedfrom ESD graph to change the state of the machine. This method achieves detection accuracyof 92.6%, on the authors own dataset with 4 persons captured for 45 minutes with built-inwebcam at 320× 240 resolution and the frame rate at 30 FPS.

Algorithm 1 Calculation of Eyelid State Detection [Ayudhya – Srinark 2009]1: procedure COMPUTEESD(imageInput) . imageInput = bottom half of detected eye.2: threshold = 03: threshold(imageInput)4: medianBlur(imageInput)5: while numOfBlackP ixels(imageInput) == 0 do6: threshold++7: threshold(imageInput)8: medianBlur(imageInput)9: end while

10: return threshold11: end procedure

Color and texture segmentation is used in blink detection algorithm [Lenskiy – Lee 2012].Histogram of skin color representation on image is created. Face detection is done withthe authors own pyramid implementation to decrease computation time. It begins in thetop layer of pyramid where image size is decreased. Binary mask of skin color regions iscreated. Resulted binary map is interpolated to another layer. Coordinates that correspondto zeros in binary map are removed in lower layer of pyramid. This repeats until the bottomlayer of pyramid is reached. Face is considered as a region with largest number of skin colorpixels. Each image is processed to calculate Speeded-Up-Robust-Feautures (SURF) [Bayet al. 2006]. Features are grouped based on location to six groups: eyebrows, opened eyes,closed eyes, nose, lips and the rest of the face. Probability of SURF descriptor belongingto given class is estimated. For each descriptor six dimensional vector with probabilities ofSURF feature belonging to class is created. Six probability density functions over image sizefield are estimated. Every pixel now can be classified to one of the facial features. Pixelsaround eyes correspond either to opened or closed eyes. The method shows average detectionrate 96.3% on the authors own dataset captured with Charge-Coupled Device (CCD) (768×

4


576) and web camera (640 × 480). There are 15 videos where each video is around 400frames long.

Radlak and Smolka [Radlak – Smolka 2013] proposed an algorithm that detects blinks usingweighted gradient descriptor. Algorithm calculates gradients in detected eye image overtime. Waveform from these two weighted gradients is calculated from given video to beanalyzed and searched for blinks. There are several ways of waveform calculation. Bestapproach achieves detection rate of 98.83% on the ZJU Eyeblink Database captured withweb camera at 30 FPS and 320× 240 resolution.

Malik and Smolka proposed an algorithm based on Local Binary Patterns (LBP) [Malik –Smolka 2014]. To compute LBP for one pixel, 3 × 3 array with its neighbors is createdwhile current pixel is in the middle. Neighbors are thresholded with value of current pixel. Ifvalue of neighbor is greater than current pixel then set it to 1 otherwise to 0. 8-bit number iscreated from this array. This is done on the whole detected eye area to create LBP histogram.Template of LBP histogram is created for opened eye. If correlation score between openedeye template and current LBP histogram reach certain threshold, blink is detected. Methodwas tested on ZJU database with detection rate 99.2%.

Facial landmarks are used for blink detection in work of Soukupová and Cech [Soukupová– Cech 2016]. Algorithm use Active Shape Models (ASM) to detect eyes. Six points areautomatically detected for every frame. One point in each corner of the eye (p1, p4), twopoints on upper eyelid (p2, p3) and two on lower eyelid (p5, p6). The Eye Aspect Ratio(EAR) is calculated from these six points (Equation 2.1). EAR is smaller while eye is closedcompared to open eye. EAR for both eyes is averaged, to distinguish blink from closed eyes.The algorithm uses 13-dimensional feature that includes EAR from ±6 frames. This featureis classified by trained linear Support Vector Machine (SVM) called EAR SVM. This methodwas tested on ZJU and Eyeblink8 datasets.

EAR =||p2 − p6||+ ||p3 − p5||

2||p1 − p4||(2.1)

Some of the mentioned algorithms achieve above 95% detection rate results. We focus moreon algorithm proposed by Fogelton and Benesova [Fogelton – Benesova 2016]. Togetherwith work of Soukupová and Cech [Soukupová – Cech 2016] they achieved the most inter-esting results and are considered the best in the area. We also focus on algorithm proposedby Radlak and Smolka [Radlak – Smolka 2013]. Since they are calculating gradients overtime, it is similar to motion vectors. The main problem with this algorithm is that it do notrun in real-time which is our main requirement. Therefore we are experimenting with it.

2.1 Weighted Gradient Descriptor

Radlak and Smolka [Radlak – Smolka 2013] proposed an algorithm that detects blinks usingweighted gradient descriptor. Their algorithm was inspired by work of Polikovsky [Po-likovsky et al. 2009] and his 3D gradient descriptor that could detect small movements invideo sequence. They simplified his approach and proposed blink detection algorithm.

First step of the algorithm is face and eye detection using Viola-Jones algorithm [Viola –Jones 2001]. After eye localization, spatial and temporal derivatives are calculated from pix-

5


Figure 2.2: Visual representation of calculated weighted vectors. v(t)↑ and v(t)↓ are weightedvectors for pixels which temporal derivatives are more and less than zero accordingly. Differencebetween these two vectors creates d(t) which is used to calculate waveform D(t) [Radlak – Smolka2012].

els in detected eye region. During their work they tested various methods to detect blinks.One of which use only temporal derivatives. Spatial Iy(x, y, t) and temporal It(x, y, t)derivatives can be seen in Equation 2.2.

Iy(x, y, t) = I(x, y + 1, t)− I(x, y − 1, t)

It(x, y, t) = I(x, y, t+ 1)− I(x, y, t− 1)(2.2)

Corresponding derivatives are combined to one vector that creates spatio-temporal vector:Iyt(x, y, t) = [Iy(x, y, t), It(x, y, t)]. This vector is used to describe movement betweenframes over time. To create weighted vectors, these spatio-temporal vectors are divided intotwo groups. One group is where It part of vector is greater than zero and the other is whereIt is less then zero. Zero is replaced by constant ε and −ε, if only It part of vector is used.The authors use ε = 0.2. Weighted vectors for both groups are calculated. To compute thesevectors, initial (Equation 2.3) and terminal (Equation 2.4) positions of vectors are required.Where vx0 and vy0 creates initial point for vector v(t)↑. vx1 and vy1 creates terminal point forvector v(t)↑. Parameters δx and δy are used to better visualize the vectors. Vector v(t)↓ iscalculated analogously. Figure 2.2 shows example of calculated vectors.

vx0(t)↑ =

∑x,y ∈ I

It(x,y,t)>0|It(x, y, t)| · x∑

x,y ∈ IIt(x,y,t)>0

|It(x, y, t)|

vy0(t)↑ =

∑x,y ∈ I

It(x,y,t)>0|It(x, y, t)| · y∑


|It(x, y, t)|

(2.3)

vx1(t)↑ =

∑x,y ∈ I

It(x,y,t)>0|It(x, y, t)| · (x+ δx · Iy(x, y, t))∑


|It(x, y, t)|

vy1(t)↑ =

∑x,y ∈ I

It(x,y,t)>0|It(x, y, t)| · (y − δy · It(x, y, t))∑


|It(x, y, t)|

(2.4)

6


Figure 2.3: Example of D(t) waveform obtained from Equation 2.5 with anotations [Radlak –Smolka 2012].

Difference between v(t)↑ and v(t)↓ is used to create waveform D(t) (Equation 2.5), which isused to detect blinks. Example of D(t) waveform can be found in Figure 2.3. There are twomore approaches to D(t) calculation, first Equation 2.6 uses vectors created from both spa-tial and temporal derivatives. After removing spatial derivatives from vector computationsEquation 2.7 is used.

D(t) = d(t) · (||v(t)↑||+ ||v(t)↓||) (2.5)

D(t) = d(t) · ||v(t)↑|| · ||v(t)↓|| (2.6)

D(t) = d(t) ·∑

x,y,It(x,y,t)>ε

|It(x, y, t)| ·∑

x,y,It(x,y,t)<−ε

|It(x, y, t)|, (2.7)

where d(t) = vy0(t)↓ − vy0(t)↑.

Because of small head movements, there is noise in createdD(t) waveform. There is need todistinct head movements from eye blinks and therefore denoising of waveform is required.Invariance of head movement and eye size is based on detector used on every frame.

The blink detection from D(t) waveform works as follows [Radlak – Smolka 2013]:

• Calculate the maximum value Dmax and the minimum value Dmin in the wave-form.

• Find a local maximum argument tmax for whichD(tmax) >Dmax/n1 in the wave-form,where n1 is a scaling parameter and in subsequent k frames try to find a local minimumargument tmin for which D(tmin) < Dmin/n2, where n2 is another scaling parameterand k determines the number of frames, after which the local minimum should appear.If you find a new local maximum argument t′max before the local minimum, then tmax

7


Table 2.1: Results for dataset obtained with Basler camera. Blinks Total is how many blinks are invideo. DB - detected blinks. DR - Detection Rate. FP - Number of false positives [Radlak – Smolka2013].

D(t) D(t) D(t)Blinks Total DB DR FP DB DR FP DB DR FP

Person 1 32 31 96.88% 13 28 87.50% 6 28 87.50% 2Person 2 33 32 96.97% 6 32 96.97% 4 32 96.97% 3Person 3 99 88 88.89% 14 85 85.86% 7 87 87.88% 0Person 4 55 51 92.73% 7 47 85.45% 6 39 70.91% 5Person 5 81 80 98.77% 5 80 98.77% 8 77 95.06% 2

= t′max.

• If you find tmax,tmin, then estimate a linear regression for the data x = tmax, tmax +1, . . . , tmin and y = D(tmax), D(tmax + 1), ..., D(tmin).

• If the slope of the regression line is smaller than λ, then the blink is detected betweenthe local maximum and minimum. The zero-crossing point is calculated as the meanof tmax and tmin.

The authors tested the method on dataset that was captured by 100 FPS Basler camera at640 × 480 resolution. There are five persons on these videos. One of them is wearingglasses. This method was evaluated with these data: Detection Rate (Equation 2.8)

DR = (DB/AB) · 100, (2.8)

where DR is detection rate, DB is number of detected blinks and AB is number of all blinks.False Positives - how many times algorithm misinterpret blink when it did not occur. Theirapproach had 98.83% detection rate on ZJU database and only one false positive. Results forD(t), D(t) and D(t) on Basler dataset are shown in Table 2.1.

2.2 Eye blink detection based on motion vectors analysis

Fogelton and Benesova proposed an algorithm based on motion vectors [Fogelton – Be-nesova 2016]. Their work is built upon similar algorithm [Drutarovsky – Fogelton 2015].

Viola Jones algorithm [Viola – Jones 2001] is used to detect face. Eye corners are localizedwith CLandmark [Uricár et al. 2015]. Eye location is defined as a circle around detectedeye corners. Farneback [Farnebäck 2003] algorithm is used for motion vectors estimation indetected eye area. Motion vectors are normalized with IntraOcular Distance (IOD) to keepinvariance to eye area size. IOD is calculated from detected eye corners.

Average motion vector µ (Equation 2.9) and standard deviation σ (Equation 2.10) is com-puted, where yi is vertical part of motion vector. Only vertical component of vector is usedbecause the authors assume that person in front of the camera does not rotate their head sig-nificantly. High peaks in σ can be used to detect blinks. Zero crossing in µ takes place, tohelp distinct blinks from head movements.

µ =

∑i yin

(2.9)

8


Figure 2.4: State machine used for eye blink detection. Conditions are checked in order of bluenumbers inside the states [Fogelton – Benesova 2016].

σ =

√∑i (µ− yi)2n

(2.10)

A state machine is used to decide the state of the eye. Average motion vector µ, standarddeviation σ and time difference between frames 4t is fed into the state machine as input.State machine is visualized in Figure 2.4. It consist of four states:

• (0) the initial state,

• (1) the eyelid moves down state,

• (2) the eyelid moves up state,

• (3) the eye blink detected state.

Eye lid movement is defined as sequence of σ with∑

σ > 3T and the magnitude of the µhigher than the threshold T , where T is threshold for σ. The state machine also takes timeinto consideration. At every transition time variable is incremented with 4t. If the timevariable is above threshold Tt (500 ms), the state machine returns to the initial state. If eyelid movement down occurs in State 0, the machine changes its state to State 1. If eye lidmovement up occurs in State 1, the machine changes its state to State 2. If the machine is inState 1 with insufficient movement down (µ < −T and σ > T ), and movement up appearsit changes its state to State 0.

Each eye has its own state machine. Eye blink is detected when one of the machines goesinto its eye blink detected state. Intersection Over Union (IOU) IOU = (A ∧ B)/(A ∨ B)is used to decide whether blink of right and left eye are simultaneous. If IOU > 0.2, theleft and right blinks are merged together. Otherwise, the blinks are considered as separateblinks. Example of merged blinks can be seen in Figure 2.5.

The authors present new dataset called Researcher’s Night. There are over 107 differentvideos containing people reading articles on computer screen. They are acting naturally,moving their head, talking to someone, etc. There are two types of datasets. Researcher’snight 15 and 30, one captured with 15 FPS and other with 30 FPS camera at 640 × 480resolution. There are 223 116 frames in total. These videos were manually annotated with

9


Figure 2.5: Example of merged blinks where IOU = 2/9 = 0.22 > 0.2. Result is average from leftand right eye blink [Fogelton – Benesova 2016].

Table 2.2: Results for the algorithm. BT - Blinks Total, DB - Detected Blinks, FN - False Negative,FP - False Positive, TP - True Positive [Fogelton – Benesova 2016].

Dataset Precision Recall BT DB FN FP TPZJU 100% 98.08% 261 256 5 0 256Talking face 100% 90.16% 61 55 6 0 55Basler 5 95.58% 93.67% 300 294 19 13 281Eyeblink 8 94.69% 91.91% 408 396 33 21 375Researcher’s night 15 86.91% 80.87% 706 657 135 86 571Researcher’s night 30 86.72% 80.57% 1143 1062 222 141 921

face and eyes coordinates, so that blink detection algorithm evaluation is independent offace and eye localization algorithms. The algorithm is also tested on Talking face dataset.There are 5000 frames captured with 25 FPS camera at 720× 576 resolution. Other datasetsused are: ZJU dataset, Eyeblink8 dataset (70 992 frames, 640× 480 resolution) and Basler5dataset (100 FPS, 640× 480, 58 884 frames).

The algorithm is evaluated with following data: Precision = TPTP+FP

, Recall = TPTP+FN

,where TP is True Positive, FP is False Positive and FN is False Negative. Table 2.2 showsresults on different datasets.

2.3 Local Binary Patterns

Ojala et al. propose computationally simple, gray scale and rotation invariant texture de-scription method [Ojala et al. 2002]. The method is based on uniform Local Binary Patterns,which are considered as a fundamental properties of image texture.

The local texture T is first defined as a joint distribution of P pixels:

T = t(gc, g0, ..., gp−1), (2.11)

where gc is gray value of center pixel and gp(p = (0, 1, ..., p − 1)) are gray values of sur-rounding P pixels. The LBPP,R is used to calculate LBP, where P is number of neighborpixels and R is a spatial radius of circle from which is the LBP computed. Figure 2.6 visu-alize different P,R combinations. If a point do not fit in the center of the pixel, the value iscalculated with bilinear interpolation.

To make LBP invariant to gray scale values, the gray value of center pixel is subtracted fromall the neighbor pixels.

T = t(g0 − gc, g1 − gc, ..., gp−1 − gc) (2.12)

10


P = 4, R = 1 P = 8, R = 1 P = 12, R = 1.5

Figure 2.6: Different combinations of P,R for LBPP,R, if the point does not fit in the center ofpixel, the value is calculated with bilinear interpolation [Ojala et al. 2002].

0 1 2 3 4 5 6 7 8

Figure 2.7: Visualization of uniform patterns. Black and white color represents 0 and 1 bit in numberrepresentation while the number in center is the LBP riu2

P,R value [Ojala et al. 2002].

Differences gp − gc are not affected by changes in luminance, therefore the descriptor isinvariant against gray scale shifts. The final binary number is created by considering onlysigns of the differences and not the values.

T = t(s(g0 − gc), s(g1 − gc), ..., s(gp−1 − gc)), (2.13)

where

s(gp − gc) =

{1 x >= 0

0 x < 0(2.14)

The final value ofLBPP,R is the value of P bit number converted into decimal. It is importantto index the neighbor pixels so that they form circular chain.

The g0 is always on right from the center pixel, therefore when the image is rotated the LBPfor a same zone will give different values. This does not apply for values containing onlyzeros or ones. They always stay the same. To create unique identifier for every LBP that isinvariant to rotations the authors defined following equation:

LBP riP,R = min{ROR(LBPP,R, i)|i = 0, 1, ..., p− 1}, (2.15)

where ROR perform circular right shift on LBP number i times. It rotates the set clockwiseuntil the maximal number of most significant bits are 0.

LBP riP,R describes a feature, for example edge, on image, therefore it can be used as a feature

descriptor. LBP ri8,1 can have up to 36 of these unique descriptors. First eight of them are

called uniform and are visualized on Figure 2.7. Each of these uniform patterns describedifferent spots on the image. As it is stated in paper, for example number 8 detects darkspots and flat areas.

11


Since the 36 patterns differentiate greatly the authors made another adjustment to improve theinvariance to rotations. They only use the uniform LBP, which are visualized on Figure 2.7.Since the LBP value is local description of texture, histogram of LBP values is created toget description of the area. The uniform LBP are special type of LBP, which have one thingin common. All of them have at most two transitions between 0 and 1. To compute thisuniformity (U) following formula is used:

U(LBPP,R) = |s(gp−1 − gc)− s(g0 − gc)|+P−1∑p=1

|s(gp − gc)− s(gp−1 − gc)| (2.16)

Then after the U is computed a LBP riu2P,R is calculated as follows, where superscript riu2

means that only uniform LBP values are used:

LBP riu2P,R =

{∑P−1p=1 s(gp − gc) U(LBPP,R) <= 2

P + 1 otherwise(2.17)

Histogram of these uniform LBP values is created, where each uniform value has its own binand all other non uniform are in a special bin labeled with P + 1 label.

2.3.1 Blink detection using Local Binary Patterns

Authors Malik and Smolka use the Local Binary Patterns to propose blink detection method[Malik – Smolka 2014].

The method begins with calculating template histogram of LBP values for an open eye.Afterwards a histogram for every pixel is computed and compared to the template. Thismethod is called template matching. If the eye is closed the difference between the templateand current histogram is expected to be large enough to allow blink detection.

For distance measurement between two histograms the Kullback-Leibler divergence is used.This creates waveform of histogram difference between the histograms computed duringvideo and the template histogram that is computed in the first 50 seconds of video. Thewaveform is visualized in Figure 2.8

The signal is then smoothed by Savitzky-Golay filter (SGF) [Krishnan – Seelamantula 2013]to remove unwanted noise in the waveform. The filter works as follows:

Cp =m∑

i=−m

(Qi − yi)2 =m∑

i=−m

(

p∑k=0

akik − yi)2, (2.18)

where m is half width of the approximation interval, Qi is the polynomial approximation, yiis the noisy waveform. The smoothed waveform can be seen in Figure 2.9.

To detect blinks the signal is observed for high peaks. The peak detection is done withGrubb’s test [Grubbs 1969]. This method was tested on Basler5 dataset with followingstatistics: Detection Rate (DR) – ratio of detected blinks to all blinks, Detected Blinks(DB) and number of False Positives (FP ). The results are shown in Table 2.3

12


Frame

Val

ue

Figure 2.8: Waveform created from calculated distance between the template histogram and his-tograms calculated during video [Malik – Smolka 2014].

Frame

Val

ue

Figure 2.9: Waveform of distances between template histogram and histograms calculated duringvideo after smoothing [Malik – Smolka 2014].

Table 2.3: Results for dataset obtained with Basler camera. Blinks Total is how many blinks are invideo, DB – detected blinks, DR – Detection Rate ratio between total blinks and detected blinks andFP – Number of false positives [Malik – Smolka 2014].

Video Blinks Total DB DR FPPerson 1 34 34 100 5Person 2 33 33 100 1Person 3 99 94 94.95 3Person 4 55 54 98.18 1Person 5 81 81 100 1

13


14

Chapter 3

Adjusted Weighted Gradient Descriptor

Our method is based on the Weighted Gradient Descriptor (WGD) [Radlak – Smolka 2013].WGD method begins with calculating gradients over time (Equation 2.2). Gradients are splitinto two groups, positive and negative using threshold ε = 0.02. Afterwards weighted av-erage in vertical direction for both groups is calculated. Distance d(t) = vy0(t)↓ − vy0(t)↑between the two averages is computed. The final feature is multiplication of averages andcomputed d(t). Feature is calculated for all frames in video, which gives us waveform thatcan be analyzed to detect blinks. Global maximum and minimum are searched in wholewaveform. Afterwards, local maximum and minimum are found that is higher/lower thanfraction from the previously found global extreme. When local extremes are found, a regres-sion slope is calculated between them. If the regression slope is higher than defined thresholdthe blink is detected. The problem withWGD method is that if the global maximum or min-imum is found from non blink movement, the peak might be too high and the detection inwhole video fails. The requirement to find global extremes deteriorates the real-time use. Toovercome difficulties which decreases real-time use we decide to process average positionsof time gradients. We also change the blink detection to use state machine.

3.1 Average Gradient Positions

We compute gradients between two frames as in WGD using Equation 2.2. The gradientsare calculated from eye region defined as circle, which is computed from eye center andhalf of the eye radius. Radius and corners are computed from annotations for a dataset.We take this idea and the annotations from Fogelton and Benesova [Fogelton – Benesova2016]. All tested datasets are annotated so that the detection algorithm is independent fromface recognition algorithms. All computed gradients are taken into consideration and aresplit in two groups based on the sign. We split them based on sign because we observedthat splitting them by ε threshold decreases the real-time use of the method, because thishad to be set differently for different videos. We visualize the two groups of gradients inFigure 3.1, where we can see that formed groups swap their positions during blink. Whilethe eye is closing, positive group of gradients is in the top part of the eye, negative is in thebottom. During eye opening, negatives are on the top part, while positive on the bottom.The observed behavior is considered as blink pattern which we detect using state machine.

Chapter 3. Adjusted Weighted Gradient Descriptor Michal Ševcík

Figure 3.1: Visual representation of calculated gradients over time. The eye area is represented ascircle defined by eye center and eye radius divided by 2. Red color represents gradients with negativevalue, while blue are gradients with positive value. The top row represents their behavior duringblink, while the bottom is during non blink. The gradients are not calculated if there is no change incolor.

Average y-coordinate is calculated for both of these groups.

avgy(t)↑ =

∑Ni=0 yin

, (3.1)

where avgy(t)↑ is average y-coordinate of positive gradient values, N is number of positivegradient values and yn is y coordinate of one positive gradient. Both averages are normalizedwith the eye radius divided by two to normalize values to interval <0,1> so that the featureis invariant to the eye size. Center between the two averages is computed. Final value ofboth averages is relative to this midpoint so that the final definition scope is (-1,1) and bothavgy(t)↑ and avgy(t)↓ are symmetric. Waveform created from this feature can be seen inFigure 3.2. This waveform is the input to our state machine. It follows the same pattern thatwas mentioned before. The average positions have been swapped and the dimensionality offeature is significantly decreased.

3.2 Blink Detection

State machine is designed to detect blinks using avgy(t)↑ and avgy(t)↓. There are two majorpeaks in created waveform as is visualized in Figure 3.2. First, there is a larger peak inavgy(t)↑ and afterwards a slightly smaller peak in avgy(t)↓. Both waveforms change its signwhich can be observed as a swap. State machine first looks for a peak in avgy(t)↑, this peakneeds to be higher than defined threshold (T ). If the peak is found, the state machine switchto the second state, where it waits for a peak in avgy(t)↓. The peak in avgy(t)↓ needs tobe higher than T

2. Afterwards the blink is detected. Maximum blink length is defined as

threshold Ttime = 0.33 seconds to overcome false positives caused by head movements orother non blink peaks in waveform. After (Ttime is reached without detecting blink, the statemachine will reset and the detection will start over from start frame (ts) + 1, where ts is lastframe where state machine switched its state to 1 and t is currently processed frame. Thestate machine is visualized in Figure 3.3.

16


t

Blink Blink Blink Blink Blink

Figure 3.2: Waveform of calculated feature from average y-coordinates, where red waveform isavgy(t)↓ and blue is avgy(t)↑. Axis t represents frames in a video.

avg(t)↑ > T avg(t)↓ >T2

2 2

t > Ttime

20 1

Figure 3.3: State machine used to detect blinks, where T is threshold, t is current frame, avg(t)↑ isaverage y position of positive values for frame t and avg(t)↓ is average y position of positive valuesfor frame t. For every frame there is also condition check if the time has not reached its limit.

We detect blinks for each eye separately. To increase accuracy, blinks from both eyes aremerged. The merging is done to reduce duplicity of blinks in detection. The interval of thesame blink can start and end on different frames for both eyes. The intervals may overlapeach other. To overcome these problems we merge blinks usin Intersection over union met-ric [Fogelton – Benesova 2016] (IOU = (A∧B)/(A∨B)), where A is interval of first blinkand B is interval of the second blink. IOU metric is used to decide whether the blink fromthe left and right eye is the same blink or if they are separate blinks. If IOU is greater than0.2 the blinks are merged together. Otherwise we consider them as a separate blinks.

17


18

Chapter 4

Evaluation

We use several datasets to evaluate our algorithm. All datasets are manually annotated tomake the results from blink detection invariant to face and eye detection algorithms. We usefollowing datasets:

• ZJU,

• Researcher’s Night (RN) [Fogelton – Benesova 2016],

• Basler5 dataset introduced in WGD method [Radlak – Smolka 2013] which name istaken from work of Fogelton and Benesova [Fogelton – Benesova 2016].

We compare our best method with the state-of-the-art methods on given datasets.

4.1 Annotations

All datasets are annotated by annotators in work of Fogelton and Benesova [Fogelton –Benesova 2016]. The annotations include following information:

• Frame ID – id of frame the annotation belongs to,

• blink ID – id of blink if there is a blink in current frame,

• non frontal face – if the subject does not look into camera and blinks occurs,

• left eye – whether left eye is visible,

• right eye – whether right eye is visible,

• eye fully closed – whether eye is fully or only partially closed,

• eye not visible – if the eye is not visible because of conditions,

• face bounding box – x and y positions and width, height of face bounding box,

• left and right corner positions – corner positions for both left and right eye.

These annotations are used during the feature computation to calculate the Region Of Interest(ROI), from which the feature is extracted.

Chapter 4. Evaluation Michal Ševcík

4.2 ZJU

ZJU dataset consists of 80 videos. It is captured at 30 FPS with resolution of 320 × 240.There are 20 persons captured with and without glasses in frontal view and upward viewwhich makes total of 80 videos. There are total of 261 GT blinks, in annotation of Fogeltonand Benesova [Fogelton – Benesova 2016]. In work of Radlak and Smolka [Radlak – Smolka2013] there are 258 GT blinks because of different observations of what is blink. Some ofthese blinks are much longer than natural blinks, therefore sometimes during evaluation itrequires to increase the Ttime for detection to work correctly. This dataset was captured inlaboratory environment, there is almost no head movement, people are quite still. There is lotof noise in the datataset caused by low resolution camera and sometimes bad light condition.

4.3 Researcher’s Night

Researcher’s Night is a dataset introduced in work of Fogelton and Benesova [Fogelton –Benesova 2016]. This dataset was captured on event called Researcher’s night. There are107 videos of people captured at 15 and 30 FPS. Both 15 and 30 FPS videos are capturedwith 640× 480 resolution. There are often more than one person on the video therefore theannotations are required, since we always want to focus only on person that is in a video forthe longest time. This dataset is recorded during RN event in real-world environment withdifferent light conditions. People were asked to read something from the computer screen.Lot of head movement in the dataset makes it challenging for current methods. The lightconditions in the dataset causes a lot of unwanted noise. There are also some blinks thattakes longer since people knew that they are being recorded for blinks. Both 15 and 30 fpsdatasets are split into three sets: train, val and test set. There is total of 1849 GT blinks. The15 FPS subset has 706 GT blink, 30 FPS version has 1143 GT blinks.

4.4 Basler5

Radlak and Smolka use this dataset to test the Weighted Gradient Descriptor [Radlak –Smolka 2013]. It is a part of The Silesian Deception Database [Radlak 2015]. We areusing the same name for this dataset as Fogelton and Benesova. This dataset contains 5videos of 5 different persons. The videos were taken by high quality Basler camera at 100FPS with resolution 640× 480. There are total of 300 blinks, one person wears glasses. Thisdataset was recorded in laboratory environment using high fps camera and with good lightconditions. These makes the dataset less noisy and overall has good conditions for currentdetection methods. There are some head movements in the video and people also talk duringthe video, the blinks are also more natural.

4.5 Evaluation

State machine was tested on various datasets. There are slight changes in Ttime betweenResearcher’s night (Ttime = 0.33 seconds) and ZJU dataset (Ttime = 0.51 seconds). This

20


Table 4.1: Results from our method. GT – ground truth, DB – detected blinks, FN – false negative,FP – false positive, TP – true positive, F1 Score.

Dataset Precision Recall GT DB FN FP TP F1ZJU 95% 95% 261 259 14 14 245 95%Researcher’s night 15 80% 67% 706 575 227 116 459 73%Researcher’s night 30 67% 58% 1143 993 467 324 669 63%Researcher’s night 15 Test 71% 75% 270 269 62 78 191 73%Researcher’s night 30 Test 67% 54% 497 400 226 132 268 60%Basler - Person 1 77% 72% 32 30 9 7 23 74%Basler - Person 2 93% 85% 33 30 5 2 28 89%Basler - Person 3 98% 87% 99 88 13 2 86 92%Basler - Person 4 92% 89% 55 53 6 4 49 91%Basler - Person 5 92% 94% 81 83 5 7 76 93%

Table 4.2: Comparison between our best method – [1] and state-of-the-art methods on ZJU dataset [2]– [Fogelton – Benesova 2016], [3] – [Radlak – Smolka 2013]. [3] has different value of GT becauseof different definition of blink.

Method - Dataset Precision Recall GT DB FN FP TP F1[1] - ZJU 95% 95% 261 259 14 14 245 95%[2] - ZJU 100% 98% 261 256 5 0 256 99%[3] - ZJU 100% 98% 258 255 4 1 254 99%

is because in ZJU, persons blinks are longer and sometimes not natural. Table 4.1 showsdetection results on various datasets. We used following statistics to evaluate this method:Ground Truth (GT) – number of blinks in a dataset, Detected Blinks (DB) – number of blinksdetected by proposed method, False Negative (FN) – number of GT blinks that were notdetected, False Positive (FP) – number of detected blinks that are not in the video or aremissing in GT, True Positive (TP) – number of GT detected blinks, F1 Score: F1 = 2 ×(precision×recallprecision+recall

), precision = TPTP+FP

, recall = TPTP+FN

.

We compare the results from our method with the state-of-the-art-methods.

Comparison between our best approach and state-of-the-art method [Fogelton – Benesova2016] and original method [Radlak – Smolka 2013] is shown in Table 4.2 on ZJU dataset.Table 4.3 shows comparison on Basler5 dataset. Results on RN dataset are compared inTable 4.4.

21


Table 4.3: Comparison between our best method – [1] and state-of-the-art methods on Basler5 dataset[2] – [Fogelton – Benesova 2016], [3] – D(t) [Radlak – Smolka 2013]

Method - Dataset Precision Recall GT DB FN FP TP F1[1] - Basler5 - 1 77% 72% 32 30 9 7 23 74%[2] - Basler5 - 1 84% 97% 32 37 1 6 31 90%[3] - Basler5 - 1 93% 81% 32 28 6 2 26 87%[1] - Basler5 - 2 93% 85% 33 30 5 2 28 89%[2] - Basler5 - 2 97% 97% 33 33 1 1 32 97%[3] - Basler5 - 2 90% 88% 33 32 4 3 29 85%[1] - Basler5 - 3 98% 87% 99 88 13 2 86 92%[2] - Basler5 - 3 100% 89% 99 88 11 0 88 94%[3] - Basler5 - 3 100% 88% 99 87 12 0 87 94%[1] - Basler5 - 4 92% 89% 55 53 6 4 49 91%[2] - Basler5 - 4 93% 91% 55 54 5 4 50 92%[3] - Basler5 - 4 87% 62% 55 39 21 5 34 72%[1] - Basler5 - 5 92% 94% 81 83 5 7 76 93%[2] - Basler5 - 5 99% 99% 81 82 1 2 80 99%[3] - Basler5 - 5 99% 93% 81 77 6 2 75 95%

Table 4.4: Comparison between our best method – [1] and state-of-the-art methods on RN dataset[2] – [Fogelton – Benesova 2016].

Method - Dataset Precision Recall GT DB FN FP TP F1[1] - RN 15 79% 66% 706 575 227 116 459 72%[2] - RN 15 87% 81% 706 657 135 86 575 84%[1] - RN 30 67% 59% 1143 993 467 324 669 63%[2] - RN 30 87% 81% 1143 1062 222 141 921 84%[1] - RN 15 Test 62% 68% 270 296 84 110 186 65%[2] - RN 15 Test 92% 81% 270 238 50 18 220 86%[1] - RN 30 Test 51% 67% 497 658 159 322 336 58%[2] - RN 30 Test 82% 74% 497 453 126 82 371 78%

22

Chapter 5

Experiments

We did several experiments while converging to our final method. In this section we describeour experiments with their respective results and the reasoning behind our decisions.

5.1 Experiment 1

We began our experiments by implementing the algorithm described in the original pa-per [Radlak – Smolka 2013]. The idea was to evaluate the feature, D(t), on more difficultdatasets than ZJU and their own dataset sometimes referred as Basler5 [Fogelton – Benesova2016]. We made few changes while implementing this algorithm. Our method uses circulararea to compute the gradients. Original method used Gausian Kernel to normalize createdgradients. It has similar results like our method because it gives more importance to thegradients in the center of the eye, we are using only the ones in the center of the eye. Thesecond change we made is that we normalize the feature with Intra Ocular Distance. Weused the same method for blink detection as is described in the original paper. This methodis based on calculating global maximum and minimum from the whole video. Afterwards itprocesses the whole video again and looks for local maximum that is higher than a half of theglobal maximum. Local minimum is then found and regression slope is calculated betweenthe maximum and minimum. Blink detection is based on shareholding. Blink is detected ifthe calculated regression slope is higher than threshold λ = 0.2. Example of this waveformduring blink can be seen in Figure 5.1. The result of this base implementation can be seen inTable 5.1.

Table 5.1: Results from our first experiment using similar method as is in original paper [Radlak– Smolka 2013] with only few adjustments. GT – ground truth, DB – detected blinks, FN – falsenegative, FP – false positive, TP – true positive, F1 Score.

Dataset Precision Recall GT DB FN FP TP F1ZJU 99% 97% 261 254 7 2 252 98%Researcher’s night 15 62% 70% 706 772 201 286 486 66%Researcher’s night 30 57% 70% 1143 1397 339 599 798 62%Researcher’s night 15 Test 62% 68% 270 296 84 110 186 65%Researcher’s night 30 Test 51% 67% 497 658 159 322 336 58%

Chapter 5. Experiments Michal Ševcík

localMax

localMin

regressionslope

Figure 5.1: Waveform from our first experiment using D(t) and thresholding method with regressioncalculated between localMax and localMin. This waveform was later used in another experimentusing state machine to detect blinks.

Table 5.2: Results from our second experiment using state machine to detect blinks. GT – groundtruth, DB – detected blinks, FN – false negative, FP – false positive, TP – true positive, F1 Score.

Dataset Precision Recall GT DB FN FP TP F1 ScoreZJU 93% 62% 261 177 97 12 165 75%Researcher’s night 15 52% 63% 706 864 255 412 452 57%Researcher’s night 30 32% 57% 1143 2011 489 1352 659 41%Researcher’s night 15 Test 49% 64% 270 355 96 180 175 55%Researcher’s night 30 Test 33% 51% 497 775 242 519 256 40%

5.2 Experiment 2

Since we want to achieve real-time use, the finding of global maximum is not a viable option.Another problem with previous experiment is that the blink detection has to be set up dif-ferently for different datasets. The results show that the method can work very well on easydatasets that are captured in laboratory environment, but not on a more difficult ones, whichare taken in live environment. To solve these problems we created a simple state machineto detect blinks in computed feature D(t). The waveform is fed to the state machine. If aD(t) value above threshold T , that needs to be set differently for different datasets, is foundin the waveform, the state machine switch to the State 1 and the D(t) value is added to thesum. While the state machine is in State 1 and a value above T is found, it is added to thesum. If there is a value that is lower than −T and sum is greater than sum threshold Tsum,the state machine switch state to State 2. While in State 2 the state machine checks for D(t)value below threshold −T . If the value is found it is added to sum. If the sum in State 2 isless than −Tsum then a regression slope is calculated between highest peak in positive andnegative part of the waveform. If this slope is higher then 0.2 then a blink is detected. Thestate machine is visualized in Figure 5.2. Results are in Table 5.2.

It is often difficult to setup state machine with so many parameters, different dataset needsdifferent thresholds. This again do not fulfill our requirement of being real-time. There isalso huge decrease in the detection results. Biggest difference is on the simplest dataset ZJU

24


Figure 5.2: State machine used to detect blinks using old D(t) waveform. D(t) is value forframe t, t is current frame, T is threshold for D(t) values, sumT is threshold for sums.

where the results dropped by 23% in F1 Score.

We have performed several not presented experiments with settings of the state machine, butnone of them proved to achieve better results. We realized that the feature D(t) we are usingis not well designed for usage in state machine. Therefore we tried to create new approachin feature calculation. This way we converged into the final feature.

5.3 Experiment 3

After finishing experiments on our best method, we made several experiments with LocalBinary Patterns. We based our method on work of Malik and Smolka [Malik – Smolka2014]. Their method use template matching to detect blinks in a video. Since we requirereal-time usage, we decided to experiment with this feature.

The LBP computation is visualized in Figure 5.3. We subtract the value of center pixel fromall of the neighbor pixels to increase invariance to gray scale shifts. Afterwards the pixel arethresholded with value γ = 4, which serves as a filter to decrease noise in the picture. Afterthe neighbors are thresholded, an 8 bit binary number is created by starting right from thecenter pixel and continuing downwards in circular pattern.

The LBP number is checked, whether it is an uniform pattern. Uniform pattern means thatthere are maximum of two transitions between 0 and 1. We check the uniform pattern withEquation 2.16 that was described in a paper proposed by Ojala et al. [Ojala et al. 2002].

We then calculate a histogram of uniform patterns for a current and previous frame, whereeach uniform pattern has its own bin and all non-uniform patterns go into one special bin.We use cosine distance (Equation 5.1) between the two histograms to calculate our feature,

25


Figure 5.3: LBP computation. First, central pixel gc is subtracted from all neighbors toachieve gray scale shifts invariance. Afterwards a binary number is created following circularpattern, where each neighbor is thresholded with γ to give either 0 or 1.

Figure 5.4: Waveform created from cosine distances between histograms in time, gray areasrepresent blinks. Comparison between two videos to visualize lack of repeating pattern.

where Ai and Bi represent current and previous histogram. This creates a waveform that weuse for blink detection. Example of the waveform can be seen in Figure 5.4.

dist =

∑ni=0Ai Bi√∑n

i=0A2i

√∑ni=0B

2i

(5.1)

The main problem that occurs is that we have not found repeating pattern in waveform, thatcould be used for blink detection. The peak is not always present while blinking. Moreover, peaks often occurs during head movement or because of significant noise in video. Wedefined simple state machine for blink detection on our waveform, but after only 54% F1Score on ZJU dataset we concluded that the feature is affected by way too many factors togive a precise information about blinking.

26


Table 5.3: Comparison between experiments for Researcher’s Night Test dataset which includes both30 an 15 FPS test subset of RN dataset.

Method Precision Recall GT DB FN FP TP F1Experiment 1 56% 73% 767 972 197 421 551 64%Experiment 2 35% 62% 767 1354 290 877 477 45%Our method 65% 60% 767 688 302 235 453 63%

Table 5.4: Comparison between experiments for ZJU dataset.Method Precision Recall GT DB FN FP TP F1Experiment 1 99% 97% 261 254 7 2 252 98%Experiment 2 93% 62% 261 177 97 12 165 75%Our method 95% 95% 261 259 14 14 245 95%

5.4 Discussion

We compare results from our experiments with the final method. Table 5.3 shows comparisonbetween the results of our experiments for Researcher’s Night Test dataset which includesboth 30 and 15 FPS subset of test dataset from Researcher’s Night. Table 5.4 shows theresults for ZJU dataset.

We compared only result from first and second experiment. Third experiment achieved only56% F1 Score on ZJU Dataset, which is 19% less than our worst experiment, therefore wedo not include it in our comparison.

First experiment achieves the best results on both ZJU and RN dataset, because there areminimal changes to the original method and it still does not work in real-time. There isvisible drop in performance between ZJU and RN dataset. This is because in RN datasetthere is a considerable head movement, which creates high peaks in the waveform, the globalextremes are too high and the method has difficulties detecting blinks.

Our second experiment works in real-time, but still uses the same waveform as the firstexperiment. Since the state machine is based on thresholds, it has difficulties detecting blinkswith different peaks. We normalized the waveform so that the blink peaks are similar. Wemade the blinks similar, all blink peaks are in similar range, but the head movements are stilllarger.

Our third experiment also works in real-time. However the results of this method are un-satisfying, because we have not found repeating pattern for blink in the waveform to designbetter state machine.

27


28

Chapter 6

Conclusion

This bachelor thesis focuses on blink detection using web camera. Several use cases ofblink detection algorithm are described, including prevention of dry eye syndrome. State ofthe art methods in blink detection are described. We focus more on the Weighted GradientDescriptor proposed by Radlak and Smolka [Radlak – Smolka 2013], the Motion VectorAnalysis proposed by Fogelton and Benesova [Fogelton – Benesova 2016] and Local BinaryPatterns texture description method proposed by Ojala et al. [Ojala et al. 2002].

We made modifications to Weighted Gradient Descriptor method with aim to increase itsreal-time usability. We did several experiments before converging to our final method. Wechange the way the feature is calculated and create a state machine to detect blinks. Ourfeature use average y coordinate of calculated gradients over time. Use of such featurereduces the dimensionality of the original feature and is better input for state machine detec-tion, which makes it possible to detect blinks in real-time. We compare our results on threedatasets: ZJU, Basler5 and Researcher’s Night. Compared to the original method, we makethe algorithm work in real-time, unfortunately performance on ZJU is decreased by 4% in F1Score. On Basler5 dataset we have comparable results to Radlak and Smolka since our F1score decreases 13% in first video, but increase on fourth video of 19% in F1 Score, yet westill have 8% worse F1 Score than Fogelton and Benesova. The proposed method achieveslower F1 Score on RN dataset compared to Fogelton and Benesova from 12% to 21%.

Chapter 6. Conclusion Michal Ševcík

30

Bibliography

AYUDHYA, C. D. N. – SRINARK, T. A Method for Real-Time Eye Blink Detection and Its Applica-tion. 6th International Joint Conference on Computer Science and Software Engineering (JCSSE).2009.

BAY, H. – TUYTELAARS, T. – VAN GOOL, L. SURF: Speeded Up Robust Features. In: ComputerVision – ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13,2006. Proceedings, Part I, pages 404–417, 2006. ISBN 978-3-540-33833-8.

BLEHM, C. et al. Computer Vision Syndrome: A Review. Survey of Ophthalmology. 2005, vol. 50,no. 3, pages 253 – 262. ISSN 0039-6257.

BROX, T. Optical Flow. In: Computer Vision: A Reference Guide, pages 565–569, 2014. ISBN978-0-387-31439-6.

DINGES, D. F. et al. Evaluation of Techniques for Ocular Measurements as an Index of Fatigue andthe Basis for Alertness Management. National Highway Traffic Safety Administration, 1998.

DINH, H. – JOVANOV, E. – ADHAMI, R. Eye blink detection using intensity vertical projection.In: Eye Blink Detection Using Intensity Vertical Projection, pages 40–45, 2012.

DRUTAROVSKY, T. – FOGELTON, A. Eye Blink Detection Using Variance of Motion Vectors.In: Computer Vision - ECCV 2014 Workshops: Zurich, Switzerland, September 6-7 and 12, 2014,Proceedings, Part III, pages 436–448, 2015. ISBN 978-3-319-16199-0.

FARNEBÄCK, G. Two-Frame Motion Estimation Based on Polynomial Expansion. In: ImageAnalysis: 13th Scandinavian Conference, SCIA 2003 Halmstad, Sweden, June 29 – July 2, 2003Proceedings, pages 363–370, 2003. ISBN 978-3-540-45103-7.

FOGELTON, A. – BENESOVA, W. Eye blink detection based on motion vectors analysis. ComputerVision and Image Understanding. 2016, 148, pages 23–33. ISSN 1077-3142.

GRUBBS, F. E. Procedures for detecting outlying observations in samples. Technometrics. 1969, vol.11, no. 1, pages 1–21.

JAVADI, M.-A. – FEIZI, S. Dry eye syndrome. Journal of ophthalmic & vision research. 2011, vol.6, no. 3, pages 192–198.

KRISHNAN, S. – SEELAMANTULA, C. On the selection of optimum Savitzky-Golay filters. SignalProcessing, IEEE Transactions on. 2013, vol. 61, no. 2, pages 380–391.

KRÓLAK, A. – STRUMIŁŁO, P. Eye-Blink Controlled Human-Computer Interface for the Disabled.In: Human-Computer Systems Interaction: Backgrounds and Applications, pages 123–133, 2009.ISBN 978-3-642-03202-8.

Bibliography Michal Ševcík

LENSKIY, A. A. – LEE, J.-S. Driver’s eye blinking detection using novel color and texture segmen-tation algorithms. International Journal of Control, Automation and Systems. 2012, vol. 10, no. 2,pages 317–327. ISSN 2005-4092.

MALIK, K. – SMOLKA, B. Eye blink detection using Local Binary Patterns. In: MultimediaComputing and Systems (ICMCS), 2014 International Conference on, pages 385–390, 2014.

OJALA, T. – PIETIKAINEN, M. – MAENPAA, T. Multiresolution gray-scale and rotation invari-ant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis andMachine Intelligence. 2002, vol. 24, no. 7, pages 971–987. ISSN 0162-8828.

POLIKOVSKY, S. – KAMEDA, Y. – OHTA, Y. Facial micro-expressions recognition using highspeed camera and 3D-gradient descriptor. In: 3rd International Conference on Crime Detectionand Prevention (ICDP 2009), pages 1–6, 2009.

PORTELLO, M. J. K. R. – CHU, C. A. Blink Rate, Incomplete Blinks and Computer Vision Syn-drome. Optometry & Vision Science. 2013, vol. 90, no. 5. ISSN 1040-5488.

RADLAK, B. S. M. B. Silesian deception database – presentation and analysis. In: Proceedings ofthe 2015 ACM on Workshop on Multimodal Deception Detection, pages 29–35, USA, 2015.

RADLAK, K. – SMOLKA, B. A novel approach to the eye movement analysis using a high speedcamera. In: Advances in Computational Tools for Engineering Applications (ACTEA), 2012 2ndInternational Conference on, pages 145–150, 2012.

RADLAK, K. – SMOLKA, B. Blink Detection Based on the Weighted Gradient Descriptor. In:Proceedings of the 8th International Conference on Computer Recognition Systems CORES 2013,pages 691–700, 2013. ISBN 978-3-319-00969-8.

SOUKUPOVÁ, T. – CECH, J. Real-Time Eye Blink Detection using Facial Landmarks. 21st Com-puter Vision Winter Workshop. 2016.

URICÁR, M. et al. Real-time multi-view facial landmark detector learned by the structured outputSVM. In: Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Confer-ence and Workshops on, volume 02, pages 1–8, 2015.

VIOLA, P. – JONES, M. Rapid object detection using a boosted cascade of simple features. In: Com-puter Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE ComputerSociety Conference on, volume 1, pages 511–518, 2001.

YOLTON, D. et al. The effects of gender and birth control pill use on spontaneous blink rates. Journalof the American Optometric Association. 1994, vol. 65, no. 11, pages 763–770. ISSN 0003-0244.

32

Appendix A

Technical Documentation

The following tools were used to program the algorithm:

• Qt Creator 4.0.2,

• OpenCV 2.4.6

We highly recommend using the same tools that we used for algorithm to work correctly. To runthe program open NewComputePoinDiff for feature calculation and DebugEvalPointDiff for blinkdetection on given feature. While evaluating algorithm you can define either #define debug forwaveform visualization or #define runTest for evaluation on dataset set by dataSetFoldervariable. Version 2.4.6 of OpenCV is required for algorithm to work correctly because of used ffmpegcodec so that frame numbers fit to annotation numbers.

Following section presents codes from our implementation, including our best method and experimentwith Local Binary Patterns.

Listing A.1 shows gradient computation and division in to two groups based on sign. Listing A.2shows feature computation from our best method for right eye, left eye is computed analogously.Listing A.3 shows state machine used in our final method to detect blinks.

Appendix A. Technical Documentation Michal Ševcík

Listing A.1: Gradient computation and division in two groups based on sign.for (int i = previousRightEyeCenter.x - previousRightEyeCenter.radius; i

< previousRightEyeCenter.x + previousRightEyeCenter.radius; ++i){for (int j = previousRightEyeCenter.y - previousRightEyeCenter.radius

; j < previousRightEyeCenter.y + previousRightEyeCenter.radius; ++j){//chceck if point is in circular areaif (pow(i-previousRightEyeCenter.x,2) + pow(j-

previousRightEyeCenter.y,2) > compareRadius) continue;//calculation of gradientg = (presentImage.at<uchar>(j,i) - previousImage.at<uchar>(j,i));//dividing gradients into two groups based on signif (g > 0){

Gradient gradientUp;gradientUp.n = g;gradientUp.x = i - (previousRightEyeCenter.x -

previousRightEyeCenter.radius);gradientUp.y = j - (previousRightEyeCenter.y -

previousRightEyeCenter.radius);gradientsUp.push_back(gradientUp);

}else if (g < 0){Gradient gradientDown;gradientDown.n = g;gradientDown.x = i - (previousRightEyeCenter.x -

previousRightEyeCenter.radius);gradientDown.y = j - (previousRightEyeCenter.y -

previousRightEyeCenter.radius);gradientsDown.push_back(gradientDown);

}}

}

34


Listing A.2: Average y position feature calculation.std::sort(gradientsDown.begin(),gradientsDown.end());std::sort(gradientsUp.begin(),gradientsUp.end());//keepPercentage is set to keep n-gradients (set to 100% in our final

method)std::vector<Gradient> finalGradientsDown(gradientsDown.begin(),

gradientsDown.begin() + gradientsDown.size()*keepPercentage);std::vector<Gradient> finalGradientsUp(gradientsUp.begin() + gradientsUp.

size()*(1 - keepPercentage),gradientsUp.end());avg = 0;for (std::vector<Gradient>::iterator it = finalGradientsUp.begin() ; it

!= finalGradientsUp.end(); ++it){vectorUp.y0 += ((Gradient)*it).y;

}vectorUp.y0 /= (double)finalGradientsUp.size();for (std::vector<Gradient>::iterator it = finalGradientsDown.begin() ; it

!= finalGradientsDown.end(); ++it){vectorDown.y0 += ((Gradient)*it).y;

}vectorDown.y0 /= (double)finalGradientsDown.size();

vectorUp.y0 /= sqrt(compareRadius);vectorDown.y0 /= sqrt(compareRadius);

avg = (vectorUp.y0 + vectorDown.y0) / 2;

vectorUp.y0 -= avg;vectorDown.y0 -= avg;

if (std::isnan(vectorUp.y0) || ratio < keepRatio){upVector.push_back(0);

}else{upVector.push_back(vectorUp.y0);

}if (std::isnan(vectorDown.y0) || ratio < keepRatio){

downVector.push_back(0);}else{

downVector.push_back(vectorDown.y0);}

35


Listing A.3: State machine used to detect blinks in avgy waveform.std::vector<Blink> detectBlinksStateMachine(std::vector<double> upOrigin,

std::vector<double> downOrigin,std::vector<float> deltaTime,doublethreshold,double thresholdStd,double frameLimit){std::vector<Blink> blinks;int state = 0;double numOfFrames = 0;int lastFrame = 0;int startFrame = 0;for(uint i = 0; i < upOrigin.size(); i++){

if(state == 1 && numOfFrames >= frameLimit / 2.0){state = 0;lastFrame = i;i = startFrame;numOfFrames = 0;

}else if(state != 0 && numOfFrames >= frameLimit){state = 0;cmp = 0;lastFrame = i;i = startFrame;numOfFrames = 0;

}else if(state == 0 && numOfFrames >= frameLimit){numOfFrames = 0;

}else if(state == 0 && downOrigin.at(i) > threshold){startFrame = i;state = 1;cmp += downOrigin.at(i);numOfFrames = deltaTime.at(i);

}else if(state == 0){numOfFrames += deltaTime.at(i);

}else if(state == 1 && upOrigin.at(i) > threshold*thDiv){state = 0;Blink temp;temp.start = startFrame;temp.end = i;numOfFrames = 0;blinks.push_back(temp);

}else if(state == 1){numOfFrames += deltaTime.at(i);

}}return blinks;

}

36

Appendix B

Plan Review

We made the analysis of state-of-the-art methods in the winter semester. At the end of the semesterwe made following plan for summer semmester:

• Implement algorithm based on Weighted Gradient Descriptor method,

• adjust the algorithm so it can work in real time, remove dependency on global maximums andminimums of computed waveform and decrease number of parameters needed in overall,

• test the algorithm on more datasets,

• compare results with the state-of-the-art methods,

• focus on different feature based on color in the eye area,

• implement algorithm that will compute the new feature,

• adjust evaluating algorithm for the new feature,

• test algorithm on different datasets,

• fix possible errors and improve algorithm,

• test the algorithm again,

• compare results with the State of The Art methods,

• summarize results of our work and write conclusion.

We managed to fulfill this plan except for the new algorithm based on color. However we did alot more work on the adjusted weighted gradient method. We did an experiment with local binarypatterns, which was meant to be the new algorithm, unfortunately this experiment had bad results onthe ZJU dataset, therefore we decided to discontinue work on this algorithm.

Appendix B. Plan Review Michal Ševcík

38

Appendix C

Resumé

C.1 Úvod

Žmurkanie je dôležité ak si chceme zachovat’ zdravé oci. Pocas žmurknutia sa na povrch ocí nanášaochranný film, ktorý slúži ako ochrana pred prachom a baktériami. Je niekol’ko faktorov, ktoré ov-plyvnujú pocet žmurknutí za minútu:

• alergia,

• nálada,

• lieky, a tak d’alej.

Znížená frekvencia žmurkania je nebezpecná a môže spôsobit’ vel’ké zdravotné problémy. Jedným znich je aj syndróm suchého oka. Syndróm suchého oka môže byt’ spôsobený niekol’kými vecami:

• dlhé používanie pocítaca,

• zlá klimatizácia miestnosti,

• alergie,

• vek,

• kontaktné šošovky, a tak d’alej.

L’udia trpiaci syndrómom suchého oka trpia niekol’kými symptómami: podráždenie ocí, únava ocí,rozmazané videnie, a tak d’alej.

Zo žmurknutia môžeme zbierat’ niekol’ko štatistík. Najdôležitejšou z nich je v prípade syndrómusuchého oka pocet žmurknutí za minútu. Toto císlo by u zdravého cloveka malo byt’ medzi 14 až 20žmurknutí za minútu.

Detekcia žmurkania však môže byt’ využitá aj v iných oblastiach. Casto sa napríklad používa akonáhrada interakcie medzi pocítacom a clovekom v prípade zdravotného postihnutia. Postihnutý clovekvyužíva pohyb svojich ocí ako ukazovatel’ myši a žmurknutie je brané ako kliknutie. Rovnako samôže pomocou štatistiky, ktorá urcuje ako dlho je oko zavreté pocas daného casového intervalu, urco-vat’ únava cloveka za volantom. V prípade vysokých císel tejto štatistiky môže byt’ vodic upozornenýaby si oddýchol a tým sa môže predíst’ nehodám zaprícineným mikro spánkom.

Appendix C. Resumé Michal Ševcík

C.2 Aktuálny stav oblasti

Väcšina algoritmov pozostáva z troch krokov: rozpoznanie tváre, rozpoznanie ocí a detekcia žmurka-nia.

Na detekciu ocí a tváre sa najcastejšie používa algoritmus Viola-Jones [Viola – Jones 2001].

Na detekciu žmurkania existuje vel’a metód. V práci autorov Dhin, Jovanov a Adhami [Dinh et al.2012] sa využíva vertikálna projekcia intenzity. Je to súcet intenzít pixelov v jednom riadku. Nadetekciu sa pocíta metrika otvorenosti oka, ktorá je vlastne vzdialenost’ou medzi dvoma lokálnymiminimami vo vertikálnej projekcií intenzít. Táto metóda dosahuje 94.8% úspešnost’ na ich vlastnomdatasete.

Stav viecka sa sleduje v práci Ayudhya a Srinark [Ayudhya – Srinark 2009]. V metóde sa využívaalgoritmus, ktorý vypocíta hodnotu, ktorá sa nazýva stavom viecka pre daný obrázok. Vytvorí sagraf týchto hodnôt. Po tom co sa tento graf vytvorí stavový automat detekuje nad grafom žmurknu-tia. Metóda dosahuje 92.6% úspešnost’ na datasete obsahujúcom 4 osoby, zachytený kamerou, ktorásníma 30 snímkov za sekundu v rozlíšení 320 x 240.

V práci Soukupovej a Cecha [Soukupová – Cech 2016] sa využívajú detektory, ktoré urcia kl’úcovébody na tváry, pomocou ktorých sa ráta pomer oka. Bodov je 6, dva sú v rohoch oka, dva sú nahornom viecku a dva na spodnom. Pomer týchto bodov je menší ked’ je oko zatvorené. Podpornývektorový automat je natrénovaný na toto správanie a využitý na detekovanie žmurkania.

Niektoré zo spomínaných algoritmov dosahujú úspešnost’ detekcie nad 95%. Viac sa však chcemesústredit’ na metódu váženého gradient deskriptora, ktorý dosahuje dobré výsledky ale nesplna jednuz našich hlavných podmienok, možnost’ detekcie v realnom case.

C.2.1 Deskriptor váženého gradientu

Radlak a Smolka [Radlak – Smolka 2013] vytvorili algoritmus, ktorý detekuje žmurknutia pomocouváženého gradientu. Na zaciatku je rozpoznaná tvár a oci pomocou algoritmu Viola-Jones [Viola –Jones 2001].

V oblasti oka sú vyrátané gradienty v case pomocou nasledovnej rovnice:

It(x, y, t) = I(x, y, t+ 1)− I(x, y, t− 1) (C.1)

Tieto gradienty sú rozdelené do dvoch skupín na základe konštanty epsilon. Jedna skupina sú gra-dienty, ktoré sú väcšie ako 0.02 a druhá sú tie gradienty, ktoré sú menšie ako −0.02. Ostatné súpovažované za šum a preto niesu brané do úvahy. Pre obe skupiny sa vyráta vážený priemer týchtogradientov. Vyráta sa d(t), ktoré predstavuje vzdialenost’ medzi týmito dvoma priemermi. Na koniecsa vyráta finálna vlastnost’ D(t) a to tak, že sa vynásobí d(t) váženým priemerom oboch skupín.

Graf týchto vlastností sa využíva na detekciu žmurkania. Najprv sa nájde globálne maximum a min-imum v danom grafe. Dalej sa hl’adá lokálne maximum, ktoré je väcšie ako n-tina z globálnehomaxima. Ak sa takéto lokálne maximum nájde podobne sa hl’adá aj minimum. Ked’ sa nájde min-imum vyráta sa sklon klesania. Ak je sklon klesania menší ako 0.2 tak sa tento úsek považuje zažmurknutie.

Táto metóda dosahuje 98.83% úspešnost’ na ZJU datasete. Metóda je tieže testovaná na ich vlastnomdatasete obsahujúcom 5 osôb, pricom jedna osoba má okuliare.

40


C.2.2 Detekcia žmurkania založená na analýze pohybových vektorov

Fogelton a Benešová vytvorili metódu detekcie žmurkania pomocou pohybových vektorov [Fogelton– Benesova 2016].

Na detekciu tváre je použití Viola-Jones algoritmus [Viola – Jones 2001]. Oci sú detekované pomocouClandmarku [Uricár et al. 2015]. Pohybové vektory v oblasti oka sú rátané pomocou Farnback-ovhoalgoritmu [Farnebäck 2003].

Priemerný smer pohybu je vyrátaný v oblasti oka rovnako tak aj smerodajná odchýlka. Tieto dvemetriky sú požité v stavovom automate na datekciu žmurknutia. Stavový automat má 4 stavy. Nultýje zaciatocný stav. Prvý stav predstavuje pohyb viecka dole. Druhý stav opisuje pohyb viecka nahor.Tretí a posledný stav predstavuje stav kedy je detekované žmurknutie. Tiež sa pocas celej detekcieoveruje ci už žmurk netrvá príliš dlho.

Pocítanie pohybových vektorov aj detekcia prebieha na l’avom a pravom oku zvlášt’ a neskôr sú spo-jené pomocou metriky, ktorá sa nazýva prienik cez zjednotenie. Táto metrika vracia císlo, pomocouktorého sa urcí ci sú žmurknutia z pravého a l’avého oka rovnaké žmurknutie alebo ide o dve separátnežmurknutia.

Táto metóda dosahuje 99% úspešnost’ na ZJU datasete. Autori tiež predstavili nový dataset, ktorýnazvali Noc Výskumníkov podl’a rovnomennej udalosti. Na tomto datasete metóda dosahuje 83%úspešnost’.

C.2.3 Lokálne Binárne Vzory

Ojala a spol. vytvorili metódu uniformných lokálnych binárnych vzorov [Ojala et al. 2002].

Táto metóda sa zakladá na vytvorení lokálneho vzoru pre P susedov v okruhu R. Ak napríklad P = 8a R = 1 tak sa pocíta 8 bitové císlo nasledovným spôsobom:

• Odcíta sa hodnota stredného pixelu od pixelov okolo,

• na základe znamienka sa susedia zmenia bud’ na 1 alebo 0,

• vznikne binárne císlo, ktoré sa prevedie do decimálnej sústavy.

Aby bolo vzor uniformný musí splnat’ podmienku maximálne dvoch prechodov medzi 0 a 1 alebo 1a 0. Zo vzniknutých uniformných vzorov sa vytvorí histogram, kde každý uniformný vzor má svojemiesto a všetky neuniformné idú na jedno extra miesto.

Malik a Smolka [Malik – Smolka 2014] používajú túto metódu na detetekovanie žmurkania. Vytvoriasi vzorový histogram pre otvorené oko, ktorý potom porovnávajú s histogramami ktoré vytvoria pocasvidea. Vytvoria tým graf vzdialeností medzi týmito histogramami, kde hl’adajú žmurknutia pomocouvysokých zmien.

C.3 Upravený deskriptor váženého gradientu

Naša metóda je založená na váženom gradient deskriptore. Urobili sme niekol’ko úprav ci už privýpocte vlastnosti, na ktorej detekujeme žmurkania, ale aj na spôsobe akým detekujeme žmurkania.Tieto upravy boli spravené aby bola splnená naša podmienka výpoctu v reálnom case.

41


C.3.1 Priemerná pozícia gradientov

Gradienty sú do skupín rozdelené podl’a znamienka. Pre obe skupiny je potom vypocítaná priemernáy pozícia. Táto pozícia je normalizovaná pomocou priemeru oka aby sa dosiahol definicný obor< 0, 1 >. Následne je vyrátaný stred medzi týmito dvoma skupinami. Finálne pozície sú urcenérelatívne k tomuto stredu aby bol finálny definicný obor (−1, 1). Tieto body vytvoria našu pozorovanúvlastnost’ pre každý snímok videa.

C.3.2 Detekcia žmurkania

Detekcia žmurkania prebieha pomocou stavového automatu. Stavový automat sa skladá z trochstavov. V nultom stave automat caká, kým bod kladných gradientov vystúpi nad hranicnú hodnotu.V prvom stave caká kým tento bod klesne pod zápornú hranicnú hodnotu. Ak obe tieto stavy prejdeautomat v poriadku ide do stavu dva kde detekuje žmurknutie.

C.4 Evaluácia

Metóda bola testovaná na anotovaných datasetoch aby sme sa mohli sústredit’ na testovanie algoritmudetekcie žmurkania a neboli ovplyvnovaný algoritmami detekcie tváre a ocí.

C.5 Záver

Práca sa zaoberá témou detekcie žmurkania pomocou webovej kamery. Niekol’ko prípadov použitiatakéhoto algoritmu je popísaných na zaciatku práce. Sústredili sme sa najme na metódu váženéhogradient deskriptora. Snažili sme sa metódu upravit’, aby splnala naše podmienky výpoctu v reálnomcase.

Pri porovnaniach s metódami z tejto oblasti sme nedosiahli najlepšie výsledky, kde v porovnaní soriginál metódou od Radlak a Smolka sme po spriemerovaní výsledkov dostali výsledky lepšie len o1.2%. Metóda však funguje v reálnom case. Od autorov Fogelton a Benešová zaostávame od 12 do21% na rôznych datasetoch.

42

Appendix D

DVD Contents

/datasets/ – datasets which were used to test the algorithm/install/ – opencv 2.4.6 library/sources/ – C++ sources of the algorithm/thesis/ – pdf version of thesis

faculty of informatics and information...

Documents