the international arab conference on information technology (acit’2013) real time finger binary...

The International Arab Conference on Information Technology (ACIT’2013)

Real Time Finger Binary Recognition Using Relative Finger-Tip Position

ASADUZ ZAMAN #1, MD ABDUL AWAL ǂ2, CHILL WOO LEE §3 and MD ZAHIDUL ISLAM #4

#Dept. of Information & Communication Engineering, Islamic University, Kushtia, Bangladesh [email protected], [email protected]

ǂDept. of Computer Science, College of Engineering & Technology, IUBAT, Dhaka, BAngladesh 2 [email protected]

§School of Computer Engineering, Chonnam National University, Gwangju, South Korea [email protected]

Abstract: In this paper, we propose a method to recognize real time Finger Binary number shown by hand gesture using relative finger-tip position using procedural and logical way. Using relative finger-tip position knowledge, the process will be able to identify binary sequence of finger-tip positions making a way to recognize hundreds of finger-gestures using only two hands. The proposed method uses a color based segmentation to identify skin area from image frames and connected component detection to detect hand region. The experimental results show that the process is able to recognize finger binary with a satisfactory accuracy. Keywords: Finger binary, Relative finger-tip position, color based segmentation, connected component.

1. Introduction No doubt, Present world of technology is seeking for another interface for Human Computer Interaction (HCI) which can provide an environment where a user can interact with a computer or roughly speaking a machine in a more user friendly way. Several systems have been developed to solve this problem. But most of the systems failed to meet the ultimate demand of user. Wearing special gloves is inconvenient; speech recognition greatly suffers from background noise. Despite all these, we can say that Gesture Recognition is more convenient to provide user interface because of availability of image or video capturing device in a very low cost. And to provide numerical input, Finger Binary Recognition is more efficient method.

In Finger Binary[1] Recognition System, a computer or machine will response to the binary number represented by the hand. The binary string will be just a combination of 1 and 0 representations of hand finger tips. 1 for shown and 0 for not shown finger tips. If we consider one hand, we’ll able to provide 0 to 31 [2^0 to 2^5-1] and if we consider two hand at a time, we’ll be able to provide 0 to 1023 [2^0 to 2^10-1]. We can also provide negative number just making one finger as a flag whether it’s negative or positive. 2. Related Work There exist several methods to recognize hand gesture. These methods can be divided into two major categories. Data glove based and Vision based. Pragati Garg et al. provided a survey on these methods[2]. The Data-Glove based methods use sensor devices for digitizing hand and finger motions into multi-parametric data. The extra sensors make it easy to collect hand configuration and movement. However,

the devices are quite expensive and bring much cumbersome experience to the users [3]. Vision based systems in contrast provide more natural and user-friendly way. Rehg and Kanade uses 3D hand model based approach[4]. Freeman and Roth proposed a pattern recognition technique using orientation histogram [5]. Feng-sheng chen et al. proposed real time tracking method using HMM for gesture recognition [6]. There exist numerous work for hand gesture recognition. Some emphasizes on template based approach[7], some used haar-like feature[8] and some used SIFT adabost[9]. Most of the systems tried to intentionally or unintentionally just counting the fingers in a hand pose.

The main focusing point of the proposed method is to increase the number of gesture that can be represented and recognized. Fully capable version of this method will be able to recognize 32 numeric counting gestures from one hand where traditional approach limits in 5 to 10 numeric counting gestures. Though it’s computationally expensive to find relative position of the fingers as compared with counting fingers, it’ll open a door to have thousands of hand gesture been recognized just using two hands and an imaging device.

3. Proposed Framework The method we propose uses color based segmentation approach to detect skin area and connected component detection for hand region detection. Relative finger-tip position is acquired in features extraction stage. The brief block diagram of finger binary recognition process is shown in Figure 1.

A. Image Acquisition Each frame of input video stream either real time video or locally stored video will be processed for finger binary recognition. Each frame is resized to 320x240 if the frame is larger.

B. Segmentation The input frame is segmented for further processing in this stage. The segmentation is crucial because it isolates the task-relevant data from image background before passing them to subsequent recognition stages. There are several features which are used in image segmentation. Such as skin color, shape, motion and anatomical model of hand. We’ve used skin color based segmentation approach. For skin color segmentation, illumination invariant color spaces are more preferred. Normalized RGB, HSV, YCrCb and YUV are some illumination invariant color spaces. Terrillon et al. [10] reviewed different skin chromaticity models and evaluate their performance. We’ve used YCrCb color space for segmentation process. After skin-color segmentation, the frame is passed through Gaussian Blur, Erosion and Dilation process to remove noise from image. Figure 2 shows the result of our used skin color segmentation process.

C. Hand Region Detection Binary image frame produced by segmentation stage is treated as input of this stage. Hand region detection stage tries to detect the hand region of the input image. As we’ve stated at our environment setting, we’ve assumed that the biggest part of the input frame will be hand region. The process of finding biggest part of the skin image is done with help of finding contours of the binary skin image and taking the biggest contour as hand region. For being in safe

side, input color image is passed through Haar-classifier based face detector with scale factor of 1.5 and minimum size of one-fifth of original image for faster processing. If any face is found, then hand region detection is processed excluding face area. Figure 3 shows the result of this process with hand region and face area marked with rectangle.

D. Features Extraction Features extraction is the most important stage for finger-binary recognition. For features extraction, firstly we’ll find contour of hand region which we have found in hand region detection stage. Then we’ll find convex hull for that contour. The convex hull provides a set of convexity defect. Every convexity defect contains four piece of information. Such as, A. Start point B. End point C. Depth point D. Depth Convexity defects of a hand figure are shown in figure 4.

Each start points and end points are possible finger-tip position. To prevent the system to detect false finger-tips, the system uses equation (1).

Figure 4. Convexity defects: the dark contour line is a convex hull around the hand; the gridded regions (A–H) are convexity defects in the hand contour relative to the convex hull. Image curtsey: Learning OpenCV by Gary Bradski and Adrian Kaehler

Figure 3: (a) input skin color segmented image, (b) contour detection with hand region and face area marked. Notice that if face areas are excluded, the biggest contour is hand region

Figure 1. Block diagram of the proposed framework.

푓(푥) = 1, 푥. 푑푒푝푡ℎ > max푋훼0, 표푡ℎ푒푟푤푖푠푒

� (1)

Where x implies convexity defect and max implies maximum depth of all convexity defects and α is the threshold value. Output 1 or 0 simply implies that the convexity defect is potential finger-tip bearer or not. After that start and end points of all potential convexity defects are taken as possible finger-tip position. Actual finger-tip positions are detected using equation (2).

푓(푥) =1, 퐹푇 = 푁푈퐿퐿

0, 퐸퐷(푥, 푦) < 훼푦 ∈ 퐹푇1, 표푡ℎ푒푟푤푖푠푒

� (2)

Where x and y are potential finger-tip position, FT is finger-tip array which initially set to NULL. ED(x,y) is Euclidean distance of x and y and α is threshold value which implies the minimum distance for two consecutive finger-tips. This function will return 1 if FT is null and x will be stored in FT. If for any member ED(x,y) returns less then α, the point will be discarded. Otherwise the x will be stored in FT. This stage extracts information about all finger-tips position along with a center-point of hand region which is the center of bounding rectangle of the detected hand region. Finally this stage stores some information for later use. Initially the system asks the user to show binary number 11111 or all finger-tips shown position. When the user shows binary number 11111, the system learns the features for making further communication smooth. In this stage the system stores a structure for the information bellow,

1. Angular relation from each finger to all other fingers. E.g. t2ir, t2mr, i2mr etc meaning thumb to index finger angular relation, thumb to middle and index to middle.

2. c2tA [Palm center to thumb angle with reference to x-axis]

3. c2tD [Palm center to thumb distance] 4. hand [bounding rectangle of hand region]

E. Decision Making The last stage of the system is decision making stage. This stage provide the recognized finger binary number as the system output. Recognition of binary 00000 and binary 00001 are processed separately as they provide quite distinguishable features. All other recognition is done by predicting whether a specific finger-tip is shown or not.

i) 00000 Recognizing Recognizing 00000 is quite easy task as when 00000 is shown by a user, the hand region provides smallest area. If current hand region’s height and width are less than a threshold level, we’ll detect the case as 00000. The system uses equation 3 for this case.

푓(푥) = 1, 푥. ℎ푒푖푔ℎ푡 < 훼; 푥. 푤푖푑푡ℎ < 훽0, 표푡ℎ푒푟푤푖푠푒

� (3)

Where x is the current frames hand region. This case is shown in figure 5.

ii). 00001 Recognizing

Recognizing 00001 is almost same as recognizing 00000. The only difference between 00000 and 00001 is that thumb is shown or not. The thumb finger just extends the 00000 frame’s hand region width above a threshold value of the actual hand width. The system uses equation 4 for this case.

푓(푥) = 1, 푥. ℎ푒푖푔ℎ푡 < 훼; 푥.푤푖푑푡ℎ > 훽0, 표푡ℎ푒푟푤푖푠푒

� (4)

Where x is the current frames hand region. This case is shown in figure 6.

iii). Predicting Thumb Position Using stored information from features extraction stage, thumb position is predicted in each frame. Predicting thumb position is very important because the system uses this thumb position as the reference position for relative finger-tip position finding. The system uses equation 5 and 6 for predicting thumb position.

Figure 6. 00001 Recognizing. Here α and β are both taken as 0.8

Figure 5. 00000 Recognizing. Here α and β are both taken as 0.8

푇. 푥 = 푐2푡푑 × cos(푐2푡푎) + 푐푒푛푡푒푟. 푥 (5) 푇. 푦 = 푐2푡푑 × sin(푐2푡푎) + 푐푒푛푡푒푟. 푦 (6)

Where T is the thumb point, c2tD and c2tA are center to thumb distance and center to thumb angle with reference to X-axis from saved features and center is current frame’s center position. Red dots in figure 7(a), 7(b), 7(d), 7(e), 7(g), 7(h), 7(i) and 7(j) are predicted thumb position.

iv). Predicting Finger-tips Shown or not For predicting finger-tips shown or not, we will firstly measure the angle among our predicted thumb position, current center position and each of the finger-tips found from features extraction stage. Then we’ll label each finger-tip whether it’s shown or not. Let the angle found is a. Then system will decide what finger is shown using the equation (7) to equation (11).

푡ℎ푢푚푏 = 1, 푎푛푔푙푒 < 450, 표푡ℎ푒푟푤푖푠푒

� (7)

푖푛푑푒푥 = 1, 45 < 푎푛푔푙푒 < 800, 표푡ℎ푒푟푤푖푠푒

� (8)

푚푖푑푑푙푒 = 1, 95 < 푎푛푔푙푒 < 1000, 표푡ℎ푒푟푤푖푠푒

� (9)

푟푖푛푔 = 1, 105 < 푎푛푔푙푒 < 1150, 표푡ℎ푒푟푤푖푠푒

� (10)

푝푖푛푘푦 = 1, 115 < 푎푛푔푙푒0, 표푡ℎ푒푟푤푖푠푒

� (11)

Note that angles are measured in degree. From equation (8) to equation (12), we can see that some angles are omitted. Omitted angle-ranges are 80-95 and 100-105. These angle-ranges are possible position for more than one finger. Angle-range 80-95 is possible position for index and middle finger and angle-range 100-105 is for middle and ring finger. To determine which finger is actually shown, we’ll use our stored information and will update our stored information in every frame using equation (12).

푅 = (12)

Where R is the previous relation of angle between fingers. And then we’ll compute sum of distance of the relations using equation (13).

푤 = ∑ (푠푟 − 푐푟) (13)

Where sri is the i’th stored relationships and cr is the current relationship found by equation (12). Finger-tip with minimum value of w will be associated as predicted finger-tip.

4. Experimental Result TABLE 1: EXPERIMENTAL RESULT

Sample No

Total Frames

Gesture Frames

Correct Output

Recognition Rate

S#1 692 187 154 82.35% S#2 998 333 237 74.55% S#3 783 223 175 78.47%

Average 824 247 188 78.45

Some frames are affected by dynamicity of the real time approach. For this reason, some gestures couldn’t be recognized. Also some gesture is very hard to perform because of articulation of human hand. At this point of development, the system is not scale invariant. The most affecting issue is skin-color segmentation. The process will do better is a better skin-color segmentation approach is applied. But as our focusing point is not skin-color segmentation, we’ve just used traditional approach with best possible localization. We’ve used OpenCV library for our system that runs on a windows machine with gcc compiler. The system will also be able to run on a linux machine with gcc compiler. Figure 7 shows some outcome of the proposed system.

5. Conclusion Although accuracy of recognition rate of experimental result is nearly 80%, it’s noteworthy that the total number of individual gesture is increased a lot in this process. It is also notable that the process uses very simple mathematical calculation to recognize gesture which is computationally very inexpensive. The systems’ performance of accuracy could be increased with using a more sophisticated skin-color segmentation approach. Good lighting condition also now effecting the system performance. If we use both hands for gesture recognition, it’s possible to recognize all 1024 finger binary gestures.

References [1] http://www.en.wikipedia.org/wiki/Finger_binary [2] Pragati Garg, Naveen Aggarwal, and Sanjeev

Sofat. "Vision based hand gesture recognition." World Academy of Science, Engineering and Technology 49, no. 1 (2009): 972-977.

[3] Mulder, “Hand gestures for HCI”, Technical Report 96-1, vol. Simon Fraster University, 1996

[4] JM Rehg, T Kanade, “Visual tracking of high DOF articulated structures: An application to human hand tracking” ,In Proc. European Conference on Computer Vision , 1994

[5] Freeman, William T., and Michal Roth. "Orientation histograms for hand gesture recognition." International Workshop on Automatic Face and Gesture Recognition. Vol. 12. 1995.

[6] Chen, Feng-Sheng, Chih-Ming Fu, and Chung-Lin Huang. "Hand gesture recognition using a real-time tracking method and hidden Markov models."Image and Vision Computing 21, no. 8 (2003): 745-758.

[7] Stenger, “Template based Hand Pose recognition using multiple cues”, In Proc. 7th Asian Conference on Computer Vision : ACCV 2006

[8] C C Wang, K C Wang, “Hand Posture recognition using Adaboost with SIFT for human robot interaction”, Springer Berlin, ISSN 0170-8643, Volume 370/2008

Asaduz Zaman has completed his B.Sc. from the Department of Information & Communication Engineering, Islamic University, Kushtia, Bangladesh. He is currently working in Gesture Recognition System. He has

research interest in Visual Object Tracking. Currently, he is a graduate student of Department of Information & Communication Engineering, Islamic University, Kushtia, Bangladesh.

Md Abdul Awal has successfully possessed a degree in MS in Telecommunication from University of Information Technology & Science (UITS) and Bachelor of Computer Engineering from American

International University – Bangladesh (AIUB). In 2007 he has joined as a Faculty in Department of Computer Science under College of Engineering & Technology at International University of Business Agriculture and Technology (IUBAT), Dhaka, Bangladesh. He has been working with IBM Bangladesh Private Limited.

Chil Woo Lee has received his B.Sc. and M.Sc. degree in Electronic Engineering from Chung-Ang University, Seoul, Korea in 1986 and 1988 respectively. And he has

received his PhD also in Electronic Engineering in 1992 from University of Tokyo, Japan. Since 1996, he has been a professor at Department of Computer Engineering, Chonnam National

Figure 7: Finger binary recognition result. (a), (b), (c), (d), (e), (f), (g) represents correct recognition of some gesture. (h) shows false detection of pinky finger which is corrected and (i), (j) shows incorrect recognition.

University in Korea. He has worked as senior researcher at laboratories of image information science and technology for four years, from 1992 to 1996 and at that time he had an extra post of visiting researcher at Osaka University in Osaka, Japan. From January 2001, he visited North Carolina A&T University as a visiting researcher and jointly worked on several digital signal processing projects.

Md Zahidul Islam has received his B.Sc. and M.Sc. degree from the Department of Applied Physics and Electronic Engineering, University of Rajshahi (RU), Bangladesh in

2000 and 2002 respectively. In 2003, he has joined as a Lecturer in the Department of Information & Communication Engineering, Islamic University (IU), Kushtia, Bangladesh. He has done his Ph.D research on Visual Object Tracking System from the Department of Computer Engineering at Intelligent Image Media and Interface Lab, Chonnam National University (CNU), South Korea. In August 2011, Dr. Islam has been successfully awarded his PhD from the same department. Besides, he has done his research internship in 3D Vision Lab in Samsung Advanced Institute of Technology (SAIT), Suwon, South Korea. Dr. Islam has also other research interests like computer vision, 3D object, human and motion tracking and tracking articulated body and genetic algorithm etc. Currently he is an Associate Professor and head of the Department of Information & Communication Engineering, Islamic University (IU), Kushtia, Bangladesh.

the international arab conference on information technology (acit’2013) real time finger binary...

Documents