Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
1
Medical Data Understanding: An
Overview
Dr. H S Nagedraswamy
Professor, DOS in Computer Science Manasagangothri, Mysore-560007
1. Introduction
With the advancement of science and technology, automation took place
in various sectors such as banking, business, education, medicine,
agriculture etc. The major goal of any automation task is to minimize
the effort, maximize the productivity and to enhance the quality of
service. In the field of medicine, automation systems such as intelligent
experts systems and decision support systems help physicians and
medical practitioners effectively diagnose the diseases and make right
decisions in treating a patient. In order to design expert systems or
decision support systems, a huge volume of heterogeneous data of
possibly high dimension need to be gathered, pre-processed,
represented, analyzed and interpreted. So, from the automation point of
view, understanding medical data is very much important for
professionals who design experts systems as well as for physicians who
validate the designed system.
2. Medical Data and its Importance
Medical datum is any single observation of a patient - for example, a
temperature reading, a red-blood-cell count, a past history of rubella, or
a blood-pressure reading. Data provide the basis for categorizing the
problems a patient may be having, or for identifying subgroups within a
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
2
population of patients. They also help a physician to decide what
additional information is needed and what actions should be taken to
gain a greater understanding of a patient’s problem or to treat most
effectively the problem that has been diagnosed.
Types of Medical Data
The types of medical data in the practice of medicine and the allied
health sciences include narrative textual data to numerical
measurements, recorded signals, drawings, and even photographs.
Some narrative data are loosely coded with shorthand conventions
known to health personnel, particularly data collected during the
physical examination, in which recorded observations reflect the
stereotypic examination process taught to all practitioners. It is
common, for example, to find the notation “PERRLA” under the eye
examination in a patient’s medical record. This encoded form indicates
that the patient’s “Pupils are Equal (in size), Round, and Reactive to
Light and Accommodation”.
Many data used in medicine take on discrete numeric values. These
include such parameters as laboratory tests, vital signs (such as
temperature and pulse rate), and certain measurements taken during
the physical examination. When such numerical data are interpreted,
however, the issue of precision becomes important. Can a physician
distinguish reliably between a 9-cm and a 10-cm liver span when she
examines the patient’s abdomen? Does it make sense to report a serum
sodium level to two-decimal-place accuracy? Is a 1-kg fluctuation in
weight from one week to the next significant? Was the patient weighed
on the same scale both times (that is, could the different values reflect
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
3
variation between measurement instruments rather than changes in the
patient)?
In some fields of medicine, analog data in the form of continuous
signals are particularly important. The best known example is an
electrocardiogram (ECG), a tracing of the electrical activity from a
patient’s heart. When such data are stored in medical records, a
graphical tracing frequently is included, with a written interpretation of
its meaning. There are clear challenges in determining how such data
are best managed in computer storage systems.
Visual images are another important category of data, which are either
acquired from machines or sketched by the physician. Radiologic
images are obvious examples of this type. It also is common for a
physician to draw simple pictures to represent abnormalities that
she/he has observed; such drawings may serve as a basis for
comparison when she or another physician next sees the patient. For
example, a sketch is a concise way of conveying the location and size of
a nodule in the prostate gland.
3. Data Measurement
Precise measurement of medical data is very much important, which
would otherwise leads to wrong conclusion. Medical data are multiple
observations about a patient. A single datum generally viewed as
defined by four elements:
• The patient in question.
• The parameter being observed (for example, liver size, urine-
sugar value, history of rheumatic fever, heart size on chest X-ray
film).
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
4
• The value of the parameter in question (for example, weight is 70
kg, temperature is 98.6o F, profession is steel worker).
• The time of the observation (for example, 2:30 A.M. on 14 FEB
2013).
It is important to keep a record of the circumstances under which a
datum was obtained. For example,
• Was the blood pressure taken in the arm or leg?
• Was the patient lying or standing?
• Was it obtained just after exercise?
• During sleep?
• What kind of recording device was used?
• Was the observer reliable?
It is rare that an observation – even by a skilled clinician - can be
accepted with absolute certainty. A related issue is the uncertainty in
the values of data.
• An adult patient reports a childhood illness with fevers and a red
rash in addition to joint swelling. Could he have had scarlet fever?
The patient does not know what his pediatrician called the disease.
• A physician listens to the heart of an asthmatic child and thinks that
she hears a heart murmur— but she is not certain because of the
patient’s loud wheezing.
• A radiologist looking at a shadow on a chest X-ray film is not sure
whether it represents overlapping blood vessels or a lung tumor.
• A confused patient is able to respond to simple questions about his
illness, but his physician is uncertain how much of his reported
history is reliable.
4. Uses of Medical Data
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
5
Medical data are recorded for a variety of purposes. They may be needed
to support the proper care of the patient from whom they were obtained.
They also may contribute to the good of society through the aggregation
and analysis of data regarding populations of individuals.
Create the Basis for the Historical Record
What is the patient’s history (development of a current
illness; other diseases that coexist or have resolved;
pertinent family, social, and demographic information)?
What symptoms has the patient reported?
What physical signs have been noted on examination?
How have signs and symptoms changed over time?
What laboratory results have been or are now available?
What radiologic and other special studies have been
performed?
What interventions have been undertaken?
What is the reasoning behind those management decisions?
Each new patient complaint and its management can be viewed as a
therapeutic experiment, inherently confounded by uncertainty, with the
goal
of answering three questions:
1. What was the nature of the disease or symptom?
2. What was the treatment decision?
3. What was the outcome of that treatment?
Anticipate Future Health Problems
Data gathered routinely in the ongoing care of a patient may suggest
that he is at high risk of developing a specific problem, even though he
may feel well and be without symptoms at present. Medical data
therefore are important in screening for risk factors, following patients’
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
6
risk profiles over time, and providing a basis for specific patient
education or preventive interventions, such as diet, medication, or
exercise.
Record Standard Preventive Measures
The medical record also serves as a source of data on interventions that
have been performed to prevent common or serious disorders. The best
examples of such interventions are immunizations: the vaccinations
that begin in early childhood and may continue throughout life,
including special treatments administered when a person will be at
particularly high risk (for example, injections of gamma globulin to
protect people from hepatitis, administered before travel to areas where
hepatitis is endemic).
Identify Deviations from Expected Trends
Data often are useful in medical care only when viewed as part of a
continuum over time. An example is the routine monitoring of children
for normal growth and development by pediatricians. Single data points
regarding height and weight generally are not useful by themselves; it is
the trend in such data points observed over months or years that may
provide the first clue to a medical problem. It is accordingly common for
such parameters to be recorded on special charts or forms that make
the trends easy to discern at a glance.
Provide a Legal Record
Another use of medical data, once they are charted and analyzed, is as
the foundation for a legal record to which the courts can refer if
necessary. The medical record is a legal document; most of the clinical
information that is recorded must be signed by the responsible
individual. In addition, the chart generally should describe and justify
both the presumed diagnosis for a patient and the choice of
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
7
management. A well maintained record is a source of protection for both
patients and their physicians.
Support Clinical Research
Another use of medical data is to support clinical research through the
aggregation and statistical analysis of observations gathered from
populations of patients.
5. Weaknesses of the Traditional Medical-Record System
The traditional way of keeping records possesses certain limitations.
Following are the few such limitations highlighted.
Pragmatic and Logistical Issues
The data cannot effectively serve the delivery of health care unless they
are recorded. Their optimal use is dependent on positive responses to
the following questions:
• Can I find the data I need when I need them?
• Can I find the medical record in which they are recorded?
• Can I find the data within the record?
• Can I find what I need quickly?
• Can I read and interpret the data once I find them?
• Can I update the data reliably with new observations in a
form consistent with the requirements for future access by myself
or other people?
Redundancy and Inefficiency
In order to be able to find data quickly in the chart, health professionals
have developed a variety of techniques that provide redundant recording
to match alternate modes of access.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
8
The Passive Nature of Paper Records
A manual archival system is inherently passive; the charts sit waiting
for something to be done with them. They are insensitive to the
characteristics of the data recorded within their pages, such as
legibility, accuracy, or implications for patient management.
Computational techniques for data storage, retrieval, and analysis make
it feasible to develop record systems that (1) monitor their contents and
generate warnings or advice for providers based on single observations
or on logical combinations of data; (2) provide automated quality
control, including the flagging of potentially erroneous data; or (3)
provide feedback of patient-specific or population-based deviations from
desirable standards.
6. The Structure of Medical Data
Scientific disciplines generally develop precise terminology or notation
that is standardized and accepted by all workers in the field.
Imprecision and the lack of a standardized vocabulary are particularly
problematic when we wish to aggregate data recorded by multiple
health professionals or to analyze trends over time. Without a
controlled, predefined vocabulary, data interpretation is inherently
complicated, and the automatic summarization of data may be
impossible. For example, one physician might note that a patient has
“shortness of breath.” Later, another physician might note that she has
“dyspnea.” Unless these terms are designated as synonyms, an
automated flowcharting program will fail to indicate that the patient
had the same problem on both occasions.
Coding Systems
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
9
Because of the needs to know about health trends for populations and
to recognize epidemics in their early stages, there are various health-
reporting requirements for hospitals (as well as other public
organizations) and practitioners. For example, cases of gonorrhea,
syphilis, and tuberculosis generally must be reported to local public-
health organizations, which code the data to allow trend analyses over
time. The Centers for Disease Control (CDC) in Atlanta then pool
regional data and report national as well as local trends in disease
incidence, bacterial-resistance patterns, and the like. Researchers at
many institutions have worked for over a decade to develop a unified
medical language system (UMLS), a common structure that ties together
the various vocabularies that have been created.
The Data-to-Knowledge Spectrum
Datum as a single observational point, generally can be regarded as the
value of a specific parameter for a particular object (for example, a
patient) at a given point in time. Knowledge, then, is derived through
the formal or informal analysis (or interpretation) of data. The term
information is more generic in that it encompasses both organized
data and knowledge.
The observation that a patient Brown has a blood pressure of 180/110
is a datum, as is the report that the patient has had a myocardial
infarction (heart attack). When researchers pool and analyze such data,
they may determine that patients with high blood pressure are more
likely to have heart attacks than are patients with normal or low blood
pressure. This data analysis has produced a piece of knowledge about
the world.
7. The Computer and Collection of Medical Data
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
10
Physicians may be asked to fill out structured paper data sheets, or
such sheets may be filled out by data abstractors who review patient
charts, but the actual entry of data into the database is done by paid
transcriptionists. In some applications, it is possible for data to be
entered automatically into the computer by the device that measures or
collects them. Certain data can be entered directly by patients; there are
systems, for example, that take the patient’s history by presenting on a
terminal multiple-choice questions that follow a branching logic. The
patient’s responses to the questions are used to generate hardcopy
reports for physicians, and also may be stored directly in a computer
database for subsequent use in other settings.
Summary
Medical data plays an important role in designing intelligent experts
systems and decision support systems. Various types of medical data
such as textual data, discrete numerical data, analog data, recorded
signals, visual images and photographs are normally used by experts
for analyzing patients. Precise measurement of medical data and
handling uncertainty associated with measurement is very much
important. Medical data are used for a wide variety of reasons.
Traditional medical records have some serious drawbacks and can be
alleviated through automation, which requires structuring of medical
data for storage, retrieval and manipulation.
References
1. Van Bemmel J.H. et al (Eds). Data, Information and Knowledge in Medicine. Methods of Information in Medicine, Special issue, 27(3), 1988.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
11
2. Patel V.L., Arocha J.F., Kaufman, D.R. Diagnostic reasoning and medical expertise. In Medin D. (Ed.) The psychology of learning
and motivation, 31:187-252. New York: Academic Press,1994.
3. Chute C.G., Cohn S., Campbell K.E., Oliver D., Campbell J.R. The
content coverage of clinical classifications. Journal of the American Medical Informatics Association 1996; 3(3):224-33.
4. Campbell J.R., Carpenter P., Sneiderman C., Cohn S., Chute
C.G., Warren J. Phase II evaluation of clinical coding schemes: Completeness, taxonomy, mapping, definitions, and clarity.
Journal of the American Medical Informatics Association 1997; 4(3):238-51.
*-*-*-*-*
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
12
Recent Tools and Techniques for Medical Image Processing
Dr. Vinay K
Assistant Professor
PG Department of Computer Science JSS College of Arts, Commerce and Science
Ooty Road, Mysore [email protected]
1.0 INTRODUCTION
Medical imaging is the technique and process used to create images of
the human body for clinical purposes or medical science. Medical
imaging has a number of techniques which can be used as non-
intruding methods of looking into the body. This means the body does
not have to be open surgically to look at various organs and areas. It
can be used to assist diagnosis or treatment of different medical
conditions. Since from the discovery of X-rays by Wilhelm Conrad
R¨ontgen in 1895, medical images have become a major component of
diagnostics, treatment planning.
In today’s world, doctors would be able to diagnose, treat and cure
patients without surgically opening the body, also not causing harmful
side effects. The use of medical image processing has enabled doctors to
gain a more immediate and accurate understanding of a patient’s
condition than ever before without having to cut inside of the body.
Medical imaging also helps us learn more about neurobiology and
human behaviors. Medical images are used for education,
documentation, and research describing morphology as well as physical
and biological functions in 1D, 2D, 3D, and even 4D image data.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
13
Medical imaging includes different imaging modalities and processes to
image human body for diagnostic and treatment purposes and therefore
has an important role in the improvement of public health in all
population groups. Furthermore, medical imaging allows to follow the
course of a disease already diagnosed or treated. Area of medical
imaging is very complex and, depending on a context, requires
supplementary activities of medical doctors, medical physicists,
biomedical engineers as well as technicians, so cost effectiveness should
be handled and can be enhanced by more efficient data handling in the
hospitals, which has become possible through the digitization of
diagnostic information.
Medical Imaging Technology has been developed to satisfy the huge
demand for information on medical imaging, a demand made not only
by radiologists but also by cardiologists, physicians and senior
healthcare managers. The report published in 2008 on brain scan
medical image processing where brain imaging is being used to
understand why some people become long-term cocaine addicts and
some do not.
1.1 ADVANCES IN MEDICAL IMAGE PROCESSING
The field of medical imaging, influenced by advances in digital and
communication technologies, has grown tremendously in the recent
years. New imaging techniques that reveal greater anatomical detail are
available in most imaging modalities.
Medical imaging is continually evolving and advancing, all with the goal
of improving patient care. Here are the few example listed.
The migration of X-rays from film to digital files.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
14
The evolution of MRIs from slow and fuzzy to fast and highly
detailed.
The portability of ultrasound.
Single-chip ultrasound and High-intensity focused ultrasound
(HIFU).
DLP Hyperspectral Imaging
By applying an optical semiconductor technology
commonly used in digital color projectors to an
imaging technique which light on an array of
potential optical medical imaging applications. The
resulting hype spectral imaging system could help
reduce the risk of complications during various medical procedures and
associated liability, when performing open or endoscopic surgery, it is
often difficult to differentiate between neighboring tissues. For example,
when removing the gallbladder, it is important not to damage the
common bile duct. If we could non-invasively distinguish the bile duct
from surrounding arteries, the surgeon would know better where to cut.
Electromagnetic Acoustic Imaging
Electromagnetic acoustic imaging
(EMAI) is a new imaging technique
that uses long-wavelength RF
electromagnetic (EM) waves to
induce ultrasound emission.
Signal intensity and image
contrast have been found to
depend on spatially varying electrical conductivity of the medium in
addition to conventional acoustic properties. The resultant conductivity-
weighted ultrasound data may enhance the diagnostic performance of
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
15
medical ultrasound in cancer and cardiovascular applications because
of the known changes in conductivity of malignancy and blood-filled
spaces.
Wafer-Scale Mega Microchip
A large developed microchip is designed to enhance medical imaging
applications. Measuring a whopping 12.8 cm square, the chip could
eventually aid in the diagnosis of cancer, enabling doctors to see the
impact of radiotherapy treatment more precisely. The wafer-scale chip
produces images that will clearly show the effects of radiation on
tumors and help doctors to detect them earlier and because it is strong,
the chip can survive many years of exposure to radiation.
3-D Metamaterial
Although ultrasound imaging is
ubiquitous in the medical field, it has
been limited by an inability to obtain
high-resolution, detailed images. By using
a 3-D metamaterial to achieve deep-
subwavelength imaging it is believe that
they can enhance ultrasound resolution
by a factor of 50. If realized, the metamaterial could be incorporated
into current ultrasound probes to capture high-resolution medical
images, thereby improving patient care.
MRI Heart Imaging
MRI Heart Imaging is used to have revolutionized
technology for imaging the beating heart. Produced
in one of the world's most powerful MRI systems,
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
16
with power equivalent to 150,000 times Earth's magnetic field, the
images provide much higher detail than standard cardiac images. The
ultrahigh field approach also delineates clearly between blood and heart
muscle. The new method could advance the capabilities of cardiac
research and care, enabling earlier diagnosis, monitoring, and
treatment of cardiac malfunctions.
1.3 Visualization and Analysis System for Medical Imaging
As for as the system of any Computer Aided Diagnosis concerned it
widely consists of following modules. Image Formation, Enhancement,
Visualization, Analysis and Management Module. Out of these modules
some have pipelined architecture and some are parallel processing
architecture. One important module is enhancement. In this module
medical imaging algorithms are applied to extract the region of interest
(ROI), filtering the noise present in the raw image, segmenting the Voxel
of Interest (VOI). Such processing steps are carried out using many
traditional as well as advance techniques. Visualization part is also got
much importance in medical imaging system. This is because medical
images often require higher end graphic system. Since each and every
pixel in medical images are very important and there should be no
compromise in viewing system. Different perception will yield different
meaning. Hence advanced computer graphic algorithms and high end
hardware systems are required for medical image processing task.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
17
Figure-1: Steps and Modules in Medical Image Processing System
2.0 Recent tools in Medical image Processing
In the last decade many remarkable research have been carried out in
the field of medical image processing. Many industry and health care
standard CAD systems were designed and effectively employed in
medical services. Research and Development companies like GE and
Siemens are designing many commercial devices. Many research groups
and researchers consortium are effectively working and designing
efficient algorithm. There is numerous software which are made open
source distributed with GPL (General Public License) and are freely
available for the research purpose. ITK/VTK, ImageJ, MANGO, MIPAV
and MeVISLAB are some of the standard computer programming tools
which are freely available. Apart for these imaging software many data
mining and Patter Recognition software like PRLab, SPM toolkit,
MATLAB and R programming languages are available for medical image
analysis.
CONCLUSION
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
18
Medical image processing is one of the fields which is rapidly
growing. Many research dedicated research labs are working in
developing sophisticated algorithm on processing medical data. As
time progress there lot many newer image modalities are evolving
and parallels advance research is also taking place in medical field.
Therefore medical image processing is one such field it never gets
saturate. Since newer modality of images involve different anatomical
structure and hence newer algorithms are required to answer such
issues.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
19
Dimensionality Reduction
Dr. S Manjunath
JSS College of Arts, Commerce and Science
Ooty Road, Mysore, Karnataka, India [email protected]
Now-a-days, there is a greater demand for medical data analysis and
understanding. In this process it is known that the raw medical data
(either, medical images from different modalities, medical data in the
form of text) is available in abandon. In order to understand and
analyze the raw data it is necessary to bring it into a structured format.
Normally, the simple way of representing the data is called pattern
matrix of feature matrix where rows corresponds to patters or objects or
observations and columns represents the attributes or features or
random variables. This type of representations is not free from noise,
redundant, or irrelevant data. In order to represent it in more efficient
way we have technique called as dimensionality reduction. In this
session, we are going look into basics of dimensionality reduction
techniques and tools available for dimensionality reduction.
Dimensionality reduction is a process of elimination or reduction or
transformation of random variables or features while preserving the
structure of the original data. Dimensionality reduction helps us in
solving the problem of curse of dimensionality when we have huge
number of features. Also, it provides better visualization of high
dimensional data along with being a good noise removal and data
compression technique. Dimensionality reduction techniques suffer
from possibility of information loss and recovering the original data if
they are performed poor on the original data.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
20
To name a few application of dimensionality reduction can be found in
the following area: medical data analysis, text categorization,
biometrics, document image processing, compression etc.
Dimensionality reduction techniques can be classified into classes two
classes viz., 1. Feature selection methods, 2 Feature Extractor methods.
In feature selection methods the original set of features are analyzed
and optimal subset of features are selected for further processing where
as in case of feature extraction methods the original set of features in
some space (say in Euclidean space) are transformed onto some other
space preserving the structure of the data and in the transformed space
the features are selected. The selection of features depends on an
objective and objective function depends on application for which
directionality reductions is being used. In order to understand
dimensionality reduction techniques let us consider there are ‘N’
number of patterns with‘d’ number of features.
The feature selection method tries to find out optimal subset of size ‘k’
out of ‘d’ number of features from ‘N’ patterns. This involves four stages:
1. Subset generation: In this step the subset of features are
generated each time. In order to generate subsets one can think
of exhaustive search, compete search, heuristic search,
probabilistic search and hybrid search methods.
2. Evaluation: The generated subset is evaluated to check whether
the generated subset meets the objective criterion or not. The
evaluation techniques can be either filter or wrapper or hybrid
approach. The filter based approaches are independent of an
inductive algorithm, whereas wrapper methods use an inductive
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
21
algorithm as the evaluation function. The hybrid approaches are
combinations of filter and wrappers approaches.
3. Stopping Criteria: This step is responsible for stopping the
process of subset generation and evaluation steps. In this step, if
the subset of features generated in subset generation stage
satisfies the objective criteria then process of feature selection is
stopped else the process continues with stage 1 by generating a
new subset of features.
4. Validation stage: This is an optional stage, where the selected
features are validated to check whether the selected features are
good enough on a real data or not.
In feature extraction methods, a pattern di RX is transformed to k
iY
using any suitable transformation function ‘T’. In the transformed
domain analysis on projected data in carried out to select the suitable
features and it is given by ki
FunctiontionTransforma
di YRX .
Feature extraction methods can be classified as supervised (Principal
Component Analysis [1], Independent Component Analysis [2], Latent
Semantic Analysis [3]), unsupervised (Fisher Linear Discriminant
Analysis [4]) and semi-supervised approaches. In case of supervised
approaches, the information about patters is given during
transformation whereas in case of unsupervised no information about
the samples is provided. In real world, it is observed that among entire
data some data may have information and some may not have
information in such case one can think of applying semi-supervised
techniques.
In the market we can see plenty of tools for reducing the dimensionality
of data such as MatLab, Weka, R, etc. In this session demonstration on
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
22
MatLab on different medical data such as heart diseases and lung
cancer will be discussed.
References:
1. Duda R. O, Hart P E, and Stork D G. (2002). Pattern
Classification. John Willey & Sons Publications.
2. Pierre C (1994). Independent Component Analysis: a new
concept? Signal Processing, Vol. 36, N.3, pp.287–314.
3. Kurimo, M. (1999). Indexing audio documents by using latent semantic analysis and SOM. In: Oja, Erkki and Kaski, Samuel (Ed.), Kohonen Maps, Elsevier Amsterdam, pp. 363–374.
4. Fisher R. A. (1938). The Statistical Utilization of Multiple
Measurements, Annalsof Eugenics, vol. 8, pp. 376-386.
*-*-*-*-*
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
23
PAPER
PRESENTED
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
24
An Enhanced Natural Scene Classification Based Image Browsing and
Retrieval System
Srinidhi S, Dr. Vinay K, Dr. G. Hemantha Kumar Department of Studies in Computer Science, University of Mysore
Manasagangotri-570006, Mysore, INDIA.
[email protected], [email protected]
Abstract
The objective of Natural scene classification is to classify images into
pre-defined scene categories. The number of digital images that needs to
be acquired, analyzed, classified, stored and retrieved in the scene
classification is exponentially growing. Accordingly, scene classification
and retrieval has become a popular topic in the recent years. In order to
find an image, the image has to be described or represented by certain
features. Since our problem is to classify the scene images, the
description of images using texture and shape analysis is predominant.
Texture and Shape is an important visual feature of an image. There are
many texture and shape representation and classification techniques in
the literature. In proposed algorithm, we classify and review some
important shape and texture feature extraction techniques. Finally the
performance of each feature and fusion of features are tested using k-
Nearest neighbor learning framework and Probabilistic Neural Network
learning framework.
Keywords: Natural scene classification, k-Nearest Neighbor,
Probabilistic Neural Network, texture feature and shape feature.
------------------------ ---------------------------
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
25
1. INTRODUCTION
Natural scene classification comprises of a set of simple techniques and
procedures, which are used to work on scene images, in order to
classify the scene images.
Natural scene classification is usually divided into three stages. The
first stage acquires the scene image. The second stage extracts
features. In final stage, classification techniques are used.
Since many years, several researches have been carried out in
developing systems to facilitate natural scene classification. The current
tools for browsing in large databases are not prepared to deal with this
type of data. And moreover, the uses of natural scene classification have
shown itself an expensive option from computational point of view.
Therefore, it is necessary to set up other categories of information
retrieval systems in order to extend the treatment to poor quality scene
images. For this purpose, another system is developed which is mainly
focused on retrieval system in scene images without explicit recognition.
This system is called as “Natural scene classification”.
Natural Scene Classification System is a system for image browsing,
searching and retrieving images from a large database of digital images.
The query comprises either an actual example from the collection of
dataset. An extremely important aspect in the retrieval procedure is the
image representation which works on features. The scene classification
procedure is used in a supervised manner.
In literature, many authors have presented different techniques to
classify scene. In [1], the scene classification of the scene has been
performed by extracting Global feature and is mapped to other extracted
features. The K-SVD is applied to individual pixel and SVM classifier are
applied to whole image and determines the performance. In [2], Fast
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
26
Fourier Transform (FFT) and GABOR filters were used to remove the
noise. GLCM and energy-based features were extracted and combined.
Individual trained SVM are applied for classification. In [3], JSEG
algorithm was introduced to segmentation of image. This model works
successfully for kth-nearest neighbor with Bayesian combination
scheme. In [4], The Independent Component Analysis (ICA) has been
taken for classifying scene. With using histogram sharpest spikes are
determined. SVM classifier is used to show better performance and good
generalization.
In this paper, we aim to develop Natural scene classification technique
for semi-scene image using features like texture and shape features,
which will be described in coming sections. Rest of the paper organized
follows: section 2 describes the proposed model, section 3 gives
experimental results and section 4 concludes the paper with the brief
summary.
2. Proposed model
The scene acquired is considered for Natural scene classification. Scene
images may contain noise: hence noise is removed by using Gabor filter.
Then feature extraction is carried out to extract features for image. The
proposed model is simple to use and understand. The architecture is as
shown in the following block model in Figure-1:
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
27
Fig. 1: Block diagram of proposed system
2.1 Feature extraction
We extract two features from a scene image, such as texture and
shape feature which are described as follows: first extracted feature is
texture; Texture is a set of metrics calculated in image processing
designed to quantify the perceived texture of an image. Texture gives us
information about the spatial arrangement of colour or intensities in an
image or selected region of an image. Texture feature extraction like
Gabor and Gray-Level Co-occurrence Matrices (GLCM) feature
extraction is used here. for extraction of features of an image, the
system uses gabor filter. Each point is represented by local gabor filter
responses. A 2-D Gabor filter is obtained by modulating a 2-D sine wave
at particular frequencies and orientations with a Gaussian envelope. We
follow the notation in [5] [6]. The 2-D Gabor filter kernel is defined by,
iyx
yxyxyxf
kk
y
kk
x
kkk
)sincos(2exp.
)cossin()sincos(
2
1exp),,,(
2
2
2
2
Where, x and y are the standard deviations of the Gaussian envelope
along the x and y-dimensions, respectively. and k are the wavelength
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
28
and orientation, respectively. The spread of the Gaussian envelope is
defined using the wavelength . A rotation of the x – y plane by an angle
k result in a Gabor filter at orientation k. k is defined by,
nkknk ,..,2,1)1(
Where, n denotes the number of orientations. The Gabor local feature at
a point (x, y) of an image can be viewed as the response of all different
Gabor filters located at that point. A filter response is obtained by
convolving the filter kernel (with specific, k) with the image. Here we
use Gabor kernels with 2 orientation [0.7854 1.5708] and two scales [2
5]. A co-occurrence matrix is a matrix or distribution that is defined
over an image to be the distribution of co-occurring values at a given
offset. These extracts four features namely, calculated by the formulae;
,
,
, and
.
Second feature extracted is Shape. The shape parameters like major
axis length, minor axis width, Area, Rectangularity, Eccentricity,
EulerNumber, EquivDiameter, ConvexArea, and Orientation are some
features extracted.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
29
2.2 Classification
Classification is a computational procedure that sorts images into
groups ("classes") according to their similarities. Images can be similar
in all kinds of ways, but in EM-related image processing we use a very
strict measure of similarity that is based on a pixel-by-pixel
comparison: the mean squared difference, generalized Euclidean
distance. Two classifiers are adopted: First, the k-Nearest Neighbor (k-
NN), k-nearest neighbor algorithm (k-NN) is a non-parametric method
for classifying objects based on closest training examples in the feature
space. There are two major design choices to make: the value of k, and
the distance function to use. When there are two alternative classes, in
order to avoid ties the most common choice for k is a small odd integer,
for example k = 3. If there are more than two classes, then ties are
possible even when k is odd. Ties can also arise when two distance
values are the same. An implementation of k-NN needs a sensible
algorithm to break ties; there is no consensus on the best way to do
this. When each example is a fixed-length vector of real numbers, the
most common distance function is Euclidean distance:
Second classification technique is Probabilistic Neural Network (PNN).
Probabilistic neural networks (PNN) are forward feed networks built with
three layers. They are derived from Bayes Decision Networks. They train
quickly since the training is done in one pass of each training vector,
rather than several. Probabilistic neural networks estimate the
probability density function for each class based on the training
samples.
3. Experimental Results and Analysis
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
30
The experimental results are reported in terms of correctly classified
scenes of the testing dataset. The accuracy rate of Individual features
and fusion features of k-Nearest Neighbor are as shown in Table 1 and
Table 2, respectively.
k gabor glcm shape2 77.3 67.8 803 76.9 69.2 784 76.8 70.3 76.65 76.6 70.8 74.36 73.7 72.3 73.87 74.5 70.3 768 74.3 69.9 759 77.3 68.1 76.1
Table 1: Accuracy rate of Individual feature (k-NN)
k gab_glc gab_shap glc_shap all_comb2 79.3 77.4 66.7 77.43 76.9 74.8 65.6 74.84 76.7 74.6 65.3 74.65 76.6 74.4 65.3 74.46 73.7 71.2 65.6 71.37 74.5 72.2 65.9 728 74.3 72.4 64.9 72.59 75.3 72.7 66.1 72.7
Table 2: Accuracy rate of Fusion feature (k-NN)
The experimental results are reported in terms of correctly classified
scenes of the testing dataset. The accuracy rate of Individual features
and fusion features of Probabilistic Neural Network are as shown in
Table 3 and Table 4, respectively.
Train:Test GABOR GLCM SHAPE50 - 50 79.3 76 8060 - 40 80.8 75.8 80.8
Table 3: Accuracy rate of Individual feature (PNN)
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
31
Train:Test GAB_GLC GAB_SHA GLC_SHA ALL_COMB
50 - 50 79.3 79.3 76.7 78.7
60 - 40 80.8 78.3 77.5 77.5
Table 4: Accuracy rate of Fusion feature (PNN)
In this paper, the proposed model fails in k-NN classifiers. Therefore,
classification technique can be extended, by considering various
classifiers fused together to get better accuracy to the proposed
methodology.
4. Conclusion
In this work, we have used texture feature and shape feature for natural
scene classification. It was found that texture and shape feature were
effective for natural scene classification. The proposed model efficiently
handles individual feature extraction and fusing the features, by
employing k-Nearest Neighbor (k-NN) and Probabilistic Neural Network
(PNN). It is observed that the individual feature yields good results when
compared with fusion feature result. Experimental results and analysis
using our dataset show that the proposed method i.e., Probabilistic
Neural Network classifier achieves 80.8% significantly better natural
scene classification performance than the k-Nearest Neighbor 80% for
individual feature and fusion feature. The study revealed that the
proposed approach performs well. It is clear that the proposed work is
simple and effective to different type’s natural scene classification and
also it performed well for different variations of light intensity for
natural scene, thus it suits to classification technique effectively. This
method showed high rate of accuracy for PNN classifier i.e., 80.8%.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
32
Reference
[1] Fengcai Li, Guanghua Gu, and Chengru Wang: Scene categorization
based on integrated feature description and local weighted feature mapping, in: Computer and Electrical Engineering 38 (2012) 917-925
[2] Zhan-Li Sun, Deepu Rajan, and Liang-Tein Chia: Scene classification using multiple features in a two-stage probabilistic
classification framework, in: Neurocomputing 73 (2010) 2971-2979
[3] Deng and Jianhua Zhang: Combining Multiple Precision-Boosted
Classifiers for Indoor-Outdoor Scene Classification, in: The information science discussion paper series, may 2006, ISSN 1172-6024
[4] Jiebo Luo and Matthew Boutell: Natural scene classification using overcomplete ICA, in: Pattern Recognition 38 (2005) 1507-1519
[5] Hamamoto, Y., A Gabor Filter-based Method for Fingerprint Identification, “Intelligent Biometric Techniques in Fingerprint and Face
Recognition, eds. L.C. Jain et al”, CRC Press, NJ, pp.137-151, 1999.
[6] Resmana Lim, and M.J.T. Reinders, “Facial Landmark Detection
using a Gabor Filter Representation and a Genetic Search Algorithm”, proceeding of ASCI 2000 conference.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
33
Kannada Handwritten Word Recognition in Bank Cheque: A Study
Nandeesh P Asst. Professor, Department of Computer Science
JSS College for Women, Chamarajanagar [email protected]
ABSTRACT
This paper presents an automation of Kannada handwritten bank
cheque words recognition system in that main challenge is
recognition of handwritten words and signature. The
identification of Kannada handwritten words plays an important
role because in Kannada script many words have a same size and
same shape. The recognition of Kannada handwritten bank
cheque words and signature is an important application in banks
and other organizations. In this work we consider dataset
contains 50 documents for each documents contains the 119
Kannada handwritten words and 28 signatures totally consider
120 classes. Here we extracting the three different types of feature
namely Gabor, LBP (local binary pattern) and LPBV (local binary
pattern variance). The effect of each feature and their
Combination in the words and signature classification is analyzed
using the K-nearest neighbour classifiers. It is common to
combine multiple categories of features into a single feature
vector for the classification and also we apply the dimensionality
reduction technique. Calculated Classification results based on
feature extraction methods, varying the K values and randomly
splitting the testing and training samples.
Keywords Document Image Processing, Cheque analysis, Segmentation
------------------------ ---------------------------
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
34
1. INTRODUCTION
Kannada script is the visual form of Kannada language. It originated
from southern Bramhi lipi of Ashoka period. It underwent modifications
periodically in the reign of Sathavahanas, Kadambas, Gangas,
Rastrakutas, and Hoysalas. Even before seventh-Century, the Telugu-
Kannada script was used in the inscriptions of the Kadambas of
Banavasi and the early Chalukya of Badamiin the west. From the
middle of the seventh century the archaic variety of the Telugu-
Kannada script developed a middle variety. The modern Kannada and
Telugu scripts emerged in the thirteenth Century. Kannada script is
also used to write Tulu, Konkani and Kodava languages [3]. The bank
Cheques are still widely used all over the world for financial
transactions. Handwritten bank cheques are processed manually every
day in developing countries. In such a manual verification, people are
written information including signature, legal amounts in words present
on each cheque has to be visually verified. Here we review of
identification on Kannada handwritten bank cheque words. In this work
we consider the legal amount written in the words, amount written in
the numerical, account number and signature blocks. In that main
challenge is recognition of hand written words, recognizing a person
based on their signatures and recognizing the numeral, Since
identification of hand written words plays an important role in
analyzing the Kannada handwritten words. The recognition of
handwritten legal amount in words of Kannada language is challenging
because of similar size and shape of many words. Moreover many words
have same suffixes or prefix.
2. RELATED WORK
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
35
Jayadevan R, et al. [1] they proposed recognition technique is
combination of two approaches. The first approach is based on gradient
structural and cavity (GSC) features along with a binary vector
matching (BVM) technique. The second approach is based on vertical
projection profile (VPP) feature and dynamic time warping (DTW). A
number of highly matched words in both the approaches are considered
for the recognition step in the combined
approach based on a ranking scheme. The dataset has been grouped
into three sub-datasets namely DB1, DB2 and DB3. DB1 contains data
collected from 90 individuals in Marathi language where each individual
contributed 114 word templates and a hand written cheque. The DB1
has 10,260(114×90) handwritten words and 90 handwritten cheques in
Marathi language. DB2 also has data in Marathi language, collected
from 70 individuals with comparatively poor handwriting. DB2 has
7,980(114×70) handwritten words and 70 handwritten cheques. DB3
contains data in Hindi language collected from 80 individuals. Each
individual contributed 106 word templates and a handwritten cheque.
The DB3 has 8,480(106×80) handwritten words and 80 handwritten
cheques in Hindi language. The three sub-datasets collectively have
26,720 handwritten Devanagari words and 240 handwritten cheques.
The result is 55.2% to 80.23% dependent upon the 3 datasets.
Shreedharamurthy S K, et al. [12] they developed the Neural Network
based Kannada Numerals Recognition System in this paper a novel
approach for feature extraction in spatial domain to recognize
segmented Kannada numerals using artificial neural networks. They
develop the handwritten Kannada numeral recognition system using
spatial features and neural networks. Handwritten numerals are scan
converted to binary images and normalized. The features are extracted
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
36
using spatial coordinates and are classified using the feed forward
neural network classifier.They used spatial features and artificial neural
network as classifier. To recognizes the hand written Kannada
numerals. They have used 100 samples of numerals from the created
data base, sample patterns. The accuracy based Out of which 80
patterns used for training phase and 20 samples for testing phase.
Mehta M, et al. [2] they develop the Automatic Cheque Processing
System (English) they consider the forgery detection. An account holder
gives cheques to another person as account payee or self-cheque. It is
been observed that a number of forgery cases have been registered as
cheque forgery, where some person has forged the signature of another
person and provided a self-cheque to himself. In this paper we propose
a mechanism for
recognition of cheque fields, like name, amount and also verify the
signature and its authenticity. We propose a unique two stage model of
Automatic Cheque processing with detecting skilled forgery in the
signature by combining two feature types namely Sum graph and HMM
and classify them with knowledge based classifier and probability
neural network. We proposed a unique technique of using HMM as
feature rather than a classifier as being widely proposed by most of the
authors in signature recognition. The accuracy based on good correct
classification rate for any number of classes. The lowest rate of correct
classification is 86% and the highest is 92%.
Dhandra B.V et al. [17] they develop Zone Based Features for
handwritten and printed Mixed Kannada Digits Recognition they
consider the field of Optical Character Recognition (OCR), zoning is
used to extract topological information from patterns. They propose
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
37
Zone based features for recognition of the mixer of Handwritten and
Printed Kannada Digits. A digit image is divided into 64 zones and pixel
density is computed for each zone. This procedure is sequentially
repeated for entire zone. Finally 64 features are extracted for
classification and recognition. There could be some zone column/row
having empty foreground pixels. Hence the feature value of such
particular zone column/row in the feature vector is zero. The KNN
classifiers are used to classify the mixed handwritten and printed
Kannada digits. They have obtained 97.32% & 98.30% recognition rate
for mixed handwritten and printed Kannada digits by using KNN
classifiers respectively.
Dinesh Acharya U. [11] they proposed the Multilevel Classifiers in
Recognition of Handwritten Kannada Numerals The recognition of
handwritten numeral is an important area of research for its
applications in post office, banks and other organizations. In that paper
presents automatic recognition of handwritten Kannada numerals
based on structural features. Five different types of features, namely,
profile based 10-segment string, water reservoir; vertical and horizontal
strokes, end points and average boundary length from the minimal
bounding box are used in the recognition
of numeral. The effect of each feature and their combination in the
numeral classification is analyzed using nearest neighbor classifiers. It
is common to combine multiple categories of features into a single
feature vector for the classification. Instead, separate classifiers can be
used to classify based on each visual feature individually and the final
classification can be obtained based on the combination of separate
base classification results.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
38
Dhandra B.V. et al. [18] they develop a script independent approach for
handwritten bilingual Kannada and telugu digits recognition,
handwritten Kannada and Telugu digits recognition system is proposed
based on zone features. The digit image is divided into 64 zones. For
each zone, pixel density is computed. Feature extraction is a problem of
extracting the relevant information from the preprocessed data for
classification of underlying objects/characters. The preprocessed digit
image is used as an input for feature extraction. For extracting the
potential feature from the handwritten digit image, the frame containing
the preprocessed/normalized image is divided into non overlapping
zones of size 8 x 8 and obtained 64 zones. For each zone, the pixel
density is computed and there pixel densities are used as a feature for
recognition. Hence, 64 features vector is used for recognition of a digit.
The KNN and classifiers are employed to classify the Kannada and
Telugu handwritten digits independently and achieved average
recognition accuracy of 95.50%, 96.22% and 99.83%, 99.80%
respectively. For bilingual digit recognition the KNN classifiers are used
and achieved average recognition accuracy of 96.18%, 97.81%
respectively.
José Eduardo Bastos dos Santos. [7] they develop the Text Extraction in
Bank Cheque Images features are used to shape features, Mean,
Standard deviation, Skewness, Range, Solidity, Extend, and Area
feature, the subtraction approach means by subtracting empty cheque
from filled cheque. In subtraction approach the extracting information
is losing. It means geometrical distortions, alignment of cheque,
graphical security elements, accuracy is based on data sample the
result is 93%.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
39
3. APPLICATIONS
The applications can be found in the following areas,
Banking system : This system is used for money transactions and
withdrawn the amount
and surety.
Insurance companies: It used to pay the premiums, width draw
the annual amount.
Gold loan companies: In Gold loan companies are used to
repaying the loans.
Government office: In Government office issuing the salary to the
employees.
4. CHALLENGES
Segmentation of cheque
Identification of handwritten words
Identification of word denomination
Identification of numeral denomination
Comparison of word and numeral denomination
Person identification based on signature
Writer identification
5. MOTIVATION
Manual cheque processing is more time consuming
Processing handwritten text is challenging specially regional
languages
Only few works are reported in the literature on handwritten
word recognition in kannada
6. OBJECTIVES
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
40
Collection of dataset.
Pre-processing.
Identification of word and numeral denomination.
Study of suitable classifier models will be carried out in this work.
7. CONCLUSION
In this paper we worked recognitions of Kannada handwritten bank
cheque words. The recognition of Kannada handwritten words is very
difficult task because handwriting is different for different person.
8. REFERENCE [1] Jayadevan R., Kolhe S.R., Patil P.M. and Pal U., 2011. Database
development and recognition of hand written Devanagari legal amount words (Hindi). International Conference on Document Analysis and Recognition.
[2] Mehta M., Sanchati R. and Marchya A., 2010. Automatic Cheque
Processing System.
[3] Patil P.B. and Ramakrishna A.G., 2008. Word level multi-script
identification. Pattern recognition letters, Vol.29, pp 1218 – 1229. [4] Guru D.S, Ravikumar M, and Harish B.S 2011.A Review on Offline
Handwritten Script Identification International Journal of Computer Applications (0975 – 8878) on National Conference on Advanced
Computing and Communications. [5] Madasu V.K., Yusof M.H.M., Hanmandlu M and Kubik K 2010.
Automatic Extraction of Signatures from Bank Cheques and other Documents Intelligent Real-Time Imaging and Sensing group, School of Information Technology and Electrical Engineering,
University of Queensland, QLD 4072, Australia.
[6] Jayadevan R., Kolhe S.R., Patil P.M. and Pal U., 2011. Automatic processing of handwritten bank cheque images a survey.
[7] Santos J.E.B.D Text Extraction in Bank Cheque Images: A Prospective View 2010.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
41
[8] Mingqiang Y, Kidiyo K, and Joseph R A survey of shape feature extraction techniques (2008). Author manuscript, published in
Pattern Recognition, Peng-Yeng Yin (Ed.) 2008 43-90.
[9] Zhang D and Guojun Lu Review of shape representation and description techniques (2003). Pattern Recognition 37 2004.
[10] U. Pal, N. Sharma, F. Kimura, "Recognition of Handwritten Kannada Numerals", 9th International Conference on Information Technology
(ICIT'06), pp. 133-136, 2006.
[11] Dhandra B.V, Mukarambi G and Hangarge M Kannada and English
Numeral Recognition System 2011 International Journal of Computer Applications (0975 – 8887) Volume 26– No.9, July 2011.
[12] Shreedharamurthy S K and Sudarshana Reddy H.R. Neural
Network based Kannada Numerals Recognition System 2012. International Journal of Computer Applications (0975 – 8878) on National Conference on Advanced Computing and Communications
- NCACC, April 2012.
[13] Kruizinga P,Petkov N and Grigorescu S.E Comparison of texture features based on Gabor filters 1999. Proceedings of the 10th International Conference on Image Analysis and Processing, Venice,
Italy, September 27-29, 1999, pp.142-147. [14] Amayeh G, Tavakkoli A and Bebis G Accurate and Efficient
Computation of Gabor Features in Real-Time Applications 2008.
[15] Liao S, Zhu X, Lei Z, Zhang L, and Stan Z. Li Learning Multi-scale
Block Local Binary Patterns for Face Recognition 2011.Heikkil M, Ainen M.P and Schmid C.B Description of Interest Regions with Local Binary Patterns (2011). INRIA Grenoble, 655 Avenue de
l’Europe, 38330 Montbonnot, France.
[16] Ilonen J, Kamarainen J K, K¨alvi¨ainenEfficient H computation of Gabor features 2005. Department of Information Technology. Lappeenranta University of Technology, P.O.Box 20, FIN-53851
Lappeenranta, Finland.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
42
[17] Mallikarjun hangare and B V Dhandra., 2010. Offline handwritten script identification in document images. International journal of
computer applications. Vol 4, pp 6 – 10
[18] B. V. Dhandra, Mallikarjun Hangarge, Gururaj Mukarambi,Spatial Features for Handwritten Kannada and English Character Recognition Special Issue on RTIPPR-10, International Journal of
Computer Applications, pp.146-150, Aug-2010.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
43
A Review on Automation of Ayurvedic Plant Recognition
Pradeep Kumar N PG Department of Computer Science,
JSS College of Arts, Commerce and Science, Ooty Road, Mysore-25
Abstract
Plant classification is a more challenging task when compared to
classification of other categories such as face, objects etc The tribal
people in India classify plants according to their medicinal values. In
the system of medicine called Ayurveda, identification of medicinal
plants is considered an important activity in the preparation of herbal
medicines. We have obtained around 3000 images of medicinal plants,
in that the plants are classified into two types namely one is normal
regular medicinal plants another one is Bonsai medicinal Plants. In
Both the type around 70 Classes of Plant species each Classes with an
around 25 images are created. The images of medicinal plants are of
different pose, with cluttered background under various lighting
conditions and climatic conditions are used.
Keywords: Ayurveda, computer vision, database, Challenges
------------------------ ---------------------------
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
44
1. Introduction:
The tribal people in India classify plants according to their
medicinal values. In the system of medicine called Ayurveda,
identification of medicinal plants is considered an important activity in
the preparation of herbal medicines.
Ayurveda, the science of life, prevention and longevity is the
oldest and most holistic medical system available on the plant today.
From over 5000 years ago in India, it was said to be a world medicine
dealing with both body and the spirit. Ayurvedic medicine, in the United
States, is an “alternative” medical practice that claims it is based on the
traditional medicine of India.
Ayurveda is derived from the two Sanskrit terms: ayu meaning life
and veda means Knowledge or science. Ayurveda, “The knowledge for
long life”. Ayurveda has long been the main system of health care in
India, although conventional (Western) medicine is becoming more
widespread there, especially in urban areas. About 70 percent of India’s
population lives in rural areas; about two-thirds of rural people still use
Ayurvedic medicinal plants to meet their primary health care needs.
Medicinal plants form the backbone of a system of medicine
called Ayurveda and is useful in the treatment of certain chronic
diseases. Ayurveda is considered a form of alternative to allopathic
medicine in the world. This Indian system of medicine has rich history.
Ancient epigraphic literature speaks of its strength. Ayurveda certainly
brings substantial revenue to India by foreign exchange through export
of Ayurvedic medicines, because of many countries inclining towards
this system of medicine. Considerable depletion in the population of
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
45
certain species of medicinal plants, we need to grow these plants in
India.
It is necessary to make people realize the importance of medicinal
plants before their extinction. It is important for Ayurveda practitioners
and also traditional botanists to know how to identify and classify the
medicinal plants through computers.
As the project is related to development of design methodologies
of Image Processing, Pattern Recognition and also Knowledge
about the Taxonomy of Ayurvedic Plants.
The proposal is an interdisciplinary where two Different fields viz.,
Computer Science and AYURVEDA are brought together for the
purpose of preserving information about Ayurveda Plants for
Future Generation.
Medicinal plants are classified based on internal and external
features. The external features of plants are helpful in their
identification. According to the plants taxonomy, we find classification
of plants based on the shapes of their leaves and flowers. But plant
classification is based on color histogram, edge direction, edge
histogram is not being attempted by human beings.
2. Related work
S M Patil proposed content based image retrieval system using
color texture and shape features. They use 700 images to train the
system. They used the color histogram to represent the color
distribution of the image. They use HSV color model. They use recall
and precision measures for calculate the performance measures.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
46
Dubey R S et al., proposed a multi feature content based image retrieval
system. They use color histogram to representation of distribution of
colors in an image. They use RGB and HSV color space models. They
use the first order moment mean, second order moment standard
deviation, third order moment skewness.
Here they use color histogram, color moments, texture and edge
histogram. To extract the edge features in the image block is to apply
the digital filters in spatial domain. Divide the image into 4 sub block
and assigning the average gray levels for that sub block and represents
the filter coefficients for vertical, horizontal, 45 degree diagonal and 135
degree diagonal. They use Euclidean distance measure as a similarity
measure. They combine all
four features in this work. They use 11 images for this work and they
use 1 image as a testing data and others are training images. They got
78% of accuracy for the color histogram with the 10th image.
Singh S. M. and Hemechandra K. worked on the content based image
retrieval using color moments and Gabor texture feature. In this they
use low level color features and low level texture features. For extraction
of color feature they convert RGB image to HSV color space. Divide the
images into three equally horizontal non overlapping regions. Extract
the moments from those regions. Then calculate the distance between
query image and image in database. They use the Canberra distance
measure for calculate the distance. They got the accuracy of 43.6% with
the color moments with the whole image and the accuracy of 59.0%
with color moments with dividing the image into three equal non
overlapping regions.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
47
Bhuravarjula H.H.P.K and Kumar V.N.S.V proposed a novel content
based image retrieval using variance color moment. In this work the
creation and digitization of images and image retrieval have become
easier, huge image databases have become more popular. The area of
retrieve images based on the visual content of the query picture
intensified recently, which demands on the quite wide methodology
spectrum on the area of image processing. Content Based Image
Retrieval (CBIR) has therefore evolved into necessity. Due to the
increased garbage value it is very important to design a CBIR system to
retrieve images from the database in a very efficient manner. In this
paper we are going to propose a color image retrieval method based on
the primitives of color moments. At the starting stage the image is
divided into four segments. Then the color moments of all segments are
extracted and clustered into four classes. At the next stage we will
consider the mean moments of each class as a primitive of the image.
All the primitives are used as features and each class mean is merged
into a single class mean. The distance between the input query image
mean with the corresponding database images are calculated by using
SAD method. The analysis results proved that the CBIR using our new
method has the better performance than the existing method.
Husin et al.,(2012) introduced Embedded portable device for herb leaves
recognition using image processing techniques and neutral network
algorithm. This paper has presented a device capable of recognize herbs
species recognition and classification based on leaves structural
characteristics. A novel individual leaf extraction computer program
was developed based on grayscale, canny edge detector and back
propagation neural network algorithm. With the use of computer and
embedded system automated classification of herbs leaves plant
becomes more convenient, and efficient. By using BPNN, the rapid
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
48
recognition for twenty kinds of herbs species leaves was realized and
the average correct recognition rate reached 98.9% . The training set
contains minimum 30 species for each type of leaf in each data file. The
larger the number of species used in training set, the higher the
number of output nodes thus enhances the recognition ability.
Guru D.S., et al investigate the effect of texture features for the
classification of flower images. A flower image is segmented by
eliminating the background using a threshold basedmethod. The
texture features, namely the color texture moments, gray-level co-
occurrence matrix, and Gabor responses, are extracted, and
combinations of these three are considered in the classification of
flowers. In this work, a probabilistic neural network is used as a
classifier. To corroborate the efficacy of the proposed method, an
experiment was conducted on our own data set of 35 classes of flowers,
each with 50 samples. The data set has different flower species with
similar appearance (small inter-class variations) across different classes
and varying appearance (large intra-class variations) within a class.
Also, the images of flowers are of different pose, with cluttered
background under various lighting conditions and climatic conditions.
The experiment was conducted for various sizes of the datasets, to
study the effect of classification accuracy, and the results show that the
combination of multiple features vastly improves the performance, from
35% for the best single feature to 79% for the combination of all
features. A qualitative comparative analysis of the proposed method
with other well-known existing state of the art flower
classification methods is also given in this paper to highlight the
superiority of the proposed method.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
49
Yu et al., introduced a color texture moments for content-based image
retriveval(CBIR). They have proposed a novel low-level feature CTM for
content-based image retrieval systems. They adopt LFT as a texture
representation scheme and derive eight characteristic maps for
describing different aspects of co-occurrence relations of image pixels.
Then they calculate the first and second moments of these maps as a
representation for the distribution of natural color image pixels. They
operate the LFT in the color space since it not only corresponds to
visual perception but also overcomes some shortcomings of the HSV
color space. Experiments on an image library containing 10,000 Corel
images and 200 queries demonstrate the effectiveness of the new
method.
Singh and Hemechandra worked on the content based image retrieval
using color moments and Gabor texture feature. In this they use low
level color features and low level texture features. To extracting the low
level texture features they apply Gabor filters on the image with 4 scale
and 6 orientations. They obtained an array of magnitudes. Mean,
standard deviation of magnitudes are used to create texture feature.
Canberra distance measure is used for computing the distance. For this
work they use WANG database that containing 1000 corral images are
in the JPEG format. In this there are 10 category and 100 images with
each category. They combine both feature of color and texture. For
measuring the performance they use precision and recall measures.
They got the accuracy of 43.6% with the Gabor texture feature and
considering the whole image as well as the dividing of image into three
equal non overlapping regions. They got the accuracy of 61.0% with
combining the Gabor texture feature and color moment feature along
with the image is divided into three equal non overlapping regions.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
50
Wu et al., worked on a leaf recognition algorithm for plant classification
using probabilistic neural network. In this paper, they employ
Probabilistic Neural Network (PNN) with image and data processing
techniques to
implement a general purpose automated leaf recognition for plant
classification. 12 leaf features are extracted and orthogonalized into 5
principal variables which consist the input vector of the PNN. The PNN
is trained by 1800 leaves to classify 32 kinds of plants with an accuracy
greater than 90%. Compared with other approaches, our algorithm is
an accurate artificial intelligence approach which is fast in execution
and easy in implementation.
In the preprocessing technique they convert RGB image into gray scale
image and then converted into binary image. Then for enhancing the
boundary of the image they apply smoothing and then filtered using the
Laplacian filter of 3*3 spatial masks.
In this work they extract five basic geometric feature. They are,
Diameter(the longest distance between any two points on the leaf
margin), Physiological length(the distance between the two terminals
(main vein of leaf)), Physiological width(the longest distance between the
points on intersection pairs),Leaf area(the number of pixels of binary
value one on the smoothed leaf image) and Leaf perimeter(the number
of pixels consisting of leaf margin).
Also Based on these 5 basic features, they can define 12 digital
morphological features used for leaf recognition.They are smooth factor,
aspect ratio, factor form, rectangularity, narrow factor, perimeter,
perimeter ratio of physiological length and width and five vein features.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
51
They extract the 12 digital morphological shape features are derived
from 5 basic geometric features. To reduce the dimension of the input
vector they use principle component analysis (PCA). The probabilistic
neural network is used as a classifier. It simulates the thinking process
of human brain and it has the fast training speed. Hence they got the
90% of accuracy.
Arribas et al., introduced leaf classification in sunflower crops by
computer vision and neural networks.In this article, they present an
automatic leaves image classification system for sunflower crops using
neural networks, which could be used in selective herbicide
applications. The system is comprised of four main stages. First, a
segmentation based on rgb color space is performed. Second, many
different features are detected and then extracted from the segmented
image. Third, the most discriminable set of features are selected.
Finally, the Generalized Softmax Perceptron (GSP) neural network
architecture is used in conjunction with the recently proposed
Posterior Probability Model Selection (PPMS) algorithm for complexity
selection in order to select the leaves in an image and then classify
them either as sunflower or non-sunflower. The experimental results
show that the proposed system achieves a high level of accuracy with
only five selected discriminative features obtaining an average Correct
Classification Rate of 85% and an area under the receiver operation
curve over 90%, for the test set. From the literature survey it is
understood that color, texture, shape features play a major role in plant
classification.
They propose 13 morphological features : Perimeter, Centroid, Area ,
Major axis of the best fit ellipse, Minor axis of the best fit ellipse, Height,
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
52
Width, Area to length ratio, Compactness, Ratio of area to the perimeter
squared, Elongation, Logarithm height/width ratio, Perimeter to
broadness (PTB), Length to perimeter ratio.
In this literature survey we clearly known there is huge research are
done related to Plant Classification based on the Flower and Leaf part.
The research related to Plant Classification based on the entire Plant
Portions is very less and they get Low Accuracy rate compare to other.
Hence I motivated and done a small attempt.
The most challenging task in recognition of ayurveda plats are as
follows
1. Segmentation of plants from the background
2. Selection of suitable features for identification of plants
3. Study of suitable classifiers
4. Data set also pose a challenge as there is no standard dataset for this
purpose.
Hence in this work our main interest is to study the effect of
shape, color and texture feature in plant classification with the following
objectives:
Creation of large DATABASE of Medicinal Plants.
The collecting of images is the main task in this work. For this we
collect more than 3000 images. The image acquisition or collection is a
challenging task. Because of uncontrolled environment and climatic
condition the lighting and brightness are changes with respect to the
seasonal change.
3. Ayurvedic Plant Database
Creation of ayurvedic database involved two stage, names
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
53
1. Image Acquisition.
2. Plant Segmentation.
3.1. Image Acquisition
We have obtained around 3000 images of medicinal plants using the
color Digital Camera having resolution of 8 Mega pixels. The views in all
the 8 directions, namely right, left, top, bottom, and other diagonals are
obtained. The images of plants are captured in such a way that the
plant trunk and leafy portion are more significant with clear difference
in color and edge pattern.
3.2. Plant Segmentation
Before features are extracted from the plant image , the plant has to be
segmented. The images of medicinal plants are filtered by Manually to
remove any noise introduced at the time of acquisition of images. The
goal is to automatically segment out the plant given only that the image
is known to contain a plant, but not other information on the class or
pose. Plants in images are often surrounded by greenery in the
background.
33. Dataset Information
The images of different medicinal plants are considered in this
work. We have collected plants like Tulasi(Ocimum Sanctum), Dodda
Patre(Cuban
Oregano), Aloe Vera(Laval sara), Ekki(Gigantea), Tumbe(Dronapuspi),
Vajravalli etc.. These plants are very useful in daily life has medicine.
3.2.1. Dataset Collection
In this project work we have created our own database. In order
to create the database we have took photographs of plants that are
found in and around of Mysore city, Karnataka, India.
Table.1 Dataset Collection Information
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
54
3.2.2. Regular medicinal plants database
Table.2 Regular Medicinal Plants Dataset List
REGULAR MEDICINAL PLANTS DATABASE
Sl . No. Scientific Name Kannada Name
1 Acacia Catechu Kaggali
2 Adhatoda Vasica Aadusoge
3 Alangium Lamarckii Ankole Mara
4 Aloe Vera Loka Sara
5 Alpinia Officinarum Sannaraashme
6 Annona squamosa Seethaphala
7 Areca Catechu Adake
8 Artabotrys odouratissimus Manoranjini
9 Asparagus Adscendens Safeidh Musalee
10 Asparagus Racemosus Aashaadhi beru
11 Averroha Carambola Kamaraakshi
12 Azadirachta Indica (Neemb) Bevu
13 Bauhinia Tomentosa Kanchuvala
14 Calatropis Procera Yakka
15 Calophyllum Inophyllum SuraHonne
16 Cassia Auriculata Aavarike
17 Cassia fistula Kakke
TYPE NO. OF CLASSES NO. OF IMAGES
PER CLASS
Regular Ayurvedic Plants
72 [ 20-25 ]
Bonsai Plants 67 [ 20-25 ]
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
55
18 Citrus Medica Maadhala
19 Clerodendron Infortunatum Vishaprahari
20 Clitorea Terneatea Girikarnike
21 Coleus Spicatus Dhoddapathri
22 Dalbergia Latifolia Thodagatta or Beete
23 Datura Alba Bili Ummatti
24 Eclipta Alba Garagadha Soppu
25 Emblica Officinalis Nelli
26 Eriodendron Anfractuosum BiliBhooruga
27 Eugenia Jambolana Jamnerele
28 Ficus Religiosa Arali
29 Gardenia Gummifera Dikkaamali
30 Gauzuma Tomentosa Bhadhraaksha
31 Gmelina Arborea Shivani
32 Grewia Asiatica Dhadasala hannu Phaalasaa
33 Hibiscus Rosa Sinensis Dhasavaala
34 Ixora Coccinea Kepala
35 Lawsonia Alba Gowrantee
36 Moringa oleiera Nugge
37 Murraya Koenigii Karibevu
38 Musa Sapientum Baale
39 Myrtus caryophyllata Lavanga
40 Ocimum Basilicum Tulasi
41 Psidium Guajava Sheebe
42 Pterocarpus Santalinus RakthaChandhana
43 Pterospermum Acerifolium Kanakachampa
44 Punica Granatum Dhalimbe
45 Putranjiva Roxburghii Puthrajivi
46 Santalum Album Shri Gandha
47 Sapindus Trifoliatus Antuvaala
48 Saraca Ashoka Ashoka
49 Solanum Indicum Heggulla
50 Solanum Trilobatum Habbu Sonde gida
51 stereosppermum suavelens Phaadhari
52 Streblus Asper Mettlimara
53 Tabebuia Rosea NA
54 Tamarindus Indica Hunuse
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
56
55 Terminalia Tomentosa Karimatti
56 Thevetia Nerifolia Haladhi Kanigale
57 withania Somnifera Hiremaddhu
58 NA Amrutha Balli
59 NA Ala
60 NA Bhraamhi
61 NA Thengu
62 NA Dharbe
63 NA Gasagase
64 NA Halasu
65 Zizyphus Jujuba Elachi
66 NA Maavu
67 NA Nimbe
68 NA Papaya(parangi)
69 NA Shunti
70 NA Thumbe
71 NA Vajravalli
72 NA Veelya
3.2.3. Bonsai plants database
Table.3 Bonsai Plant Dataset List
BONSAI PLANTS DATABASE
Sl .
No. Scientific Name Kannada Name
1 Acacia Catechu Kaggali
2 Acacia Catechu Kaggali
3 Akasha Mallige NA
4 Almonda Hindhola
5 Aloe Barbadensis Kumaari
6 Angle Marmelos Chitha(Billwa)
7 Aralia Cordata NA
8 Artocarpus Heterophyllus Halasu
9 Australian Ficus NA
10 Australian Natalensis Brundhavana Saaranga
11 Borassus Flabellifer Toddy Palm
12 Bougainvillea NA
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
57
13 Brazilan Rain Tree NA
14 Candle Tree NA
15 Casuarina Equisetifolia NA
16 Cicca Acida NA
17 deva parijatha deva manohari
18 Divi Divi Kalyani
19 Eragrostis Cynosuroides Darbe
20 Ficus Apolo NA
21 Ficus Bengalensis NA
22 Ficus Benjamina NA
23 Ficus Blackenea NA
24 Ficus Bodhi NA
25 Ficus Curlie NA
26 Ficus Glomerata (Vrushabha) NA
27 Ficus Hispida Vajra kanthi
28 Ficus Infectoria NA
29 Ficus Infectoria Basari(Uthara)
30 Ficus Jaquinia NA
31 Ficus Jaquinia on rock NA
32 Ficus Jaqvinia NA
33 Ficus Lipstic NA
34 Ficus Lipstick (Cascade) NA
35 Ficus Long Island Hari Narayani
36 Ficus Microcarpa NA
37 Ficus Mysorensis NA
38 Ficus Natalensis NA
39 Ficus Nuda NA
40 Ficus Pilkhan NA
41 Ficus Religiosa NA
42 Ficus Retusa Rasika ranjini
43 Ficus Retusa Vajra kanthi
44 Ficus Specis NA
45 Ficus Tallboti NA
46 Grevillea Robusts Ralia NA
47 Jaquina Ruscifolia NA
48 Juniperus (semi Cascade) NA
49 Kannada gowla NA
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
58
50 kirni punnagavaraali
51 Christmas Tree Malaya Maarutha
52 Mangifera Indica NA
53 Mangifera Indica Maavu
54 Mimsops Elengi Ranjalu
55 Phyllanthus distichus lavali
56 Phyllanthus Emblica Betta nallikayi
57 Podocarpus Polystachyus NA
58 Punarvasu Bambo (Bhidhiru)
59 Sand Paper Ficus NA
60 Sapota (Variegated) NA
61 Saraca Indica Ashoka
62 Sceffera Arboricola Cultivar NA
63 Schefflera NA
64 Sonefflera arboricola NA
65 Spondias Mangifera Aamate(Hastha)
66 Tamarindus Indica NA
67 Vitis Quadrangularis Asthi Shrunkala
Conclusion
In this work we have created a database of two different classes.
The images are captures and segmented. The database is made
publically available.
REFERENCES
Anami B.S., Nandyal S.S. and Govardhan A., 2010. A combined color, texture and edge features based approach for identification
and classification of Indian medicinal plants, International Journal of Computer Applications, Vol.6, No.12, pp. 0975-8887.
Wu S.G., Bao F.S., Xu E.Y., Wang Y.X. and Xiang Q.L., 2007. A
Leaf Recognition Algorithm for Plant Classification Using
Probabilistic Neural Network.
Yu H., Li M., Zhang H. J. and Feng J. Color Texture Moments For Content-Based Image Retrieval.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
59
Husin Z., Shakaff A.Y.M., Aziz A.H.A., Farook R.S.M., Jaafar
M.N., Hashim U. and Harun A., 2012, Embedded portable device for herb leaves recognition using image processing techniques
and neural network algorithm, Computers and Electronics in Agriculture Vol.89, pp.18–29.
Arribas J. I., Sanchez-Ferrero G. V., Ruiz-Ruiz G. and Gomez-Gil J., 2011, Computer and Electronics in Agriculture Vol.78, pp.9-
18.
Patil S.M., International Journal of Computer Science &
Engineering Technology.Dubey R.S., Choubey R. and Bhattacharjee J., 2010, International Journal on Computer Science and Engineering Vol.o2, No.06, 2149.
Singh S.M. and Hemachandran 2012, International Journal of
Computer Science Issues, Vol. 9, Issue 5, No.1, September.
Bhuravarjula H.H.P.K. and Kumar V.N.S.V., 2012, A novel
content based image retrieval using variance color moment. International Journal of Computer and Electronics Research
Vol.1, Issue.3.
Guru D.S., Kumar Y.H.S., and Manjunath S., Texture features in
flower classification. Mathematical and Computer Modelling 54 (2011), pp.1030–1036.
Guru D.S., Mallikarjuna.P.B., Manjunath S and Shenoi M.M, 2012 Intelligent Automation and Soft Computing, Vol. 18, No. 5,
pp. 577-586.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
60
. A Review on Neurological Disorders
Mr. Maheswara Prasad1, Dr. Manjunatha Rao L2 Research Scholar, CMJ University, Meghalaya, India
Abstract:
Major Neurodegenerative diseases are like defining the continent of
Europe: part history, part science, part politics, and to cap it, both
could have an effect on health and prosperity. A big advantage of the
term is that it is a concept that patients can relate to from parallels in
everyday life. Wearing out in time of certain components sometimes
replaceable, sometimes not encompasses principles of selective
neuronal death as a primary event with age as a major risk factor and
good remedies patchy.
Though the causes may differ, patients with neurodegenerative
disorders are likely to show localized to generalized atrophy of the brain
cells leading to compromise in both mental and physical functions.
Mentally, the patients will exhibit forgetfulness, poor memory, decrease
in mental capacity, emotional disturbance, poor speech, etc. Physically,
the patients will exhibit partial to complete incontinence, aspiration of
food particles, tremor, poor balance, muscle rigidity, muscle paralysis,
etc. These decreases in mental and physical functions dramatically
reduce the quality of life for the patients and increase the burden on the
family and care-takers.
------------------------ ---------------------------
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
61
1.0 Introduction:
Defining neurodegenerative diseases is like defining the continent of
Europe: part history, part science, part politics, and to cap it, both
could have an effect on health and prosperity. A big advantage of the
term is that it is a concept that patients can relate to from parallels in
everyday life. Wearing out in time of certain components sometimes
replaceable, sometimes not encompasses principles of selective
neuronal death as a primary event with age as a major risk factor and
good remedies patchy.
Though the causes may differ, patients with neurodegenerative
disorders are likely to show localized to generalized atrophy of the brain
cells leading to compromise in both mental and physical functions.
Mentally, the patients will exhibit forgetfulness, poor memory, decrease
in mental capacity, emotional disturbance, poor speech, etc. Physically,
the patients will exhibit partial to complete incontinence, aspiration of
food particles, tremor, poor balance, muscle rigidity, muscle paralysis,
etc. These decrease in mental and physical functions dramatically
reduce the quality of life for the patients and increase the burden on the
family and care-takers.
1.1. Trace elements.
There are three groups into which the elemental constituents of a
biological material can be placed. These are the major, minor and the
trace elements. The major and minor elements make up 99% of the
total constituents of biological matter while the remaining 1% are
known as the trace elements. The trace elements act primarily as
catalysts in enzyme systems of cells where they serve a wide range of
functions[Val71].
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
62
1.1.1. Essential and non-essential trace elements.
There are approximately twenty six out of the ninety naturally occurring
elements which are known to be essential for animal and human life.
[Und77]. The majority of trace elements essential to life lie between the
atomic numbers 23 through 34. Many definitions have been formulated
to determine whether elements are essential or not such as Mertz who
stated that, "An element is to be considered to be essential if its
deficiency consistently results in impairment of a function from optimal
to sub-optimal" [Mer70]. Cotzias [Cot67] also gave a comprehensive
definition which lists six sets of criteria to determine whether or not a
trace element is essential.
1.1.2. Toxic trace elements.
As well as the essential and non-essential trace elements there is a
further group of ‘toxic' trace elements. The definition of a toxic element
is complicated as elements which are essential to everyday life can also
be deemed toxic when they are either too high or too low in
concentration. This can change elements such as iron, iodine and
copper from being essential under normal conditions, into toxic
elements and categorise them with lead, cadmium and mercury which
have potentially toxic properties at even the lowest concentrations.
These practices have a disastrous effect on the long-term welfare of
human and animal populations. An example of essential elements that
are known to have a toxic/detrimental effect at high or low
concentrations are, iron, iodine and copper. Iron deficiency causes
anaemia and iodine is associated with goitre [Hey84]. But it must
always be remembered that ‘safe' dietary levels of these potentially toxic
trace elements also exist. The required concentration of an element for
‘normal' function can also vary depending on the extent to which other
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
63
elements that affect their absorption and retention are present. These
considerations apply to all the trace elements to varying degrees but
some have more affect than others.
1.2 PARKINSON’S DISEASE
1.2.1 Introduction : Parkinson's disease may be one of the most baffling
and complex of the neurological disorders. Its cause remains a mystery
but research in this area is active, with new and intriguing findings
constantly being reported.
Parkinson's disease belongs to a group of conditions called motor
system disorders. The four primary symptoms are tremor or trembling
in hands, arms, legs, jaw, and face; rigidity or stiffness of the limbs and
trunk; bradykinesia or slowness of movement; and postural instability
or impaired balance and coordination. As these symptoms become more
pronounced, patients may have difficulty walking, talking, or
completing other simple tasks.
1.2.2. Causes : Parkinson's disease occurs when certain nerve cells, or
neurons, in an area of the brain known as the substantia nigra die or
become impaired. Normally, these neurons produce an important brain
chemical known as dopamine. Dopamine is a chemical messenger
responsible for transmitting signals between the substantia nigra and
the next "relay station" of the brain, the corpus striatum, to produce
smooth, purposeful muscle activity. Loss of dopamine causes the nerve
cells of the striatum to fire out of control, leaving patients unable to
direct or control their movements in a normal manner.
1.2.4. Treatment: Even for an experienced neurologist, making an
accurate diagnosis in the early stages of Parkinson's disease can be
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
64
difficult. There are, as yet, no sophisticated blood or laboratory tests
available to diagnose the disease. The physician may need to observe
the patient for some time until it is apparent that the tremor is
consistently present and is joined by one or more of the other classic
symptoms. Since other forms of parkinsonism have similar features but
require different treatments, making a precise diagnosis as soon as
possible is essential for starting a patient on proper medication.
We believe that computational models will play in increasingly
important role in understanding the pathophysiology of movement
disorders such as Parkinson’s disease. And a number of groups are
beginning to apply methodologies used in understanding central pattern
generators and neuronal oscillations to the study of Parkinson’s tremor.
These studies mayyield insights that will eventually lead to better
treatments for these disorders.
The National Institute of Neurological Disorders and Stroke of The
National Institutes of Health. NIH Publication No. 94-139. Parkinson's
Disease: Hope Through Research. September 1994. Last revised
September 15, 1999. (Online)
http://www.ninds.nih.gov/patients/disorder/parkinso/pdhtr.htm"
1.3. Alzheimer's Disease:
1.3.1. Introduction: Alzheimer’s disease (AD) is a degenerative brain
disorder that affects 3-4 million Americans and accounts for over 70%
of all cases of dementia (Katzman, 1986). The disease is slowly
progressive; rates of progression from initial symptoms to end stage
dementia range from 2 to 25 years, with most patients in the 8 to 12
year range.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
65
AD begins slowly. At first, the only symptom may be mild forgetfulness.
People with AD may have trouble remembering recent events, activities,
or the names of familiar people or things. Simple math problems may
become hard for these people to solve. Such difficulties may be a
bother, but usually they are not serious enough to cause alarm. AD
may have a very long preclinical course of 20 or more years with
biochemical and pathological changes preceding clinical symptoms.
Scientists at research centers across the country are trying to learn
what causes AD and how to prevent it. They also are studying how
memory loss happens. They are looking for better ways to diagnose and
treat AD, to improve the abilities of people with the disease, and to
support caregivers. Currently, no treatment can stop AD. However, for
some people in the early and middle stages of the disease, medications
including tacrine, donepezil, and velnacrine may alleviate some
cognitive symptoms.
1.4. Primary Health Care Models
People with learning disabilities are susceptible to many physical
illnesses, affecting virtually all organs and bodily systems. The
prevalence of such disorders is greater than that in the general
population and can have a considerable impact on the life of a person
with learning disabilities. Significant emotional and/or behavioural
disturbance and loss of adaptive skills may result. Subsequently
significant physical morbidity may remain inadequately treated.
Roy and Martin (1998) review the models of primary health care which
have been used so far. They are :
1.4.1. General Practitioner lead approach
Howells (1986 ) offered health checks to people with learning disabilities
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
66
attending a training centre and found a significant number of
unmanaged physical disorders.
Kerr (1996) did a comprehensive health check for 28 people with
learning disabilities and compared them to matched controls in a
practice based study. The outcome was that the study group received
less of the regular screening i.e. immunisation and cervical cytology but
had more outpatient appointments and saw more specialists.
1.4.2. Specialist led approach
In this model, the specialist, usually the Consultant Psychiatrist, took
the lead in health checks. The study by Wilson and Haire (1990) in
Nottingham amongst people with learning disabilities attending an
adult training centre is an example of this model. Beange (1995) carried
out a community based study in the Northshore district of Sydney,
Australia .As in other studies, the higher incidence of unmanaged
health problems was demonstrated in both .
1.4.3. Collaborative Models
In this model, there is collaboration between the primary health care
team and the specialist services to provide comprehensive health
checks. In the first of these, a facilitator co-ordinated people with
learning disabilities having a health check at their own general
practitioner’s surgery (Martin et al, 1997). Bollard ( 1998) discusses a
model where the Community Learning Disability Nurse’s role was
extended to work with the primary health care team. Health checks
offered at GP practices were performed mostly by practice nurses.
Cassidy et al ( 1998) describe joint clinics for people with learning
disabilities where physical and psychiatric health checks are offered
during a single visit to the surgery.
In order to reduce the undetected health problems which people with
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
67
learning disabilities have, there needs to be further training for medical
students and general practitioners. Plant ( 1997 ) demonstrated by a
confidential postal questionnaire that general practitioners often lacked
confidence in caring for people with learning disabilities. There is some
confusion about what constitutes a learning disability, degrees of
learning disability, the health needs that people with learning
disabilities have and configuration of specialist services though the
general practitioners are often in a position of knowing a great deal
about these individuals’ social situations ( Whitfield et al, 1996 ;
Marshall et al, 1995 ).
(J Geriatr Psychiatry Neurol 2002; 15:38–43).
1.7. Summary:
The success of applying computational methods to understanding
neurological disease will depend upon a number of factors. Models
should be sufficiently detailed to capture the effects of the major
contributing anatomical, physiological and pharmacological processes.
However, once an understanding of the system is gained, the model
should be simple enough to allow interpretation of its results. Whereas
most modeling efforts culminate with reproduction of some subset of
the known data, it is more valuable to use the model to try out new
experimental predictions.
In general, the advantage of incorporating more biological detail into a
model is the level of detail at which predictions can be made. The major
disadvantage is the increase in model complexity as it becomes more
realistic.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
68
The choice of level always depends upon the particular problem,
however, for many problems; a safe choice may be to construct the
model at the level of integrate-and-fire units. Such units sum their
inputs and generate an individual spike of activity whenever the firing
threshold is exceeded. A growing body of evidence suggests that the
temporal dynamics of cell activity is critical to neural function.
Perhaps the greatest challenge to the computational approach is to
begin to explain how functional behavior emerges from the operation of
cellular-level processes. This approach may reveal unsuspected
common mechanisms operating in different disease processes (relations
between Parkinsonism and Schizophrenia, or related problems in
Alzheimer’s disease, epilepsy and dyslexia). The goal is to move from the
current situation, in which the “standard model” for a disease process is
a flow chart of interconnections between brain regions, to a conceptual
model that integrates, through simulations, the wealth of information at
the molecular, cellular, network, systems and behavioral levels.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
69
Machine vision system for identification
of diseases in mulberry plants: A Review
Chaithra D R
PG Department of Computer Science, JSS College of Arts, Commerce and Science, Ooty Road, Mysore-25
Abstract
Machine vision has many applications in day to day life. Specially, the
research community has given importance to the field of agriculture to
develop machine vision based systems. In this direction, we are going to
present a overview of development of machine vision system for
identification of diseases in mulberry plants specially leaves. The work
carried out in this and similar area is presented in this paper. Along
with this, the challenges need to be addressed for development of
machine vision based system for identification of diseases in mulberry
plants is also presented in this paper.
Keywords: Machine vision system, mulberry plant, disease identification
------------------------ ---------------------------
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
70
Introduction
Human vision means seeing of object or image or any picture, he can
take decisions about those. In the same way computer can also having
vision power to take decisions. The human can only give that power to
the system. Computer vision is a field that includes methods for
acquiring, processing, analysing, and understanding images and in
general, high-dimensional data from the real world in order to produce
numerical or symbolic information, e.g., in the forms of decisions.
The machine vision system has applications in different variety fields
such as medical field, agriculture field, industries field, biometrics,
military field, banking application and so on. This system has major
applications in agriculture. Classification of plants means identify and
classify the plant in given set of different species of plants is one
application. The other application is identification of diseases in crops.
Another application is identification of medicinal plant. Identification of
leaves of different plant is also one application and so on.
The dramatic and worldwide spread of plant diseases is the driving force
and motivation in the development of machine vision systems to identify
these diseases. The naked eye observation of experts is the main
approach adopted in practice for detection and identification of plant
diseases. However, this requires continuous monitoring of experts
which might be prohibitively expensive in large farms. Further, in some
developing countries, farmers may have to go long distances to contact
experts, this makes consulting experts too expensive and time
consuming. There is a need for systems that can help crop producers
and farmers, particularly in remote areas, to identify early symptoms of
plant disease by means of analyses of digital images of crop samples.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
71
The success of machine learning for image pattern recognition also
suggests applications in the area of identification of plant
diseases looking for fast, automatic, less expensive and accurate
method to detect plant disease cases is of great realistic significance.
Disease identification in plants takes prominent role in the field of
agriculture. The different diseases are causes to plants from different
reasons. Because of disease causing agents like virus, fungi, bacteria,
because of flying insects, because of deficiency of nutrition in soil such
as iron, nitrogen, zinc, potash and so on. The diseases are affected at
different region of the plant as leaves, stems, and roots so on. These
different affected parts of the plants are given to the system, the system
that identifies the diseases.
Sericulture plays a major role in the field of agriculture in India. This is
the commercial unit in India. In this field mulberry plant takes major
part. Mulberry plant is hard, perennial, deep rooted plant. There are
several varieties of mulberry plants. In Karnataka there are species are
there. They are MorusAlba MorusIndica. Mulberry leaves are only food
that the silkworm would eat. These are used in medical field to make
mulberry syrup. These leaves contains calcium, iron, fiber, Full of
potassium, magnesium, and other minerals. So this is used for
medicinal purpose.
Mulberry is affected by several diseases caused by fungi, bacteria,
mycoplasma, virus, and nematodes. The diseases are also caused by
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
72
flying insects.The incidence and loss due to these diseases vary with
season, variety, and cultivation practices.
Fungal diseases are leaf spot caused by cercosporamoricola, the
Symptoms is Irregular brownish spots appear on the leaves, Powdery
mildew is caused by phyllactinacorylea karst, symptom is Mildew spots
appear on the under surface of leaves, Leaf rust is caused by Aecidium
Mori or CeroteliumFici symptoms is pathogen affects the woody portion
which results in swelling and deformity.Viral disease is
Mulberry leaf mosaic disease: caused by virus transmitted by grafting or
by insect vectors, common symptom is wrinkling of leaves mostly the
ventral surface of leaf.The diseases are caused by insects are, Tukra
caused by Mealy Bug and the symptom is Malformation of apical tips,
wrinkled dark green leaves, Leaf Roller the symptom is rolling and
binding of leaves on the apical portion of the plant. The Disease due
deficiency of iron, nitrogen and zinc -leaf turns to yellow.
1. Literature Survey
Guru et al., proposed a machine vision system for classification of
Tobacco leaves of ripe, unripe and overripe. They use 244 images of
tobacco leaves. They use CIELAB color space model in matlab to
segment the leaf and K-NN as classifier to classify the leaves. They use
textural features. Out of 244 images they got 83 samples are unripe,
102 samples are ripe and 59 samples are over ripe. Here they studied 3
models, such as Gray Level Texture Patterns (GLTP), Local Binary
Patterns (LBP) and Local Binary Pattern Variance (LBVP). Here GLTP
model has 80% of accuracy. It has highest performance compare to
other models.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
73
Gulhane andGurjar, worked on identification of diseases in cotton
plant. They collect the dataset of 117 images of cotton crops. They use
Box counting algorithm and Support vector machine as classifier. They
got 83% of accuracy in use of 53 features, selecting of 45 features gives
best of 93.1% of accuracy(1).
Camargo and Smith, (2009) work on the classification of leaves of
different plant species. They use 20 different leaf species 100 for each
leaf species. They use Singular value decomposition method and Back
Propagation Neural Network classifier as classifier. They got 98.9%
accuracy in BPNN classifier.
Rumpfa et al., (2010) inoculated and 15 non inoculated sugar beet
leaves to detect the Early detection and classification of plant
diseases. They use SPAD values and decision tree (DT), ANNs, SVI as
different classifiers. They got DT-95.33%, ANNs-96.63%, SVIs-97.12%
accuracy for Cercospora leaf spot diseases. Accuracy of SVIs less than
ANNs andthat is less than Decision Tree classifier.
Chen et al., (2002) use 20 diseases samples and 25 non disease
samples for detect the disease image. They use 20 diseases samples and
25 non disease samples. Here Fuzzy feature selection approach-fuzzy
curves(FS) fuzzy surface (FS) methods and Neural Network technique,
SVM as classifier. They got 90.5% accuracy.
Mallikarjuna and Guru worked on performance evaluation of
segmentation and classification of tobacco seedling diseases. They
extracted 950 lesion areas from 120 infected leaves and 50 uninfected
areas are used. They use 4 texture features. They are uniformity,
entropy, smoothness, coarseness. They use Probability Neural
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
74
Network(PNN) as classifier and CIELAB color model in matlab. Here they
measures the performance of segmentation by evaluating the measures
as Dice Coefficient (DC), Error Rate (ER), Measures of Overlapping
(MOL), Measure of Under Segmentation (MUS), Measure of Over
Segmentation (MOS), Precision (P), Recal (R). They proposed
segmentation algorithm has high performance compare to previous one.
Guru et al., proposed a segmentation and classification of tobacco
seedling diseases. They use segment lesion areas from leaf of tobacco
seedling and probabilistic neural network as classifier. They use
statistical texture features are smoothness and coarseness to classify
the diseases of tobacco. The diseases are frog eye spot, anthracnose.
First the leaf is transformed to B-channel gray scale image. They use
750 lesion areas. Out of these 500 are anthracnose and 200 are frog-
eye spot and 50 are uninfected areas. They got the accuracy of 85.78%
of anthracnose, 82 % of frog eye spot and 98% of uninfected areas from
gray scale level co-occurrence matrix and first order statistical texture
features.
Challenges
1. Collecting dataset and processing: We select three different
diseases as three class, they are Powdery Mildew, Tukra, Deficiency of
–zn and –fe. For class Tukra 155 images, for Powdery Mildew 84
images, Deficiency of –zn and –fe 102 images are collected and
Removing of noises or unnecessary part of the images using matlab
function.
2. Feature extraction: Leaf spots, the color of image, change of shape
of image are considered the important units indicating the existence
disease. These are the different features of identification of disease.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
75
3. Classification: We choose any classification technique to classify
the image that belongs to any of 3 disease class.
Testing can be done through different images. To generating a dataset
as training dataset and testing dataset we use Cross validating
technique or Boot strapping technique. The output of this work
identifies the input image that belongs to the any of three classes with
the label as disease name.
Conclusion
Development of machine vision system for mulberry leaves is a
challenging task. In this paper we have discussed stages involved in
design of a pattern recognition system. The work carried out in this
direction has been discussed in brief. The challenges involved in
design of such system is also been discussed in this paper.
References
Camargo A. and Smith J.S., 2009. Image pattern classification
for the identification of disease causing agents in plants. Computers and Electronics in Agriculture, vol. 66, pp. 121–
125.
Chen Y.R., Chao K., and Kim M.S., 2002.Machine vision
technology for agricultural applications. Computers and Electronics in Agriculture, vol. 36, pp. 173-191.
Gulhane V.A., and Gurjar A.A. Detection of Diseases on Cotton
Leaves and Its Possible Diagnosis.
Guru D.S., Mallikarjuna P.B and ManjunathS. Segmentation and Classification of Tobacco Seedling Diseases
Guru D.S., Mallikarjuna.P.B., Manjunath S and Shenoi M.M,
2012 Intelligent Automation and Soft Computing, Vol. 18, No.
5, pp. 577-586
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
76
Mallikarjuna P.B., Guru D.S., 2011, Performance Evaluation of
Segmentation and Classification of Tobacco Seedling Diseases. International Journal of Machine Intelligence Vol. 3, pp. 204-211
Rumpfa T.,Mahleinb A.K., Steinerb U., Oerkeb E.C., Dehneb H.W., Plümera L.,Rumpfa T., Steinerb U., OerkebE.C.,
DehnebH.W., and Plümera L., Early detection and classification of plant diseases with Support Vector Machines
based on hyper spectral reflectance., 2010. Computers and Electronics in Agriculture, vol. 74, pp. 91–99.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
77
Recognition of Image inside Multiple
Images
Mr. Rajesh K M 1, Dr. Manjunath Rao L 2
1Research Scholar, CMJ University, Shilong, Meghalaya State, India
ABSTRACT
Visual Cryptography is one kind of image encryption. It is different from
traditional cryptography, because it does not need complex computation
to decrypt. In current technology, most of visual cryptography are
embedded a secret using two shares is limited. Visual Cryptography is
based on cryptography where n images are encoded in a way that only
the human visual system can decrypt the hidden message without any
cryptographic computations when all shares are stacked together. This
paper presents an improved algorithm based on Chang’s and Yu visual
cryptography scheme for hiding a colored image into multiple colored
cover images. This scheme achieves lossless recovery and reduces the
noise in the cover images without adding any computational complexity.
KEYWORDS: Image processing, visual Cryptography, secret
sharing.
------------------------ ---------------------------
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
78
I. INTRODUCTION:
Visual cryptography, introduced by Naor and Shamir in 1995[2], is a
new cryptographic scheme where the ciphertext is decoded by the
human visual system. Hence, there is no need to any complex
cryptographic computation for decryption. The idea is to hide a secret
message (text, handwriting, picture, etc…) in different images called
shares or cover images. When the shares (transparencies) are stacked
together in order to align the sub pixels, the secret message can be
recovered. The simplest case is the 2 out of 2 scheme where the secret
message is hidden in 2 shares, both needed for a successful decryption
[2]. This can be further extended to the k out of n scheme where a
secret message is encrypted into n shares but only k shares are needed
for decryption where k ≤ n. If k-1 shares are presented, this will give no
information about the secret message. Naor and Shamir applied this
idea on black and white images only. Few years later, Verheul and
Tilborg [4] developed a scheme that can be applied on colored images.
The inconvenient with these new schemes is that they use meaningless
shares to hide the secret and the quality of the recovered plaintext is
bad. More advanced schemes based on visual cryptography were
introduced in [1,3,5], where a colored image is hidden into multiple
meaningful cover images. Chang et al. [3] introduced in 2000 a new
colored secret sharing and hiding scheme based on Visual
Cryptography schemes (VCS) where the traditional stacking operation of
subpixels and rows interrelations is modified.[5] This new technique
does not require transparencies stacking and hence, it is more
convenient to use in real applications. However, it requires the use and
storage of a Color Index Table (CIT) in order to losselessly recover the
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
79
secret image. CIT requires space for storage and time to lookup the
table. Also, if number of colors c increases in the secret image, CIT
becomes bigger and the pixel expansion factor becomes significant
which results in severe loss of resolution in the camouflage images.
Chang and Yu introduced in [1] an advanced scheme for hiding a
colored image into multiple images that does not require a CIT. This
technique achieves a lossless recovery of the secret image but the
generated shares (camouflage images) contain excessive noise. Here we
can introduces an improved scheme based on Chang’s technique in
order to enhance the quality of the cover images while achieving lossless
recovery and without increasing the computational complexity of the
algorithm.
II. DEVELOPMENT:
Chang’s et al. Algorithm
Chang et al. proposed in 2002 a new secret color image sharing scheme
[1] based on modified visual cryptography. The proposed approach uses
meaningful shares (cover images) to hide the colored secret image and
the recovery process is lossless. The scheme defines a new stacking
operation (XOR) and requires a sequence of random bits to be generated
for each pixel.
Method description
Assume that a gray image with 256 colors constitute a secret to be
hidden. Each color can be represented as an 8- bit binary vector. The
main idea is to expand each colored pixel into m subpixels and embed
them into n shares. This scheme uses m=9 as an expansion factor. The
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
80
resulting structure of a pixel can be represented by an nx9 Boolean
matrix S= [Sij] where (1≤
jth subpixel in the ith share has a non-white color. To recover the color
of the original secret pixel, an “XOR” operation on the stacked rows of
the n shares is performed.
1.1 Hiding Algorithm
For a 2 out of 2 scheme, the construction can be described by a
collection of 2x9 Boolean matrices C. If a pixel with color
k=(k1k2…k8)2 needs to be shared, a dealer randomly picks an integer r
between 1 and 9 inclusively as well as one matrix in C. The
construction is considered valid if the following
conditions are satisfied:
Note that the number of 1’s in the first row of S must exceed the
number of 0’s by one.
Steps of the Algorithm
Take a colored secret image IHL of size HxL and choose any two
arbitrary cover images O1HL and O2HL of size HxL
Scan through IHL and convert each pixel Iij to an 8- bits binary string
denoted as k=(k1k2…k8) 2
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
81
According to rp and k for each pixel, construct S to satisfy equation (1)
Scan through O1 and for each pixel of color K1p , arrange the row “i” in
S as a 3x3 block B1p and fill the subpixels valued “1” with the color
K1p
Do the same for O2 and construct B2p. The resulting blocks B1p and
B2p are the subpixels of the Pth pixel after the expansion.
After processing all the pixels in IHL, two camouflage colored images
O1’ and O2’ are generated. In order to losselessly recover IHL, both O1’
and O2’ as well as a sequence of random bits R={r1, r2, … , rI| } are
needed.
Figure 1 describes the (2,2) scheme for hiding one pixel. This
process is repeated for all pixels in IHL to construct both
camouflage images O1’ and O2’.
1.2 Recovering Algorithm
In order to recover the secret image in a 2 out of 2 scheme, both
camouflage images O1’, O2’ as well as the string of random bits R are
required for the recovery process (Fig.2). The camouflage images are t
time bigger than IHL due to the expansion factor of subpixels.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
82
Steps of the Algorithm
Extract the first 3x3 blocks V1r and V2r from both camouflage images
O1’ and O2’, respectively.
Re-arrange V1r and V2r in a 2x9 matrix format Sr
Select the first random bit rp corresponding to the first encrypted pixel
Input Sr and rp to the F(.,.) function corresponding to equation (1).
Recover kp , the first pixel in IHL
Repeat for all 3x3 blocks in O1’ and O2’
2. Improved image generation schemeIn this section, we introduce a
modification of Chang's algorithm to generate better quality camouflage
images. Most of the modifications are applied to the subpixel expansion
block described in the next section.
2.1 Hiding Algorithm
Before subpixel expansion, add one to all pixels in the cover images
and limit their maximum value to 255. This ensures that no “0” valued
pixels exist in the images. When the images are expanded, replace all
the 0’s in S0, S1 by values corresponding to k1-1 in B1 and k2-1 in B2
(Figure 3) instead of leaving them transparent. Also, adjust all pixel
values to be between 0-255.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
83
2.2 Decryption algorithm
To recover the secret image, both camouflage images O1’ ,O2’ and
the string of random bits R are required.
Steps of the Algorithm
Take all regions of size txt in the camouflage images
Re-structure the square matrices as 1xm vectors
Scan through the 9 subpixels in the vector and note the coordinates of
the K1 and the K1-1 colors previously encrypted
Count the number of k and k-1 pixels in the processed vector, denoted
as countk-1, countk, respectively.
If countk-1 < countk , the transparent pixel is color k-1, otherwise, set
it to k
Use the K1 and K2 colors to find the secret pixel using the F(.,.) function
and the random number previously transmitted
Repeat for all txt block pixels in the camouflage images
III. CONLUSION:
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
84
This paper presented a new technique based on Chang et al. algorithm
[5] to hide a color secret image into multiple colored images. This
developed method does not require any additional cryptographic
computations and achieves a lossless recovery of the secret image. In
addition, the camouflage images obtained using the modified algorithm
look less susceptible of containing a secret message than the ones
obtained using the original method.
VI. REFERENCES:
[1] Chang, C. C. and Yu. T. X., Sharing a Secret Gray Image in
Multiple Images, in the Proceedings of International Symposium on Cyber Worlds: Theories and Practice, Tokyo, Japan, Nov. 2002, pp.230-237.
[2] M.Naor and A. Shamir, Visual cryptography. Advances in
Cryptology EUROCRYPT ’94. Lecture Notes in Computer Science, (950):1–12, 1995
[3] C. Chang, C. Tsai, and T. Chen, A new scheme for sharing secret color images in computer network. In the Proceedings of International Conference on Parallel and Distributed Systems,
pages 21–27, July 2000.
[4] E.Verheul and H. V. Tilborg., Constructions and properties of k out of n visual secret sharing schemes. Designs, Codes and Cryptography, 11(2):179–196, 1997.
[5] C. Yang and C. Laih., New colored visual secret sharing schemes. Designs, Codes and Cryptography, 20:325–335,2000.
[6] G. Ateniese, C. Blundo, A. D. Santis, and D. Stinson. Visual
cryptography for general access structures. Information and Computation, 129(2):86–106, 1996.
[7] R. J. Hwang and C. C. Chang, “Some Secret Sharing Schemes and Their Applications,” PhD. dissertation of National Chung Cheng University, Taiwan, 1998.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
85
Interpretation of Indian Classical Mudras: A Pattern Recognition
Approach
Manikanta P PG Department of Computer Science,
JSS College for Arts, Commerce and Science, Ooty road, Mysore-25. [email protected]
Abstract
This project deals with the detection and recognition of hand gestures.
Images of the hand gestures are taken using a digital camera and
matched with the images in the database and the best match is
returned. Gesture recognition is one of the essential techniques to build
user-friendly interfaces. For example, a robot that can recognize hand
gestures can take commands from humans, and for those who are
unable to speak or hear, having a robot that can recognize sign
language would allow them to communicate with it. Hand gesture
recognition could help in video gaming by allowing players to interact
with the game using gestures instead of using a controller. However,
such an algorithm needs to be more robust to account for the myriad of
possible hand positions in three-dimensional space. It also needs to
work with video rather than static images. That is beyond the scope of
our project.
------------------------ ---------------------------
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
86
INTRODUCTION
Mudra is a Sanskrit word, meaning sign or seal. It is gesture or
position, usually of the hands,, Mudras locks and guides energy flow
and reflexes to the brain by curling, crossing, stretching, and touching
the fingers and hands. We can talk to the body and mind as each area
of the hand correspond to a certain part of the mind or body.
Gesture is a form of non-verbal communication in which visible bodily
actions communicate particular messages, either in place of speech or
together and in parallel with words. Gestures include movement of the
hands, face, or other parts of the body.
There are two categories of gestures: static and dynamic. A static
gesture is a particular hand configuration and pose, represented by a
single image. A dynamic gesture is a moving gesture, represented by a
sequence of images. We focus on the recognition of static gestures,
although our method generalizes in a natural way to dynamic gestures.
For the broadest possible application, a gesture recognition algorithm
should be fast to compute.
Computer recognition of hand gestures may provide a more natural
human-computer interface, allowing people to point, or rotate a CAD
model by rotating their hands. Interactive computer games would be
enhanced if the computer could understand players' hand gestures.
Gesture recognition is useful for processing information from humans
which is not conveyed through speech or type. As well, there are various
types of gestures which can be identified by computers.
Sign language recognition: Just as speech recognition can transcribe
speech to text, certain types of gesture recognition software can
transcribe the symbols represented through sign language into text.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
87
Directional indication through pointing: Pointing has a very specific
purpose in our society, to reference an object or location based on its
position relative to ourselves. The use of gesture recognition to
determine where a person is pointing is useful for identifying the
context of statements or instructions. This application is of particular
interest in the field of robotics.
RELATED WORK
Hassan et al (2012), applied multivariate Gaussian distribution to
recognize hand gestures using non geometric features. The input hand
image is segmented using two different methods; skin color based
thresholding techniques. Some operations are performed to capture the
shape of the hand to extract hand feature; the modified Direction
Analysis Algorithm are adopted to find a relationship between statistical
parameters (variance and covariance) from the data, and used to
compute object (hand) slope and trend by finding the direction of the
hand gesture.
Li (2003),recognized hand gestures using fuzzy c-means clustering
algorithm for mobile remote application. He used FCM algorithm
clustering for the classification Gestures. The system implemented
under intricate background and invariant light Conditions. The system
implemented with recognition accuracy of 85.83%.
Kulkarni and Lokhande (2010), recognize static posture of American
Sign Language using neural networks algorithm. The input image are
converted into HSV color model, resized into 80x64 and some image
preprocessing operations are applied to segment the hand from a
uniform background, features are extracted using histogram technique
and Hough algorithm. Feed forward Neural Networks with three layers
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
88
are used for gesture classification. 8 samples are used for each 26
characters in sign language, for each gesture, 5 samples are used for
training and 3samples for testing, the system achieved 92.78%
recognition rate.
Stergiopoulou and Papamarkos (2009), suggested a new Self-Growing
and Self-Organized Neural Gas (SGONG) network for hand gesture
recognition. For hand region detection a color segmentation technique
based on skin color filter in the Y Cb Cr color space was used, an
approximation of hand shape morphology has been detected using
(SGONG) network; Three features were extracted using finger
identification process which determines the number of the raised
fingers and characteristics of hand shape, and Gaussian distribution
model used for recognition.
Trigo and Pellegrino (2010), used geometric shape descriptors for
gesture recognition. A webcam used to collect database image and
segmented it manually. Several experiments were performed and each
experiment contains one or more groups of the three features group
defined; invariant moments group with seven moments; K-curvature
group with the features: fingers number, angle between two fingers, and
distance-radius relation; and geometric shape descriptors group with
the features; aspect ratio, circularity, spreadness, roundness and
solidity. Multi-layer perception MLP used for classification. The
geometric shape descriptor group has the best performance
classification.
Freeman and Michal (1995), presented a method for recognition gesture
based on calculated local orientation histogram for each image. The
system consists of training phase, and running phase. For training
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
89
phase, several histograms were stored in the computer as a features
vector, and in running phase, the features vector of the input gesture
extracted and compared with all the feature vectors stored in computer,
Euclidean distance metric used for recognized gestures.
Elmezain et al (2008) proposed a system to recognize isolated and
meaningful gestures for Arabic numbers (0 to 9). Gaussian Mixture
Model (GMM) used for skin color detection. For features extraction, the
orientation between the centroid points of current frame and previous
frame were determined by vector quantization. The hand motion path
recognized using different HMM topologies and BW algorithm. The
system relied on zero codeword detection models to recognize the
meaningful gestures from continuous gestures.
Kouichi and Hitomi (1999), presented posture recognition system using
neural network to recognize 42 alphabet finger symbols, and gesture
recognition system to recognize 10 words. Back propagation Neural
Network used for postures recognition and Elman Recurrent Neural
Network for gesture recognition. The two systems were integrated in a
way that after receiving the raw data, the posture system determined
the sampling start time of the input image, and if it decided to be a
gesture then it sent to the gesture system.
METHODOLOGY
In this work we are trying to automate mudra recognition. Mudra is
very important in dance and also common people, the figure1 shows the
block diagram of this work and explained as follows.
Figure 1: Block diagram of the proposed model
We take hand images from camera with black background, and we
converted color images into gray scale images after we resize the images
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
90
and then we converted into Black & White format. After collecting the
images we extracted features are as follows,
A. SHAPE FEATURES:
We extract following shape features namely,
Area: It specifies the actual number of pixels in the region. (This value
might differ slightly from the value returned by bwarea, which
weights different patterns of pixels differently).
Euler number: It specifies the number of objects in the region minus
the number of holes in those objects. This property is supported only
for 2-D input label matrices.
Orientation: It specifies the angle (in degrees ranging from -90 to 90
degrees) between the x-axis and the major axis of the ellipse that has
the same second-moments as the region. This property is supported
only for 2-D input label matrices.
Extent: It specifies the ratio of pixels in the region to pixels in the
total bounding box. Computed as the Area divided by the area of the
bounding box.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
91
Perimeter: It specifies the distance around the boundary of the region.
regionprops computes the perimeter by calculating the distance
between each adjoining pair of pixels around the border of the region.
If the image contains discontiguous regions, regionprops returns
unexpected results.
Convex area: It specifies the number of pixels in 'ConvexImage'.
Filled area: specifying the number of on pixels in Filled Image.
Solidity: specifying the proportion of the pixels in the convex hull that
are also in the region. Computed as Area/Convex Area.
Eccentricity: this specifies the eccentricity of the ellipse that has the
same second-moments as the region. The eccentricity is the ratio of
the distance between the foci of the ellipse and its major axis length.
MajorAxisLength: specifying the length (in pixels) of the major axis of
the ellipse that has the same normalized second central moments as
the region.
EquivDiameter: It specifies the diameter of a circle with the same area
as the region. Computed as sqrt(4*Area/pi).
MinorAxisLength: It specifies the length (in pixels) of the minor axis of
the ellipse that has the same normalized second central moments as
the region.
B. GABOR FEATURES: It is a linear filter used for edge detection.
Frequency and orientation representations of Gabor filters are similar to
those of the human visual system.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
92
C. EDH FEATURES: It means Edge Direction Histogram. The basic idea
is to build a histogram with the directions of the gradients of the edges
(borders or contours). It is possible to detect edges in an image.
After feature extraction we design classifier to classification, There are
number of classifier is their, in that we have to select best classifier,
here we use KNN-classifier. And we will check the accuracy.
EXPERIMENTS AND RESULTS
We collect 25 types of Indian classical dance mudras and one mudra
consider as one class. Table 1 shows the number of class and number
of images in the class.
Table 1: Details of data set used for experimentation
No of class No of images/class
25 100-150
After collecting images we segment manually.Then extracted the
features namely Shape features, Gabor features and EDH features. Here
we use K-Nearest Neighbor classifier for classification. Before using
classifier we split the images into two parts namely Training set and
Testing set. We have used nearest neighbor Classifier using distance
measure as Euclidean distance and Table 2 shows the corresponding
accuracy under varying training and testing samples.
Table 2: Recognition accuracy of the proposed model using Euclidean
distance measure
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
93
Training/Testing 60/40 50/50 40/60 30/70
K=1 59.4872 59.3965 55.7462 58.6370
CONCLUSION AND FUTURE WORK
We use NN classifier as classifier in this project we get result of
59.3965% Accuracy.
The designed application demonstrated the capability of a range camera
for real-time applications. Though the process seems to be promising,
further work is required to improve the segmentation speed and
different classifier have to use like SVM and Neural network etc... for to
improvement of Accuracy purpose.
Future work includes not only improvement of the designed strategy
but also taking into account more challenges such as dynamic gestures
involving both hands and/or multiple cameras. Our final objective
involves gestures with a high degree of freedom; which may require
detection of fingers and articulated hands.
REFERENCES
Elmezain M., Ayoub A., and Bernd M., 2008, HIDDEN MARKOV
MODEL BASED ISOLATED AND MEANINGFUL HAND GESTURE RECOGNITION, World Academy of Science, Vol.41, pp.393-400.
Freeman W.T and Michal R., 1995 ORIENTATION HISTOGRAMS OF
HAND GESTURE RECOGNITION, International Workshop on Face and Gesture Recognition, Zurich.
Hasan M.M and Mishra P.K, 2012 FEATURE FITTING USING MULTIVARIATE GAUSSIAN DISTRIBUTION FOR HAND GESTURE
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
94
RECOGNITION, International Journal of Computer Science and Emerging , Vol.3(2).
Kouichi M, and Hitomi T, 1999, GESTURE RECOGNITION USING
RECURRENT NEURAL NETWORKS,ACMSIGCHI conference on human factors in computing system, pp.237-242.
Kulkarni V.S and Lokhande .S.D , 2010, APPEARANCE BASED
RECOGNITION OF AMERICA SIGN LANGUAGE USING GESTURE SEGMENTATION, IJCSE,Vol.2(3), pp 560-565.
Li X, 2003, GESTURE RECOGNITION BASED ON FUZZY C-MEANS
CLUSTERING ALGORITHM, Department of Computer Science. The University Tennessee Knoxville.
Stergiopouolou .E and Papamarkos .N, 2009, HAND GESTURE USING A NEUTRAL NETWORK SHAPE FITTING TECHNIQUE,
Elsevier Engineering Application of Artificial Intelligence, Vol.22(8), pp1141-1158.
Trigo T.R and Pellegrino R.M,2010, AN ANALYSIS OF HAND GESTURE CLASSIFICATION, IJIP, Vol.6,No.1,pp 635-646.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
95
Current Challenges in Plagiarism
Detection
Nagaraju L J
PG Department of Computer Science
JSS College of Arts, Commerce and Science, Ooty Road, Mysore [email protected]
Abstract
Plagiarism detection can be divided in external and intrinsic methods.
External plagiarism detection require the references documents for
finding plagiarism, but intrinsic plagiarism detection is based on
discrepancies in style within a suspicious document and it does not use
any references. Our work is completely focused on external plagiarism
detection. For document matrix representation almost all researchers
use the Vector Space Model which has the limitation of not preserving
the order of terms which is essential in preserving the actual meaning of
the document. In this project we use status matrix representation
which is capable of preserving the Order of terms. And for finding
plagiarism we use order of frequencies of words present in both source
and suspicious document, and Plagiarized lines or passages are
detected by analyzing the Status Matrices.
Keywords: Plagiarism Detection, Status Matrix.
------------------------ ---------------------------
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
96
Introduction
Plagiarism defined as the theft of intellectual property (Meyer et al.,
2006), or act of fraud or to use (Another’s production) without crediting
the source. There exist different forms of plagiarism, ranging from
simply copying and pasting original passages to more elaborate
paraphrased and translated plagiarism. Anecdotal evidence and studies
such as (Sheard et al., 2002) strengthen the suspicion that plagiarism
is on the rise, facilitated by new media such as the World Wide Web.
Growing information sources ease plagiarism while plagiarism
prevention and detection become harder. To combat these problems the
manual method is very expensive and not possible to detect the
plagiarism for big datasets (documents), so we need automated
plagiarism detector.
Plagiarism detection is the process of locating instances of plagiarism
within a work or document. Plagiarism detection split into two tasks:
external plagiarism detection and intrinsic plagiarism detection.
External plagiarism detection deals with the problem of finding
plagiarized passages in a suspicious document based on a reference
corpus. Intrinsic plagiarism detection does not use external knowledge
and tries to identify discrepancies in style within a suspicious document
(Zechner et al., 2009).
External plagiarism detection is the approach where suspicious
documents are compared against a set of possible references. From
exact document copy, to paraphrasing, different levels of plagiarism
techniques can been used in several contexts (Eissen et al., 2006).
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
97
External plagiarism detection relies on a reference corpus composed of
documents from which passages might have been plagiarized. A
passage could be made up of paragraphs, a fixed size block of words, a
block of sentences and so on. A suspicious document is checked for
plagiarism by searching for passages that are duplicates or near
duplicates of passages in documents within the reference corpus. An
external plagiarism system then reports these findings to a human
controller who decides whether the detected passages are plagiarized or
not (Zechner et al., 2009).
The major applications of Plagiarism detection is to detect and avoid the
plagiarism in, project reports and assignments, which are submitted by
students in the college and in the same manner we can also detect the
plagiarized text in, World Wide Web, newly published research papers,
books, journals and magazines.
From the literature survey it is understood that, almost all works are
based on VSM which has the limitation of not preserving the order of
terms which is essential in preserving the actual meaning of the
document. This is the most challenging task in information retrieval. As
the size of the document increase the number of terms also increases
which increase the dimension of the feature vector. Vector space
representation scheme will have sparse matrix which is a bottleneck for
analysis.
LITERATURE SURVEY
There are many people works on Plagiarism detection, in this section we
briefly discuss about some related works on plagiarism detection. It
covers both external and intrinsic plagiarism detections.
External Plagiarism Detection
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
98
Zechner et al. introduce their model based on nearest neighbor
search in a high dimensional term vector space. This model contain
three steps, in the first stage Vectorization of the passages of each
document in the reference corpus means all
reference documents are represented in matrix form using Vector
Space Model. Then partitioning the reference corpus vector space and
calculate the centroid for each partitions using k-means algorithm.
Additionally we store a sorted list of similarities for each cluster,
holding the similarities between the centroid of the cluster and
the sentence vectors associated with that cluster. In the second stage
Vectorization of the passages of a suspicious document and Determine
the nearest cluster to the query sentence based on the cosine similarity
between the centroids and the query sentence. Find the position in the
sorted similarity list the query sentence would be inserted at based on
its similarity to the cluster centroid. Finding each passage’s nearest
neighbor(s) in the reference corpus vector space. Detection of
plagiarism for each suspicious document is based on its nearest
neighbor list via similarity thresholding. In the final stage they do Post
processing of the detected plagiarized passages, merging subsequent
plagiarized passages to a single block. This is accomplished by simply
checking whether sentences marked as plagiarized are in sequence in
the suspicious document. If this is the case they are merged. This is
repeated until no more merging is possible.
External plagiarism detection is similar to textual information retrieval
(IR). Given a set of query terms an IR system returns a ranked set of
documents from a corpus that best matches the query terms. The most
common structure for answering such queries is an inverted index. An
external plagiarism detection system using an inverted index indexes
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
99
passages of the reference corpus‟ documents. For each passage in a
suspicious document a query is send to the system and the returned
ranked list of reference passages is analyzed. Such a system was
presented in (Hoad and Zobel, 2003) for finding duplicate or near
duplicate documents.
Marta et al. proposed their work, contains two steps and post
processing, in the first step, they build an information retrieval
system based on Solr/Lucene, segmenting both suspicious and
source documents into smaller texts. Then perform a search based on
bag-of-words which provides a first selection of potentially plagiarized
texts. In segmentation they choose 100 words with 50% overlap. For
each document segment used as a query, the top ranked match is
considered as a plagiarism candidate. In the second step for further
investigated. They implemented a sliding window approach that
computes cosine distances between overlapping text segments from
both the source and suspicious documents on a pair wise basis. As a
result, a similarity matrix between text segments is obtained, which is
smoothed by means of low-pass 2-D filtering. From the smoothed
similarity matrix, plagiarized segments are identified by using
image processing techniques. And finally they performed a post
processing which compacted all overlapped sections.
Another method for finding duplicates and near duplicates is
based on hashing or
fingerprinting. Such methods produce one or more fingerprints that
describe the content of a document or passage. A suspicious
document‟s passages are compared to the reference corpus based on
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
100
their hashes or fingerprints. Duplicate and near duplicate
passages are assumed to have similar fingerprints (Brin et al., 1995).
Kasprzak & Brandejs introduced their model for automatic external
plagiarism detection.
It consists of two main phases; the first is to build the index of the
documents, while in the second the similarities are computed. This
approach uses word n-grams, with n ranging from 4 to 6, and takes into
account the number of matches of those n-grams between the
suspicious documents and the source documents for computing the
detections.
Thomas Gottron works on standard IR technologies for the
candidate selection and efficient data structures for the detailed
analysis between a suspicious and a candidate document. When
provided with a suspicious document the pre-selection component uses
the Lucene engine to retrieve candidate documents from the
source collection for a detailed comparison. The detailed analysis
then provides tuples of sequences from suspicious and candidate
documents that already represent detected plagiarized contents. A
series of post-processing filters takes care to remove pathological
cases. Prior to building the Lucene index, all non-English documents
were translated into English using Google’s translation-service.
Essentially, this corresponds to a standard cross-language indexing
approach. To be able to easily map the translated parts back onto the
original texts, they were translated in small chunks of a few
paragraphs. The information which parts of the texts correspond to
each other was stored for a later on backward resolution of character
positions in plagiarized parts.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
101
Another method is based on two phases; first, it executes a
plagiarism search space reduction method, and then executes an
exhaustive search to find plagiarized passages. The search space
reduction method aims at quickly identify those pair of documents that
potentially have some text in common, possibly one of them having
plagiarized from the other. For this, the method‟s general tactics are to
remove stop words, and consider word 4-grams. If two documents have
at least two word 4-grams coincidences close enough as to be in the
same paragraph, the documents are given to the next phase. Otherwise
the pair is discarded (Oberreuter et al., 2010).
Intrinsic Plagiarism Detection
Intrinsic plagiarism detection only recently received attention from the
scientific community. It was first introduced in (Meyer zu Eissen and
Stein, 2006) and defined as detecting plagiarized passages in a
suspicious document without a reference collection or any other
external knowledge. A suspicious document is first decomposed into
passages. For each passage a feature vector is constructed.
Features are derived from stylometric measures like the average
sentence length or the average word length known from the field
of authorship analysis. These features have to be topic independent so
as to capture the style of an author and not the domain she writes
about. Next a difference vector is constructed for each passage that
captures the passages deviation from the document mean vector. Meyer
zu Eissen and Stein (2006) assume that a ground truth is given,
marking passages actually from the author of the suspicious
document. A model is then trained based on one-class classification,
using the ground truth as the training set. The model is then used to
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
102
determine which passages are plagiarized. However, it is not clear how
the ground truth is derived from a suspicious document when no
information about the document is known beforehand.
Zechner et al. present their work; it based the ideas presented in Meyer
zu Eissen and Stein, first they determine whether passages in a
suspicious document are plagiarized based only on changes in style
within the document. An author‟s style is also of importance in the field
of authorship classification. Both problems rely on so called stylometric
features. These features should be topic and genre independent and
reflect an author’s style of writing. Changes of style within a document
can be detected by various methods. They choose a simple outlier
detection scheme based on a vector space spanned by various
stylometric features. This system is composed of 3 stages: In the first
stage Vectorization of each sentence in the suspicious document.
And in the second step Determination of outlier sentences based on the
document’s mean vector. And finally do the post processing of the
detected outlier sentences.
Conclusion
From the literature survey it is understood that plagiarism detection is
a challenging issue. Comparing external and internal plagiarism,
internal plagiarism is more challenging as there are no reference
documents available. With respect to representation conventional vector
space models which work based on word frequency will not provide the
actual semantic information which is very essential in plagiarism
detection. In this direction the work has to be carried out.
References
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
103
Brin S., Davis J and Garcia-Molina H., 1995. Copy
detection mechanisms for digital documents. In ACM International Conference on Management of Data.
Dinesh R., Harish B.S., Guru D.S and Manjunath S., 2009. Concept of Status Matrix in Classification of Text Documents.
Hoad, Timothy C., and Justin Zobel., 2003. Methods for
identifying versioned and plagiarized documents. J. Am. Soc. Inf. Sci. Technol., Vol.3, No.54, pp.203–215.
Kasprzak J. and Brandejs M., 2010. Improving the reliability of
the plagiarism detection system: Lab report for pan at clef 2010.
Meyer zu Eissen, S., Stein B. and Kulig M., 2006. Plagiarism
detection without reference collections, pp.359–366.
Oberreuter G., L‟Huillier G., Ríos S.A. and Velásquez, J.D., 2010.
Finding approximated segments of n-grams for document copy detection: Lab report for pan at clef 2010.
Sheard, Judy, Dick M., Markham S., Macdonald I., and Walsh
M., 2002. Cheating and plagiarism: perceptions and practices of first year it students. SIGCSE Bull., Vol.3, No.34, pp.183–187.
Zechner M., Muhr M., and Kern R., 2009. External and
Intrinsic Plagiarism Detection Using Vector Space Models, 3rd PAN workshop.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
104
A Mathematical Overview of Vision Processing
Mr. Ashwin Kumar H N1, Dr. Manjunatha Rao L2 1Research Scholar, CMJ University, Meghalaya, India
Abstract
Vision Processing for Real-time 3-D Data Acquisition Based on Coded
Structured Light system provides an idea for real-time acquisition of 3-
D surface data by a specially coded vision system. To achieve 3-D
measurement for a dynamic scene, the data acquisition must be
performed with only a single image. A principle of uniquely color-
encoded pattern projection is proposed to design a color matrix for
improving the reconstruction efficiency. The matrix is produced by a
special code sequence and a number of state transitions.
A color projector is controlled by a computer to generate the desired
color patterns in the scene. The unique indexing of the light codes is
crucial here for color projection since it is essential that each light grid
be uniquely identified by incorporating local neighborhoods so that 3-D
reconstruction can be performed with only local analysis of a single
image. The term structured light is defined as the projection of simple
or encoded light patterns onto the illuminated scene. The main benefit
of using structured light is that features in the images are better
defined. As a result, both the detection and extraction of image features
are simplified and more robust
------------------------ ---------------------------
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
90
CONSTRUCTION OF PATTERNS
The codes are created by sequences of color values in which two
consecutive values are different. By correctly deriving the code for a
given resolution, each row results in a pattern made of grids or stripes
to be projected. Image is saved after the projection.
PRINCIPLE OF UNIQUELY COLOR-ENCODED PATTERN:
“The matrix M should consists of the color primitives of given color set
‘P, so that there are no two identical words in the matrix. Furthermore,
every element has a color different from its adjacent neighbors in the
word”.
Let P be a set of color primitives, P={1,2,…,p} where the numbers
P={1,2,…,p} representing different colors. These color primitives are
assigned to an ‘m*n’ matrix ‘M’ to form the encoded pattern which may
be projected onto the scene. We can define a word from ‘M’ by the color
value at location (i , j) in ‘M’ and the color values of its 4-adjacent
neighbors. If (Xij) is the assigned color point at row ‘i’ and column ‘j’ in
matrix ‘M’, then the word for defining this location ‘Wij’ is the sequence
{Xij.Xij-1,Xi-1j,Xij+1,Xi+1j} where i={1,2,…,m} and j={1,2,…,n}.
.Condition 1:
We need to assign the color primitives of P to the matrix M so that
there are no two identical words in the matrix.
W={wij | wij != wkl,(i,j) != (k,l), 2 <= i,k <= (m-1), 2<= j,l <= (n-1)}
. Condition 2:
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
106
Furthermore, every element has a color different from its adjacent
neighbors in the word.
M = {xij | xij != xi-1j, xij != xi+1j, xij != xij+1,xij != xij-1, 1 <= i<= m, 1 <=j<=
n}
PATTERN CODIFICATION
A given number of colors used to create the code, the codes are defined
as the color elements found by traversing the square matrix row by row
from the top left corner.
First with a given color set P, we try to make a longest horizontal code
sequence
Sh=[c1,c2,c3…..,cm]
where ‘m’ is the sequence length. For any adjacent color pair, it satisfies
Ci!=Ci+1,1<=i<m
and any triplet of adjacent colors, T3i=[ci,ci+1,ci+2] , is unique in the
sequence
T3i! = T3j, i! = j, 1 <=I , j<= m-2
The maximal length of the horizontal sequence ‘Sh’ is
Length (Sh)=p(p-1)(p-1)+2
CALIBRATION
The calibration process computes the intrinsic and extrinsic
parameters. This is mandatory for gathering accurate and robust 3D
measurements.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
107
CAMERA CALIBRATION:
The intrinsic parameters of the camera C are defined by the 3×3 matrix.
--------------- (A)
where
‘m’ is expressed in the image coordinates system (u, v), is the
ideal (undistorted) projection of the world point ‘MC’.
‘MC’ is expressed in the camera coordinates system (XC, YC,
ZC).
The parameters ‘αu’ and ‘αv’ are the scaling values in the ‘XC’
and ‘YC’ directions, respectively.
The principal point (u0, v0) is the point where the optical axis
intersects the image plane.
The parameter ‘θ’ is the skew angle between the image axes ‘u’
and ‘v’.
The parameter λ = 1 /ZC is a scale factor.
The extrinsic parameters of the camera C are defined by the
rigid displacement (R1, t1) such that
------------- (B)
where
‘R1’ is a 3×3 rotation matrix.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
108
‘t1’ is a 3×1 translation vector, and ‘O3’ is a 3 × 1 null
vector.
The 4 × 4 matrix ‘ESC’ represents the transformation
from the camera coordinate system to the scene
coordinate system.
PROJECTOR CALIBRATION:
The intrinsic parameters of the light projector are given by the
characteristics of the used projection pattern. The extrinsic parameter
of the projector ‘P’ are defined by the rigid displacement (R, t) such that
---- ------------ (C)
where
The 4 × 4 matrix ‘ECP‘ represents the transformation
from the projector coordinate system to the camera
coordinate system.
The rigid displacement (R2, t2) between the scene and
projector coordinate system can be analytically
computed from expression (B) and (C).
TRIANGULATION PROCESS
The triangulation is the process that determines the 3D position of a
point given its 2D positions on the perspective projections of the camera
and the projector, respectively. Given a 3D point ‘X’ expressed in the
camera coordinate system, and its two projections ‘XC‘ and ‘XP‘
expressed in normalized coordinates in camera and projector
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
109
coordinates systems, respectively. Let ‘Yc’,’Zc’ and ‘YP’,’ZP’ are camera
and projector coordinates. If (R, t) lines the rigid transformation between
camera and projector coordinate systems, then we can express in
camera coordinate system.
X = λ XC and X = XP + μ Y P --------------- (D)
Similarly Y= λ YC and Y = YP + μ Z P, Z= λ ZC and Z =
ZP + μ P
Where
XP = ROP + t and P = RXP + t ------------
--- (E)
YP = ROP + t and P = RYP + t
ZP = ROP + t and P = RZP + t and λ = 1
/ZC.
From expression 4 we obtain the simplified system.
--------
------ (F)
The expression of resolution 6 gives
Z= XP - ZP P / XC - P or YP- ZP Y P/YC- YP
------- (G)
Finally, with the previously computed ‘Z’ value and expression 6,
the coordinates of the 3D point ‘X’ can be fully acquired.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
110
Conclusion
Real-time, low-cost, reliable, and accurate 3-D data acquisition is a
dream for us in the vision community. While the available technology is
still not able to reach all the features such as speed, accuracy and
processing of only single image together, this project makes a significant
progress to the goal. An idea was presented and implemented for
generating a specially color-coded light pattern, which combines the
advantages of both fast 3-D vision processing from a single image and
reliability and accuracy from the principle of structured light systems.
With a given set of color primitives, the patterns generated are
guaranteed to be a large matrix and desired shape with the restriction
that each word in the pattern matrix must be unique. By using such a
light pattern, correspondence problem is solved within a single image
and therefore, this is used in a dynamic environment for real-time
applications. Furthermore, the method does not have a limit in the
smoothness of object surfaces since it only requires analyzing a small
part of the scene and identifies the coordinates by local image
processing which greatly improves the 3-D acquisition efficiency.
REFERENCES
[1] M. Ribo and M. Brandner, “State of the art on vision-based
structured light systems for 3D measurements,” in Proc. IEEE Int. Workshop on Robotic Sensors: Robotic and Sensor Environments, Ottawa.
[2] J. Salvi, J. Pags, and J. Batlle, “Pattern codification strategies in
structured light systems,” Pattern Recognit., vol. 37, no. 4, pp. 827–849,Apr. 2004.
[3] D. Desjardins and P. Payeur, “Dense stereo range sensing with marching pseudo- random patterns,” in Proc. 4th Canada. Conf. May
2007, pp. 216–226.
[4] S. Osawa, “3-D shape measurement by self-referenced pattern
projection method,” Measurement, vol. 26, pp. 157–166, 1999.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
111
[5] C. S. Chen, Y. P. Hung, C. C. Chiang, J. L. Wu, and Range, “Data acquisition using color structured lighting and stereo vision,” Image Vis. Comput., vol. 15, pp. 445–456, 1997.
[6] L. Zhang, B. Curless, and S. M. Seitz, “Rapid shape acquisition using color structured light and multi-pass dynamic programming,” in Proc. IEEE 3D Data Processing Visualization and Transmission, Padova, Italy, Jun. 2002, pp. 24–36.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
112
Taxonomy of Multicast routing protocols for Mobile
ad-hoc networks
Mr. Jagadeesh Krishna S1, Dr. Manjunatha Rao L2 1Research Scholar, CMJ University, Meghalaya, India
ABSTRACT
Mobile nodes self-organize to form a network over radio links. A mobile
ad-hoc network (MANET) is composed of mobile nodes without any
infrastructure.. The goal of MANETs is to extend mobility into the
autonomous, mobile and wireless domains, where a set of nodes form
the network routing infrastructure in an ad-hoc fashion. The majority of
applications of MANETs are in areas where rapid deployment and
dynamic reconfiguration are necessary and wired network is not
available. These include military battlefields, emergency search, rescue
sites, classrooms and conventions, where participants share
information dynamically using their mobile devices. These applications
lend themselves well to multicast operations. In addition, within a
wireless medium, it is crucial to reduce the transmission overhead and
power consumption. Multicasting can improve the efficiency of the
wireless link when sending multiple copies of messages by exploiting
the inherent broadcast property of wireless transmission. Hence,
reliable multicast routing plays a significant role in MANETs. However,
to offer effective and reliable multicast routing is difficult and
challenging. In recent years, various multicast routing protocols have
been proposed for MANETs. These protocols have distinguishing
features and employ different recovery mechanisms. To provide a
comprehensive understanding of these multicast routing protocols and
better organize existing ideas and work to facilitate multicast routing
design for MANETs, we present the taxonomy of the multicast routing
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
113
protocols, their properties and design features This paper aims to aid
those MANETs researchers and application developers in selecting
appropriate multicast routing protocols for their work.
Keywords: Mobile ad-hoc network (MANET); Multicast routing
protocol; Taxonomy; Mobile node; Routing table.
------------------------ ---------------------------
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
114
INTRODUCTION
Multicasting is the transmission of packets to a group of zero or more
hosts identified by a single destination address [1]. Multicasting is
intended for group-oriented computing, where the membership of a host
group is typically dynamic that is, hosts may join and leave groups at
any time. There is no restriction on the location or number of members
in a host group. A host may be a member of more than one group at a
time. Also, a host does not have to be a member of a group to send
packets to the members in the group. In the wired environments, there
are two popular network multicast schemes: shortest path multicast
tree and core-base tree. The shortest path multicast tree method
guarantees the shortest path to each destination, but each source has
to build a tree. Therefore, too many trees exist in the network. The core-
based tree method cannot guarantee the shortest path from a source to
a destination, but only one tree is required to be constructed for each
group. Therefore, the number of trees is greatly reduced.
Currently, one particularly challenging environment for multicast is in
MANETs [2,3]. A MANET is a self-organizing collection of wireless mobile
nodes that form a temporary and dynamic wireless network established
by a group of mobile nodes on a shared wireless channel without the
aid of a fixed networking infrastructure or centralized administration. A
communication session is achieved either through single-hop
transmission if the recipient is within the transmission range of the
source node, or by relaying through intermediate nodes other-wise. For
this reason, MANETs are also called multi-hop packet radio networks.
However, the transmission range of each low-power node is limited to
each other’s proximity, and out-of-range nodes are routed through
intermediate nodes.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
115
Mobile nodes in MANETs are capable of communicating with each other
without any network infrastructure or any centralized
administration.Mobile nodes are not bounded to any centralized control
like base stations or mobile switching centers. Due to the limited
transmission range of wireless network interfaces, multiple hops may be
needed for one node to exchange data with another across the network.
In such a network, each mobile node operates not only as a host but
also as a router, forwarding packets for other mobile nodes in the
network that may not be within direct wireless transmission range of
each other. Each node participates in an ad-hoc routing function that
allows it to discover multi-hop paths through the network to any other
node.
Related work
As a promising network type for future mobile application, MANETs are
attracting more and more researchers [2,3]. In multicast routing
protocols field, some researches on the taxonomy of multicast routing
protocols over MANETs have been carried out. Tariq Omari et al. [4]
classify multicast routing protocols into tree-based mesh-based,
stateless, hybrid-based and flooding protocols and evaluate the
performance and capacity of multicast routing protocols for MANETs.
Two distinct on-demand multicast protocols, Forwarding Group
Multicast Protocol (FGMP) and core-assisted mesh protocol are
described in [5]. And other multicast protocols used in MANETs have
also been briefly summarized. In [6], AODV ODMRP, PBM and PAST-
DMPUMA are explained. In[7], Cordeiro et al. provide information about
the current state-of-the-art in multicast protocols for MANETs, and
compares them with respect to several performance metrics. In [7,8],
authors classify these protocols into four categories based on how
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
116
routes are created to the members of the group: tree-based approaches,
meshed-based approaches, stateless multicast and hybrid approaches.
Multicasting routing protocols
The majority of applications for MANETs are in areas where rapid
deployment and dynamic reconfiguration are necessary and the wired
network is not available. These include military battlefields, emergency
search and rescue sites, classrooms, and conventions where
participants share information dynamically using their mobile devices.
These applications lend themselves well to multicast operation. In
addition, within a wireless medium, it is even more crucial to reduce the
transmission overhead and power consumption. Multicasting can be
used to improve the efficiency of the wireless link when sending
multiple copies of messages to exploit the inherent broadcast nature of
wireless transmission.So multicast plays an important role in
MANETs[2].
In the wired environments, there are two popular network multicast
approaches, namely, shortest path multicast tree and core-based
tree[9]. The shortest path multicast tree guarantees the shortest path to
each destination. But each source needs to build a tree. Usually, there
exist too many trees in the network, so the overhead tends to be large.
In contrast, the core-based tree constructs only one tree for each group
and the number of trees is greatly reduced. Unlike typical wired
multicast routing protocols, multicast routing for MANETs must
address a diverse range of issues due to the characteristics of MANETs,
such as low bandwidth, mobility and low power. MANETs deliver lower
bandwidth than wired networks; therefore, the information collection
during the formation of a routing table is expensive. Mobility of nodes,
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
117
which causes topological changes of the underlying network, also
increases the volatility of network information. In addition, the
limitation of power often leads users to discon-nect mobile
units.Multicasting routing protocols have emerged as one of the most
focused areas in the field of MANETs. There are three basic categories of
multicast methods [9] in MANETs:
1. A basic method is to simply flood the network. Every node
receiving a message floods it to a list of neighbors. Flooding a
network acts like a chain reaction that can result in exponential
growth.
2. The proactive approach pre-computes paths to all possible
destinations and stores this information in the routing table. To
maintain an up-to-date database, routing information is
periodically distributed through the network.
3. The final method is to create paths to other nodes on demand.
The idea is based on a query response mechanism or reactive
multicast. In the query phase, a node explores the environment.
Once the query reaches the destination. The response phase
starts and establishes the path.
Recently, many multicast routing protocols have been newly proposed
to perform multicasting in MANETs. These include ad-hoc multicast
routing protocol utilizing increasing Id numbers (AMRIS)[10], multicast
ad-hoc on-demand vector (MAODV)[11], core assisted mesh protocol
(CAMP) [12], lightweight adaptive multicast (LAM) [13], location
guided tree (LGT) [14], on-demand multicast routing protocol (ODMRP)
[15], forwarding group multicast protocol (FGMP)[16], ad-hoc multicast
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
118
routing (AMRoute) [17], multicast core extraction distributed ad-hoc
routing (MCEDAR) [18] and differential destination multicast (DDM)
[19]. Most of these multicast routing protocols are primarily based on
flavors of distance-vector or link-state routing plus additional
functionalities to assist the routing operations in particular ways. The
goals of all these protocols include minimizing control overhead,
minimizing processing overhead, maximizing multi-hop routing
capability, maintaining dynamic topology and preventing loops in the
networks etc.
However, many multicast routing protocols do not perform well in
MANETs because in a highly dynamic environment, nodes move
arbitrarily, thus network topology changes frequently and
unpredictably. Moreover, bandwidth and battery power are limited.
These constraints in combination with the dynamic network topology
make multicasting routing protocol designing for MANETs extremely
challenging.
Taxonomy of multicast routing protocols
To compare and analyze multicast routing protocols, appropriate
classification methods are important. Classification methods help
researchers and designers to understand the distinct characteristics of
different multicast routing protocols and find out the internal
relationship among them. Therefore, we present protocol characteristics
which are used to group and compare different approaches. These
characteristics are mainly related to the information which is exploited
for MANETs and the roles which nodes may take in the multicast
routing process.
1. Tree, mesh and hybrid multicast routing protocols
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
119
One of the most popular methods to classify multicast routing protocols
for MANETs is based on how distribution paths among group members
are constructed. According to this method, existing multicast routing
approaches for MANETs can be divided into tree-based multicast
protocols, mesh-based multicast protocols and hybrid multicast
protocols. Tree-based multicast routing protocols can be further divided
into source-rooted and core-rooted schemes according to the roots of
the multicast trees. In a source-rooted tree-based multicast routing
protocol, source nodes are roots of multicast trees and execute
algorithms for distribution tree construction and maintenance. This
requires a source to be aware of the topology information and addresses
of all its receivers in the multicast group.Therefore, source-rooted tree-
based multicast routing protocols suffer from high traffic overhead
when used for dynamic networks. AM Route is an example for source-
rooted tree multicast routing protocol.
In a core-rooted tree multicast routing protocol, cores are nodes with
special functions such as multicast data distribution and membership
management. Some core-rooted multicast routing protocols utilize tree
structures. But unlike source-rooted tree-based multicast routing,
multicast trees are only rooted at core nodes. For different source-
rooted multicast routing protocols, core nodes may perform various
routing and management functions. Shared Tree Ad-hoc Multicast
Protocol (STAMP)[20]and Adaptive Core-based Multicast Routing
protocol (ACMP) [21] are core-based multicast routing protocols
proposed for MANETs.
Tree-based protocols provide high data forwarding efficiency at the
expense of low robustness. Their advantage is their simplicity. Their
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
120
disadvantage is that until the tree is reconstructed after movement of a
node, packets possibly have to be dropped.
In a mesh-based multicast routing protocol, packets are distributed
along mesh structures that are a set of interconnected nodes. Route
discovery and mesh building are accomplished in two ways: by using
broadcasting to discover routes or by using core or central points for
mesh building. Mesh-based protocols perform better in high mobility
situation as they provide redundant paths from source to destinations
while forwarding data packets. However, mesh-based approaches
sacrifice multicast efficiency in comparison to tree-based
approach. Mesh-based Multicast Routing Protocol with Consolidated
Query Packets (CQMP)[22],Enhanced On-Demand Multicast Routing
Protocol (E-ODMRP)[23]and Bandwidth Optimized and Delay Sensitive
(BODS) [24]are the mesh-based multicast routing protocols
proposed for MANETs.Hybrid-based multicast routing protocols
combine the advantages of both tree and mesh-based approaches.
Hence, hybrid protocols address both efficiency and robustness. Using
this scheme, it is possible to get multiple routing paths, and duplicate
messages can reach a receiver through different paths. However, they
may create non-optimal trees with nodes mobility. Efficient Hybrid
Multicast Routing Protocol (EHMRP) [25] is an instance for hybrid-based
multicast routing protocol.
2. Proactive and reactive multicast routing protocols
Another classification method is based on how routing information is
acquired and maintained by mobile nodes. Using this method, multicast
routing protocols can be divided into proactive routing and reactive
routing. A proactive multicast routing protocol is called ‘‘table-driven”
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
121
multicast routing protocol. In a network utilizing a proactive routing
protocol, every node maintains one or more tables representing the
entire topology of the network. These tables are updated regularly in
order to maintain up-to-date routing information from each node to
every other node. To maintain up-to-date routing information, topology
information needs to be exchanged between the nodes on a regular
basis, leading to relatively high overhead on the network. On the other
hand, routes will always be available on request. There are some typical
proactive multicast routing protocols, such as CAMP, LGT and AMRIS.
A reactive multicast routing protocol is also called ‘‘on-demand”
multicast routing protocol. Reactive protocols seek to set up routes on-
demand. If a node wants to initiate communication with a node to
which it has no route, the routing protocol will try to establish such a
route. Reactive multicast routing protocols have better scalability than
proactive multicast routing protocols. However, when using reactive
multicast routing protocols, source nodes may suffer from long delays
for route searching before they can forward data packets. ACMP and
CQMP are examples for reactive routing protocols for MANETs.
3. Evaluating capacity, architecture and location for multicast
routing protocols
Most of the multicast routing protocols assume a physically flat network
architecture with mobile nodes having homogeneous capability in terms
of network resources and computing power. In practice however, this
assumption may not often hold since there exist various types of mobile
nodes with different roles, capacities and mobility patterns. In an
architecture-based multicast routing protocol, MANETs have physically
hierarchical architectures, which are formed by different types of mobile
nodes. For example, Hierarchical QoS Multicast Routing Protocol
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
122
(HQMRP) for MANETs builds a multicast structure at each level of the
hierarchy for efficient and scalable multicast message delivery. Self-
Organizing Map (SOM) is also a typical hierarchical architecture, which
provides a way for automatically organizing the hierarchical
architecture. In location-based multicast routing protocols, the
availability of a Global Positioning System (GPS), Bluetooth or other
locations systems easily gets geographical information of mobile nodes
when needed . Each node determines its own location through the use
of GPS or some other type of positioning service. A location service is
used by the sender of a packet to determine the location of the
destination. The routing decision at each forwarding node is then based
on the location information of its neighbors and the destination nodes
Location-based Geocasting and Forwarding (LGF), LGT and Scalable
Position-Based Multicast(SPBM) protocol are typical location-based
multicast routing protocols for MANETs.
4. Quality of service
Another protocol classification is based on metrics used for multicast
routing construction as criteria for MANETs. Most of conventional
multicast routing protocols are designed for minimizing data traffic in
the network or minimizing the average hops for delivery a packet. When
Quality of Service (QoS) is considered, some protocols may be
unsatisfactory or impractical due to the lack of resources, the excessive
computation overhead, and the lack of knowledge about the global
network state or the excessive message processing overhead. However,
some multicast routing protocols, such as LGT,AMRIS and CAMP are
designed without explicitly considering QoS. QoS multicast routing not
only requires finding a route from a source to a destination, but
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
123
satisfying the end-to-end QoS requirement, often given in terms of
bandwidth or delay. QoS is more difficult to guarantee in MANETs than
in other types of networks, because the wireless bandwidth is shared
among adjacent nodes and the network topology changes as the nodes
move. This requires extensive collaboration between the nodes, both to
establish the routes and to secure the resources necessary to provide
the QoS. With the extensive applications of MANETs in many
domains, the appropriate QoS metrics should be used, such as
bandwidth, delay, packet loss rate and cost for multicast routing
protocols. Therefore,QoS multicasting routing protocols face the
challenge of delivering data to destinations through multi-hop routes in
the presence of node movements and topology changes. Multicast Core
Extraction Distributed Ad-hoc Routing (MCEDAR) is an example for
QoS-based multicast routing protocols for MANETs.
5. Energy efficiency
Because MANETs are a set of nodes that agree upon forming a
spontaneous, temporary network with the lack of any centralized
administration, any form of infrastructure and nodes are typically
powered by batteries with a limited energy supply, each node ceases its
function when the battery exhausts. Therefore, given the energy
constraints placed on the network’s nodes, designing energy-efficient
multicast routing protocols is an important issue for MANETs,
maximizing the lifetime of its nodes and thus of the network itself.
Minimum Weight Incremental Arborescence (MWIA), RB-MIDP and D-
MIDP are examples for energy-efficient multicast routing.
6. Network coding
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
124
The advent of the notion of coding at the packet level, commonly called
network coding, change many aspects of networking. Given a network
with capacity constraints on links, one problem of designing multicast
routing protocols is to maximum the multicast throughput between a
source node and a set of receivers. The main advantage of using
network coding can be seen in multicast scenarios. Network coding
enables better resource utilization and achieves the max-flow which is
the theoretical upper bound of network resource utilization, by allowing
a network node, such as a router to encode its received data before
forwarding it. Each node implementing the network coding function,
receives information from all the input links, encodes it and sends the
encoded information to all output links. The coded-network lends itself,
for multicast connections, to a cost optimization which not only
outperforms traditional routing tree-based approaches, but also lends
itself to a distributed implementation and to a dynamic implementation
when changing conditions, such as mobility.
7. Reliable multicast routing protocols
In ad-hoc environments, every link is wireless and every node is mobile.
Those features make data loss easy as well as multicasting inefficient
and unreliable. Reliable multicast routing protocol becomes a very
challenging research problem for MANETs. The design of reliable
multicasting depends on the following three decisions: (1) by whom
errors are detected; (2) how error messages are signaled and (3) how
missing packets are retransmitted.
In the sender-initiated approach, the sender is responsible for the error
detection. Error messages are signaled using ACK signals sent from
each receiver. A missing piece of data at a receiver is detected if the
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
125
sender does not receive an ACK from the receiver. In this case, the need
to retransmit a missing packet is handled by retransmitting the missing
data from the source through a unicast. When several receivers have
missing packets, the sender may decide to re-multicast the missing
packets to all receivers in the multicast group. In the receiver-initiated
approach, each receiver is responsible for error detection. Instead of
acknowledging each multicast packet, each receiver sends a NACK once
it detects a missing packet. If multicast packets are time-stamped using
a sequence number, a missing packet can be detected by a gap between
sequence numbers of the receiving packets. When the sender-initiated
approach is applied, only the sender is responsible for retransmitting
the missing packet, and the corresponding retransmitting method is
called an sender-oriented. Note that when the sender receives ACK
signals from all the receivers, the corresponding packet can be removed
from the history. There are three ways to retransmit the missing packet
when the receiver-initiated approach is used: (1) sender-oriented, (2)
neighborhood-oriented, and (3) fixed-neighborhood-oriented.
8. Overlay multicast routing protocols
In most protocols, both group members and non-members on a
tree/mesh link must maintain the multicast states to forward data
packets. Thus, multicast protocols must detect and restore link
failure,which can be a result of migrations by non-group members as
well as group members. As a result,many control messages are issued
to repair broken links. To provide data forwarding without involvement
of non-group members and to constrain the protocol states on group
members, overlay multicast protocols for MANETs enhance the packet
delivery ratio by reducing the number of reconfigurations caused by
non-group members’ unexpected migration in a tree or mesh structure.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
126
The advantages of overlay multicast come at the cost of low efficiency of
packet delivery and long delay. However, when constructing the virtual
infrastructure, it is very hard to prevent different unicast tunnels from
sharing physical links, which results in redundant traffic on the
physical links. Overlay multicast based on heterogeneous forwarding
(OMHF)[42]is an example for overlay multicast routing protocols for
MANETs.
9. Single and multiple source multicast routing protocols
A multicast group may contain multiple sources due to different kinds
of services or applications simultaneously provided by the networks.
Each single source multicast routing protocol induces a lot of overhead
and thus wastes tremendous network resources in multi-source
multicast environments. In multiple source multicast routing protocols,
using the clustering technique, a large network can be divided into
several sub-networks with only a few cluster heads needing to maintain
local information, thus preventing flooding of useless packets and
avoiding wasting bandwidth. To achieve efficient multicasting in a
multi-source multicast environment, the clustering technique is
employed to design an efficient multicast routing protocol for multi-
source multicasting. Multiple source routing is essential for load
balancing and offering quality of service. Other benefits of multiple
source routing include: the reduction of computing time that routers’
CPUs require, high resilience to path breaks, high call acceptance ratio
(in voice applications) and better security. +
Performance criteria
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
127
Many multicast routing protocols are proposed for MANETs based on
different design points of view to meet specific requirements from
different application domains. There are tree different ways to evaluate
and compare the performance of multicast routing protocols for
MANETs:
1. The first one is based on user parameters and configurations,
such as the average multicast degree,the control overhead, the
average delay, the throughput and the multicast service cost .
2. The second way is comparing different multicast routing
updating methods. Multicast routing update can be done in one
of three ways: (a) Store and update: store the information in a
routing table and update it by listening to routing messages. (b)
Delete all and refresh: discard all old routes (timeout) and
start over and (c) Unicast protocol support: use the services of a
separate unicast routing protocol for route updating. In another
method, the performance of multicast routing protocols is
evaluated with different simulation tools, such as NS-2, Opnet,
Matlab.
3.With the popularity of MANETs and considering the dynamic
network features of MANETs, integrated criteria for evaluating
performance of MANETs multicast routing protocols should be
proposed to meet the different mobile application requirements
in different environments and different design targets .
Conclusion
A MANET consists of dynamic collections of low power nodes with
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
128
quickly changing multi-hop topologies that usually composed of
relatively low bandwidth wireless link. These constraints make
multicasting in MANETs challenging. General solutions to solve these
problems are to avoid global flooding and advertising, construction of
routes on demand, and dynamically maintain memberships, etc.
Multicasting can efficiently support a wide variety of applications that
are characterized by a close degree of collaboration, typical for many
MANETs. The design of the multicast routing protocols for MANETs are
driven by specific goals and requirements based on respective
assumptions about the network properties or application area.All
protocols have their own advantages and disadvantages. Some
constructs multicast trees to reduce end-to-end latency while others
build mesh to ensure robustness. Some protocols create overlay
networks and use unicast routing to forward packets. Energy-aware
multicast protocols optimize either total energy consumption or system
lifetime of the multi-cast tree.
Future work
As mentioned earlier, research in the area of multicast over MANETs is
far from exhaustive. Much of the effort so far has been on devising
routing protocols to support effective and efficient communication
between nodes that are part of a multicast group. It is really difficult to
design a multicast routing protocol considering all the above mentioned
issues. Still, there are still many topics that deserve further
investigation:
1. Scalability. This issue is not only related to multicast in
MANETs but also with the ad-hoc itself.A multicast routing
protocol is scalable with respect to some constraints posed by
MANETs.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
129
2. Address configuration. In ad-hoc environments, a different
addressing approach may be required. Special care must be
taken so that other groups do not reuse a multicast address
used by a group at the same time. Node movement and network
partitioning makes this task of synchronizing multicast
addresses in a MANET really difficult.
3. Multicast service support. The multicast protocol defines
conditions for joining/leaving groups,multicast participants
should be able to join or leave groups at will. On the other hand,
service providers can be convinced to support multicast
protocols.
4. Security. How can the network secure itself from malicious or
compromised nodes? Due to the broadcast nature of the
wireless medium security provisioning becomes more difficult.
Further research is needed to investigate how to stop an
intruder from joining an ongoing multicast session or stop a
node from receiving packets from other sessions.
5. Traffic control. Both source and core-based approaches
concentrate traffic on a single node. In stateless multicast group
membership is controlled by the source, which leads to the
vulnerability of multicast protocols for MANETs. Still need to be
investigated is how to efficiently distribute traffic from a central
node to other member nodes for MANETs.
6. QoS. QoS defines a guarantee given by the network to satisfy a
set of predetermined service performance constraints for the
user in terms of end-to-end delay, jitter, and available
bandwidth. Therefore, multicast routing protocols must be
feasible for all kinds of constrained multicast applications to run
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
130
well in a MANET.However, it is a significant technical challenge
to define a comprehensive framework for QoS support, due to
dynamic topology, distributed management and multi-hop
connections for MANETs.
7. Power control. For power-constrained wireless networks, a
crucial issue in routing and multicasting is to conserve as much
power as possible while still achieving good throughput
performance.
8. Multiple sources. Most of the existing multicast routing
protocols in ad-hoc networks are designed for single source
multicasting. However, a multicast group may contain multiple
sources due to different kinds of services or applications
simultaneously provided by the networks. Each single source
multicast routing protocol induces a lot of overhead and thus
wastes tremendous network resources in a multi-source
multicast environment.
References
[1] D.P. Agrawal, Q.A. Zeng, Introduction to wireless and mobile systems, Brooks/Cole, 2003.
[2] Luo Junhai, Ye Danxia, et al., Research on topology
discovery for IPv6 networks, IEEE, SNPD 2007 3 (2007) 804–809.
[3] S. Toumpis, Wireless ad-hoc networks, in: Vienna Sarnoff Symposium, Telecommunications Research Center, April2004. Available
from:http://www.eng.ucy.ac.cy/toumpis/publications/sarnoff04.pdf.
[4] O. Tariq, F. Greg, W. Murray, On the effect of traffic model to the performance evaluation of multicast protocols in MANET,
Canadian Conference on Electrical and ComputerEngineering
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
131
(2005) 404–407.
[5] X. Chen, J. Wu, Multicasting Techniques in Mobile Ad-hocNetworks, Computer Science Department,SouthWest
Texas State University, San Marcos.
[6] T.A. Dewan, Multicasting in Ad-hoc Networks, University of
Ottawa, 2005, pp. 1–9.
[7] M.C.C. De, H. Gossain, D.P. Agrawal, Multicast over
wireless mobile ad-hoc networks: present and future directions, IEEE Network (2003) 52–59.
[8] Z.C. Huang, C.C. Shen, A comparison study of omnidirectional and directional MAC protocols for ad-hoc
networks,IEEE Global Telecommunications Conference (2002) 57–61.
[9] X. Chen, J. Wu, Multicasting techniques in mobile ad- hoc networks, The Handbook of Ad-hoc Wireless Networks (2003) 25–40.
[10] C.W. Wu, Y.C. Tay, C.K. Toh, Ad-hoc Multicast Routing
Protocol Utilizing Increasing Id-numbers (AMRIS) Functional Specification, Internet draft, November 1998.
[11] E.M. Royer, C.E. Perkins, Multicast operation of the ad-hoc on-demand distance-vector routing protocol, ACM MOBI-COM (1999) 207–218. August.
[12] L. Ji, M.S. Corson, A lightweight adaptive multicast
algorithm, GLOBECOM (1998) 1036–1042.
[13] J.J. Garcia-Luna-Aceves, E.L. Madruga, The core-
assisted mesh protocol, IEEE JSAC (1999) 1380–1394. August.
[14] K. Chen, K. Nahrstedt, Effiective location-guided tree construction algorithms for small group multicast in MANET,
Proceedings of the INFOCOM (2002) 1180– 1189.
[15] M. Gerla, S.J. Lee, W. Su, On-Demand Multicast Routing
Protocol (ODMRP) for Ad-hoc Networks, Internet draft,draft-ietf-manet-odmrp-02.txt, 2000.
[16] C.C. Chiang, M. Gerla, L. Zhang, Forwarding group multicast protocol (FGMP) for multi-hop, Mobile Wireless Networks, AJ. Cluster Comp, Special Issue on Mobile
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
132
Computing, vol. 1 (2), 1998, pp. 187–196.
[17] E. Bommaiah et al., AMRoute: Ad-hoc Multicast Routing Protocol, Internet draft, August 1998.
[18] P. Sinha, R. Sivakumar, V. Bharghavan, MCEDAR: multicast core-extraction distributed ad-hoc routing, in:
IEEE Wireless Communications and Networking Conference,September 1999, pp. 1313–1317.
[19] L. Ji, M.S. Corson, Differential destination multicast-a MANET multicast routing protocol for small groups,
Proc.INFOCOM (2001) 1192–1201.
[20] L. Canourgues, J. Lephay, Soyer, et al., STAMP: shared-
tree ad-hoc multicast protocol, MILCOM 2006 (2006) 1–7,October.
[21] B. Kaliaperumal, A. Ebenezer, Jeyakumar, Adaptive core-based scalable multicasting networks, INDICON, 2005Annual IEEE (2005) 198–202, December.
[22] H. Dhillon, H.Q. Ngo, CQMP: a mesh-based multicast
routing protocol with consolidated query packets, in: IEEE Wireless Communications and Networking Conference,WCNC 2005, pp. 2168–2174.
[23] Y.Oh. Soon, J.S. Park, M. Gerla, E-ODMRP: enhanced ODMRP with motion adaptive refresh, in: ISWCS 2005 –
Conference Proceedings, 2005, pp. 130–134.
[24] E.R. Inn, K.G.S. Winston, Distributed steiner-like multicast path setup for mesh-based multicast routing in ad-hoc networks, in: Proceedings-IEEE International Conference
on Sensor Networks, Ubiquitous and Trustworthy Computing, TIME 2006, pp. 192–197.
[25] J. Biswas, M. Barai, S.K. Nandy, Efficient hybrid multicastrouting protocol for ad-hoc wireless networks, local
computer networks, in: 29th Annual IEEE International Conference on November 2004, pp. 180–187.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
133
Security Approaches on Progressive Authentication Method Accessing
Multiple Information in Mobile Devices
Santhosh Kumar
Asst. Proffessor, GFGC, Hunsur
Abstract
Mobile device security has become increasingly important in mobile
computing. It is of particular concern as it relates to the security of
personal information now stored on smart-phones. Mobile devices face
an array of threats that take advantage of numerous vulnerabilities
commonly found in such devices. These vulnerabilities can be the result
of inadequate technical controls, but they can also result from the poor
security issues. Security controls are not always consistently
implemented on mobile devices. The importance of enabling security
controls on mobile devices and adopting to implement the security
approach towards authentication is the subject intended to the research
of this paper. Mobile users are often faced with a trade-off between
security and convenience. Either users do not use any security lock and
risk compromising their data, or they use security locks but then have
to inconveniently authenticate every time they use the device. Rather
than exploring a new authentication scheme, we address the problem of
deciding when to surface authentication and for which applications. We
believe reducing the number of times a user is requested to
authenticate lowers the barrier of entry for users who currently do not
use any security. Progressive authentication, the approach we propose,
combines multiple signals (biometric, continuity, possession) to
determine a level of confidence in a user’s authenticity. Based on this
confidence level and the degree of protection the user has configured for
his applications, the system determines whether access to them
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
134
requires authentication. Thus representing an attractive solution for
users who do not use any security mechanism on their devices.
------------------------ ---------------------------
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
135
1. INTRODUCTION
1.1 Mobile devices often do not have passwords enabled. Mobile devices
often lack passwords to authenticate users and control access to data
stored on the devices. Many devices have the technical capability to
support passwords, personal identification numbers (PIN), or pattern
screen locks for authentication. Some mobile devices also include a
biometric reader to scan a fingerprint for authentication. Additionally, if
users do use a password or PIN they often choose passwords or PINs
that can be easily determined or bypassed, such as 1234 or 0000.
Without passwords or PINs to lock the device, there is increased risk
that stolen or lost phones' information could be accessed by
unauthorized users who could view sensitive information and misuse
mobile devices[15].
1.2 Two-factor authentication is not always used when conducting
sensitive transactions on mobile devices. According to studies, users
generally use static passwords instead of two-factor authentication
when conducting online sensitive transactions while using mobile
devices. Using static passwords for authentication has security
drawbacks: passwords can be guessed, forgotten, written down and
stolen, or eavesdropped. Two-factor authentication generally provides a
higher level of security than traditional passwords and PINs, and this
higher level may be important for sensitive transactions. Two-factor
refers to an authentication system in which users are required to
authenticate using at least two different "factors" — something you
know, something you have, or something you are — before being
granted access. Mobile devices can be used as a second factor in some
two-factor authentication schemes. The mobile device can generate pass
codes, or the codes can be sent via a text message to the phone.
Without two-factor authentication, increased risk exists that
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
136
unauthorized users could gain access to sensitive information and
misuse mobile devices.
1.3 Wireless transmissions are not always encrypted. Information such
as e-mails sent by a mobile device is usually not encrypted while in
transit. In addition, many applications do not encrypt the data they
transmit and receive over the network, making it easy for the data to be
intercepted. For example, if an application is transmitting data over an
unencrypted WiFi network using http (rather than secure http), the
data can be easily intercepted. When a wireless transmission is not
encrypted, data can be easily intercepted.
1.4 Mobile devices may contain malware. Users may download
applications that contain malware. Users download malware
unknowingly because it can be disguised as a game, security patch,
utility, or other useful application. It is difficult for users to tell the
difference between a legitimate application and one containing malware.
For example, an application could be repackaged with malware and a
consumer could inadvertently download it onto a mobile device. The
data can be easily intercepted. When a wireless transmission is not
encrypted, data can be easily intercepted by eavesdroppers, who may
gain unauthorized access to sensitive information.
1.5 Mobile devices often do not use security software. Many mobile
devices do not come preinstalled with security software to protect
against malicious applications, spyware, and malware-based attacks.
Further, users do not always install security software, in part because
mobile devices often do not come preloaded with such software. While
such software may slow operations and affect battery life on some
mobile devices[28,29], without it, the risk may be increased that an
attacker could successfully distribute malware such as viruses, Trojans,
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
137
spyware, and spam to lure users into revealing passwords or other
confidential information.
1.6 Operating systems may be out-of-date. Security patches or fixes for
mobile devices' operating systems are not always installed on mobile
devices in a timely manner. It can take weeks to months before security
updates are provided to users' devices. Depending on the nature of the
vulnerability, the patching process may be complex and involve many
parties. For example, Google develops updates to fix security
vulnerabilities in the Android OS, but it is up to device manufacturers
to produce a device-specific update incorporating the vulnerability fix,
which can take time if there are proprietary modifications to the device's
software. Once a manufacturer produces an update, it is up to each
carrier to test it and transmit the updates to users' devices. However,
carriers can be delayed in providing the updates because they need time
to test whether they interfere with other aspects of the device or the
software installed on it.
In addition, mobile devices that are older than two years may not
receive security updates because manufacturers may no longer support
these devices. Many manufacturers stop supporting smart-phones as
soon as 12 to 18 months after their release. Such devices may face
increased risk if manufacturers do not develop patches for newly
discovered vulnerabilities.
1.7 Software on mobile devices may be out-of-date. Security patches for
third-party applications are not always developed and released in a
timely manner. In addition, mobile third-party applications, including
web browsers, do not always notify users when updates are available.
Unlike traditional web browsers, mobile browsers rarely get updates.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
138
Using outdated software increases the risk that an attacker may exploit
vulnerabilities associated with these devices.
1.8 Mobile devices often do not limit Internet connections. Many mobile
devices do not have firewalls to limit connections. When the device is
connected to a wide area network it uses communications ports to
connect with other devices and the Internet. A hacker could access the
mobile device through a port that is not secured. A firewall secures
these ports and allows the user to choose what connections he wants to
allow into the mobile device. Without a firewall, the mobile device may
be open to intrusion through an unsecured communications port, and
an intruder may be able to obtain sensitive information on the device
and misuse it.
1.9 Mobile devices may have unauthorized modifications. The process of
modifying a mobile device to remove its limitations so users can add
features (known as "jail-breaking" or "rooting") changes how security for
the device is managed and could increase security risks. Jail-breaking
allows users to gain access to the operating system of a device so as to
permit the installation of unauthorized software functions and
applications and/or to not be tied to a particular wireless carrier. While
some users may jailbreak or root their mobile devices specifically to
install security enhancements such as firewalls, others may simply be
looking for a less expensive or easier way to install desirable
applications. In the latter case, users face increased security risks,
because they are bypassing the application vetting process established
by the manufacturer and thus have less protection against
inadvertently installing malware. Further, jail-broken devices may not
receive notifications of security updates from the manufacturer and
may require extra effort from the user to maintain up-to-date software.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
139
1.10 Communication channels may be poorly secured. Having
communication channels, such as Bluetooth communications, "open" or
in "discovery" mode (which allows the device to be seen by other
Bluetooth-enabled devices so that connections can be made) could allow
an attacker to install malware through that connection, or
surreptitiously activate a microphone or camera to eavesdrop on the
user. In addition, using unsecured public wireless Internet networks or
WiFi spots could allow an attacker to connect to the device and view
sensitive information.
2. Present method of Accessing Mobile Device using different
authentication techniques
2.1 Multi-level authentication
Multi-level authentication has been considered before. As in progressive
authentication, data and applications are categorized in different levels
of authorization, variously called “hats” [23], “usage profiles” [13],
“spheres” [21], “security levels” [4], or ”sensitive files” [25]. With the
exception of Treasure-Phone [21] and MULE [25], most of this work has
been conceptual, with no actual implementation. Treasure-Phone
divides applications into multiple access spheres and switches from one
sphere to another using the user’s location, a personal token, or
physical “actions” (e.g., locking the home door would switch from the
“Home” to the “Closed” sphere). However, these sphere switching criteria
have flaws. First, location is rather unreliable and inaccurate, and when
used in isolation, it is difficult to choose the appropriate sphere (e.g.,
being alone at home is different than being at home during a party).
Second, the concept of personal tokens requires users to carry more
devices. Third, monitoring physical “actions” assumes that the device
can sense changes in the physical infrastructure, something that is not
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
140
yet viable. Conversely, progressive authentication enables automatic
switching among the multiple levels of authentication by relying on
higher-accuracy, simpler and more widely available multi-modal
sensory information.
MULE proposes to encrypt sensitive files stored in laptops based
on their location: if the laptop is not at work or at home, these files are
encrypted. Location information is provided by a trusted location device
that is contacted by the laptop in the process of regenerating decryption
keys. Progressive authentication protects applications, not files, and it
uses multiple authentication factors, unlike MULE, which uses location
exclusively.
2.2 Automatic authentication
Other forms of automatic authentication use a single authentication
factor such as proximity [6, 7, 12], behavioral patterns [22], and
biometrics, such as typing patterns [1,16], hand motion and button
presses [3]. Most of these techniques are limited to desktop computers,
laptops or specific devices (e.g., televisions [3]). The closest to our work
is Implicit Authentication [22], which records a user’s routine tasks
such as going to work or calling friends, and builds a profile for each
user. Whenever deviations from the profile are detected, the user is
required to explicitly authenticate. Progressive authentication differs
from this work in that it uses more sensory information to enable real-
time, finer granularity modeling of the device’s authentication state. On
the other hand, any of that proximity, behavioral and biometric patterns
could be plugged into our system. Transient authentication [6, 7]
requires the user to wear a small token and authenticate with it from
time to time. This token is used as a proximity cue to automate laptop
authentication. This approach requires the user to carry and
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
141
authenticate with an extra token, but its proximity-based approach is
relevant to our work in that it also leverages nearby user-owned devices
(i.e., the tokens) as authentication signals.
2.3 Mobile device authentication
The design of more intuitive and less cumbersome authentication
schemes has been a popular research topic. Current approaches can be
roughly classified into knowledge-based, multi-factor, and biometric
authentication techniques. All three are orthogonal to progressive
authentication. Our goal is not to provide a new “explicit”
authentication mechanism, but instead to increase the usability of
current mechanisms by reducing the frequency at which the user must
authenticate. When explicit authentication is required, any of these
techniques can be used. Knowledge-based approaches assume that a
secret (e.g., a PIN) is shared between the user and the device, and must
be provided every time the device is used. Due to the limited size of
phone screens and on-screen keyboards, this can be a tedious process
[5], especially when it is repeated multiple times per day. In multi-factor
authentication, more than one type of evidence is required. For
instance, two-factor authentication [2,20,24] requires a PIN and secured
element such as a credit card or USB dongle. This practice presents
major usability issues, as the need to carry a token such as Secure-ID
[20] goes against the user’s desire to carry fewer devices. Biometric
schemes [5, 10, 17] leverage biometrics [11] or their combinations [8, 9],
such as face recognition and fingerprints, to authenticate the user with
high accuracy. Even though very secure, biometric identification comes
with acceptability, cost and privacy concerns [17], and is especially
cumbersome on small devices.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
142
3. Proposed Progressive Authentication Method accessing multiple
information in Mobile Devices
3.1 Enable user authentication: Devices can be configured to require
passwords or PINs to gain access. In addition, the password field can be
masked to prevent it from being observed, and the devices can activate
idle-time screen locking to prevent unauthorized access.
3.2 Enable two-factor authentication for sensitive transactions: Two-
factor authentication can be used when conducting sensitive
transactions on mobile devices. Two-factor authentication provides a
higher level of security than traditional passwords. Two-factor refers to
an authentication system in which users are required to authenticate
using at least two different "factors" — something you know, something
you have, or something you are — before being granted access. Mobile
devices themselves can be used as a second factor in some two-factor
authentication schemes used for remote access. The mobile device can
generate pass codes, or the codes can be sent via a text message to the
phone. Two-factor authentication may be important when sensitive
transactions occur, such as for mobile banking or conducting financial
transactions.
3.3 Verify the authenticity of downloaded applications: Procedures can
be implemented for assessing the digital signatures of downloaded
applications to ensure that they have not been tampered with.
3.4 Install antimalware capability: Antimalware protection can be
installed to protect against malicious applications, viruses, spyware,
infected secure digital cards, and malware-based attacks. In addition,
such capabilities can protect against unwanted (spam) voice messages,
text messages, and e-mail attachments.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
143
3.5 Install a firewall: A personal firewall can protect against
unauthorized connections by intercepting both incoming and outgoing
connection attempts and blocking or permitting them based on a list of
rules.
3.6 Install security updates: Software updates can be automatically
transferred from the manufacturer or carrier directly to a mobile device.
Procedures can be implemented to ensure these updates are
transmitted promptly.
3.7 Remotely disable lost or stolen devices: Remote disabling is a
feature for lost or stolen devices that either locks the device or
completely erases its contents remotely. Locked devices can be
unlocked subsequently by the user if they are recovered.
3.8 Enable encryption for data stored on device or memory card: File
encryption protects sensitive data stored on mobile devices and memory
cards. Devices can have built-in encryption capabilities or use
commercially available encryption tools.
3.9 Enable white-listing: White-listing is a software control that permits
only known safe applications to execute commands.
3.10 Establish a mobile device security policy: Security policies define
the rules, principles, and practices that determine how an organization
treats mobile devices, whether they are issued by the organization or
owned by individuals. Policies should cover areas such as roles and
responsibilities, infrastructure security, device security, and security
assessments. By establishing policies that address these areas,
agencies can create a framework for applying practices, tools, and
training to help support the security of wireless networks.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
144
3.11 Provide mobile device security training: Training employees in an
organization's mobile security policies can help to ensure that mobile
devices are configured, operated, and used in a secure and appropriate
manner.
3.12 Establish a deployment plan: Following a well-designed
deployment plan helps to ensure that security objectives are met.
3.13 Perform risk assessments: Risk analysis identifies vulnerabilities
and threats, enumerates potential attacks, assesses their likelihood of
success, and estimates the potential damage from successful attacks on
mobile devices.
3.14 Perform configuration control and management: Configuration
management ensures that mobile devices are protected against the
introduction of improper modifications before, during, and after
deployment [21].
CONCLUSION
Connecting to an unsecured Mobile devices network could let
attacker access personal information from a device, putting users at
risk for data and identity theft. One type of attack that exploits the
mobile devices network is known as man-in-the-middle, where an
attacker inserts himself in the middle of the communication stream and
steals information. Progressive authentication areas such as mutli-level
authentication systems, context-based and automatic authentication,
and mobile device primary authentication are in general methods of
accessing the mobile device. The important insight of our research is to
combine multiple authentication signals to determine the user’s level of
authenticity, and surface authentication only when this level is too low
for the content being requested. Overall in general progressive
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
145
authentication offers a new point in the design of mobile authentication
and provides users with more options in balancing the security and
convenience of their mobile devices.
References
[1] Bergadano, F., Gunetti, D., and Picardi, C. User authentication through keystroke dynamics. ACM Trans. Inf. Syst. Secur. 5 (November
2002), 367–397.
[2] C.G.Hocking, S.M.Furnell, N.L.Clarke, and P.L.Reynolds. A distributed and cooperative user authentication framework. In Proc. of IAS ’10 (August 2010), pp. 304–310.
[3] Chang, K.-H., Hightower, J., and Kveton, B. Inferring identity using accelerometers in television remote controls. In Proc. of Pervasive ’09
(2009), pp. 151–167.
[4] Clarke, N., Karatzouni, S., and Furnell, S. Towards a Flexible, Multi-
Level Security Framework for Mobile Devices. In Proc. of the 10th Security Conference (May 4–6 2011).
[5] Clarke, N. L., and Furnell, S. M. Authentication of users on mobile telephones - A survey of attitudes and practices. Computers and
Security 24, 7 (Oct. 2005), 519–527.
[6] Corner, M. D., and Noble, B. Protecting applications with transient
authentication. In Proc. of MobiSys ’03 (2003), USENIX.
[7] Corner, M. D., and Noble, B. D. Zero-interaction authentication. In Proc. of MobiCom ’02 (2002), ACM, pp. 1–11.
[8] Greenstadt, R., and Beal, J. Cognitive security for personal devices. In Proc. of the 1st ACM workshop on AISec (2008), ACM,pp. 27–30.
[9] Hong, L., and Jain, A. Integrating faces and fingerprints for personal identification. IEEE Trans. Pattern Anal. Mach. Intell. 20 (December
1998), 1295–1307.
[10] Jain, A., Bolle, R., and Pankanti, S. Biometrics: Personal
Identification in a Networked Society. Kluwer Academic Publ., 1999.
[11] Jain, A., Hong, L., and Pankanti, S. Biometric identification.
Commun. ACM 43 (February 2000), 90–98.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
146
[12] Kalamandeen, A., Scannell, A., de Lara, E., Sheth, A., and LaMarca, A. Ensemble: cooperative proximity-based authentication. In
Proc. of MobiSys ’10 (2010), pp. 331–344.
[13] Karlson, A. K., Brush, A. B., and Schechter, S. Can I borrow your phone?: Understanding concerns when sharing mobile phones. In Proc. of CHI ’09 (2009), ACM, pp. 1647–1650.
[14] Lu, H., Brush, A. J. B., Priyantha, B., Karlson, A. K., and Liu, J. SpeakerSense: Energy E_cient Unobtrusive Speaker Identification on
Mobile Phones. In Proc. of Pervasive 2011 (June 12-15 2011), pp. 188–205.
[15] Mobile wallet worked to UK shoppers. http://www.bbc.co.uk/news/technology-13457071.
[16] Nisenson, M., Yariv, I., El-Yaniv, R., and Meir, R. Towards behaviometric security systems: Learning to identify a typist. In Proc. of
PKDD ’03 (2003), Springer, pp. 363–374.
17-[27] Prabhakar, S., Pankanti, S., and Jain, A. K. Biometric
recognition: Security and privacy concerns. IEEE Security and Privacy 1 (2003), 33–42.
[18] Priyantha, B., Lymberopoulos, D., and Liu, J. LittleRock: Enabing Energy E_cient Continuous Sensing on Moble Phones. Tech. Rep. MSR-
TR-2010-14, Microsoft Research, February 18, 2010.
[19] Priyantha, B., Lymberopoulos, D., and Liu, J. LittleRock: Enabling
Energy-E_cient Continuous Sensing on Mobile Phones. IEEE Pervasive Computing 10 (2011), 12–15.
[20] RSA SecurID. http://www.rsa.com/node.aspx?id=1156.
[21] Seifert, J., De Luca, A., Conradi, B., and Hussmann, H. TreasurePhone: Context-Sensitive User Data Protection on Mobile Phones. In Proc. of Pervasive ’10. 2010, pp. 130–137.
[22] Shi, E., Niu, Y., Jakobsson, M., and Chow, R. Implicit
authentication through learning user behavior. In Proc. of ISC ’10 (October 2010), pp. 99–113.
[23] Stajano, F. One user, many hats; and, sometimes, no hat – towards a secure yet usable PDA. In In Proc. of Security Protocols Workshop (2004).
[24] Stajano, F. Pico: No more passwords! In Proc. of Security Protocols Workshop (March 28–30 2011).
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
147
[25] Studer, A., and Perrig, A. Mobile user location-specific encryption (MULE): using your access as your password. In Proc. of WiSec ’10
(2010), ACM, pp. 151–162.
[26] Texas Instruments. OMAPTM 5 mobile applications platform, 13 July 2011. Produ ct Bulletin
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
148
HYBRID VEHICLES
Abhilash A.S & Yashwanth N
([email protected]) 7th Sem Mechanical Engineering
ATME COLLEGE OF ENGINEERING MYSORE
ABSTRACT
As modern culture and technology continue to develop, the
growing presence of global warming and irreversible climate change
draws increasing amounts of concern from the world’s population.
Earth’s climate is beginning to transform, proven by the frequent severe
storms, the drastic shrinking of polar ice caps and mountain glaciers,
the increased amount of flooding in coastal areas, and longer droughts
in arid sections of the world. There are large holes in the ozone layer of
the earth’s atmosphere and smog levels are ever increasing, leading to
decreased air quality. Countries around the world are working to
drastically reduce CO2 emissions as well as other harmful
environmental pollutants.
Amongst the most notable producers of these pollutants are
automobiles, which are almost exclusively powered by internal
combustion engines and spew out unhealthy emissions. Cars and
trucks are responsible for almost 25% of CO2 emission, and other
major transportation Methods account for another 12%.In the opinion
of many, cars are a large contributor to pollutions levels and, in the
bigger picture, global warming. With immense quantities of cars on the
road today, pure combustion engines are quickly becoming a target of
global warming blame.
Internal combustion engines account for a lot of the pollution
problems, but the issue still stands as to what system will drive the
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
149
next wave of automotive vehicles. One potential alternative to the
world’s dependence on standard combustion engine vehicles are hybrid
cars. Hybrids, like their name suggests, are vehicles that utilize multiple
forms of fuel to power their engines. In the majority of modern hybrids,
cars are powered by a Combination of traditional gasoline power and
the addition of an electric motor. In this sort of hybrid engine, the
combustion engine is used at high speeds for long distances, such as
the highway, and the electric engine at low speeds and short distances,
such as in urban areas. By incorporating alternative energy
drive‐ trains into vehicles that also use combustion engines, they allow
for a considerably cleaner mode of transportation.
Key words: Vehicles, pollution, engines, hybrid.
------------------------ ---------------------------
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
150
Introduction:
What is a Hybrid Engine System?
A hybrid engine system is a system that uses two or more distinct
power sources to move the vehicle. It is a Fusion between an internal
combustion engine and electric motor. However other mechanisms is
used to capture and utilize different functions through different power
combinations
Automobile hybrid systems combine two motive power sources, such as
an internal combustion engine and an electric motor, to take advantage
of the benefits provided by these power sources while compensating for
each other’s shortcomings, resulting in highly efficient driving
performance. Although hybrid systems use an electric motor, they do
not require external charging, as do electric vehicles. They are also
called Hybrid Electric Vehicles (HEV’s).
Hybrid System Configurations
The following three major types of hybrid systems are being used in the
hybrid vehicles currently on the market:
1) SERIES HYBRID SYSTEM
The engine drives a generator, and an electric motor uses this generated
electricity to drive the wheels. This is called a series hybrid system
because the power flows to the wheels in series, i.e., the engine power
and the motor power are in series. A series hybrid system can run a
small- output engine in the efficient operating region relatively steadily,
generate and supply electricity to the electric motor and efficiently
charge the battery. It has two motors—a generator (which has the same
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
151
structure as an electric motor) and an electric motor. This system is
being used in the Coaster Hybrid.
2) PARALLEL HYBRID SYSTEM
In a parallel hybrid system, both the engine and the electric motor drive
the wheels, and the drive power from these two sources can be utilized
according to the prevailing conditions. This is called a parallel hybrid
system because the power flows to the wheels in parallel. In this
system, the battery is charged by switching the electric motor to act as
a generator, and the electricity from the battery is used to drive the
wheels. Although it has a simple structure, the parallel hybrid system
cannot drive the wheels from the electric motor while simultaneously
charging the battery since the system has only one motor.
3) SERIES/PARALLEL HYBRID SYSTEM
This system combines the series hybrid system with the parallel hybrid
system in order to maximize the benefits of both systems. It has two
motors, and depending on the driving conditions, uses only the electric
motor or the driving power from both the electric motor and the engine,
in order to achieve the highest efficiency level. Furthermore, when
necessary, the system drives the wheels while simultaneously
generating electricity using a generator. This is the system used in the
Prius and the Estima Hybrid.
Characteristics of Hybrid Systems
Hybrid systems possess the following four characteristics:
1. ENERGY-LOSS REDUCTION
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
152
The system automatically shuts off the engine when the vehicle comes
to a stop and restarts it when the accelerator is pressed. This prevents
wasted energy from idling.
2. ENERGY RECOVERY AND REUSE
The energy that would normally be wasted as heat during deceleration
and braking is recovered as electrical energy, which is then used to power the
starter and the electric motor.
3. MOTOR ASSIST
The electric motor assists the engine in accelerating, passing, or hill
climbing. This allows a smaller, more efficient engine to be used. In
some vehicles, the motor alone provides power for low-speed driving
conditions where internal combustion engines are least efficient.
9. HIGH-EFFICIENCY OPERATION CONTROL
The system maximizes the vehicle’s overall efficiency by using the
electric motor to run the vehicle under operating conditions in
which the engine’s efficiency is low and by generating electricity
under operating conditions in which the engine’s efficiency is
high. The series/parallel hybrid system has all of these
characteristics and therefore provides both superior fuel efficiency
and driving performance.
10. REGENERATIVE BRAKING
The system uses a process called regenerative braking to store the
kinetic energy generated by brake use in the batteries, which in
turn will power the electric motor. The electric motor applies
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
153
resistance to the drive train causing the wheels to slow down. In
return, the energy from the wheels turns the motor, which
functions as a generator, converting energy normally wasted
during coasting and braking into electricity, which is stored in a
battery until needed by the electric motor.
11. IMPROVED AERODYNAMIC
Improving aerodynamics; (part of the reason that SUVs get such
bad fuel economy is the drag on the car. A box shaped car or
truck has to exert more force to move through the air causing
more stress on the engine making it work harder). Improving the
shape and aerodynamics of a car is a good way to help better the
fuel economy and also improve handling at the same time.
12. OTHERS
Using low rolling resistance tires (tires were often made to give a quiet,
smooth ride, high grip, etc., but efficiency was a lower priority). Tires
cause mechanical drag, once again making the engine work harder,
consuming more fuel. Hybrid cars may use special tires that are more
inflated than regular tires and stiffer or by choice of carcass structure
and rubber compound have lower rolling resistance while retaining
acceptable grip, and so improving fuel economy whatever the power
source.
Powering the a/c, power steering, and other auxiliary pumps electrically
as and when needed; this reduces mechanical losses when compared
with driving them continuously with traditional engine belts.
These features make a hybrid vehicle particularly efficient for city traffic
where there are frequent stops, coasting and idling periods. In addition
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
154
noise emissions are reduced, particularly at idling and low operating
speeds, in comparison to conventional engine vehicles.
Benefits of Hybrid Cars: There are many great benefits of hybrid cars.
1. Built with lightweight materials, these cars are very compact in size.
The engine is built to be very fuel efficient. When the vehicle stops at a
traffic light, the engine with automatically turn off and restart whenever
the car if put into a gear.
2. These cars have the benefit of being run by a gasoline engine and an
electric motor which exists for acceleration.
3. The batteries of the electric motor get recharged themselves by
utilizing the kinetic energy generated during braking.
4. Hybrid vehicle engines generate fewer emissions, provide good
mileage, idle less, and are very fuel efficient. These hybrid vehicles can
help save planet.
5. The aerodynamic architecture lessens drag and the tires are built
with a unique rubber which lessens fiction.
6. The battery which is inbuilt has huge competence and is composed of
nickel-metal-hydride.
7. The power-train equipment permits utilization of a couple of power
sources and improves mileage.
8. There are numerous options to choose from. Honda, Ford, Toyota,
GMC, and Chevrolet are a few worth mentioning.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
155
9. In case you select a hybrid vehicle then the US Government will
appreciate your selection by providing you considerable tax breaks.
10. Driving a hybrid implies that you are dynamic in guaranteeing the
environment is clean and that you care for your planet. It also indicates
that you are a responsible citizen who wants to save fuel which is
valuable.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
156
ANTI-LOCK BREAKING SYSTEM
Dhanush N.S & Sachin Pande M.P
([email protected]) 7th Sem Mechanical Engineering
ATME COLLEGE OF ENGINEERING MYSORE
ABSTRACT
Motor driving is a skill which requires extensive practice. Much
has been told, presented on happenings of accidents due to errors in
driving. It is even fascinated to indicate that while driving we move into
different emotions based on the path, mode and various factors which
will be involved in driving.
The drawback of any motor driving leads to an accident, which
happens in a fraction of a second. The mental presence of the driver
during the entire process of driving itself is an expertise a driver has to
have. The much more complicated concept is we have to drive not only
for ourselves but we may have to map the driving skills of other drivers
who are driving other vehicles along.
The type, class, speed and other features of various vehicles will
bring different complexities for driving.
All these complexities can be reduced to some extent by the
concept of self locking and unlocking of wheels when heavy break is
applied. This concept helps the driver not only control the vehicle but
also to steer the vehicle to safety.
The Anti-Lock BRAKE SYSTEM (ABS) is a mechatronic system
where mechanical braking, hydraulic system and the electronic sensor
system work in tandem to prevent wheel locking during heavy braking.
This allows the driver to maintain steering control while stopping
vehicle in shortest distance possible. Since ABS will not allow the tire to
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
157
stop rotating, one can brake and steer at the same time. The braking
and steering ability of the vehicle is limited by the amount of traction
the tire can generate.
This paper mainly focuses on the functioning, types and
applications of Anti-lock breaking system.
Key words: vehicle, speed, breaking, self unlocking.
------------------------ ---------------------------
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
158
INTRODUCTION
In a little over 100 years since automobiles took hold of people’s
imagination technologies were designed to accelerate them faster and
reach higher speeds. Car manufacturers worldwide are vying with each
other to invent more reliable gadgets, there by coming closer to the
dream of the ‘Advanced safety vehicle’ or ‘Ultimate safety vehicle’, on
which research and development has been going on for the past several
year. The most recent advancement in braking system being the Anti-
lock breaking system (ABS). Wheel lockup during braking causes
skidding which in turn cause a loss of traction and vehicle control. This
reduces the steering ability to change direction. So the car slides out of
control. But the road wheel that is still rotating can be steered. That is
what ABS is all about. With such a system, the driver can brake hard,
take the evasive action and still be in control of the vehicle in any road
condition at any speed and under any load. ABS does not reduce
stopping distance, but compensates the changing traction or tyre
loading by preventing wheel lockup.
CONCEPT OF ABS
The theory behind anti-lock brakes is simple. A skidding wheel has less
traction than a non-skidding wheel. If the vehicles have been stuck on
ice and if the wheels are spinning then the vehicle has no traction, this
is because the contact patch is sliding relative to the ice. Good drivers
have always pumped the brake pedal during panic stops to avoid wheel
lock up and the loss of steering control. ABS simply gets the pumping
job done much faster and in much precise manner than the fastest
human foot.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
159
By keeping the wheels from skidding while you slow down, anti-lock
brakes have two benefits. You'll stop faster and you’ll be able to steer
while you stop.
ABS COMPONENTS
Many different ABS are found on today’s vehicles. These designs are
varied by their basic layout, operation and components. The ABS
components can be divided mainly into four components
Speed sensors
Valves
Pump
Controller
Speed sensors
The anti-lock braking system needs some way of knowing when a wheel
is about to lock up. The speed sensors, which are located at each wheel,
or in some cases in the differential, provide this information.
Valves
There is a valve in the brake line of each brake controlled by the ABS.
On some systems, the valve function in three positions-
In position one, the valve is open; pressure from the
master cylinder is passed right through to the brake.
In position two, the valve blocks the line, isolating
that brake from the master cylinder. This prevents
the pressure from rising further should the driver
push the brake pedal harder.
In position three, the valve releases some of the
pressure from the brake.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
160
Pump
When the ABS system operates the brake lines lose pressure. The pump
re-pressurizes the system.
Controller
The controller is an electrical control type unit in the car which receives
information from each individual wheel speed sensor, in turn if a wheel
loses traction the signal is sent to the controller, the controller will then
limit the brake force (EBD) and activate the ABS modulator which
actuates the braking valves on and off.
OPERATION
Typically ABS includes a central electronic control unit (ECU),
four wheel speed sensors, and at least two hydraulic valves within the
brake hydraulics. The ECU constantly monitors the speed of each
wheel, if it detects a wheel rotating significantly slower than the others,
a condition indicative of impending wheel lock, it actuates the valves to
reduce hydraulic pressure to the brake at the affected wheel, thus
reducing the braking force on that wheel, the wheel then turns faster.
Conversely, if the ECU detects a wheel turning significantly faster than
the others it actuates the valves to increase the hydraulic pressure to
the brake at the affected wheel so the braking force is reapplied, slowing
down the wheel. This process is repeated continuously and can be
detected by the driver via brake pedal pulsation. Some anti-lock
systems can apply or release braking pressure 15 times per
second. Because of this, the wheels of cars equipped with ABS are
practically impossible to lock even during panic braking in extreme
conditions.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
161
Modern ABS applies individual brake pressure to all four wheels
through a control system of hub-mounted sensors and a
dedicated micro-controller. ABS is offered or comes standard on most
road vehicles produced today and is the foundation for electronic
stability control systems, which are rapidly increasing in popularity.
TYPES OF ANTILOCK BRAKE SYSTEMS
Four channel, four sensors ABS
This is the best scheme. There is speed sensor on all four wheels and a
separate valve for all the four wheels. With this set up the controller
monitors each wheel individually to make sure it is achieving maximum
braking force.
Three-channel, four-sensor ABS
There is a speed sensor on all four wheels and a separate valve for each
of the front wheels, but only one valve for both of the rear wheels.
Three channel, three sensor ABS
This scheme is commonly found on pickup trucks with four wheels
ABS, has a speed sensor and a valve for each of the front wheels, with
one valve and one sensor for both rear wheels. The speed sensor for the
rear wheel is located in the rear axle.
The rear wheels are monitored together, they both have to start to lock
up before the ABS is activated on the rear wheels. With this system, it is
possible that one of the rear wheels will lock during a stop, reducing
brake effectiveness.
One channel, one sensor abs
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
162
This scheme is commonly found on pickup trucks with rear wheel ABS.
it has one valve, which controls both rear wheels, and one speed sensor,
located in the rear axle. This system operates the same as the rear end
of the rear channel system. The rear wheels are monitored together and
both have to start to lock up before the abs kicks in. in this system is
also possible that one of the rear wheels will lock reducing brake
effectiveness.
ADVANTAGES OF ABS
It allows the driver to maintain directional
stability and control over steering during
braking
Safe and effective
Automatically changes the brake fluid pressure
at each wheel to maintain optimum brake
performance.
ABS absorbs the unwanted turbulence shock
waves and modulates the pulses thus
permitting the wheel to continue turning under
maximum braking pressure.
DISADVANTAGES OF ABS
It is very costly
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
163
Maintenance cost of a car equipped with ABS is more.
SURVEY
A June 1999 National Highway Traffic Safety Administration (NHTSA)
study found that ABS increased stopping distances on loose gravel by
an average of 27.2 percent.
The Insurance Institute for Highway Safety released a study in 2010
that found motorcycles with ABS 37% less likely to be involved in a fatal
crash than models without ABS.
A 2004 Australian study by Monash University Accident
Research Centre found that ABS.
Reduced the risk of multiple vehicle crashes by 18
percent,
Increased the risk of run-off-road crashes by 35
percent.
CONCLUSION
ABS has been so far developed to a system, which provides rapid,
automatic braking in response to signs of incipient wheel locking by
alternatively increasing and decreasing hydraulic pressure in the brake
line Statistics show that approximately 40 % of automobile accidents
are due to skidding. In real world conditions, even an alert and
experienced driver without ABS would find it difficult to match or
improve on the performance of a typical driver with a modern ABS-
equipped vehicle. ABS reduces chances of crashing, and/or the severity
of impact. In gravel, sand and deep snow, ABS tends to increase
braking distances. On these surfaces, locked wheels dig in and stop the
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
164
vehicle more quickly. ABS prevents this from occurring. If there is an
ABS failure, the system will revert to normal brake operation. Normally
the ABS warning light will turn on and let the driver know there is a
fault.
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
165
GET THE PNR STATUS FROM
RAILWAY TICKET
Mr. Raghunatha.B 1, Dr. Y.H.Sharath Kumar 2 1Research Scholar, University Of Mysore, India
Abstract
In this project, we have designed the system which provides the PNR
status in the Audio/Visual form. We have also created the railway ticket
database for the experimentation purpose. In the hierarchy of the
proposed system, the railway ticket images are first localized and the
segmented. After segmentation, the numerals are obtained using
Template based matching technique. Then the Text To Speech (TTS) tool
is used to convert the text PNR status to audio information. The
performance of the proposed PNR status system is validated by the
accuracy measure.
------------------------ ---------------------------
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
166
1. INTRODUCTION
Number recognition ability to recognize numbers out of order and
to understand how numbers relate to objects, without number
recognition, addition and subtraction are impossible. Students need
plenty of opportunities to practice those skills in many different ways.
Number recognition is classified into two types as off-line and on-line
recognition methods. In the off-line recognition, the writing is usually
captured optically by a scanner and the completed writing is available
as an image. But, in the on-line system, the two dimensional
coordinates of successive points are represented as a function of time
and the order of strokes made by the writer are also available. The
online methods have been shown to be superior to their off-line counter
parts in recognizing characters due to the temporal information
available with the former. However, in the off-line systems, the pattern
recognition has been successfully used to yield comparably high
recognition accuracy levels. Several applications including mail sorting,
bank processing, document reading and postal address recognition
require number recognition systems.
Fig 1.1 Railway ticket status system
2. Passenger name record (PNR)
Passenger Name Record (PNR) is a record in the database of a Computer
reservation system (CRS) that contains the group of passengers
travelling together Passenger name record is called PNR having 10 digit
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
167
numbers in Indian railway passenger’s reservation system. These
records include: PNR Number ,name of passenger, age, gender, number
of passenger ,telephone number ,address, train number ,class of travel,
coach number seat /berth number and status of ticket is confirmed.
Fig1.2 Sample railway tickets
3. Motivation
The PNR Status tool checks the PNR number for any Train in India to
confirm our railway reservation PNR status. Checking the Ticket Status
is easy but many times people got stuck because they do not
understand the meaning of it.
4. Applications
This project useful for blind people. We can also get easily PNR status
from this system. It is used for illiterate people. It also an aid for old age
people
5. Proposed segmentation module
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
168
Fig 1.4 proposed segmented module
6. Challenges of number recognition
Project can face challenges
Position irregularity of PNR number on railway ticket.
Fig 1.5 Sample PNR cropped image
7. Original image and segmented output image
Fig 1.6 Original image Fig 1.7 Segmented output image
8. Number dataset
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
169
Fig 1.3 Sample number dataset
9. Template matching:
Template matching is a technique in Digital image processing for
finding small parts of an image which match a template image. It can be
used in manufacturing as a part of quality control, a way to navigate a
mobile robot, or as a way to detect edges in images.
Fig 1.8 Template matching
10. Web Technology
JavaScript: JavaScript an interpreted computer programming language
it was originally implemented as part of web browsers so that client-side
scripts could interact with the user, control the browser communicate
asynchronously
HyperTextMarkupLanguage (HTML):It is mark-up language for
creating Web pages and other information that can be displayed in
a web browser.HTML is written in the form of HTML element consisting
of tags enclosed in angle brackets (like <html>),
REFERENCES
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
170
• Lebanese University, Institute of Technology, P.O.B. 813 – SaiLEBANON
• Eun Ryung Lee, Pyeoung Kee Kim. Automatic recognition of a car
license plate using color image processing [J]. Journal of Korea Institute of Telemetric and Electronics, 1995, 24(2): 128-131.
• LIH,etal A contour-based approach to multisensory image registration [J]. IEEE Transaction on Image Processing, March, 1995, 4(3): 320 – 334.
• Smith SM, Brady JM. Susan – a new approach to low level image
processing [J]. Journal of Computer Vision, 1997, 23(1): 45 – 78.
• Institute of Applied Mathematics, UCO, BP10808 – 49008 Anger
FRANCE[1] F.Ahmed and A.A.S. Away., 1993. An Adaptive Opt-electronic Neural Network for Associative Pattern Retrieval. Journal of Parallel and Distributed Computing, 17(3), pp. 245-
250. J. Swartz, 1999. “Growing ‘Magic’ of Automatic Identification”,IEEERobotics & Automation Magazine, 6(1), pp.
20-23.
• Park etal, 2000. “OCR in a Hierarchical Feature Space”, IEEE
Transactions on Pattern Analysis and Machine Intelligence, 22(4), pp. 400-407
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
171
CBIR Based Matching and Retrieval of Drug Pill Images
Chinmaya T M
Asst. Prof. Govt College, Mandya
Abstract Automatic illicit drug pill matching and retrieval is becoming an
important problem due to an increase in the number of tablet type illicit
drugs being circulated in our society. Here we propose an automatic
method to match drug pill images based on the imprints appearing on
the tablet. This will help identify the source and manufacturer of the
illicit drugs. The feature vector extracted from tablet images is based on
edge localization and invariant moments. Instead of storing a single
template for each pill type, we generate multiple templates during the
edge detection process. The difficulties during matching due to
variations in illumination and viewpoint.
------------------------ ---------------------------
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
172
Introduction
Illicit drugs, widely circulated in the international market, are one of the major factors influencing criminal activities. They also lead to additional
enforcement and tracking expense for law enforcement units. Drug trafficking is also one of the major factors behind violent and other illegal activities. Illicit drug makers use imprints, color, and shape to
identify the chemical substances and their quantities in each pill. Special imprints on the pills are also used for advertisement purposes.
This information includes chemical and physical description, where the physical description includes shape, color, imprint, etc…It is important to develop an image based matching tool to automatically identify illicit
drug pills based on their imprint, size, shape, color, etc. The keywords are based on the size, shape, and color of the pill (e.g., round, diamond, Rectangle, Oval etc.), but they do not utilize the imprint. Keyword-based
retrieval has a number of known limitations, namely keywords are subjective and do not capture all the information about the pill for
accurate retrieval. To develop a successful automatic pill image matching system, it is important to compensate for the variations in the appearance of the pills, due, for instance, to changes in viewpoint,
illumination or occlusion. For this reason, we utilize the gradient magnitude information to characterize the imprint patterns on the drug
pill images. Gradient magnitude is more stable than color or gray scale especially against illumination variations. Given the gradient magnitude image, Scale Invariant Feature Transform (SIFT) descriptor and Multi-
scale Local Binary Pattern (MLBP) descriptors are used to generate feature 2 vectors. In addition, invariant moment features proposed by Hu (1962) and color histogram are used to generate shape and color
feature vectors, respectively.
Related Work Tao and Grosky (1998) both had developed the how to match the Image Matching of OBIR System with Feature Point Histograms, the
traditional database approach of modeling the real world is based on manual Annotations of its salient features in terms of alphanumeric data. However, all such Annotations are inherently subjective. In some
cases, it is rather difficult to characterize certain important real-world concepts, entities, and attributes by means of text only. Shape and
spatial constraints are important data in many applications, ranging from complex space exploration and satellite information management to medical research and entertainment.
In order to overcome these problems, several schemes for data modeling
and image representation have been proposed. Once a measure of
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
173
similarity is determined, the corresponding actual images are retrieved from the database. Due to the lack of any unified framework for image
representation, storage, and retrieval, these symbolic representation schemes and retrieval techniques have greatly facilitated image
management. The OBIR system is written primarily in Java. Windows and menus of the system provide user-friendly interfaces. A brief description of each of the system components is as follows.
This component of our system allows users to index a URL-
referenced image. The actual image is not stored in the database. In general, those image features which characterize image object shapes and spatial relations of multiple image objects can be represented as a
set of points. These Points can be tagged with labels to capture any necessary semantics. For example, a corner point of an image region has a precise location and can be labeled with the region’s identifier and
a color histogram of an image region can be represented by a point placed at the center-of-mass of the region and labeled by the histogram.
We call each of these individual 6 points representing shape and spatial features of image objects a feature points. Corner points, which are generally high-curvature points located along the crossings of image
object’s edges or boundaries, will serve as the feature points for our experiment. We have argued for representing an image object by the
collection of its corner points in, which proposed a quadtree-based technique for indexing such collections. SUSAN (Smallest Univalue Segment Assimilation Nucleus) is used for our corner point detection,
because SUSAN provides better results than traditional corner detection algorithms under varying levels of image brightness. OBIR is a generic object-based image retrieval system that works in a web-based
environment. Using its modular design, we may plug-in various image representation schemes as well as indexing methodologies. In this
paper, we have introduced OBIR and have demonstrated the efficacy of our symbolic image representation scheme on a small database of images. This image representation scheme crucially depends on the
quality of the technique used to find corner points. This is the weak link in our approach, but even so, we have shown that our system works well in certain environments. To refine and extend the OBIR system, we
intend to use better image processing algorithms to extract more precise image feature points. In this problem we have use various nearest
neighbor approaches to directly access relevant images. We also intend to extend our approach to work with video indexing [1].
Singha.M and Hemachandran.K (Feb 2012) both had work in this paper. We propose an image retrieval system, called Wavelet-Based
Color Histogram Image Retrieval (WBCHIR), based on the combination
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
174
of color and texture features. The color histogram for color feature and wavelet representation for texture and location information of an image.
This reduces the processing time for retrieval of an image with more promising representatives. The extraction of color features from digital
images depends on an understanding of the theory of color and the representation of color in digital images. The distance formulas used by many researchers, for image retrieval, include Histogram Euclidean
Distance, Histogram Intersection Distance, Texture is also considered as one of the feature extraction attributes by many researchers.
Although there is no formal definition for texture, intuitively this descriptor provides measures of the properties such as smoothness and regularity. Mainly the texture features of an image are analyzed through
statistical, structural and spectral methods. In this paper it extracts some features like color feature The color feature has widely been used in CBIR systems, because of its easy and fast Computation The
extraction of color features from digital images depends on an understanding 7 of the theory of color and the representation of color in
digital images. The color histogram is one of the most commonly used color feature representation in image retrieval. A color histogram represents the distribution of colors in an image, through a set of bins,
where each histogram bin corresponds to a color in the quantized color space. A color histogram for a given image is represented by a vector:
H = {H [0], H [1], H [2], H [3] …...…H [i]… H [n]} Where ‘ i ’ is the color bin in the color histogram and H [i] represents the
number of pixels of color ‘i’ in the image, and n is the total number of bins used in color histogram. Typically, each pixel in an image will be assigned to a bin of a color histogram. Accordingly in the color
histogram of an image, the value of each bin gives the number of pixels that has the same corresponding color. In order to compare images of
different sizes, color histograms should be normalized. The normalized color histogram H is given as:
H = {H[0], H[1], H[2],……, H[i],…. H[n] Where Hi=Hi/p, p is the total number of pixels of an image. Like color, the texture is a powerful low-level feature for image search
and retrieval applications. Much work has been done on texture analysis, classification, and segmentation for the last four decade, still
there is a lot of potential for the research. “Texture is an attribute representing the spatial arrangement of the grey levels of the pixels in a region or image”. The common known texture descriptors are Wavelet
Transform, Gabor-filter, co-occurrence matrices he proposed method has been implemented using Matlab 7.3 and tested on a general-
purpose WANG database containing 1,000 images photo, in JPEG
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
175
format of size 384x256 and 256x386. The search is usually based on similarity rather than the exact match. We have followed the image
retrieval technique, on different quantization schemes. In this paper, we presented a novel approach for Content Based Image Retrieval by
combining the color and texture features called Wavelet-Based Color Histogram Image Retrieval (WBCHIR).Similarity between the images is as certained by means of a distance function. The experimental result
shows that the proposed method outperforms the other retrieval methods in terms of Average Precision [2]. Several researchers have
proposed drug identification systems using content based image retrieval (CBIR).CBIR is a popular technology of image recognition which extracts physical features such as color or shape to describe an
image of the object (received Jan 2011,revised Aug 2011). These features are then used for Drug recognition. In this paper we had proposed an 8 Automated Drug Image Identification System (ADIIS),
using content based image retrieval to extract the features of drug images, and using neural networks perform drug recognition. The
features used to recognize drugs include colors, shapes, ratios, magnitudes and textures. The query image is matched with database images of drugs by the weighted Euclidean distance to calculate
similarity distance. The system then retrieves ten of the images most similar to the target drug image, allowing the user to correctly identify
the drug and obtain information about it. The major contributions and advantages of this paper are to construct an Automated Drug Image Identification System based on five features and dynamic weights to
identify drugs and improve the recognition accuracy of drugs even they are white circular drugs. The term Content Based image retrieval (CBIR), also known as query by image content ,is the application of
computer vision techniques to the image retrieval problem of searching for digital images in large databases, the term is also used to describe
the procedures necessary to retrieve images from a large collection, based on the syntactic image features. Though current CBIR System typically use low level features such as texture, color and shape,
systems that use high level features such as texture are common. Using CBIR, we extract different image processing techniques to extract the features of drugs to query a drug database.
Lin et al proposed a tablet drug image retrieval system to raise the drug
recognition of white tablets. Lin’s system extracts features including the shape, color and size. However, Lin’s method is not effective in identifying drugs, because many drugs are similar, and have the same
size and color. This system still cannot effectively extract the representative features of drugs. Xie et al, captured drug features that
users select for system identification, such as drug size, shape, weight
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
176
and color. Xie’s system provides a way for the user to select the features extracted. Because of their popularity, white circular drugs and their
features are hard to represent, so the system cannot recognize the specific appearance of the drugs. of the intensity. Drug colors are
divided into the following ten colors: white, gray, black, purple, blue, green, orange, red and cyan. The images of RGB values were converted into HSV values.
In shape feature canny edge algorithm is used to define the edge of the
drug images and convert them into binary images. The edges is then divided into four equal blocks. An Edge Histogram Descriptor is used for the distribution edge of drug images. Five types of drugs were used to
indicate the various shapes of possible edges. In Linear Gabor the filter responses that result from the application of a
filter bank of Gabor filters can be used directly as texture features, though none of the approaches described in the literature employs such
texture features. In this study, linear Gabor features are used only for comparison. In our experiments we used two filter banks, one with symmetric and one with antisymmetric Gabor filters. This choice is
motivated by the properties of simple cells in the visual cortex which can be modeled by the Gabor filter. The spatial frequency bandwidth
and the spatial aspect ratio determine the orientation bandwidth of the filter which is about half response and is also constant for all filters in the bank used. Three different preferred spatial frequencies and eight
different preferred orientations were used, resulting in a bank of 24 Gabor filters. The application of such a filter bank results in a 24-dimensional feature vector in each point of the image, i.e. a 24-
dimensional vector field for the whole image. In Threshold Gabor features it states that, In contrast to the linear features described
above, most Gabor filter related texture features are obtained by apply ing non-linear post-11
Application
Illicit drug makers use imprints, color & shape to identify the
chemical substances and their quantities in each pill.
Special imprints on the pills are also used for advertisements purposes.
Challenges
Large intra class variations and less inter class variations: As the
pills of different class will have same color or shape or textures
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
177
there will be large intra class variations, makes the problem difficult.
Identification of appropriate features: As pill images of different have same color, shape and texture identification of feature is a
more challenging task.
Selection of suitable classifier model is another challenging task.
Objectives
Study of suitable feature extraction based on color, shape and
texture of pill images 4
Study of fusion techniques for better representation of pill images
Study of classifiers for effective classification
Creation of large database on pill images will complete
information Motivation
In general for several peoples in society they don’t know the correct information about the correct drug for a particular disease, and in
current market there are several types of illicit drugs are available now. So to find out the correct information about the particular drug for a particular disease it’s a challenging part.
It is important that image retrieval does not solve the general image understanding problems. The retrieval system presents similar images.
The user should define what the similarity between images has to be. For example, segmentation and complete feature may not be necessary
for image similarity. So, when we want to develop an efficient CBIR, some problems have to be solved. The first problem is to select the image features that will represent the image. Naturally, images stored
with information and features that can be used for image retrieval. Some of the features can be (color, texture, shape) and some can be the
human description of the image like impressions. The second problem is the system steps to extract features of an image and to deal with large image databases. We have to keep in our mind that large image
databases are used for testing and retrieving. So this motivated to build the correct drug identification system.
Dataset samples
Here there are some different types of dataset samples are there with their suitable molecules are shown below. 30
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
178
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
179
Conclusion This chapter we extracted features namely Shape features, Gabor
features, EDH features, LBP features and LBPV features and applied KNN classifier and noted all accuracy results. And finally get best
accuracy result as shown in the table format. Future work
In future we are going to study combination of different features of color, shape and texture.
Also study on classifier combination shall be made.
Selection of appropriate feature will be studied and designed.
An application based on the work will be carried out and will be designed in future.
References 1. Y.tao & W.I.Grosky computer science dept Wayne State University
Detroit, Michigan 48202 U.S.A.email:[email protected] & Grosky@ cs.wayne.edu.
2. Manimala and K.Hemachandran “content based image retrieval using color and texture”, dept of C.S.Assam university, silchar, India. Pin
code-788011. email:[email protected] and [email protected].
3. RUNG-CHING, YUNG-KUAN CHAN “An automatic drug image identification system based on multiple image features and dynamic
weights”, RUNG-CHEN dept of information management chaoyang university of technology n0.168,jifeng E.rd., wufeng district, Taichung 41349 taiwan ., email:[email protected].
4. P.kurizinga,N.petkov & S.E grigore “comparison of texture feature based on GABOR institute of Groningen p.o.Box 800,9700 Av Groningen, the Netherlands., email:[email protected],[email protected]@cs.rug.nl.
5. Zhao.g, pietilkainen.M, “Dynamic texture recognition using local binary patterns with an application of facial expressions” IEEE trans.Pattern Anal. mach intell29(6),915-928(2007).
6. Avneet Kaur is research scholar and pursuing M.Tech (Electronics
and Communication Engineering) in Department of Electronics & Communication Engineering, Amritsar College of engineering and
Technology, Amritsar, Punjab, India (e-mail: [email protected]). Vijay Kumar Banga is working as
Proceedings of the seminar on “Recent tools for Dimensionality Reduction in Understanding Medical Data” – 22nd
August 2013
180
professor and Head of the Department of Electronics & Communication Engineering, Amritsar College of Engineering and Technology, Amritsar, Punjab, India (e-mail: [email protected]).
7. “Fusion of Colour, Shape and Texture Features for Content Based Image Retrieval” Pratheep Anantharatnasamy, Kaavya Sriskandaraja,
Vahissan Nandakumar and Sampath Deegalla Department of Computer Engineering, Faculty of Engineering, University of Peradeniya, Sri Lanka [email protected], [email protected], [email protected] [email protected].
8. “Image Retrieval Based on Content Using Color Feature” Ahmed jamal afifi-jan 2,2012 44
9. Content Based Information Retrieval in Forensic Image Databases by Zeno Geradts, Department of Digital Evidence Netherlands Forensic
Institute Ministry of Justice Volmerlaan 172288 GD Rijswijk, Netherlands [email protected] December 2001.
10. Content Based Medical Image Retrieval Using Texture Features Harikrishnan.S, PG Scholar/CSE , Paavai Engineering College Yogapriya.J, Assistant Professor/CSE, Paavai Engineering College.
11. A Statistical Approach to Texture Classification from Single Images
Manik Varma and Andrew Zisserman Robotics Research Group Dept. of Engineering Science University of Oxford Oxford, OX1 3PJ, UK (manik,az)@robots.ox.ac.uk.
12. Efficient computation of Gabor features J. Ilonen, J.-K. Kamarainen, H. K¨alvi¨ainen Department of Information Technology,
Lappeenranta University of Technology, P.O.Box 20, FIN-53851 Lappeenranta, Finland.
i
Proceedings of Seminar
RECENT TOOLS FOR DIMENSIONALITY REDUCTION
IN UNDERSTANDING MEDICAL DATA
22nd AUGUST 2013
CHAMARAJANAGAR, INDIA
CHIEF EDITOR
Prof. MD Pushpavathi
ASSOCIATE EDITORS
Prof. A G Shivakumar
Prof. Annapoorneswara
Dr. Shankarappa S
Dr. Prathibha S
Sri. Rajesh K M
Smt. Shubha L N
Organized by
Department of Computer Science JSS College for Women
Chamarajanagar-571313, Karnataka, India
ii
© JSS College for Women
Chamarajanagar
August 2013
ISBN NO: 978-81-928386-0-1
Published by:
Department of Computer Science
JSS College for Women
Chamarajanagara-571313
INDIA
iii
Preface
With the advancement of science and technology, automation took place in
various sectors such as banking, business, education, medicine, agriculture
etc. The major goal of any automation task is to minimize the effort,
maximize the productivity and to enhance the quality of service. In the field
of medicine, automation systems such as intelligent experts systems and
decision support systems help physicians and medical practitioners
effectively diagnose the diseases and make right decisions in treating a
patient. In order to design expert systems or decision support systems, a
huge volume of heterogeneous data of possibly high dimension need to be
gathered, pre-processed, represented, analyzed and interpreted. So, from the
automation point of view, understanding medical data and the tools for
dealing with medical data analysis is very much important for professionals
who design experts systems as well as for physicians who validate the
designed system.
The first state level seminar on Recent Tools for Dimensionality
Reduction in Understanding Medical Data is the forum for researchers to
present the state of the art works in the areas of Medical Image Processing,
Pattern Recognition, Dimensionality reduction, Data Analysis Datamining,
and other related areas.
The seminar was more related to understanding the new avenues in medical
data understanding and the tools for analyzing the heterogeneous type of
medical data (Text, Image, and Video). The objective of this seminar was to
highlight the challenges in medical data analysis and to open up the
research issues in this area of research. Since the field of Computer Science
and Engineering plays an important role in addressing the issues in medical
data analysis and the possible solutions, it is very much essential to identify
the prominent areas of research in Computer Science related to medical
data analysis and hence this seminar has provided the platform for many
budding researchers to discuss about the trust areas of research in this
domain and motivated many young students towards research in discipline
iv
of Computer Science and Engineering. Thus the seminar is an attempt to
provide a greater insight into the latest research works in the field of medical
data and to bring together the executives of medical sector, researchers,
teachers and students for a meaningful interaction in this regard.
We are grateful to Mr. Yogeesh, dealer, Microtek UPS, Mysore. Mr.
Murugesh, Pixel Computer, chamarajanagara, Munna Furniture,
Chamarajanagar. We thank the resource persons Prof. D.S.Guru for his
valuable suggestions and guidance in organizing this seminar. We also
thank Dr. H.S. Nagendraswamy, Dr Vinay, and Dr. S Manjunath for having
delivered more resourceful lectures on comprehensive review of
understanding medical data, medical image processing challenges and
dimensionality reduction techniques respectively. We also extend our
sincere thanks to Dr. Raju, District Leprosy Officer, and Chamarajanagar….
The interest shown by the researchers in presenting the papers is highly
appreciated. Among 30 papers presented at seminar, 15 papers were
finalized by the review committee for publication. We had papers on image
retrieval, medicinal plant recognition, neurological disorders, multiple
images, video processing etc. A special thanks to our authors who have
submitted papers related to the fields of medicine, computing and image
processing all complying with the theme of the seminar.
We would like to express our gratitude to Prof. M D Pushpavathi, Principal
of the college, advisors and members of the organizing committee and staff
of JSS College for their unstinted support and encouragement in bringing
out the proceedings of the seminar. We are thankful to Degula Mudranalaya
for printing this proceeding.
Chamarajanagar Editors
v
From Chairperson’s desk
I am extremely happy that the Department of Computer Science of our
college has organized this seminar to enlighten teachers and students about
the fascinating link between Medical Data and Computer Science. The
seminar is significant in the context of the thrust being given to higher
education in order to promote research in Basics sciences. The objective of
the seminar is to expose young minds to the various facets of the birth and
evolution of the Universe and the great triumphs in the field of Medical data
in relation to computer science. I hope the seminar will rekindle interest in
the fundamental problems of Computer science.
The seminar has been organized with blessings and encouragement from
His Holiness Jagadguru Sri Shivarathri Deshikendra Mahaswamiji,
President, JSS Mahavidyapeetha. I am grateful to Sri B N Betkerur,
Executive Secretary, JSS Mahavidyapeetha and Prof. T D Subbanna,
Director, College Development Section, JSS Mahavidyapeetha for their
enthusiastic support.
I am extremely thankful to distinguished Professors Dr. D S Guru, Dr.
Nagendraswamy, Dr. Manjunath S, and Dr. Vinay for enriching the seminar
with their presentations. I am also thankful to Dr. Raju District Leprosy
Officer, Chamarajanagar, and Technical Committee members for their
valuable guidance and technical support.
The proceeding of the seminar is a testimony to the enthusiasm and interest
shown by the paper presenters and my special thanks to them. I would like
to place on record my sincere thanks to the Chief Editor, Dr. D S Guru,
editors, Dr. Nagendra Swamy, Dr. Manjunath S, Dr. Vinay and Sri. Rajesh K
M for their efforts in bringing out the proceedings.
I am extremely grateful to University Grants Commission and all other
sponsors for providing financial assistance for organizing the seminar.
vi
I would like to acknowledge the persistent and untiring efforts of Sri. Rajesh
K M, Organizing secretary and faculty of Department of Computer Science of
our college in organizing the seminar.
The organization of this seminar is due to the dedicated efforts of the
Advisors and Members of the Seminar committee, teaching and non-
teaching staff and students of the college. I would like to place on record my
sincere thanks to all of them.
Prof. M D Pushpavathi, Principal.
vii
SEMINAR COMMITTEE
Chief Patron
HIS HOLINESS JAGADGURU
SRI SHIVARATHRI DESHIKENDRA MAHASWAMIJI
President, JSS Mahavidyapeetha, Mysore
Advisors
Sri B N BETKERUR Executive Secretary, JSS Mahavidyapeetha, Mysore
Prof. S P MANJUNATH Deputy Secretary-I, JSS Mahavidyapeetha, Mysore
Prof. S SHIVAKUMARASWAMY Deputy Secretary-II, JSS Mahavidyapeetha, Mysore
Prof. T D SUBBANNA Director, CDS, JSS Mahavidyapeeta, Mysore
Sri. B NIRANJAN MURTHY Asst. Director, CDS, JSS Mahavidyapeeta, Mysore
*-*-*-*-*-
viii
Technical Advisory Committee
Chair
Dr. D S Guru Professor, DOS in CS, Manasagangothri, Mysore- 570006
Members
Dr. T Vasudev Prof. & HOD, DOS in CS, MIT, Mysore
Dr. Manjunath Rao L Director-DSSIT, Bangalore
Dr. H S Nagendraswamy
Asst. Prof. Dos in CS, UOM, Mysore
Dr. Rghuveer R
Asst. Prof.DOS in CS, NIE, Mysore
Dr. Bajanthri
Asst. Prof.DOS in CS, GEC, Chamarajanagar
Dr. C N Ravikumar
Prof. DOS in CS, SJCE, Mysore
Review Committee
Dr. Manjunath S Asst. Prof. DOS in CS, JSS College, Ooty Road, Mysore
Dr. Vinay
Asst. Prof. DOS in CS, JSS College, Ooty Road, Mysore
Dr. Harish
Asst. Prof. Dos in CS, SJCE, Mysore
Dr. Nagasundara
Asst. Prof.DOS in CS, IT, Mysore
*-*-*-*-*
ix
Organizing Committee
Chairperson
Prof. M D Pushpavathi Principal, JSS College for Women, Chamarajanagar
Organizing Secretary
Sri. Rajesh K .M Head, Dept of Computer Science & BCA
JSS College for Women, Chamarajanagar
Members
Prof. A G Shivakumar Vice- Principal & PRO
Prof. VijayaKumar M V
Head, Dept. Zoology
Prof. Veeranna Head, Dept. Economics
Prof. Shivanna Head, Dept. History
Dr. Shankarappa S Head, Dept. Commerce
Prof. Poornima S Head, Dept. English
Dr. S Prathibha Head, Dept. Botany
Sri. Siddaraju S
Head, Dept. Chemistry
Smt. Arunashree K S Head, Dept. Mathematics
*-*-*-*-*-*
x
CONTENTS Preface From Chairperson’s Desk
Seminar Committee
iii v
vii
Invited Talks
1. Understanding Medical Data an Overview Dr. HS Nagendraswamy
2. Medical Image Processing
Dr. Vinay 3. Dimensionality Reduction
Dr. S Manjunath
1
12
19
Papers Presented
1. An Enhanced Natural Scene Classification Based Image Browsing And Retrieval System Srinidhi
2. Kannada Handwritten Word Recognition in Bank Cheque: A Study Nandeesh P
3. A Review on Automation of Ayurvedic Plant Recognition Pradeepkumar N
4. A Review On Neurological Disorders Maheswara Prasad S
5. Disease Identification In Mulberry Leaves –Review Chaithra D
6. Recognition Of Image Inside Multiple Images Rajesh K M
7. Interpretation of Indian Classical Mudras: A Pattern Recognition Approach Manikanta P
8. Current Changes In Plagiarism Detection Nagaraju L J
9. A Mathematical Overview Of Vision Processing Ashwin Kumar H N
10. Taxonomy Of Multicast Routing Protocols For Mobile Ad-Hoc
Networks Jagadeeshkrishna S
11. Security Approaches On Progressive Authentication Method Accessing Multiple Information In Mobile Devices Santhoshkumar B N
12. Hybrid Vehicle Abilash A.S Yashavanth N
13. Anti Lock Breaking System Dhanush N.S , Sachin Pande M.P
14. Get The Pnr Status From Your Railway Ticket Raghunath
15. CBIR Based matching and Retrieval of Drug Pill Images Chinmaya T.M
24
33
43
60
69
77
85
95
104
112
133
148
156
165
171
xi
M E S S A G E…
xii
xiii
TECHNICAL SESSIONS