souravsaha008_ciitsep2013
TRANSCRIPT
-
5/22/2018 SouravSaha008_CiiTSep2013
1/7
Abstract Contemporary researchers of Bio-informatics havewitnessed an exponential growth in the amount of biologicalinformation over the years. The increasing volume of DNA sequences
has of late created interest among many scientists in computationalapproaches to DNA sequence analysis. A lot of computer analysis ofDNA sequences is directed toward meaningful interpretation ofbiologically significant patterns. Pattern classification forms one of themost important foundations for extraction of knowledge from theenormous DNA sequence databases. This paper reports a cheap andefficient DNA pattern classifier based on the sparse network ofCellular Automata.
KeywordsBio-informatics, Cellular Automata, DNA, PatternClassification, Sequence Analysis
I. INTRODUCTIONellular Automata (CA) is a simple model of a spatially
extended decentralized system made up of a number ofindividual components (cells) [1]. The communication
between constituent cells is limited to local interaction. The
state of each individual cell changes over time depending on the
states of its neighbors [2], [3]. The overall structure can be
viewed as a parallel processing computer [4].Cellular Automata is used by Bio-informatics researchers for
automated recognition, description, classification and grouping
of patterns [5], [6]. For example, CA based models have been
reported to recognize genetic disorder in cells responsible for
the development of cancer [7], [8]. A CA based pattern
classification model generally comprises of two basic
operations exploration of CA for supervised classification
and prediction of class for unknown patterns.
DNA (deoxyribonucleic acid) consists of two long strands,
each strand being made of units called phosphates, deoxyribosesugars and nucleotides (adenine [A], guanine [G], cytosine [C],and thymine [T]) linked in series. For ease of understanding,
biologists commonly represent DNA molecules simply by theirdifferent nucleotides using the symbols {A, G, C, T}. The DNA
in each cell provides the full genetic blueprint for that cell.
Manuscript received October 9, 2013.Tamal Chakrabarti is with the Computer Science and Engineering
Department, Institute of Engineering and Management, Y-12, Block-EP,Sector-V, Salt Lake Electronics Complex, Kolkata-700091, West Bengal, India( e-mail: [email protected]).
Sourav Saha is with the Computer Science and Engineering Department,
Institute of Engineering and Management, Y-12, Block-EP, Sector-V, SaltLake Electronics Complex, Kolkata-700091, West Bengal, India (e-mail:
[email protected]).Devadatta Sinha is with the Computer Science and Engineering Department,
Calcutta University, 92 Acharya Prafulla Chandra Road, Kolkata-700009,
West Bengal, India (e-mail: [email protected]).
The identification of genes in a given DNA sequence is a
emerging field of study in Bio-informatics. One popula
approach is to develop a predictive computer model from
database of known gene sequences and use the resulting mode
to predict where genes are likely to be in newly generatesequence information. Discovery of coding regions in DNA
sequences can therefore be viewed as a pattern recognitio
problem. The explosive growth in biological data demands thathe most advanced and powerful ideas in machine learning
such as cellular automata, should be brought to bear on suc
problems [9], [10].
A pattern classification/recognition algorithm using CA
usually has two phases [11], the learning or training phase an
the testing phase. In the training phase, th
machine/network/algorithm is trained with some benchmar
patterns [12]. In the testing phase new patterns are teste
against the trained model [13], built in the previous step.
This paper proposes a unique DNA-classification schem
using Cellular Automata (CA), which is evolved throug
Simulated Annealing heuristic technique.
II. RELATED WORKSPlenty of research works deal with the problem of classificatio
of DNA patterns from the genomic sequences. One of the vita
tasks in the study of genomes is DNA sequence identificatio
[14]. Recently researchers have attempted variou
soft-computing techniques for DNA sequence identification
Peterson et. al. used proper orthogonal decomposition (PODtechnique to recognize various cancerous patterns in DNA
sequences [15]. Since the performance of a classificatio
strategy heavily depends on selection of similarity or distanc
measure, there has been a demand for exploration of variou
similarity metrics for DNA classification. Priness et. a
compared various unsupervised classification techniques oDNA sequences with respect to the Euclidean distance and th
Pearson correlation [16]. Kulp et. al. proposed a generalize
Hidden Markov Model (GHMM) based framework t
recognize human genes in DNA [17]. Kumar et. al. integratepattern mining and neural network-based approaches to classif
DNA-sequences with reduced dimensions using Multi-lineaPrincipal Component Analysis (MPCA) [18]. However, th
VLSI-implementation- friendly sparse structure of CA has no
yet been extensively utilized as DNA pattern classifier. Th
proposed work evolves a CA based classification framewor
for DNA pattern prediction in linear time. In order to deriv
desirable CA for DNA pattern classification, the propose
scheme has employed simulated annealing heuristic with Fuzz
Levenshtein distance as similarity-cost-measure among DNA
A Cellular Automata Based DNA Pattern
Classifier
Tamal Chakrabarti, Sourav Saha, and Devadatta Sinha
C
-
5/22/2018 SouravSaha008_CiiTSep2013
2/7
patterns. The simple structure of CA renders the proposed
model easily implementable on a VLSI chip suitable for
embedded applications, which demand high speed.
III. CELLULAR AUTOMATA AS DNAPATTERN CLASSIFIERA Cellular Automaton (CA) consists of a number of cells
organized in the form of a lattice [19]. It evolves in discrete
space and time, and can be viewed as an autonomous finite state
machine (FSM) [20]. Each cell stores a discrete variable at timet that refers to the current state of the cell. The next state ofthe cell +1at time (t + 1) is affected by its current state andthe states of its neighbors at time-t. For example, in case of
3-neighborhood CA, the state transition depends on the cell
itself and its left and right neighbors), such that:
+1 = 1 , , +1 (1)Where 1 and +1 are the current states of left and rightneighbors of the ith CA cell at time t and is the ith statetransition function.
Every CA gives rise to a state transition graph consistingof a number of cyclic and acyclic states [21]. The state
transition graph of an arbitrary CA is shown in Fig. 1. The set of
non-cyclic states of the CA as depicted in Fig. 1 forms inverted
trees rooted at the cyclic states. The cyclic states are referred to
as attractors [22]. The states of a tree rooted at the cyclic state
forms the -basin [23].
Fig. 1 State Transition Diagram
A CA with multiple basins may be viewed as a natural
classifier [24], [25]. It tends to classify a given set of patterns
into multiple disjoint state transition graphs (Fig. 1) with each
disjoint graph representing a class falling in their respective
attractor basin.
As an example let us consider the DNA sequences depicted
in Table I. We have encoded the four nucleotides as A = 00, T =
01, G = 10, C = 11. This binary encoding scheme, gives rise t
the following binary codes for the DNA sequences unde
consideration.TABLEI
BINARY ENCODING OF DNASEQUENCES
Serial Nr. DNA Sequence Binary Code (b9b8b1b0)
1 AATTC 0000010111
2 ATTTC 0001010111
3 ATTGA 0001011000
4 CATTC 11000101115 AATTA 0000010100
6 GCGCT 1011101101
7 GTGCT 1001101101
8 GTGCC 1001101111
9 TTGCT 0101101101
10 GCGCC 1011101111
To classify the given set of DNA sequences into two classe
we need to design a CA based classifier for two pattern sets P
and P2, such that two arbitrary patterns P1 and P2 shoul
fall into different attractor basins.
Let us use the rules of state transition as depicted in Table I
Here
means the bitwise XOR operation.
TABLEII
STATE TRANSITION RULES
Bit Position Rule
0 b1b01 b2 b12 b3 b2 b13 b4 b3 b24 b4 b35 b5
6 b7 b57 b7
8 b9 b8 b79 b9 b8
Using the given CA-rule set we observe that the sequencegiven in Table I, can be classified into two classes, th
0-attractor basin (an attractor with all zeros) and the non-zer
attractor basins, as depicted in Fig. 2.
Fig. 2 Classifying the DNA Patterns into Attractor Basins
Using the above CA the given DNA sequences can b
0000010100 0001010111 0001011000
0-Basin
1100010111 0000010111
1011101101 1001101101 1001101111
Non 0-Basin
0101101101 1011101111
10001 01001
10000 11000 01000
00001 00000 11001
10010 10011
01010 11011 01011
00010 00011 11010
10100 10101
01100 11100 01101
00101 00100 11101
10110 10111
01110 11111 01111
00110 00111 11110
-
5/22/2018 SouravSaha008_CiiTSep2013
3/7
categorized into two sets as shown in Fig. 3.
Fig. 3 Categorization of the DNA Patterns into two Classes
Any CA rule with only XOR-operation can be emulated by a
coefficient-matrix multiplication scheme as illustrated below.
The next state of a binary pattern B = bn-1 bn-2b1b0 can be
derived by multiplying it with corresponding CA-Coefficient
matrix.
= 0,0 0,1 0,11,0 1,1 1,11,0 1,1 1,1 011
In order to build CA-Coefficient matrix for a CA rule the
following equation is used.
, = 1, , , = 0, 1 , ,0, 1
From the equation it is clear that the (i, j)th position of the
coefficient matrix will hold one if only if the CA-rule at ith
bitdepends on jthbit for XOR operation otherwise it holds zero
value. For example, the corresponding CA-Coefficient
matrix(C) for the CA-Rule set shown in Table-II can be derived
as follows.
b9 b8 b7 b6 b5 b4 b3 b2 b1 b0
b9 1 1 0 0 0 0 0 0 0 0
b8 1 1 1 0 0 0 0 0 0 0
b7 0 0 1 0 0 0 0 0 0 0
b6 0 0 1 0 1 0 0 0 0 0
C = b5 0 0 0 0 1 0 0 0 0 0b4 0 0 0 0 0 1 1 0 0 0
b3 0 0 0 0 0 1 1 1 0 0
b2 0 0 0 0 0 0 1 1 1 0
b1 0 0 0 0 0 0 0 1 1 0
b0 0 0 0 0 0 0 0 0 1 1
In case of XOR-CA rule, the following theorem relates a
pattern to its basin.
Theorem 1 If any pair of arbitrary patterns- B1 and B2 ever
reach -basin on consecutive applications of XOR-CA rulethen the pattern-B=B1B2 will reach zero-basin onconsecutive applications of XOR-CA rule.
Proof:
Let 0 and 0 are two arbitrary patterns falling in the
-basin. The pattern
0reaches
-basin after k
thconsecutive
application of XOR-CA rule. Also, let denote the patternwhich is derived afterithconsecutive application of XOR-CArule on 0 . The above assumption implies followingequations.
0 = 1 1 = 2 = +1 = = = 1 = = . 0 =
=1
0 =
0
Similarly, if the pattern 0 reaches -basin after k' consecutivapplication of XOR-CA rule and k > k' then we can state thfollowing equation since the attractor pattern will not chang
even after application of XOR-CA rule.
0 = = 0Now, 0 0 = = 0 leads to 0 0 = 0 The above equation implies that the pattern (B) derived from
=
0
0also reaches zero-basin.
Hence is the proof.
Example 1In Fig.1, two patterns B1 = 01000 and B2 = 1000are in the zero basin with B = B1B2= 11000 also falling ithe same zero-basin.
Theorem 1 confirms that the hamming distance between a paof patterns falling in the same basin gets reflected in the zer
basin patterns. This result obviously leads to the fact tha
XOR-CA rules with patterns in zero-basin close to each othe
with respect to their hamming distance can act as effectiv
pattern classifiers. The state-transition characteristics of suc
CA are desirable for DNA-pattern classification wherei
similar patterns will tend to fall in zero-basin.
Design of Multi-stage Hierarchical Classifier: A two clas
XOR-CA-classifier is favourable for implementation due to it
simplicity but has several limitations due to its lineacharacteristics. The limitations of single-stagXOR-CA-Classifier can be avoided to a certain extent bdesigning a multi-stage hierarchical classifier. In multi-stag
classification scheme, the single stage classifier is repeatedl
employed at every stage leading to a hierarchical tree-lik
structure with each node corresponding to a single stage CA
classifier (Fig. 4).
Class B
GCGCT
GTGCT
GTGCC
TTGCT
GCGCC
Class A
AATTC
ATTTC
ATTGA
CATTC
AATTA
-
5/22/2018 SouravSaha008_CiiTSep2013
4/7
IV. EVOLUTION OF THE CABY SIMULATED ANNEALINGSimulated annealing is a generalization of a Monte Carlo
method for examining the equations of state and frozen states of
n-body systems [26]. We employ and appropriately tune theSimulated Annealing to arrive at the desired CA with patterns
in zero-basin close to each other.
In Simulated Annealing an initial temperature (Ti) is set. The
temperature decreases exponentially during the process [27]. Ateach temperature point (Tp) some action is taken based on the
value of Cost Function. The entire process continues till
temperature becomes zero. To evaluate the CA rules as a DNA
pattern classifier, we design a heuristic cost function asdescribed below. Let us assume that we are given with two
distinct classes of DNA sequences, represented by class A = {
AATTC, ATTTC, ATTGA, CATTC, AATTA} and class B = {
GCGCT, GTGCT, GTGCC, TTGCT, GCGCC}. We initially
create a randomly generated CA rule, represented by the
Coefficient matrix C. For training the classifier we arbitrarily
select NA number of DNA sequences from class A and NBnumber of DNA sequences from class B. Let us assume that
out of the (NA + NB) sequences NAB patterns fall in thezero-basin by applying the CA rules and the rest of the
sequences fall in the non-zero basin. Next we emit the
consensus sequence-CSeq for these NAB numbers of DNAsequences using HMMER [28], which is an online DNA
sequence analysis tool based on Hidden Markov Models. Let L
be the average Levenshtein distance of these NABnumbers of
DNA sequences as determined with respect to the consensus
sequence-CSeq. The Levenshtein distance Lev(x, y) between
two DNA sequences x and y of lengths m and n respectively is
given by
Levx,y m, n = maxm, n ,if minm, n = 0
min Levx,y
m
1, n
+ 1
Levx,y m, n 1 + 1Levx,y m 1, n 1 + [xm yn] otherwiseThe Levenshtein distance is an integer, which gives a measure
of similarity between two DNA sequences. To compute the
Fuzzy Levenshtein distance [29], the percentage similarity
between two DNA sequences is computed. To transform the
Levenshtein distance into a percentage, the number of edits
required are subtracted from 1.0 and divided by the length ofthe longest string. The Fuzzy Levenshtein distance is obtained
by multiplying the resulting value by 100. The Fuzzy
Levenshtein distance of the sequences in a of three DNA
sequences in the same basin from their consensus sequence
illustrated in the table below:
TABLEIII
COMPUTATION OF FUZZY LEVENSHTEIN DISTANCE
SequenceConsensus
Sequence
Fuzzy Levenshtein
distance
CAGAT
CAGTT
0.8
AGGTT 0.2
CAATT 0.6
The fitness cost of a CA as solution is then calculated as th
average of the Fuzzy Levenshtein distances of each sequence i
the alignment to the consensus alignment. For example, th
fitness cost of the solution in the previous example is 0.53. Th
lower is the value of L the better is the fitness cost of the CA
rule as a classifier.
There are two types of solutions based on cost value - Bes
Solution (BS) and Current Solution (CS). A New Solution (NS
at the next Tp compares its cost value with CS. If NS has bette
cost value than CS, then NS becomes CS. The new solutio(NS) is also compared with BS and if NS is better, then N
becomes BS. Even if NS is not as good as CS, NS is accepte
with a probability. This step is done typically to avoid any locaminima. The complete algorithm is presented below:
AlgorithmSA_EvolveCA
// Input: Pattern Size (n), Pattern Set (S), Initial Temp. (Ti)
// Output: CA Rule.
1 Tp= Ti2 CS = BS = NULL3 while(Tp> 0) {4 i f(Tp> 0.5 * Ti) {5 Randomly generate a CA as guess solution6 }7 else {8 Generate a new solution from CS9 }10 Generate state transition table and rule table11 NS = CA-Rule12 cost= cost-value(NS)cost-value(CS)13 i f(cost< 0) {14 CS = NS15 i f(cost-value(NS) < cost-value(BS)) {16 BS = NS17 }18 }19 else20 accept CS = NS with probability / 21 Reduce Tpexponentially22 }The above mentioned simulated annealing algorithm continue
to explore CA-search space with heuristic approach fo
obtaining desired CA-rule as long as the temperature remain
positive. The temperature (Tp) in simulated annealing
initialized with a large value (line 1) and at every attempt it i
S1, S2, S3, S4
S1, S2 S3, S4
S1 S2 S3 S4
Fig. 4 Multi-stage hierarchical classifier
-
5/22/2018 SouravSaha008_CiiTSep2013
5/7
reduced (line 21) gradually to get to the termination phase. A
CA-Rule as a candidate solution is randomly generated (line 5)
through random synthesis of CA-Coefficient matrix. In order to
obtain neighbor candidate solution to CS (i.e. current solution
derived so far), a few bits of CA-Coefficient matrix
corresponding to CS is altered (line 8). The probability of
accepting a new candidate solution as current solution depends
on the fitness-cost value. It is evident from the algorithm that
every new candidate solution has the possibility to becomecurrent solution irrespective of its fitness-cost. However, with
the temperature approaching zero value i.e. as the algorithm
approaches termination phase, the probability of accepting
less-fit CA-Rule also diminishes (line 21). During the
exploration, the algorithm records the best explored CA-Rule
as BS (line 16).
V. RESULTThis section reports experimental observations during
evaluation of our proposed CA based DNA classification
scheme. To analyse the performance of the proposed CA based
DNA-pattern classifier, the experiment has been performed onsynthetic datasets. All the experiments have been conducted
under the following setup.
Hardwareo Processor - Intel Core i7-3610QM CPU
@ 2.30GHz 8o RAM8GBo Disk 1000 GB
Softwareo Operating system Open SUSE Kernel
version 3.1.0-1.2-desktop
o OS type32-bito Compiler used javac version 4.6.2 (SUSE
Linux)
During the experimentation, emphasis has been put on thebehavior of our model in response to the varying DNA
sequence length as well as number of DNA-trainee patterns (i.e.trainee-size). The given set of DNA sequences has been
randomly divided into a trainee set and testing-set. The most
desirable CA is evolved through simulated annealing heuristic
algorithm and the CA is assumed to be the best explored
solution which can classify the trainee DNA patterns efficiently
with respect to their Fuzzy Levenshtein distances as discussed
in previous section. The testing-set is used to measure the
class-prediction accuracy of the proposed model built with
randomly chosen trainee patterns in comparison with the actual
class membership. The overall performance of the proposedscheme is represented in the form of following graphs plottedwith variations of DNA sequence size and number of trainee
sequences. Each of the figures presents classification accuracy
of the proposed model for DNA patterns of various sequence
lengths against varying trainee pattern size. Fig. 5 displaysclassification accuracy of the proposed model with DNA
sequence length 20 whereas Fig. 6 reports classification
accuracy with DNA sequence length 40 against various trainee
pattern sizes. The observation reveals several interesting facts
on the behavior of the model. In both the cases, the accuracy
level has been observed as ranging from 60 percent to 9
percent showing linear improvement with the increase i
number of trainee DNA patterns.
Fig. 5 Classification Accuracy vs. Number of Trainee Patterns for asequence length of 20
Fig. 6 Classification Accuracy vs. Number of Trainee Patterns for asequence length of 40
20 40 60 80 100
Classification
Accuracy67.19 74.45 80.13 89.03 93.77
60
65
70
75
80
85
90
95
100
ClassificationAccuracy
Number of Trainee Patterns
20 40 60 80 100
Classification
Accuracy62.44 66.33 69.06 78.11 83.04
60
65
70
75
80
85
Classificati
onAccuracy
Number of Trainee Patterns
Sequence Length = 20
Sequence Length = 40
-
5/22/2018 SouravSaha008_CiiTSep2013
6/7
Fig. 7 Classification Accuracy vs. Number of Trainee Patterns for asequence length of 60
Fig. 8 Classification Accuracy vs. Number of Trainee Patterns for asequence length of 80
The behavior of the proposed scheme does not vary drasticallywith respect to the other sequence lengths as obvious in Fig. 7
(DNA Sequence Length 60), Fig. 8 (DNA Sequence Length80), and Fig. 9 (DNA Sequence Length 100). However, it is
evident from each graph that as the number of trainee patternsincreases the accuracy level also increases almost linearly. One
of the interesting observations is that as long as the trainee
pattern size remains below 60 percent, the performance of the
model does not vary too much with the variation in DNA
sequence length. However, while dealing with number of
trainee patterns exceeding 60 percent of the given set, theclassification accuracy of the model falls with the increase in
DNA sequence length. The outcome also indicates that as the
sequence length increases the average performance of the
scheme slides down a bit. But the accuracy rate rises sharply
with the increase in number of trainee patterns. It is eviden
from our observation that the proposed classification schem
has the potential to classify DNA sequences with reasonabl
accuracy rate.
Fig. 9 Classification Accuracy vs. Number of Trainee Patterns for asequence length of 100
VI. CONCLUSIONThe identification and classification of genes in new DNA
sequences information is not a trivial problem. The researche
is quite often faced with hundreds of gigabytes of data to b
analyzed. This difficulty is compounded by the man
competing choices for the parameters, in choosing th
algorithm, in choosing the similarity metric, in selecting thclassification model and finally in selecting a terminatin
criterion. This paper has presented the idea of a cellula
automata based DNA pattern classifier, which is low-cos
high-speed and works with high accuracy. The propose
technique of DNA pattern classification would open up a wid
scope of investigative studies with a goal to explore furthe
improvements in this area. .
REFERENCES
[1] Stefania Bandini. Guest Editorial - Cellular Automata. Future GeneratioComputer Systems, 18:vvi, August 2002.
[2] A. Albicki, S. K. Yap, M. Khare, and S. Pamper. Prospects on CellulAutomata Application to Test Generation. Technical Report EL-88-0Dept. of Electrical Engg., Univ. of Rochester, 1988.
[3] H. Baltzer, W. P. Braun, and W. Kohler. Cellular Automata Model foVegetable Dynamics. Ecological Modelling, 107:113125, 1998.
[4] S. Wolfram, Theory and application of Cellular Automata, WorScientific, 1986.
[5] P. H. Bardell. Analysis of Cellular Automata used as Pseudo-RandoPattern Generators. In International Test Conference, pages 762761990.
[6] C. Burks and D. Farmer. Towards Modeling DNA Sequences aAutomata. Physica D, 10:157167, 1984.
[7] J. H. Moore and L. W. Hahn. A Cellular Automata-based PatterRecognition Approach for Identifying Gene-Gene an
Gene-Environment Interactions. American Journal of Human Genetic67(52), 2000.
20 40 60 80 100
Classification
Accuracy61.78 65.31 70.07 79.99 87.09
60
65
70
75
80
85
90
ClassificationAccuracy
Number of Trainee Patterns
20 40 60 80 100
Classification
Accuracy60.03 68.65 71.01 73.76 84.88
50
55
60
65
70
75
80
85
90
ClassificationAccuracy
Number of Trainee Patterns
20 40 60 80 100
Classification
Accuracy
54.97 57.06 61.21 69.33 73.07
50
55
60
65
70
75
ClassificationAccuracy
Number of Trainee Patterns
Sequence Length = 100
Sequence Length = 60
Sequence Length = 80
-
5/22/2018 SouravSaha008_CiiTSep2013
7/7
[8] J. H. Moore and L. W. Hahn. Multilocus Pattern Recognition usingCellular Automata and Parallel Genetic Algorithms. In Proc. of the
Genetic and Evolutionary Computation Conference (GECCO-2001),page 1452, 7-11 July 2001.
[9] A. Albicki and M. Khare. Cellular Automata used for Test PatternGeneration. In Proc. ICCD, pages 5659, 1987.
[10] A. Albicki and S. K. Yap. Covering a Set of Test Patterns by a CellularAutomata. Research Review, Dept. of Comp. Sc. and Engg., Univ. of
Rochester, 1987.[11] E. R. Banks. Information Processing and Transmission in Cellular
Automata. PhD thesis, M. I. T., 1971.
[12] S. C. Benjamin and N. F. Johnson. A Possible Nanometer-scaleComputing Device based on an Adding Cellular Automaton. Applied
Physics Letters, 1997.
[13] A. M. Barbe. A Cellular Automata Ruled by an Eccentric ConservationLaw. Physica D, 45:4962, 1990.
[14] Jianbo Gao, Yan Qi, Yinhe Cao, and Wen-wen Tung, "Protein CodingSequence Identification by Simultaneously Characterizing the Periodicand Random Features of DNA Sequences", Journal of Biomedicine andBiotechnology, Vol. 2, pp. 139146, 2005.
[15] Peterson, D.; Lee, C.H., "A DNA-based pattern recognition technique forcancer detection," Engineering in Medicine and Biology Society, 2004.IEMBS '04. 26th Annual International Conference of the IEEE , vol.2,
no., pp.2956,2959, 1-5 Sept. 2004 doi: 10.1109/IEMBS.2004.1403839[16] Ido Priness, Oded Maimon and Irad Ben-Gal, Evaluation of
gene-expression clustering via mutual information distance measure,
BMC Bioinformatics 2007, 8:111 doi:10.1186/1471-2105-8-111
[17] David Kulp, avid Haussler, Martin G. Reese Frank, H. Eeckman , AGeneralized Hidden Markov Model for the Recognition of Human Genesin DNA, ISMB-96 Proceedings, 1996.
[18] Sathish Kumar S, N.Duraipandian, An Effective Identification ofSpecies from DNA Sequence: A Classification Technique by Integrating
DM and ANN, International Journal of Advanced Computer Science andApplications, Vol. 3, No.8, , pp. 104114, 2012.
[19] A. W. Burks. Essays on Cellular Automata. Technical Report, Univ. ofIllinois, Urbana, 1970.
[20] S. Bhattacharjee, J. Bhattacharya, and P. Pal Chaudhuri. An EfficientData Compression based on Cellular Automata. In Data Compression
Conference (DCC95), 1995.[21] Stephen A Billings and Yingxu Yang. Identification of Probabilistic
Cellular Automata. IEEE Transaction on System, Man and Cybernetics,
Part B, pages 112, 2002.[22] M. S. Capcarrere. Cellular Automata and Other Cellular System: Design
and Evolution. PhD thesis, Swiss Federal Institute of Technology,
Luassane, 2002.[23] S. Chakraborty, D. Roy Chowdhury, and P. Pal Chaudhuri. Theory and
Application of Non-Group Cellular Automata for Synthesis of Easily
Testable Finite State Machines. IEEE Trans. on Computers,45(7):769781, July 1996.
[24] S. Chattopadhyay, S. Adhikari, S. Sengupta, and M. Pal. Highly Regular,Modular, and Cascadable Design of Cellular Automata-based PatternClassifier. IEEE Transaction on VLSI Systems, 8(6):724735, December2000.
[25]N. Ganguly, P. Maji, S. Dhar, B. K. Sikdar, and P. Pal Chaudhuri.Evolving Cellular Automata as Pattern Classifier. In Proc. of FifthInternational Conference on Cellular Automata for Research andIndustry, ACRI 2002, Switzerland, pages 5668, October 2002.
[26] E. H. L. Aarts and J. Korst. Simulated Annealing and BoltzmannMachines. John Wiley & Sons, Essex, U.K., 1989.
[27] De Vicente, Juan; Lanchares, Juan; Hermida, Romn (2003). "Placementby thermodynamic simulated annealing". Physics Letters A 317 (56):415423.
[28] HMMER 3.1 (February 2013);http://hmmer.org/[29] Hjelmqvist, Sten (March 2012), Fast, memory efficient Levenshtein
algorithm(http://www.codeproject.com/Articles/13525/Fast-memory-efficient-Lev
enshtein-algorithm)
Prof. Tamal Chakrabarti is currently Assistant Professor, Department of
Computer Science and Engineering, Institute of Engineering and Management.He started his career with Wipro Technologies, India as a Software Engineer.Then he joined Flextronics Software Systems, India as a Technical Leader.
After that he was associated with IBM India Pvt. Limited, where he was leading
a software development team. Subsequently, he worked with InfosysTechnologies, India, as a Project Manager. Since 2009, he has been teaching in
Institute of Engineering and Management. He did his graduation (B.Sc., Honsin Physics from Calcutta University in 1997, and B.Tech. in Computer Scienc
and Engineering from Calcutta University in 2000. In 2006 He received his Mdegree from BITS Pilani, India. He has been presented with numerous awardfrom professional bodies and academia; including Feather in My Cap Award
(twice) by Wipro Technologies, Spot Award by Lucent TechnologieBravo Award by IBM India Pvt. Ltd. and Award of Excellence fo
contribution in the International Conference on innovativ
techno-management solution for social sector, in 2012. He has participated various projects in India, Belgium and Ireland. IBM India Pvt. Ltd. had honorehim with Mentor Award for guiding a projectin The Great Mind Challeng
2011. Prof. Chakrabarti is a member of the Computer Society of India (CSIHe has authored numerous papers in journals and conferences. His researc
interests include, Bio-informatics, Programming Languages and Design an
Analysis of Algorithms.
Prof. Sourav Sahais currently Assistant Professor, Department of Comput
Science and Engineering, Institute of Engineering and Management. He startehis career working in R&D sector at various companies. Since 2011, he ha
been teaching in Institute of Engineering and Management. He did h
graduation (B.Tech) in Computer Science & Engineering from Kalyan
University in 2000, and obtained his Master of Engineering (M.E.) degree Computer Science and Engineering from Bengal Engineering and Scienc
University in 2002. He was awarded university medal for securing highemark in M.E. and also received award from Indian National Academy oEngineering for best innovative bachelor level project in 2000. He ha
numerous international and national publications in reputed journals an
conferences to his credit throughout his entire career. His research interesinclude Cellular Automata, Pattern Recognition, Bio-Medical Engineerin
Bio-Informatics etc.
Prof. (Dr.) Devadatta Sinhais currently Professor, Department of Comput
Science and Engineering of University of Calcutta, India. He joined thdepartment as a Reader in 1989. Prior to this, he worked as Assistant Professo
Department of Computer Engineering, B.I.T. Mesra Ranchi and as Lecturer an
Senior Lecturer (Computer Science) at the Department of MathematicJadavpur University. He obtained his Ph.D. from Jadavpur University in 198and his area of research was Program Testing. He has published more than 5
papers and articles in different national and international journals, proceedingperiodicals and monographs. His area of interests includes Softwa
Engineering, Parallel and Distributed Computing, Bioinformatic
Cryptography. He has guided a number of doctoral and masters thesis Computer Science. He worked as Head of the Department of ComputScience and Engineering, University of Calcutta for two terms of two yea
each. He worked as Chairman, undergraduate studies in Computer SciencUniversity of Calcutta and currently the Convener, Ph.D. Committee
Computer Science and Engineering, University of Calcutta. He is associate
with a number of academic institutions as member in their academic bodies. His involved in a number of national and international conferences in thcapacity of Chairman of PC/OC. He served as Chairman, Computer Society o
India, Kolkata Chapter and is a Patron of the chapter. He was SectionaPresident, Section of Computer Science, and Indian Science CongreAssociation in 1993-94.
http://hmmer.org/http://hmmer.org/