souravsaha008_ciitsep2013

5/22/2018 SouravSaha008_CiiTSep2013

1/7

Abstract Contemporary researchers of Bio-informatics havewitnessed an exponential growth in the amount of biologicalinformation over the years. The increasing volume of DNA sequences

has of late created interest among many scientists in computationalapproaches to DNA sequence analysis. A lot of computer analysis ofDNA sequences is directed toward meaningful interpretation ofbiologically significant patterns. Pattern classification forms one of themost important foundations for extraction of knowledge from theenormous DNA sequence databases. This paper reports a cheap andefficient DNA pattern classifier based on the sparse network ofCellular Automata.

KeywordsBio-informatics, Cellular Automata, DNA, PatternClassification, Sequence Analysis

I. INTRODUCTIONellular Automata (CA) is a simple model of a spatially

extended decentralized system made up of a number ofindividual components (cells) [1]. The communication

between constituent cells is limited to local interaction. The

state of each individual cell changes over time depending on the

states of its neighbors [2], [3]. The overall structure can be

viewed as a parallel processing computer [4].Cellular Automata is used by Bio-informatics researchers for

automated recognition, description, classification and grouping

of patterns [5], [6]. For example, CA based models have been

reported to recognize genetic disorder in cells responsible for

the development of cancer [7], [8]. A CA based pattern

classification model generally comprises of two basic

operations exploration of CA for supervised classification

and prediction of class for unknown patterns.

DNA (deoxyribonucleic acid) consists of two long strands,

each strand being made of units called phosphates, deoxyribosesugars and nucleotides (adenine [A], guanine [G], cytosine [C],and thymine [T]) linked in series. For ease of understanding,

biologists commonly represent DNA molecules simply by theirdifferent nucleotides using the symbols {A, G, C, T}. The DNA

in each cell provides the full genetic blueprint for that cell.

Manuscript received October 9, 2013.Tamal Chakrabarti is with the Computer Science and Engineering

Department, Institute of Engineering and Management, Y-12, Block-EP,Sector-V, Salt Lake Electronics Complex, Kolkata-700091, West Bengal, India( e-mail: [email protected]).

Sourav Saha is with the Computer Science and Engineering Department,

Institute of Engineering and Management, Y-12, Block-EP, Sector-V, SaltLake Electronics Complex, Kolkata-700091, West Bengal, India (e-mail:

[email protected]).Devadatta Sinha is with the Computer Science and Engineering Department,

Calcutta University, 92 Acharya Prafulla Chandra Road, Kolkata-700009,

West Bengal, India (e-mail: [email protected]).

The identification of genes in a given DNA sequence is a

emerging field of study in Bio-informatics. One popula

approach is to develop a predictive computer model from

database of known gene sequences and use the resulting mode

to predict where genes are likely to be in newly generatesequence information. Discovery of coding regions in DNA

sequences can therefore be viewed as a pattern recognitio

problem. The explosive growth in biological data demands thathe most advanced and powerful ideas in machine learning

such as cellular automata, should be brought to bear on suc

problems [9], [10].

A pattern classification/recognition algorithm using CA

usually has two phases [11], the learning or training phase an

the testing phase. In the training phase, th

machine/network/algorithm is trained with some benchmar

patterns [12]. In the testing phase new patterns are teste

against the trained model [13], built in the previous step.

This paper proposes a unique DNA-classification schem

using Cellular Automata (CA), which is evolved throug

Simulated Annealing heuristic technique.

II. RELATED WORKSPlenty of research works deal with the problem of classificatio

of DNA patterns from the genomic sequences. One of the vita

tasks in the study of genomes is DNA sequence identificatio

[14]. Recently researchers have attempted variou

soft-computing techniques for DNA sequence identification

Peterson et. al. used proper orthogonal decomposition (PODtechnique to recognize various cancerous patterns in DNA

sequences [15]. Since the performance of a classificatio

strategy heavily depends on selection of similarity or distanc

measure, there has been a demand for exploration of variou

similarity metrics for DNA classification. Priness et. a

compared various unsupervised classification techniques oDNA sequences with respect to the Euclidean distance and th

Pearson correlation [16]. Kulp et. al. proposed a generalize

Hidden Markov Model (GHMM) based framework t

recognize human genes in DNA [17]. Kumar et. al. integratepattern mining and neural network-based approaches to classif

DNA-sequences with reduced dimensions using Multi-lineaPrincipal Component Analysis (MPCA) [18]. However, th

VLSI-implementation- friendly sparse structure of CA has no

yet been extensively utilized as DNA pattern classifier. Th

proposed work evolves a CA based classification framewor

for DNA pattern prediction in linear time. In order to deriv

desirable CA for DNA pattern classification, the propose

scheme has employed simulated annealing heuristic with Fuzz

Levenshtein distance as similarity-cost-measure among DNA

A Cellular Automata Based DNA Pattern

Classifier

Tamal Chakrabarti, Sourav Saha, and Devadatta Sinha

C


2/7

patterns. The simple structure of CA renders the proposed

model easily implementable on a VLSI chip suitable for

embedded applications, which demand high speed.

III. CELLULAR AUTOMATA AS DNAPATTERN CLASSIFIERA Cellular Automaton (CA) consists of a number of cells

organized in the form of a lattice [19]. It evolves in discrete

space and time, and can be viewed as an autonomous finite state

machine (FSM) [20]. Each cell stores a discrete variable at timet that refers to the current state of the cell. The next state ofthe cell +1at time (t + 1) is affected by its current state andthe states of its neighbors at time-t. For example, in case of

3-neighborhood CA, the state transition depends on the cell

itself and its left and right neighbors), such that:

+1 = 1 , , +1 (1)Where 1 and +1 are the current states of left and rightneighbors of the ith CA cell at time t and is the ith statetransition function.

Every CA gives rise to a state transition graph consistingof a number of cyclic and acyclic states [21]. The state

transition graph of an arbitrary CA is shown in Fig. 1. The set of

non-cyclic states of the CA as depicted in Fig. 1 forms inverted

trees rooted at the cyclic states. The cyclic states are referred to

as attractors [22]. The states of a tree rooted at the cyclic state

forms the -basin [23].

Fig. 1 State Transition Diagram

A CA with multiple basins may be viewed as a natural

classifier [24], [25]. It tends to classify a given set of patterns

into multiple disjoint state transition graphs (Fig. 1) with each

disjoint graph representing a class falling in their respective

attractor basin.

As an example let us consider the DNA sequences depicted

in Table I. We have encoded the four nucleotides as A = 00, T =

01, G = 10, C = 11. This binary encoding scheme, gives rise t

the following binary codes for the DNA sequences unde

consideration.TABLEI

BINARY ENCODING OF DNASEQUENCES

Serial Nr. DNA Sequence Binary Code (b9b8b1b0)

1 AATTC 0000010111

2 ATTTC 0001010111

3 ATTGA 0001011000

4 CATTC 11000101115 AATTA 0000010100

6 GCGCT 1011101101

7 GTGCT 1001101101

8 GTGCC 1001101111

9 TTGCT 0101101101

10 GCGCC 1011101111

To classify the given set of DNA sequences into two classe

we need to design a CA based classifier for two pattern sets P

and P2, such that two arbitrary patterns P1 and P2 shoul

fall into different attractor basins.

Let us use the rules of state transition as depicted in Table I

Here

means the bitwise XOR operation.

TABLEII

STATE TRANSITION RULES

Bit Position Rule

0 b1b01 b2 b12 b3 b2 b13 b4 b3 b24 b4 b35 b5

6 b7 b57 b7

8 b9 b8 b79 b9 b8

Using the given CA-rule set we observe that the sequencegiven in Table I, can be classified into two classes, th

0-attractor basin (an attractor with all zeros) and the non-zer

attractor basins, as depicted in Fig. 2.

Fig. 2 Classifying the DNA Patterns into Attractor Basins

Using the above CA the given DNA sequences can b

0000010100 0001010111 0001011000

0-Basin

1100010111 0000010111

1011101101 1001101101 1001101111

Non 0-Basin

0101101101 1011101111

10001 01001

10000 11000 01000

00001 00000 11001

10010 10011

01010 11011 01011

00010 00011 11010

10100 10101

01100 11100 01101

00101 00100 11101

10110 10111

01110 11111 01111

00110 00111 11110


3/7

categorized into two sets as shown in Fig. 3.

Fig. 3 Categorization of the DNA Patterns into two Classes

Any CA rule with only XOR-operation can be emulated by a

coefficient-matrix multiplication scheme as illustrated below.

The next state of a binary pattern B = bn-1 bn-2b1b0 can be

derived by multiplying it with corresponding CA-Coefficient

matrix.

= 0,0 0,1 0,11,0 1,1 1,11,0 1,1 1,1 011

In order to build CA-Coefficient matrix for a CA rule the

following equation is used.

, = 1, , , = 0, 1 , ,0, 1

From the equation it is clear that the (i, j)th position of the

coefficient matrix will hold one if only if the CA-rule at ith

bitdepends on jthbit for XOR operation otherwise it holds zero

value. For example, the corresponding CA-Coefficient

matrix(C) for the CA-Rule set shown in Table-II can be derived

as follows.

b9 b8 b7 b6 b5 b4 b3 b2 b1 b0

b9 1 1 0 0 0 0 0 0 0 0

b8 1 1 1 0 0 0 0 0 0 0

b7 0 0 1 0 0 0 0 0 0 0

b6 0 0 1 0 1 0 0 0 0 0

C = b5 0 0 0 0 1 0 0 0 0 0b4 0 0 0 0 0 1 1 0 0 0

b3 0 0 0 0 0 1 1 1 0 0

b2 0 0 0 0 0 0 1 1 1 0

b1 0 0 0 0 0 0 0 1 1 0

b0 0 0 0 0 0 0 0 0 1 1

In case of XOR-CA rule, the following theorem relates a

pattern to its basin.

Theorem 1 If any pair of arbitrary patterns- B1 and B2 ever

reach -basin on consecutive applications of XOR-CA rulethen the pattern-B=B1B2 will reach zero-basin onconsecutive applications of XOR-CA rule.

Proof:

Let 0 and 0 are two arbitrary patterns falling in the

-basin. The pattern

0reaches

-basin after k

thconsecutive

application of XOR-CA rule. Also, let denote the patternwhich is derived afterithconsecutive application of XOR-CArule on 0 . The above assumption implies followingequations.

0 = 1 1 = 2 = +1 = = = 1 = = . 0 =

=1

0 =

0

Similarly, if the pattern 0 reaches -basin after k' consecutivapplication of XOR-CA rule and k > k' then we can state thfollowing equation since the attractor pattern will not chang

even after application of XOR-CA rule.

0 = = 0Now, 0 0 = = 0 leads to 0 0 = 0 The above equation implies that the pattern (B) derived from

=

0

0also reaches zero-basin.

Hence is the proof.

Example 1In Fig.1, two patterns B1 = 01000 and B2 = 1000are in the zero basin with B = B1B2= 11000 also falling ithe same zero-basin.

Theorem 1 confirms that the hamming distance between a paof patterns falling in the same basin gets reflected in the zer

basin patterns. This result obviously leads to the fact tha

XOR-CA rules with patterns in zero-basin close to each othe

with respect to their hamming distance can act as effectiv

pattern classifiers. The state-transition characteristics of suc

CA are desirable for DNA-pattern classification wherei

similar patterns will tend to fall in zero-basin.

Design of Multi-stage Hierarchical Classifier: A two clas

XOR-CA-classifier is favourable for implementation due to it

simplicity but has several limitations due to its lineacharacteristics. The limitations of single-stagXOR-CA-Classifier can be avoided to a certain extent bdesigning a multi-stage hierarchical classifier. In multi-stag

classification scheme, the single stage classifier is repeatedl

employed at every stage leading to a hierarchical tree-lik

structure with each node corresponding to a single stage CA

classifier (Fig. 4).

Class B

GCGCT

GTGCT

GTGCC

TTGCT

GCGCC

Class A

AATTC

ATTTC

ATTGA

CATTC

AATTA


4/7

IV. EVOLUTION OF THE CABY SIMULATED ANNEALINGSimulated annealing is a generalization of a Monte Carlo

method for examining the equations of state and frozen states of

n-body systems [26]. We employ and appropriately tune theSimulated Annealing to arrive at the desired CA with patterns

in zero-basin close to each other.

In Simulated Annealing an initial temperature (Ti) is set. The

temperature decreases exponentially during the process [27]. Ateach temperature point (Tp) some action is taken based on the

value of Cost Function. The entire process continues till

temperature becomes zero. To evaluate the CA rules as a DNA

pattern classifier, we design a heuristic cost function asdescribed below. Let us assume that we are given with two

distinct classes of DNA sequences, represented by class A = {

AATTC, ATTTC, ATTGA, CATTC, AATTA} and class B = {

GCGCT, GTGCT, GTGCC, TTGCT, GCGCC}. We initially

create a randomly generated CA rule, represented by the

Coefficient matrix C. For training the classifier we arbitrarily

select NA number of DNA sequences from class A and NBnumber of DNA sequences from class B. Let us assume that

out of the (NA + NB) sequences NAB patterns fall in thezero-basin by applying the CA rules and the rest of the

sequences fall in the non-zero basin. Next we emit the

consensus sequence-CSeq for these NAB numbers of DNAsequences using HMMER [28], which is an online DNA

sequence analysis tool based on Hidden Markov Models. Let L

be the average Levenshtein distance of these NABnumbers of

DNA sequences as determined with respect to the consensus

sequence-CSeq. The Levenshtein distance Lev(x, y) between

two DNA sequences x and y of lengths m and n respectively is

given by

Levx,y m, n = maxm, n ,if minm, n = 0

min Levx,y

m

1, n

+ 1

Levx,y m, n 1 + 1Levx,y m 1, n 1 + [xm yn] otherwiseThe Levenshtein distance is an integer, which gives a measure

of similarity between two DNA sequences. To compute the

Fuzzy Levenshtein distance [29], the percentage similarity

between two DNA sequences is computed. To transform the

Levenshtein distance into a percentage, the number of edits

required are subtracted from 1.0 and divided by the length ofthe longest string. The Fuzzy Levenshtein distance is obtained

by multiplying the resulting value by 100. The Fuzzy

Levenshtein distance of the sequences in a of three DNA

sequences in the same basin from their consensus sequence

illustrated in the table below:

TABLEIII

COMPUTATION OF FUZZY LEVENSHTEIN DISTANCE

SequenceConsensus

Sequence

Fuzzy Levenshtein

distance

CAGAT

CAGTT

0.8

AGGTT 0.2

CAATT 0.6

The fitness cost of a CA as solution is then calculated as th

average of the Fuzzy Levenshtein distances of each sequence i

the alignment to the consensus alignment. For example, th

fitness cost of the solution in the previous example is 0.53. Th

lower is the value of L the better is the fitness cost of the CA

rule as a classifier.

There are two types of solutions based on cost value - Bes

Solution (BS) and Current Solution (CS). A New Solution (NS

at the next Tp compares its cost value with CS. If NS has bette

cost value than CS, then NS becomes CS. The new solutio(NS) is also compared with BS and if NS is better, then N

becomes BS. Even if NS is not as good as CS, NS is accepte

with a probability. This step is done typically to avoid any locaminima. The complete algorithm is presented below:

AlgorithmSA_EvolveCA

// Input: Pattern Size (n), Pattern Set (S), Initial Temp. (Ti)

// Output: CA Rule.

1 Tp= Ti2 CS = BS = NULL3 while(Tp> 0) {4 i f(Tp> 0.5 * Ti) {5 Randomly generate a CA as guess solution6 }7 else {8 Generate a new solution from CS9 }10 Generate state transition table and rule table11 NS = CA-Rule12 cost= cost-value(NS)cost-value(CS)13 i f(cost< 0) {14 CS = NS15 i f(cost-value(NS) < cost-value(BS)) {16 BS = NS17 }18 }19 else20 accept CS = NS with probability / 21 Reduce Tpexponentially22 }The above mentioned simulated annealing algorithm continue

to explore CA-search space with heuristic approach fo

obtaining desired CA-rule as long as the temperature remain

positive. The temperature (Tp) in simulated annealing

initialized with a large value (line 1) and at every attempt it i

S1, S2, S3, S4

S1, S2 S3, S4

S1 S2 S3 S4

Fig. 4 Multi-stage hierarchical classifier


5/7

reduced (line 21) gradually to get to the termination phase. A

CA-Rule as a candidate solution is randomly generated (line 5)

through random synthesis of CA-Coefficient matrix. In order to

obtain neighbor candidate solution to CS (i.e. current solution

derived so far), a few bits of CA-Coefficient matrix

corresponding to CS is altered (line 8). The probability of

accepting a new candidate solution as current solution depends

on the fitness-cost value. It is evident from the algorithm that

every new candidate solution has the possibility to becomecurrent solution irrespective of its fitness-cost. However, with

the temperature approaching zero value i.e. as the algorithm

approaches termination phase, the probability of accepting

less-fit CA-Rule also diminishes (line 21). During the

exploration, the algorithm records the best explored CA-Rule

as BS (line 16).

V. RESULTThis section reports experimental observations during

evaluation of our proposed CA based DNA classification

scheme. To analyse the performance of the proposed CA based

DNA-pattern classifier, the experiment has been performed onsynthetic datasets. All the experiments have been conducted

under the following setup.

Hardwareo Processor - Intel Core i7-3610QM CPU

@ 2.30GHz 8o RAM8GBo Disk 1000 GB

Softwareo Operating system Open SUSE Kernel

version 3.1.0-1.2-desktop

o OS type32-bito Compiler used javac version 4.6.2 (SUSE

Linux)

During the experimentation, emphasis has been put on thebehavior of our model in response to the varying DNA

sequence length as well as number of DNA-trainee patterns (i.e.trainee-size). The given set of DNA sequences has been

randomly divided into a trainee set and testing-set. The most

desirable CA is evolved through simulated annealing heuristic

algorithm and the CA is assumed to be the best explored

solution which can classify the trainee DNA patterns efficiently

with respect to their Fuzzy Levenshtein distances as discussed

in previous section. The testing-set is used to measure the

class-prediction accuracy of the proposed model built with

randomly chosen trainee patterns in comparison with the actual

class membership. The overall performance of the proposedscheme is represented in the form of following graphs plottedwith variations of DNA sequence size and number of trainee

sequences. Each of the figures presents classification accuracy

of the proposed model for DNA patterns of various sequence

lengths against varying trainee pattern size. Fig. 5 displaysclassification accuracy of the proposed model with DNA

sequence length 20 whereas Fig. 6 reports classification

accuracy with DNA sequence length 40 against various trainee

pattern sizes. The observation reveals several interesting facts

on the behavior of the model. In both the cases, the accuracy

level has been observed as ranging from 60 percent to 9

percent showing linear improvement with the increase i

number of trainee DNA patterns.

Fig. 5 Classification Accuracy vs. Number of Trainee Patterns for asequence length of 20


20 40 60 80 100

Classification

Accuracy67.19 74.45 80.13 89.03 93.77

60

65

70

75

80

85

90

95

100

ClassificationAccuracy

Number of Trainee Patterns

20 40 60 80 100

Classification

Accuracy62.44 66.33 69.06 78.11 83.04

60

65

70

75

80

85

Classificati

onAccuracy


Sequence Length = 20



6/7



The behavior of the proposed scheme does not vary drasticallywith respect to the other sequence lengths as obvious in Fig. 7

(DNA Sequence Length 60), Fig. 8 (DNA Sequence Length80), and Fig. 9 (DNA Sequence Length 100). However, it is

evident from each graph that as the number of trainee patternsincreases the accuracy level also increases almost linearly. One

of the interesting observations is that as long as the trainee

pattern size remains below 60 percent, the performance of the

model does not vary too much with the variation in DNA

sequence length. However, while dealing with number of

trainee patterns exceeding 60 percent of the given set, theclassification accuracy of the model falls with the increase in

DNA sequence length. The outcome also indicates that as the

sequence length increases the average performance of the

scheme slides down a bit. But the accuracy rate rises sharply

with the increase in number of trainee patterns. It is eviden

from our observation that the proposed classification schem

has the potential to classify DNA sequences with reasonabl

accuracy rate.


VI. CONCLUSIONThe identification and classification of genes in new DNA

sequences information is not a trivial problem. The researche

is quite often faced with hundreds of gigabytes of data to b

analyzed. This difficulty is compounded by the man

competing choices for the parameters, in choosing th

algorithm, in choosing the similarity metric, in selecting thclassification model and finally in selecting a terminatin

criterion. This paper has presented the idea of a cellula

automata based DNA pattern classifier, which is low-cos

high-speed and works with high accuracy. The propose

technique of DNA pattern classification would open up a wid

scope of investigative studies with a goal to explore furthe

improvements in this area. .

REFERENCES

[1] Stefania Bandini. Guest Editorial - Cellular Automata. Future GeneratioComputer Systems, 18:vvi, August 2002.

[2] A. Albicki, S. K. Yap, M. Khare, and S. Pamper. Prospects on CellulAutomata Application to Test Generation. Technical Report EL-88-0Dept. of Electrical Engg., Univ. of Rochester, 1988.

[3] H. Baltzer, W. P. Braun, and W. Kohler. Cellular Automata Model foVegetable Dynamics. Ecological Modelling, 107:113125, 1998.

[4] S. Wolfram, Theory and application of Cellular Automata, WorScientific, 1986.

[5] P. H. Bardell. Analysis of Cellular Automata used as Pseudo-RandoPattern Generators. In International Test Conference, pages 762761990.

[6] C. Burks and D. Farmer. Towards Modeling DNA Sequences aAutomata. Physica D, 10:157167, 1984.

[7] J. H. Moore and L. W. Hahn. A Cellular Automata-based PatterRecognition Approach for Identifying Gene-Gene an

Gene-Environment Interactions. American Journal of Human Genetic67(52), 2000.

20 40 60 80 100

Classification

Accuracy61.78 65.31 70.07 79.99 87.09

60

65

70

75

80

85

90



20 40 60 80 100

Classification

Accuracy60.03 68.65 71.01 73.76 84.88

50

55

60

65

70

75

80

85

90



20 40 60 80 100

Classification

Accuracy

54.97 57.06 61.21 69.33 73.07

50

55

60

65

70

75







7/7

[8] J. H. Moore and L. W. Hahn. Multilocus Pattern Recognition usingCellular Automata and Parallel Genetic Algorithms. In Proc. of the

Genetic and Evolutionary Computation Conference (GECCO-2001),page 1452, 7-11 July 2001.

[9] A. Albicki and M. Khare. Cellular Automata used for Test PatternGeneration. In Proc. ICCD, pages 5659, 1987.

[10] A. Albicki and S. K. Yap. Covering a Set of Test Patterns by a CellularAutomata. Research Review, Dept. of Comp. Sc. and Engg., Univ. of

Rochester, 1987.[11] E. R. Banks. Information Processing and Transmission in Cellular

Automata. PhD thesis, M. I. T., 1971.

[12] S. C. Benjamin and N. F. Johnson. A Possible Nanometer-scaleComputing Device based on an Adding Cellular Automaton. Applied

Physics Letters, 1997.

[13] A. M. Barbe. A Cellular Automata Ruled by an Eccentric ConservationLaw. Physica D, 45:4962, 1990.

[14] Jianbo Gao, Yan Qi, Yinhe Cao, and Wen-wen Tung, "Protein CodingSequence Identification by Simultaneously Characterizing the Periodicand Random Features of DNA Sequences", Journal of Biomedicine andBiotechnology, Vol. 2, pp. 139146, 2005.

[15] Peterson, D.; Lee, C.H., "A DNA-based pattern recognition technique forcancer detection," Engineering in Medicine and Biology Society, 2004.IEMBS '04. 26th Annual International Conference of the IEEE , vol.2,

no., pp.2956,2959, 1-5 Sept. 2004 doi: 10.1109/IEMBS.2004.1403839[16] Ido Priness, Oded Maimon and Irad Ben-Gal, Evaluation of

gene-expression clustering via mutual information distance measure,

BMC Bioinformatics 2007, 8:111 doi:10.1186/1471-2105-8-111

[17] David Kulp, avid Haussler, Martin G. Reese Frank, H. Eeckman , AGeneralized Hidden Markov Model for the Recognition of Human Genesin DNA, ISMB-96 Proceedings, 1996.

[18] Sathish Kumar S, N.Duraipandian, An Effective Identification ofSpecies from DNA Sequence: A Classification Technique by Integrating

DM and ANN, International Journal of Advanced Computer Science andApplications, Vol. 3, No.8, , pp. 104114, 2012.

[19] A. W. Burks. Essays on Cellular Automata. Technical Report, Univ. ofIllinois, Urbana, 1970.

[20] S. Bhattacharjee, J. Bhattacharya, and P. Pal Chaudhuri. An EfficientData Compression based on Cellular Automata. In Data Compression

Conference (DCC95), 1995.[21] Stephen A Billings and Yingxu Yang. Identification of Probabilistic

Cellular Automata. IEEE Transaction on System, Man and Cybernetics,

Part B, pages 112, 2002.[22] M. S. Capcarrere. Cellular Automata and Other Cellular System: Design

and Evolution. PhD thesis, Swiss Federal Institute of Technology,

Luassane, 2002.[23] S. Chakraborty, D. Roy Chowdhury, and P. Pal Chaudhuri. Theory and

Application of Non-Group Cellular Automata for Synthesis of Easily

Testable Finite State Machines. IEEE Trans. on Computers,45(7):769781, July 1996.

[24] S. Chattopadhyay, S. Adhikari, S. Sengupta, and M. Pal. Highly Regular,Modular, and Cascadable Design of Cellular Automata-based PatternClassifier. IEEE Transaction on VLSI Systems, 8(6):724735, December2000.

[25]N. Ganguly, P. Maji, S. Dhar, B. K. Sikdar, and P. Pal Chaudhuri.Evolving Cellular Automata as Pattern Classifier. In Proc. of FifthInternational Conference on Cellular Automata for Research andIndustry, ACRI 2002, Switzerland, pages 5668, October 2002.

[26] E. H. L. Aarts and J. Korst. Simulated Annealing and BoltzmannMachines. John Wiley & Sons, Essex, U.K., 1989.

[27] De Vicente, Juan; Lanchares, Juan; Hermida, Romn (2003). "Placementby thermodynamic simulated annealing". Physics Letters A 317 (56):415423.

[28] HMMER 3.1 (February 2013);http://hmmer.org/[29] Hjelmqvist, Sten (March 2012), Fast, memory efficient Levenshtein

algorithm(http://www.codeproject.com/Articles/13525/Fast-memory-efficient-Lev

enshtein-algorithm)

Prof. Tamal Chakrabarti is currently Assistant Professor, Department of

Computer Science and Engineering, Institute of Engineering and Management.He started his career with Wipro Technologies, India as a Software Engineer.Then he joined Flextronics Software Systems, India as a Technical Leader.

After that he was associated with IBM India Pvt. Limited, where he was leading

a software development team. Subsequently, he worked with InfosysTechnologies, India, as a Project Manager. Since 2009, he has been teaching in

Institute of Engineering and Management. He did his graduation (B.Sc., Honsin Physics from Calcutta University in 1997, and B.Tech. in Computer Scienc

and Engineering from Calcutta University in 2000. In 2006 He received his Mdegree from BITS Pilani, India. He has been presented with numerous awardfrom professional bodies and academia; including Feather in My Cap Award

(twice) by Wipro Technologies, Spot Award by Lucent TechnologieBravo Award by IBM India Pvt. Ltd. and Award of Excellence fo

contribution in the International Conference on innovativ

techno-management solution for social sector, in 2012. He has participated various projects in India, Belgium and Ireland. IBM India Pvt. Ltd. had honorehim with Mentor Award for guiding a projectin The Great Mind Challeng

2011. Prof. Chakrabarti is a member of the Computer Society of India (CSIHe has authored numerous papers in journals and conferences. His researc

interests include, Bio-informatics, Programming Languages and Design an

Analysis of Algorithms.

Prof. Sourav Sahais currently Assistant Professor, Department of Comput

Science and Engineering, Institute of Engineering and Management. He startehis career working in R&D sector at various companies. Since 2011, he ha

been teaching in Institute of Engineering and Management. He did h

graduation (B.Tech) in Computer Science & Engineering from Kalyan

University in 2000, and obtained his Master of Engineering (M.E.) degree Computer Science and Engineering from Bengal Engineering and Scienc

University in 2002. He was awarded university medal for securing highemark in M.E. and also received award from Indian National Academy oEngineering for best innovative bachelor level project in 2000. He ha

numerous international and national publications in reputed journals an

conferences to his credit throughout his entire career. His research interesinclude Cellular Automata, Pattern Recognition, Bio-Medical Engineerin

Bio-Informatics etc.

Prof. (Dr.) Devadatta Sinhais currently Professor, Department of Comput

Science and Engineering of University of Calcutta, India. He joined thdepartment as a Reader in 1989. Prior to this, he worked as Assistant Professo

Department of Computer Engineering, B.I.T. Mesra Ranchi and as Lecturer an

Senior Lecturer (Computer Science) at the Department of MathematicJadavpur University. He obtained his Ph.D. from Jadavpur University in 198and his area of research was Program Testing. He has published more than 5

papers and articles in different national and international journals, proceedingperiodicals and monographs. His area of interests includes Softwa

Engineering, Parallel and Distributed Computing, Bioinformatic

Cryptography. He has guided a number of doctoral and masters thesis Computer Science. He worked as Head of the Department of ComputScience and Engineering, University of Calcutta for two terms of two yea

each. He worked as Chairman, undergraduate studies in Computer SciencUniversity of Calcutta and currently the Convener, Ph.D. Committee

Computer Science and Engineering, University of Calcutta. He is associate

with a number of academic institutions as member in their academic bodies. His involved in a number of national and international conferences in thcapacity of Chairman of PC/OC. He served as Chairman, Computer Society o

India, Kolkata Chapter and is a Patron of the chapter. He was SectionaPresident, Section of Computer Science, and Indian Science CongreAssociation in 1993-94.
http://hmmer.org/http://hmmer.org/

souravsaha008_ciitsep2013

Documents