signal processing of dna and protein sequences
TRANSCRIPT
![Page 1: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/1.jpg)
Nitesh Kumar Singh
SIGNAL PROCESSING OF PROTEIN SEQUENCES AND
DNA
![Page 2: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/2.jpg)
Signal -Signal is the flow of Information.Mathematically, Signals are the functions of
the independent variable, such as time ( For example speech signal ), or position ( for example image ).
![Page 3: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/3.jpg)
Biomedical Signal –
Electrical signals generated in the a biological system (human or animal) or originating from a physiologic process due to electrochemical changes accompanied by the conduction of signals. Examples are EEG, ECG.
![Page 4: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/4.jpg)
Signal Processing Methods –
Analog or Continuous Time Signal Processing
Digital or Discrete Time Signal Processing
![Page 5: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/5.jpg)
Advantages of DSP over ASP -
Stable, robust, accurate.Flexibility and up-gradation.easily stored.Easy operation in short timeMultiplexing done by Integrated Service
Digital Network (ISDN)
![Page 6: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/6.jpg)
DSP In Biomedical Signals -
Processing of biomedical signals in biological as well as synthetic biological world. Signals are then recorded and processed digitally.
Example : EEG, ECG etc.DSP in medical imaging. Example : CT scanner,
ultrasound, endoscopes etc.Manufacturing healthcare instruments. Example :
heart rate meter, aspect bispectral index.For diagnostic purposes, like analyzing the signals of
heartbeat to check the abnormality and so like, the proteins sequences to study the genomic of living beings.
![Page 7: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/7.jpg)
Biomedical application domain using DSP -
Information gathering : Measurement of phenomena to understand the biological system.
Diagnosis : Detection of the malfunction, abnormality, pathology.
Monitoring : To obtain periodic or continuous information about the biological system.
Therapy and Control : Modify the behavior of the system and ensure the result.
Evaluation : Objective analysis, i.e. proof of performance, quality control, effect of treatment.
![Page 8: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/8.jpg)
Processing of Biomedical Signals -
Transducers
Amplifiers and Filters
Analog to Digital conversion
Filtering to remove artifacts
Detection of events and components
Analysis of events and waves; Feature extraction
Pattern recognition, classification and
diagnostic decisions
Computer aided diagnostic therapy
Biomedical
signals
Sign
al
proc
essi
ng
Signal
processingSignal processing
Signal processing
Signal Data Acquisition
![Page 9: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/9.jpg)
IN THE GENOMICS WORLD
DNA and proteins are mathematically represented in ‘character strings’, in which each character is a letter of an alphabet.
For e.g., DNA has alphabet size of 4 and has the letters A, T, C and G.
Protein has alphabet size of 20.
![Page 10: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/10.jpg)
REVISING SOME BIOLOGICAL FUNDAMENTALS
DNA :It is made up of many linked smaller
components, called Nucleotides.Each nucleotides is of 4 types, designated by A,
G, T, C with ends either being 3’ or 5’. 3’ end is linked to 5’ and vica-versa for a strong
covalent bond.Always read in a specific direction, from left to
right5’ 3’
![Page 11: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/11.jpg)
Cont.
DNA occurs in pair of stands.Each pair being complementary to each other.The nucleotide chains are bonded by hydrogen
bond with
A = T
C GThe 2 stands in a DNA runs opposite to each
other
![Page 12: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/12.jpg)
![Page 13: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/13.jpg)
CENTRAL DOGMA
Each DNA is made up of 2 types of regions : Genes and intergenic spaces.
Gene contain the information of the proteins.Each gene is responsible for the production of
protein.A gene, further has 2 sub-regions : Introns and
Exons.Genes are first transcribed into single stranded
RNA or mRNA.Introns from RNA are then removed by the
process of splicing.
![Page 14: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/14.jpg)
Cont.After splicing, each mRNA is divided into 3
adjacent bases.Each base is called a Codon.
E.g., AGT, AAC, TGC, TAC, etc.A codon identifies an amino acid which defines
a protein.There are about 64 possible codons, but only 20
amino acids.Many codons can define 1 single amino acid
(many-to-one)
![Page 15: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/15.jpg)
![Page 16: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/16.jpg)
Cont.
The process of conversion of mRNA to protein is called as translation.
Translation is aided by an adopter molecules, called transfer RNA or tRNA.
![Page 17: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/17.jpg)
![Page 18: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/18.jpg)
DNA SEQUENCES AND DSP
The macromolecular biological sequences corresponding to chains of nucleotides or amino acids is done by considering them to be strings of characters “A,” “T,” “C,” and “G.” In DSP of these sequences, the characters are assigned a numerical values.
Suppose, we assign number a to character ‘A’, t to character ‘T’, c to character ‘C’, and g to character ‘G’ where a, t, c and g are complex numbers.
![Page 19: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/19.jpg)
Cont.If, we take ‘ t = a* ’ and ‘ g = c* ’
We can get a complementary DNA sequence by :
We can also obtain a sequences of proteins by assigning numerical values to the amino acids.
![Page 20: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/20.jpg)
Indicator SequenceThe indicator sequence of adenine of a DNA
sequence is defined as:
Where , adenine
And, DNA sequenceSimilarly, we can obtain for the rest 3 bases
![Page 21: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/21.jpg)
Cont.
The total spectrum of a symbolic sequence is often defined as the squared modulus of the DFT’s of the indicator sequences, that is:
![Page 22: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/22.jpg)
Spectral Envelope
Consider the n × 4 matrix,
and the vector of real weights,
The sequence z = uw then corresponds to the mapping of
A a, C c, G g, t T
![Page 23: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/23.jpg)
DNA walk
It is a graphical representation of DNA sequence, termed as “fractal landscape” or “DNA walk”.
random walk model, a walker moves either up ( u(i) = +1) or down ( u(i) = −1) one unit length for each step i of the walk.
uncorrelated walk, the direction of each step is independent of the previous steps.
correlated random walk, the direction of each step depends on the history (“memory”) of the walker.
![Page 24: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/24.jpg)
Cont.
The DNA walk is defined by the rule that the walker steps up ( u(i) = +1) if a pyrimidine occurs at position a linear distance i along the DNA chain, while the walker steps down ( u(i) = −1) if a purine occurs at position i.
This provides degree of correlation in the base pair sequence, which is directly visualized by calculating the “net displacement” of the walker after number of steps.
![Page 25: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/25.jpg)
Gene Prediction
Characteristics of protein coding DNA regions:base sequences in the protein-coding regions of
DNA molecules have a period-3 component because of the codon structure involved in the translation of base sequences into amino acids.
Eg, For eucaryotes (cells with nucleus) this periodicity has mostly been observed within the exons and not within the introns.
![Page 26: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/26.jpg)
Cont.
Filtering:
The filtering of the fragment of the DNA sequence is done with the help of IIR Antinotch Filter
![Page 27: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/27.jpg)
Cont.
DNA Spectrogram:the appearance of spectrograms provides
significant information about signals.
provide local frequency information for all four bases defined by displaying the resulting three magnitudes by superposition of the corresponding three primary colors
red for x, green for y, blue for z
![Page 28: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/28.jpg)
Cont.
![Page 29: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/29.jpg)
Cont.
![Page 30: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/30.jpg)
Cont.
Identification of protein coding DNA region:First, DFT’s are calculated for different bases by
the formula of
with k = N/3, that:
W=aA+tT+cC+gG.
![Page 31: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/31.jpg)
Color coding and color map approach
Since, Number of primary colors is same as the number of the coding reading frames, color-coding scheme is applied. In this,
the value Θ = 0B is assigned to color RED
the value Θ = 120B is assigned to color BLUE
the value Θ = -120B is assigned to color GREEN
![Page 32: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/32.jpg)
Cont. In-between values are color-coded in a linear manner in
which the three axes labeled R, G, and B correspond to the primary colors red, green, and blue.
![Page 33: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/33.jpg)
Cont.In color map, the intensity is modulated by the square
magnitude multiplied by 700 and clipped to the interval (0, 1).
![Page 34: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/34.jpg)
DisadvantagesThe obstacles involved include large amounts of data,
lacking a complete knowledge of the genome length a priori, and recognizing nucleotide symbol identity with complete accuracy.
These impediments are typical of ones encountered in standard telecommunications problems.
Using Fourier transforms for mapping, the mapping may either expose or hide some frequency information.
Furthermore, there might be no biochemical meaning for the ordering and arithmetic structure that result from the symbolic to numeric mapping.
![Page 35: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/35.jpg)
Conclusion -Signal processing-based computational and visual tools
are meant to synergistically complement character-string-domain tools that have successfully been used for many years by computer scientists.
The assignment of optimized, complex numerical values to nucleotides and amino acids provides a new computational framework, which may also result in new techniques for the solution of useful problems in bioinformatics, including sequence alignment, macromolecular structure analysis, and phylogeny.
field of computer science, bioinformatics, has emerged, focusing on the use of computers for efficiently deriving, storing, and analyzing these character strings to help solve problems in molecular biology
![Page 36: Signal processing of dna and protein sequences](https://reader036.vdocuments.net/reader036/viewer/2022062513/556255d1d8b42aa52d8b5426/html5/thumbnails/36.jpg)
THANK YOU!!