核酸序列分析及结构预测 主 讲 张 军 细胞生物学及遗传学教研室
DESCRIPTION
核酸序列分析及结构预测 主 讲 张 军 细胞生物学及遗传学教研室. 第一节 核酸序列的数据形式 1. 串( string )符号或字符的有序排列,符号或字符来自有限集合 {A, T, G, C} 。序列( sequence )与串是同一概念。 s=ATTGCATATG ;串的长度 |s|; 串 s 某个位置的字符表示为 s i , 1 ≤ i ≤ |s| 。 特别的,长度为 0 的串称为空串( empty string ),用符号 ε 表示。. 2. 子串( substring )和子序列( subsequence ),二者不是相同的概念。 子串和超串 - PowerPoint PPT PresentationTRANSCRIPT
-
1. string{A, T, G, C}sequence
s=ATTGCATATG|s|; ssi 1 i |s|
0empty string
-
2. substringsubsequence
s=ATGCGGTA; t=TGCGG; st
s=ATGCGGTA; t=TGTA st
interval
s=ATGCGGTACGTATACG; u=CG, s[i, i+1]
-
3. uw(concatenation),uws = ATGCGGTA; t=TGCGGst = ATGCGGTATGCGGts = TGCGGATGCGGTAs = ATsss= AT AT AT=s3
prefix
s = ATGCGGTAGC; prefix(s,3)=ATG; prefix(s,0) =
s1u, s=tu, tu
-
suffix
s = ATGCGGTAGC
suffix(s,3) =AGC suffix(s,2) =GC suffix(s,0) =
s1u, s=ut, tu
(killer agent)1
||-1
-
, s = ATGCGGTAGC
s= TGCGGTAGCs = ATGCGGTAG ATGC GGTAG ? (ATGC ) GGTAG ATGC ( GGTAG )
stu=(st)u=s(tu); |s| -1, |t| -1, |u| -1
|st| = |s| + |t| ,st
-
1
s[ij]= i-1 s |s| -j
prefix(s, k) = s |s| -k
Suffic(s, k) = |s| -k s
-
homology- Orthologous paralogous
similarity
-
a1 in species I, a1 in species II)a1 and a2 in species I
-
Alignment s=GACGGATTAGt=GATCGGAATAGAlignment2: GA-CGGATTAGGATCGGAATAGAlignment1:GACGGATTAG GATCGGAATAG
-
()
4DNA{A, C, G, T}
IUPAC
-
IUPAC
GGGuanine AAAdenine TTThymine CCCytosineRG or APurine YT or CPyrimidine MA or CAmino KG or TKeto SG or CStrong interaction (3 H bonds) WA or TWeak interaction (2 H bonds) HA or C or TNot-GBG or T or Cnot-AVG or C or Anot-T(not-U) DG or A or Tnot-C NG or A or T or CAny
-
DNA
-
1
2
3
4
54
-
(global alignment)
s=ATTGCATATGt=ATTGATATC
s=ATTGCATATGt=ATTG ATATC
-
121
s, t2sim(s, t)=max{score i}
s=ATTGCATATG s=ATTGCATATG t=ATTG ATATC; t=ATTG ATATC8(-2)(-1)=5 4+ (-2) + (-1) 5 =-1
-
2.
st(local alignment)st
s=AATTGCATATGt=ATTGT
s(2,3,4,5)=t(1,2,3,4)
-
3.
st
s=AATTGCATATGt=ATTGTs=AATTGCATATG s=AATTGCATATGt= - ATTGT - - - - - t= A- TTG - - T - - -
2
-
2sim(s, t)max{score i}
s t; t=AGCTT; s=TTA TTA - - TTA AGTTA AGCTAAGCTT
-
1 2
(cost)
dist(s,t)=min{cost i}
-
ACCGACAATATGCATA ATAGGTATAACAGTCAACCGACAATATGCATA ACTGACAATATGGATA
-
RNA
-
1 2
-
1 1
-
108108
-
aHomo sapiensPongo pygmaeusb108 (a) (b)
-
DNA
-
DNA
DNA
(fragment assembly)
fragment
-
ATTGGGCA; CGATT; TGGGCAGA - - ATTGGGCA - -CGATT - - - - - - - - - - - TGGGCAGA
CGATTGGGCAGA
-
(shortest common superstring)
(reconstruction)
(multicontig)
-
DNA DNA
-
DNADNADNADNARNA
DNAPromoterTerminator sequenceSplice site
-
DNA
-
training setcontrol set
-
training setcontrol set
-
Sn Sp
TpTnFnFp
-
functional sitefunctional sequencemotifsignal
-
functional region
-
A common consensus : NTATN
-
1
215
-
GGAATTCCRG or AYT or CMA or CKG or TSG or C(3)WA or T(2)HA or C or TGBG or T or CAVG or C or AT(U)DG or A or TCNG or A or T or C
-
: (1) N(2) (3) 54(4) 2(5) N1
-
TTATGATATATACGCTTGTC TCCAC TTATGATATATACGCTTGTC TCCAC TNNNN tTATG tACGC tTGTC tCCAC tTATG tACGC tTGTC tCCAC TNNNC [1] [2] [3] [4] [2] [3] NNNNNTNNNN TNNNC tACGc tTGTc tCCAc [4] [2] tACGc tTGTc tCCAc [3] TNSNC [5] Consensus1 TNSNC TTATG ATATA [5] Consensus2 NTATN TNSNC
-
DNA
-
B
4n 4n
-
M(aj,j)aj,a {A,T,G,C}
-
s=a1a2an
S=ATTGCA Ws = 1+6+14-5+8+19=43 TTWs TSWs T'S
-
MA+ A- 1M 23-6 3SiSi A+ 4Si A-5 4WSi TMSiM16 5WSi TMSiM16 67 7M2
-
MM
-
DNA
-
ORF,open reading frame
-
()
21 64/3
-
DNAORFORFORF
-
() 641
DNA36:4:1
DNA
-
DNAORFORFORF
ORF
-
fabcabca1,b1,c1, a2,b2,c2,, an+1,bn+1a1b1c1n
-
n
-
i
3nPiPi
-
()
-
() :
-
sensitivitySnspecificitySp
-
() EST
-
() 53
-
() ,e1, i1, , in-1, en , ATG-1n-UAG
-
donor- gt acceptor- ag
-
gene A
-
i0, e1, i1, , en, in ij0jn el1ln i0in
-
DNA 13
2
3-i0, e1-en, in
-
sourcesink
-
() DNARNA
-
HPESE-mailWSwebCL/EXSC
-
RNA
RNA
-
GCG GCG (Genetics Computer Group)
140
-
GCG GenBankEMBL GCGPIRSWISS-PROTSP-TrEMBL
-
1Gap: BestFit: FrameAlign: CompareDotPlot: GapShowProfileGap:
-
2PileUp: HmmerAlignPlotSimilarityPrettyPrettyBoxMEMEHmmerBuildHmmerCalibrateProfileMakeProfileGapOverlapNoOverlapOldDistances
-
3LookUp
StringSearch
Names
-
4BLASTNetBLASTFastASsearchTFastA/TfastX/FastXFrameSearchMotifSearchHmmerSearchProfileSearchProfileSegmentsFindPatternsMotifsWordSearchHmmerPfamSegments
-
5DNA/RNAMfoldDNARNAPlotFoldMfoldStemLoop
-
6PAUPSearchPAUPDisplayDistancesDiverge
-
7GelStartGelEnterGelMergeGelAssembleGelViewGelDisassemble
-
8TestCodeCodonPreferenceFramesRepeatCompositionCodonFrequencyCorrespond
-
9MapMapPlotMapSort: PeptideMapPlasmidMapPeptideSort:
-
10PrimePrimePairMeltTemp
-
11ProfileScanCoilScanHTHScanSPScanIsoelectric: PepPlotPeptideStructurePlotStructure
-
12 ReverseShuffle CorruptSampleDataSetGCGToBLAST