cpu time 051010
DESCRIPTION
CPU Time ApplicationsTRANSCRIPT
Modeling Bayesian Phylogenetic Inference in Protein
Data Analysis by Using Mr. Bayes, Proml, Consensus
Applications
Mr. Bayes vs. Proml (maximum likelihood)
1 3 5 7 9
11
13
15
17
19
21
S1
0
2000
4000
6000
8000
10000
12000
Series1Series2
CPU time/Mr. Bayes/Proml
1
9
17
S1
S2
S3
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Series1
Series2
Series3
Diff. of Maximum Likelihood(Mr. Bayes – Proml)
vs. CPU (sec)
maximum likelihood
0500
10001500200025003000350040004500
0 200 400 600 800
diff (postml - proml)
cp
u t
ime (
sec)
Series1
Diff. of Maximum Likelihood(Mr. Bayes – Proml)
vs. CPU (sec)maximum likelihood
0500
10001500200025003000350040004500
1 4 7 10 13 16 19
diff (postml - preml)
cp
u t
ime (
sec)
Series1
Series2
Linear Regression in Testing Datasets
linear regression
0
2000
4000
6000
8000
10000
12000
0 5000 10000 15000
Series1
Testing Datasets Plus One/Two Long Branch’s Datasets
147101316192225283134
S10
2000
4000
6000
8000
10000
12000
14000
16000
mrbayes vs proml (plus AB,CD data)
Series1
Series2
Linear Regression After Bayesian Correction for Testing Datasets & One/Two Long Branch’s Datasets
0
2000
4000
6000
8000
10000
12000
14000
16000
0 5000 10000 15000 20000
Series1
Phylogeny for All Testing Datasets
phy all
-0.5
0
0.5
1
1.5
2
2.5
3
0 50 100 150 200 250 300
no.
leng
th Series1
Phylogeny for All Datasets
phylogeny for all datasets
-0.5
0
0.5
1
1.5
2
2.5
3
0 50 100 150 200 250 300 350 400
no.
leng
th Series1
One Long Branch Datasets
one long branch
-0.5
0
0.5
1
1.5
2
2.5
0 10 20 30 40 50 60
no.
leng
th Series1
Two Long Branches Datasets
two long branches
00.20.40.60.8
11.21.41.6
0 10 20 30 40 50 60
no.
leng
th Series1
Phylogeny (sequence length from Proml)
phy07
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0 5 10 15
species
len
gth
Series1
AB50J
-0.5
0
0.5
1
1.5
2
0 2 4 6 8
no.
len
gth
Series1
CD20J
0
0.2
0.4
0.6
0.8
1
1.2
0 2 4 6 8
no.
len
gth
Series1
phy06
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0 2 4 6 8 10
species
len
gth
Series1
One/Two Long Branch’s Datasets(Maximum Likelihood)
CD
10J
1200
0.74
179
1132
5.10
468
CD
50J
1264
8.05
314
1193
6.72
682
S111000115001200012500
1300013500
14000
Series1
Data Analysis• Testing datasets: phy01 ~ phy21, nexus01
~ nexus21)
• Experimental datasets: one long branch (AB10J ~ AB70aj), two long branches (CD10J ~ CD70aj)
• Operation systems: Mac OS X ver. 10.3.9
• Dual 800 MHZ PowerPC G4
• 256 MB SDRAM• Mr. Bayes – 3.1.1
• Phylip 3.67 (Proml, Consensus)
continue• Testing sample size: 21x2• Experimental samples: 7x2• Degree of freedom: 20• Chi square: 283.1561 > 31.41(alpha=0.05)• Proml and Mr. Bayes are two dep val.• ANOVA Ssw=2669051, Ssb=24253093• Sstotal=50943143.71• Eta square= 0.476081596• Type I error=0.05• Type II error=1.83%• Power= 98.17%• Instrument threshold=1xE-8
Testing Datasets
y(Mr.Bayes)= 1.058351726x(Proml)+14.79771
0.999724correl
14.79771intercept
1.058352slope
Testing datasets in linear regression between Mr. Bayes and Proml)
104.6243131.7778878.5355sd
226.959296.33331576.467mean
diff(Mr.Bayes-Proml)characterCPU
Testing samples:
0.996717correl
0.109857f-test5343.856intercept
3.47E-05t-test0.492193slope
Linear regression between experimental samples:
364.0589179.5717sd
13122.8611802.88mean
CD(two long branches)AB(one long branch)
Experimental samples:
Linear Regression for All
y(Mr. Bayes)= 1.058352x(Proml)+14.79764
0.999959correl
14.79764intercept
1.058352slope
Linear regression for all datasets(including experimental and testing)
385.302190.05sd
13903.512506.4mean
CD(two long branches)AB(one long branch)
After Bayesian modeling
Tree Hierarchical Structure: AB10J• AB10J.JTT• +----------seq.7 • | • +-----5 +---------seq.4 • | | | • | +---2 +-------seq.6 • | | +----4 • | +----3 +----------seq.5 • | | • | +-----------seq.2 • | • 1------------------------------------------seq.3 • | • +--------seq.1 • AB10J.consensus• +--------------------seq.4• |• +--1.0-| +------seq.6• | | +--1.0-|• | +--1.0-| +------seq.5• +------| |• | | +-------------seq.2• | |• | | +------seq.1• | +----------------1.0-|• | +------seq.3• |• +----------------------------------seq.7
Histogram AB10J
AB10J
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 2 4 6 8
no.
len
gth
Series1
Tree Hierarchical Structure:CD10J• CD10J.JTT• +----seq.7 • | • +--5 +----seq.4 • | | | • | +-2 +-------------------seq.6 • | | +-4 • | +--3 +-----seq.5 • | | • | +-----seq.2 • | • 1---seq.3 • | • +---------------------seq.1 • CD10J.consensus• +--------------------seq.4• |• +--1.0-| +------seq.6• | | +--1.0-|• | +--1.0-| +------seq.5• +------| |• | | +-------------seq.2• | |• | | +------seq.1• | +----------------1.0-|• | +------seq.3• |• +----------------------------------seq.7
Histogram CD10J
CD10J
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 2 4 6 8
no.
len
gth
Series1
Discussion
• Bayesian modeling can be used to evaluate type I,II errors, eta square, power, Chi square X2, Anova, correlated coefficient, linear regression etc ..
• It is possible to design a 2x2 table in order to evaluate risk such as RD, RR, RO
• Proml and consensus features bring out a histogram’s profile including hierarchical tree structure and it is possible for peak area integration
Questions
• CPU time can be used to count all activities in hydrogen bonds through kinesthetic module in computer, and hydrogen bond’s configurations of DNA match from pairs of A-T, A-U. C-G, and/or DNA alignment from separate genetic codes of A, T, U, C, G.
• CPU time is possible to count all triggering by stem cell activity through functional proteins.
• CPU time has been already used in Forensic science to count pattern differentiation from suspect sample in judiciary investigations.