cpu time 051010

Modeling Bayesian Phylogenetic Inference in Protein

Data Analysis by Using Mr. Bayes, Proml, Consensus

Applications

Mr. Bayes vs. Proml (maximum likelihood)

1 3 5 7 9

11

13

15

17

19

21

S1

0

2000

4000

6000

8000

10000

12000

Series1Series2

CPU time/Mr. Bayes/Proml

1

9

17

S1

S2

S3

0

500

1000

1500

2000

2500

3000

3500

4000

4500

Series1

Series2

Series3

Diff. of Maximum Likelihood(Mr. Bayes – Proml)

vs. CPU (sec)

maximum likelihood

0500

10001500200025003000350040004500

0 200 400 600 800

diff (postml - proml)

cp

u t

ime (

sec)

Series1

Diff. of Maximum Likelihood(Mr. Bayes – Proml)

vs. CPU (sec)maximum likelihood

0500

10001500200025003000350040004500

1 4 7 10 13 16 19

diff (postml - preml)

cp

u t

ime (

sec)

Series1

Series2

Linear Regression in Testing Datasets

linear regression

0

2000

4000

6000

8000

10000

12000

0 5000 10000 15000

Series1

Testing Datasets Plus One/Two Long Branch’s Datasets

147101316192225283134

S10

2000

4000

6000

8000

10000

12000

14000

16000

mrbayes vs proml (plus AB,CD data)

Series1

Series2

Linear Regression After Bayesian Correction for Testing Datasets & One/Two Long Branch’s Datasets

0

2000

4000

6000

8000

10000

12000

14000

16000

0 5000 10000 15000 20000

Series1

Phylogeny for All Testing Datasets

phy all

-0.5

0

0.5

1

1.5

2

2.5

3

0 50 100 150 200 250 300

no.

leng

th Series1

Phylogeny for All Datasets

phylogeny for all datasets

-0.5

0

0.5

1

1.5

2

2.5

3

0 50 100 150 200 250 300 350 400

no.

leng

th Series1

One Long Branch Datasets

one long branch

-0.5

0

0.5

1

1.5

2

2.5

0 10 20 30 40 50 60

no.

leng

th Series1

Two Long Branches Datasets

two long branches

00.20.40.60.8

11.21.41.6

0 10 20 30 40 50 60

no.

leng

th Series1

Phylogeny (sequence length from Proml)

phy07

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0 5 10 15

species

len

gth

Series1

AB50J

-0.5

0

0.5

1

1.5

2

0 2 4 6 8

no.

len

gth

Series1

CD20J

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8

no.

len

gth

Series1

phy06

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0 2 4 6 8 10

species

len

gth

Series1

One/Two Long Branch’s Datasets(Maximum Likelihood)

CD

10J

1200

0.74

179

1132

5.10

468

CD

50J

1264

8.05

314

1193

6.72

682

S111000115001200012500

1300013500

14000

Series1

Data Analysis• Testing datasets: phy01 ~ phy21, nexus01

~ nexus21)

• Experimental datasets: one long branch (AB10J ~ AB70aj), two long branches (CD10J ~ CD70aj)

• Operation systems: Mac OS X ver. 10.3.9

• Dual 800 MHZ PowerPC G4

• 256 MB SDRAM• Mr. Bayes – 3.1.1

• Phylip 3.67 (Proml, Consensus)

continue• Testing sample size: 21x2• Experimental samples: 7x2• Degree of freedom: 20• Chi square: 283.1561 > 31.41(alpha=0.05)• Proml and Mr. Bayes are two dep val.• ANOVA Ssw=2669051, Ssb=24253093• Sstotal=50943143.71• Eta square= 0.476081596• Type I error=0.05• Type II error=1.83%• Power= 98.17%• Instrument threshold=1xE-8

Testing Datasets

y(Mr.Bayes)= 1.058351726x(Proml)+14.79771

0.999724correl

14.79771intercept

1.058352slope

Testing datasets in linear regression between Mr. Bayes and Proml)

104.6243131.7778878.5355sd

226.959296.33331576.467mean

diff(Mr.Bayes-Proml)characterCPU

Testing samples:

0.996717correl

0.109857f-test5343.856intercept

3.47E-05t-test0.492193slope

Linear regression between experimental samples:

364.0589179.5717sd

13122.8611802.88mean

CD(two long branches)AB(one long branch)

Experimental samples:

Linear Regression for All

y(Mr. Bayes)= 1.058352x(Proml)+14.79764

0.999959correl

14.79764intercept

1.058352slope

Linear regression for all datasets(including experimental and testing)

385.302190.05sd

13903.512506.4mean

CD(two long branches)AB(one long branch)

After Bayesian modeling

Tree Hierarchical Structure: AB10J• AB10J.JTT• +----------seq.7 • | • +-----5 +---------seq.4 • | | | • | +---2 +-------seq.6 • | | +----4 • | +----3 +----------seq.5 • | | • | +-----------seq.2 • | • 1------------------------------------------seq.3 • | • +--------seq.1 • AB10J.consensus• +--------------------seq.4• |• +--1.0-| +------seq.6• | | +--1.0-|• | +--1.0-| +------seq.5• +------| |• | | +-------------seq.2• | |• | | +------seq.1• | +----------------1.0-|• | +------seq.3• |• +----------------------------------seq.7

Histogram AB10J

AB10J

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 2 4 6 8

no.

len

gth

Series1

Tree Hierarchical Structure:CD10J• CD10J.JTT• +----seq.7 • | • +--5 +----seq.4 • | | | • | +-2 +-------------------seq.6 • | | +-4 • | +--3 +-----seq.5 • | | • | +-----seq.2 • | • 1---seq.3 • | • +---------------------seq.1 • CD10J.consensus• +--------------------seq.4• |• +--1.0-| +------seq.6• | | +--1.0-|• | +--1.0-| +------seq.5• +------| |• | | +-------------seq.2• | |• | | +------seq.1• | +----------------1.0-|• | +------seq.3• |• +----------------------------------seq.7

Histogram CD10J

CD10J

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 2 4 6 8

no.

len

gth

Series1

Discussion

• Bayesian modeling can be used to evaluate type I,II errors, eta square, power, Chi square X2, Anova, correlated coefficient, linear regression etc ..

• It is possible to design a 2x2 table in order to evaluate risk such as RD, RR, RO

• Proml and consensus features bring out a histogram’s profile including hierarchical tree structure and it is possible for peak area integration

Questions

• CPU time can be used to count all activities in hydrogen bonds through kinesthetic module in computer, and hydrogen bond’s configurations of DNA match from pairs of A-T, A-U. C-G, and/or DNA alignment from separate genetic codes of A, T, U, C, G.

• CPU time is possible to count all triggering by stem cell activity through functional proteins.

• CPU time has been already used in Forensic science to count pattern differentiation from suspect sample in judiciary investigations.

cpu time 051010

Documents

long branches datasets

long branch datasets

bayes proml

slope testing datasets

testing datasets ymr

data analysis testing

nexus21 experimental

long branches cd10j