bikash shakya emma lang jorge diaz. blastx entire sequence against 9 plant genomes. repeatmasker ...

18
Group 6 Bikash Shakya Emma Lang Jorge Diaz

Upload: christiana-jordan

Post on 28-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements

Group 6Bikash Shakya

Emma LangJorge Diaz

Page 2: Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements

BLASTx entire sequence against 9 plant

genomes.

RepeatMasker

 55.47% repetitive sequences

82.5% retroelements

13.0% DNA transposons

EMBOSS explorer

74 CpG islands

54 inverted repeats

Sequence 6

Page 3: Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements

GENE PREDICTION

Masked sequence

GeneMark

12 genes

FGENESH10 genes

Unmaskedsequence

GeneMark

27 genes

FGENESH28 genes

BLASTx7 most

promising genes

Bases: •START & STOP codons •High GC content •No repeats •Good E-value •Proper splice sites •Both program agreed •No mobile elements

Page 4: Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements

GENE I: Zea mays uncharacterized protein LOC100194332

Both programs predicted the exact same 3 exonsRNA Evidence BLAST search in the refseq_rna database Zea mays uncharacterized LOC100194332

(LOC100194332), mRNA (cDNA) Identity:100% E-value:0Sequence alignment with the translated sequences

Final Gene Model PredictionFour genes: I, II, III, IV

Page 6: Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements

EST Evidence

Identity:99% E-value:0.0. EST data covered both exons 1 & 2 except 114 bases

GENE I Protein function

• Conserved domain: Myb DNA binding• Predicted to be a MYB related transcription factor• Myb proteins bind to DNA and regulate gene expression

Page 7: Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements

6 exons 241 amino acids membrane

protein with 7 transmembrane helices

sugar efflux transporter

Gene II

Image from: http://bp.nuap.nagoya-u.ac.jp

Page 8: Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements

99% match to “Zea mays seven-transmembrane-domain protein 1”

(LOC100284352) mRNA (cDNA)

EST data covered all of exons 1, 2, 3, and 4 plus beginning of exon 5 ◦ All EST sequences used had 98-99% identity with

gene II

Gene II RNA and EST Evidence

Page 9: Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements

conserved domain: MtN3_slv

Sugar efflux transporter Involved in seed and pollen development

GENE II Protein Function

Page 10: Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements

1 exon 899 amino acids Soluble protein 1,4-alpha-glucan-

branching enzyme 3/ starch branching enzyme 3

Matched orthologs in 5 other plant genomes.

Gene III

Starch branching enzyme I from rice.

Image from: http://pdb.rcsb.org

Page 11: Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements

99% match to “Zea mays starch branching enzyme III (sbe3)” mRNA (not cDNA)

Gene III RNA and EST Evidence

EST data covered almost all of gene III (1 gap) (intron?)◦ All EST sequences used had 99%-100% identity with gene III

Page 12: Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements

Segment without EST data aligns to starch branching enzyme III in A. thaliana – not an intron

Page 13: Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements

conserved domains for 1,4-alpha-glucan-branching enzyme

top HHpred result was starch branching enzyme 1 in rice (e-value: 2e-128)

These enzymes catalyze the formation of the alpha-1,6-glucosidic linkages in starch.

GENE III Protein Function

Page 14: Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements

5 exons 583 amino acids Membrane protein with 10 trans-membrane helices Amino acid transporter Matched orthologs in wheat and sorghum genomes.

Gene IV

Page 15: Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements

96% match to “Zea mays LOC100193963 (si486073c04), mRNA” (E=0.00) (not cDNA) Other good match was to “XM_002455881.1Sorghum bicolor hypothetical protein, mRNA” (94%, E=0.0)

Gene IV: RNA Evidence

Page 16: Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements

EST best matches:

◦ ZM_BFc Zea mays cDNA clone ZM_BFc0171C07 5‘ (95%, E=0.0)◦ ZM_BFc Zea mays cDNA clone ZM_BFc0038P24 5‘ (96%, E= 2e-

158)

EST data also have two gaps.

Gene IV: EST Evidence

Page 17: Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements

Gene IV: BlastX in Nov. 2012

Page 18: Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements

Conserved domains: ◦ NCBI BlastX

◦ InterProScan

GENE IV: Protein Function