transcriptome sequencing, characterization and …...evolutionary studies ncbi taxonomy browser, feb...
TRANSCRIPT
-
Transcriptome sequencing, characterization and
polymorphism detection in Big Sagebrush (Artemisia
tridentata) subspecies
Prabin Bajgain, Joshua Udall (BYU, Provo)
Bryce Richardson (USDA-RMRS, Provo)
-
- Ecologically, one of the most important shrub species in the intermountain United States
- Three main widespread subspecies – ssp. tridentata (basin ecotype), ssp. vaseyana(mountain ecotype), ssp. wyomingensis (wyoming ecotype); two less common subspecies –ssp. spiciformis, ssp. Xericensis
- Numerous mammals, insects and birds are dependent on big sagebrush for food and shelter – some are obligates while some are semi-obligates
- Human encroachment and wildfires followed by cheatgrass invasion are threatening big sagebrush habitat, and those dependent on it
Big Sagebrush (Artemisia tridentata)
-
Goals
Entrez records
Database nameSubtree
linksDirectlinks
Nucleotide 32 31
Protein 14 14
Popset 3 3
SNP* 20,953 20,953
PubMed Central 34 34
Taxonomy 2 1
- To create a reliable and relatively large sequence database for big sagebrush
- Develop markers on the gene sequences
- Make the data publicly available for population, ecological and evolutionary studies NCBI Taxonomy Browser, Feb 17 2011
* Bajgain et al., ‘Transcriptome characterization and polymorphism detection in subspecies of Artemisia tridentata (big sagebrush)’ (in press)
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid55611[Organism:exp]http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid55611[Organism:noexp]http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein&cmd=Search&dopt=DocSum&term=txid55611[Organism:exp]http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein&cmd=Search&dopt=DocSum&term=txid55611[Organism:noexp]http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Popset&cmd=Search&dopt=DocSum&term=txid55611[Organism:exp]http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Popset&cmd=Search&dopt=DocSum&term=txid55611[Organism:noexp]http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PMC&cmd=Search&dopt=DocSum&term=txid55611[Organism:exp]&pmfilter_Fulltext=offhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PMC&cmd=Search&dopt=DocSum&term=txid55611[Organism:noexp]&pmfilter_Fulltext=offhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=taxonomy&cmd=Search&dopt=DocSum&term=txid55611[Subtree]http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=taxonomy&cmd=Search&dopt=DocSum&term=55611[uid]
-
RNA extraction
and cDNA library
prep
454-sequencing(sspp. tridentata & vaseyana)
Sequence assembly
Pfam & BLASTx
search
Gene annotation(using BLASTxresults)
Marker detection
SNP mapping
Secondary
metabolite genes
Illumina sequencing(ssp. wyomingensis)
Hybridization
theory
Workflow
-
EST Sequence assembly
Assembly Count Average length Total bases
ssp.
tridentata
(basin)
Reads 823,392 403.91 332,578,737
Singletons 191,745 403.62 77,391,754
Contigs 20,357 716 14,587,705
ssp.
vaseyana
(mtn)
Reads 702,001 333.13 233,854,535
Singletons 179,189 331.51 59,402,844
Contigs 20,250 624 12,641,189
ssp.
combined
Reads 1,525,393 371.34 566,433,272
Singletons 275,866 370.18 102,121,262
Contigs 29,541 796 23,521,465
Summary report of individual and combined de novo assembly
-
Assembly annotation
• BLASTx:• against NR protein database• e-value of 1e-15
• BLAST2GO for annotation• 21,436 (72.6%) sequences had hits
Biological Process
Molecular Function
Cellular Component
-
EnzymesNo. of hits
(ssp. tridentata)
No. of hits
(ssp. vaseyana)
MEP
pathway
1-deoxy-D-xylulose 5-phosphate synthase (DXS) 51 100
1-deoxy-D-xylulose-5-phosphate reductoisomerase (DXP) 83 118
2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase (MCT) 22 22
4-diphosphocytidyl-2-C-methyl-D-erythritol kinase (CMK) 63 126
2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase (MCS) 22 22
1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (HDS) 0 0
1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate reductase (HDR) 20 12
isopentenyl diphosphate/ dimethylallyl diphosphate isomerase (IDI) 36 20
isopentenyl diphosphate/ dimethylallyl diphosphate synthase (IDS) 0 0
MVA
pathway
acetoacetyl-coenzyme A thiolase (AACT) 39 21
3-hydroxy-methylglutaryl coenzyme A synthase (HMGS) 0 0
3-hydroxy-methylglutaryl coenzyme A reductase (HMGR) 0 0
Mevalonate kinase (MK) 0 50
phosphomevalonate kinase (PMK) 0 0
mevalonate disphosphate decarboxylase (MDC) 20 0Coumarin
biosynthesis
pathway
phenylalanine ammonia lyase 29 45cinnamate 4-hydroxylase 28 704-coumarate CoA ligase 322 215
Secondary metabolite genes
-
SNP detection
• SNP = Single Nucleotide Polymorphism• parameters:
• 8x coverage; 90% nucleotide frequency; 20% minor allele frequency• 20,952 ‘true’ SNPs, average coverage 20x
0
500
1000
1500
2000
2500
8 12 16 20 24 28 32 36 40 44 48 52 56 60
Nu
mb
er
of S
NP
s
SNP coverage depth
Distribution of the number of SNPs by read coverage depth
-
SNP detection
tridentataSNPS
vaseyanaSNPS
Both SNP types
Montana wyomingensis
138 306 251
Utah wyomingensis
157 424 458
• suggests origin of tetraploid ssp. wyomingensis via mixed ancestry• more similar to ssp. vaseyana
total
695
1,039
-
SSR detection
• parameters• 2-7 3-5 4-5 5-5 6-5 7-5 8-5 9-4 10-4 (SSR motif length – repeat number) • 100 bp interruption distance
• 1,003 SSRs in basin• 507 SSRs in mtn
Frequency and distribution of SSRs in two big sagebrush subspecies
0
100
200
300
400
500
600
700
800
di tri tetra penta hexa
Nu
mb
er
of r
ep
eat
s
Repeat motif
tridentata
vaseyana
-
From here?
- Evolution, intermixing and more evolution of big sagebrush subspecies
- Phylogenetic relationship among big sagebrush populations distributed inthe intermountain US
- Sequence capture approach (~350 genes, 55 populations)
-
- Common garden studies to look at variation among the populations
- Later, link traits with genes in Artemisia tridentata populations
From here?
-
Acknowledgements
- Funding: USDA-FS, RMRS, National Fire Plan, GBNPSIP
- Rich Cronn
- Jared Price
- Nancy Shaw
- Covey Jones
- Brian Knaus
- Felix Jimenez
- Scott Yourstone