transcriptome sequencing, characterization and …...evolutionary studies ncbi taxonomy browser, feb...

13
Transcriptome sequencing, characterization and polymorphism detection in Big Sagebrush ( Artemisia tridentata) subspecies Prabin Bajgain, Joshua Udall (BYU, Provo) Bryce Richardson (USDA-RMRS, Provo)

Upload: others

Post on 26-Jan-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

  • Transcriptome sequencing, characterization and

    polymorphism detection in Big Sagebrush (Artemisia

    tridentata) subspecies

    Prabin Bajgain, Joshua Udall (BYU, Provo)

    Bryce Richardson (USDA-RMRS, Provo)

  • - Ecologically, one of the most important shrub species in the intermountain United States

    - Three main widespread subspecies – ssp. tridentata (basin ecotype), ssp. vaseyana(mountain ecotype), ssp. wyomingensis (wyoming ecotype); two less common subspecies –ssp. spiciformis, ssp. Xericensis

    - Numerous mammals, insects and birds are dependent on big sagebrush for food and shelter – some are obligates while some are semi-obligates

    - Human encroachment and wildfires followed by cheatgrass invasion are threatening big sagebrush habitat, and those dependent on it

    Big Sagebrush (Artemisia tridentata)

  • Goals

    Entrez records

    Database nameSubtree

    linksDirectlinks

    Nucleotide 32 31

    Protein 14 14

    Popset 3 3

    SNP* 20,953 20,953

    PubMed Central 34 34

    Taxonomy 2 1

    - To create a reliable and relatively large sequence database for big sagebrush

    - Develop markers on the gene sequences

    - Make the data publicly available for population, ecological and evolutionary studies NCBI Taxonomy Browser, Feb 17 2011

    * Bajgain et al., ‘Transcriptome characterization and polymorphism detection in subspecies of Artemisia tridentata (big sagebrush)’ (in press)

    http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid55611[Organism:exp]http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide&cmd=Search&dopt=DocSum&term=txid55611[Organism:noexp]http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein&cmd=Search&dopt=DocSum&term=txid55611[Organism:exp]http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein&cmd=Search&dopt=DocSum&term=txid55611[Organism:noexp]http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Popset&cmd=Search&dopt=DocSum&term=txid55611[Organism:exp]http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Popset&cmd=Search&dopt=DocSum&term=txid55611[Organism:noexp]http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PMC&cmd=Search&dopt=DocSum&term=txid55611[Organism:exp]&pmfilter_Fulltext=offhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PMC&cmd=Search&dopt=DocSum&term=txid55611[Organism:noexp]&pmfilter_Fulltext=offhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=taxonomy&cmd=Search&dopt=DocSum&term=txid55611[Subtree]http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=taxonomy&cmd=Search&dopt=DocSum&term=55611[uid]

  • RNA extraction

    and cDNA library

    prep

    454-sequencing(sspp. tridentata & vaseyana)

    Sequence assembly

    Pfam & BLASTx

    search

    Gene annotation(using BLASTxresults)

    Marker detection

    SNP mapping

    Secondary

    metabolite genes

    Illumina sequencing(ssp. wyomingensis)

    Hybridization

    theory

    Workflow

  • EST Sequence assembly

    Assembly Count Average length Total bases

    ssp.

    tridentata

    (basin)

    Reads 823,392 403.91 332,578,737

    Singletons 191,745 403.62 77,391,754

    Contigs 20,357 716 14,587,705

    ssp.

    vaseyana

    (mtn)

    Reads 702,001 333.13 233,854,535

    Singletons 179,189 331.51 59,402,844

    Contigs 20,250 624 12,641,189

    ssp.

    combined

    Reads 1,525,393 371.34 566,433,272

    Singletons 275,866 370.18 102,121,262

    Contigs 29,541 796 23,521,465

    Summary report of individual and combined de novo assembly

  • Assembly annotation

    • BLASTx:• against NR protein database• e-value of 1e-15

    • BLAST2GO for annotation• 21,436 (72.6%) sequences had hits

    Biological Process

    Molecular Function

    Cellular Component

  • EnzymesNo. of hits

    (ssp. tridentata)

    No. of hits

    (ssp. vaseyana)

    MEP

    pathway

    1-deoxy-D-xylulose 5-phosphate synthase (DXS) 51 100

    1-deoxy-D-xylulose-5-phosphate reductoisomerase (DXP) 83 118

    2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase (MCT) 22 22

    4-diphosphocytidyl-2-C-methyl-D-erythritol kinase (CMK) 63 126

    2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase (MCS) 22 22

    1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (HDS) 0 0

    1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate reductase (HDR) 20 12

    isopentenyl diphosphate/ dimethylallyl diphosphate isomerase (IDI) 36 20

    isopentenyl diphosphate/ dimethylallyl diphosphate synthase (IDS) 0 0

    MVA

    pathway

    acetoacetyl-coenzyme A thiolase (AACT) 39 21

    3-hydroxy-methylglutaryl coenzyme A synthase (HMGS) 0 0

    3-hydroxy-methylglutaryl coenzyme A reductase (HMGR) 0 0

    Mevalonate kinase (MK) 0 50

    phosphomevalonate kinase (PMK) 0 0

    mevalonate disphosphate decarboxylase (MDC) 20 0Coumarin

    biosynthesis

    pathway

    phenylalanine ammonia lyase 29 45cinnamate 4-hydroxylase 28 704-coumarate CoA ligase 322 215

    Secondary metabolite genes

  • SNP detection

    • SNP = Single Nucleotide Polymorphism• parameters:

    • 8x coverage; 90% nucleotide frequency; 20% minor allele frequency• 20,952 ‘true’ SNPs, average coverage 20x

    0

    500

    1000

    1500

    2000

    2500

    8 12 16 20 24 28 32 36 40 44 48 52 56 60

    Nu

    mb

    er

    of S

    NP

    s

    SNP coverage depth

    Distribution of the number of SNPs by read coverage depth

  • SNP detection

    tridentataSNPS

    vaseyanaSNPS

    Both SNP types

    Montana wyomingensis

    138 306 251

    Utah wyomingensis

    157 424 458

    • suggests origin of tetraploid ssp. wyomingensis via mixed ancestry• more similar to ssp. vaseyana

    total

    695

    1,039

  • SSR detection

    • parameters• 2-7 3-5 4-5 5-5 6-5 7-5 8-5 9-4 10-4 (SSR motif length – repeat number) • 100 bp interruption distance

    • 1,003 SSRs in basin• 507 SSRs in mtn

    Frequency and distribution of SSRs in two big sagebrush subspecies

    0

    100

    200

    300

    400

    500

    600

    700

    800

    di tri tetra penta hexa

    Nu

    mb

    er

    of r

    ep

    eat

    s

    Repeat motif

    tridentata

    vaseyana

  • From here?

    - Evolution, intermixing and more evolution of big sagebrush subspecies

    - Phylogenetic relationship among big sagebrush populations distributed inthe intermountain US

    - Sequence capture approach (~350 genes, 55 populations)

  • - Common garden studies to look at variation among the populations

    - Later, link traits with genes in Artemisia tridentata populations

    From here?

  • Acknowledgements

    - Funding: USDA-FS, RMRS, National Fire Plan, GBNPSIP

    - Rich Cronn

    - Jared Price

    - Nancy Shaw

    - Covey Jones

    - Brian Knaus

    - Felix Jimenez

    - Scott Yourstone