additional research goals using transcriptomic datahonavar/transcriptomics-notes-tuggle-bcb570.pdfmy...

25
My definition of Transcriptomics Research Research on profile of RNA transcripts and abundance under specific conditions High-throughput/high dimensional data Purposes: Define RNA profile expressed in specific tissue or cell type- “transcriptome” Identify RNAs responding to treatment Identify RNAs responding to genetic differences Additional Research Goals using Transcriptomic Data Identify gene sets coordinately responding to treatment/genetic change Identify co-regulation Identify transcriptional regulatory proteins Identify regulatory pathways Identify dependencies among pathways Integrate with QTL mapping to find QT loci regulating expression Methods used to Generate Transcriptomic data 1. High volume cDNA sequencing (Expressed Sequence Tag (EST) projects 2. Quantitative PCR 3. Serial Analysis of Gene Expression and New HT Sequencing methods 4. Microarray-based Methods and Results

Upload: others

Post on 10-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

My definition of Transcriptomics Research

Research on profile of RNA transcripts and abundance under specific conditionsHigh-throughput/high dimensional dataPurposes:♦Define RNA profile expressed in specific tissue

or cell type- “transcriptome”♦ Identify RNAs responding to treatment♦ Identify RNAs responding to genetic differences

Additional Research Goals using Transcriptomic Data

Identify gene sets coordinately responding to treatment/genetic change♦ Identify co-regulation♦ Identify transcriptional regulatory proteins♦ Identify regulatory pathways♦ Identify dependencies among pathways

Integrate with QTL mapping to find QT lociregulating expression

Methods used to Generate Transcriptomic data

1. High volume cDNA sequencing (Expressed Sequence Tag (EST) projects

2. Quantitative PCR3. Serial Analysis of Gene Expression and

New HT Sequencing methods4. Microarray-based Methods and Results

Page 2: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

1. EST Sequencing: What is an EST and how can it be used to investigate the genome?

cDNA library creation

AAAAAAAAAAAAAA

AAAAAAATranscriptioncreates mRNApopulation-specific to cell

mRNA

Sequencing of individual cDNA

inserts to generate EST

Comparative mapping

Comparative Sequence Analysis

Expression Analysis

Protein Functional Analysis

EST: 400-500 bp single-pass sequence

from expressed portion of genome

-97-99% accurate

EST Coverage in dbEST

http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html

……

Large Pig EST Projects (> 5,000 seq)*

*as of June 2006, total of 566,277 deposited: Now 1.7 million

Institution Contact Name ESTs

Submitted USDA-ARS Meat Animal Research Center Smith TPL 197,149

National Institute of Agrobiological Sciences (Japan) Uenishi H 137,092

Roslin Institute (U.K.) Anderson SI/Archibald A 56,364 University of Missouri-Columbia Prather RS 37,806 Institut National de la Recherche

Agronomique (France) Tosser-Klopp G/ Bonnet

A 24,956 Iowa State University Tuggle CK 20,983

Animal Technology Institute (Taiwan) Lee W-C 14,266 USDA-Plum Island Neilan JG 14,240

Oklahoma State University DeSilva U 12,825 Michigan State University Ernst C/ Coussens P 12,804

Nevada Department of Agriculture Rink A 11,556 National Chung-Hsing University (Taiwan) Huang M-C 9,373

University of Nebraska-Lincoln Pomp D 5,414

Page 3: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Largest single porcine EST projectContaining 823,871 novel ESTs and 398,837 public ESTsTissues came from both fetal and adult pigs♦ Brain, eye, circulatory (heart, aorta), gut, bone marrow,

cartilage, glandular (suprarenal, thyroid, mammary, lymphatic), muscle, mucosal membrane, and reproductive

97 different non-normalized libraries

Current total over 1,700,000 porcine ESTs in NCBI databases- O.C. starting point to annotate pig AffyNon-normalized versus normalized libraries

Sino-Danish EST Data Released

Normalization of libraries is useful to more deeply sequence the RNA complement of a specific tissueRNA frequencies among genes are highly different-scale across 4-5 orders of magnitude !Normalization is using hybridization among highly abundant members of the RNA pool to remove those sequences, bringing the RNA frequencies, and thus the sampling, closer to a “normal”distributionBut only sequence data from non-normalized libraries can be used to estimate expression levels

Non-normalized versus normalized libraries

~9,000 sequences/library averageThey have estimated expression levels based on EST frequency within their datasetThey have also identified putative SNPs within these sequences-> libraries from multiple breeds

www.piggenome.dk

Sino-Danish EST Data Released

Page 4: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Confirmation of Normalization in Pig cDNA Libraries- ISU

Library Tissue ESTs Clusters Unique FractionName Source** generated in Library Clusters UniqueA1-A3 Ant. Pituitary 1,235 1,054 560 0.45NA Normalized Ant. Pit. 963* 835* 430* 0.51*AY0 Term Placenta 1,411 1,040 563 0.40AY1 Term Placenta-Norm. 3,565 2,899 2028 0.57CP0 Uterus (D12/14) 1,669 1,229 701 0.42CP1 Uterus (D12/14)-Norm. 1,836 1,635 1057 0.58E3-E6 Whole Embryo/Fetus 1,274 1,061 603 0.47H1-H5 Hypothalamus (0, 5, 12 DE) 1,104 1,031 542 0.49O1-O3 Ovary (0, 5, 12 DE) 1,429 1,289 710 0.50

Totals 14,607* 12,048* 7,194*

*estimates based on analysis of first 560 sequences

Use of mixed tagged libraries for efficient EST production

Library Tissue ESTs Clusters Unique FractionName Source** generated in Library Clusters UniqueA1-A3 Ant. Pituitary 1,235 1,054 560 0.45NA Normalized Ant. Pit. 963* 835* 430* 0.51*AY0 Term Placenta 1,411 1,040 563 0.40AY1 Term Placenta-Norm. 3,565 2,899 2028 0.57CP0 Uterus (D12/14) 1,669 1,229 701 0.42CP1 Uterus (D12/14)-Norm. 1,836 1,635 1057 0.58E3-E6 Whole Embryo/Fetus 1,274 1,061 603 0.47H1-H5 Hypothalamus (0, 5, 12 DE) 1,104 1,031 542 0.49O1-O3 Ovary (0, 5, 12 DE) 1,429 1,289 710 0.50

Totals 14,607* 12,048* 7,194*

*estimates based on analysis of first 560 sequences

mRNACreate cDNAfrom mRNA

Size Selectionto obtain

best clones

Ligate into pT7T3-Pacplasmid

Anneal tag-T18primer

Production of Tissue-tagged cDNA libraries

Insert newplasmids intobacteria to make clone libraries

Create mixed normalized libraries

Sequence

Page 5: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

....TAAGCTTGCGGCCGCCAAACTTTTTTTTTT....

Library Tag Identification

Plasmid Vector

Library Tag (example)

Poly-T tail

cDNA

Not I site

Sequencing

Use of mixed tagged libraries for efficient EST production

Library Tissue ESTs Clusters Unique FractionName Source** generated in Library Clusters UniqueA1-A3 Ant. Pituitary 1,235 1,054 560 0.45NA Normalized Ant. Pit. 963* 835* 430* 0.51*AY0 Term Placenta 1,411 1,040 563 0.40AY1 Term Placenta-Norm. 3,565 2,899 2028 0.57CP0 Uterus (D12/14) 1,669 1,229 701 0.42CP1 Uterus (D12/14)-Norm. 1,836 1,635 1057 0.58E3-E6 Whole Embryo/Fetus 1,274 1,061 603 0.47H1-H5 Hypothalamus (0, 5, 12 DE) 1,104 1,031 542 0.49O1-O3 Ovary (0, 5, 12 DE) 1,429 1,289 710 0.50

Totals 14,607* 12,048* 7,194*

*estimates based on analysis of first 560 sequences

3. Serial Analysis of Gene Expression:Digital Analysis

http://www.sagenet.org/findings/index.html

Serial analysis of gene expression (SAGE) is a method for comprehensive analysis of gene expression patterns- no cDNAs produced.

Three principles underlie the SAGE methodology:

1. A short sequence tag (10-14bp) contains sufficient information to uniquely identify a transcript provided that that the tag is obtained from a unique position within each transcript (may not be universally true)

2. Sequence tags can be linked together and then cloned and sequenced

3. Quantification of the number of times a particular tag is observed provides the expression level of the corresponding transcript.

A digital analysis of expression

Page 6: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Digital Analysis of Gene Expression:Limitation of SAGE is that large numbers of tags need to be accessed to find rare

transcripts.

New extremely high throughput sequencing technologies may solve this problem.

Example: Illumina technology

Estimated that a single copy transcript exists in about a frequency of about 1 per 350,000.

Thus a single copy will be “read” 3 time per 1,000,000 read or 3 TPM

With 4 million reads per run, every transcript should have about 12 reads or more

http://www.illumina.com/pagesnrn.ilmn?ID=70#234

Illumina Digital Analysis of Gene Expression:Overview of the extremely high throughput Illumina technology

With 4 million reads per run,every transcript should have about 12 reads or more

Process can be scaled as well- if you need more data on low level transcripts, you can simply sequence a second time

Illumina Digital Analysis of Gene Expression:

Reproducible results

http://www.illumina.com/pagesnrn.ilmn?ID=70#234

Comparable results with QPCR

Page 7: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

2. Quantitative Real-time-PCR- testing expression one gene at a time

http://www.sigmaaldrich.com/Life_Science/Molecular_Biology/PCR/Key_Resources/Probed_based_QPCR_Animation.html

Real time: refers to the fact that the amplification of the specific sequence is measured in real time, rather than more traditional endpoint analyses on gels

Animation of fluorescent-probe-based Real-time PCR

2. Quantitative Real-time-PCR- testing expression one gene at a time

Fluo

resc

ence

CT

Quantitative Real time-PCR- ΔΔ Ct method

Ref: http://pathmicro.med.sc.edu/pcr/realtime-home.htm

Normalization of data using a “control” gene-assumes that control gene is not affected by treatment

Fold change calculation:

2 ΔΔ Ct : 2,702 fold increase in IL1b RNA due to treatment

Page 8: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Two general methods to make microarrays

In situ synthesis:Affymetrix GeneChip

Spotting cDNA segments or oligos onto glass slides

4. Transcriptional Profiling using microarrays

Adapted from Nuwaysir et al., 1999

Generic Two-color MicroarrayProcedure

Data Analysis Critical !!Oligos or cDNA

fragments spotted

Data Analysis !!

Page 9: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Important aspects to consider

When using microarray technology to reliably measure biology

♦Sources of variability♦Experimental Design♦Statistical analyses of data♦Validation ♦Standardization

Sources of variability

For cDNA spotted arrays:

♦ Accuracy of clone-tracking and PCR amplification of cDNAs♦ Spotting quality♦ Spot detection/analysis

♦ Identity of cDNA spotted-annotation of sequencewhich is often partial

Page 10: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Sources of variability

For both cDNA and oligonucleotide spotted arrays:♦Amount of nucleic acid spotted across arrays♦Spotting consistency and scanning quantification♦Variable hybridization protocols and results♦ Inter-laboratory comparison

?

Affymetrix Pros and Cons

Benefits♦ Consistency from Chip to Chip due to manufacturing

technology and QC♦ No clone tracking or spotting variation♦ Multiple values collected for each transcript- data depth high♦ Mismatch control improves specificity♦ High level of coverage of genes (especially compared to

currently available livestock spotted arrays)Limitations♦ Inflexible design; difficult to rapidly change feature content♦ High cost may decrease critical biological replication

Experimental Design Issues

♦ Reference design versus Loop design » Most relevant to spotted arrays

D. Nettleton

Page 11: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Experimental Design Issues

♦Reference design versus Loop design » Most relevant to spotted arrays

D. Nettleton

Reference Design

♦ Heavily used in initial pre-clinical settings- to compare “normal” to “cancerous”

♦ Could be useful to determine “abnormal” expression pattern♦ Problem- “known” sample is measured the most!

Reference

Loop Design

♦ All samples measured the same number of times in “loop”♦ Works well for multiple treatments, a logical series of

treatments (concentrations of drug, etc) as well as times series after treatment

Page 12: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Statistics

♦Initial work simply compared levels of Cy3 and Cy5 expression, set an ad hoc 2 fold difference in expression (i.e., reference design).

♦Clear that standard linear model ANOVA methods are more appropriate, also need to include effect of multiple testing

♦False discovery rate calculations-described later

Validation

To verify the expression patterns of key genes showing differential expression in the profiling experiment- one gene at a time.Main tool here is real-time quantitative RT-PCRLabor-intensive, each assay must be developedOther approaches coming forward

StandardizationData warehousing- public access- NCBI GEOMeta-analysis- improved powerSuggestions from NIST meeting* on the use of microarrays in the clinic♦ RNA reference materials

» Known/verifiable set of RNAs for validation of methods» Spike-in set of artificial RNAs for validation of specific

hybridizations

* Cronin et al., Clin. Chem. 50:1464 (2004)

Page 13: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Application of transcriptomics: An RNA expression-based clinical test available

commercially

van 't Veer et al., Gene expression profiling predicts clinical outcome of breast cancer, Nature, 2002 415: 530.

- Expression patterns of 70 genes in breast cancer tissue samples were found to accurately predict metastatic outcome (used historical tissue samples)- This “gene expression signature” was found in a large expression profiling study comparing normal and cancerous tissue samples- Microarray-based analyses used

“MammaPrint” diagnostic test for breast cancer prognosis

Validation of “Gene

Expression Signature” in

second cohort of patients

van de Vijver et al., A gene expression signature as a predictor of survival in breast cancer, New England Journal of Medicine, 2002 347: 19.

Validation of “Gene Expression Signature”

van de Vijver et al., New England Journal of Medicine, 2002 347: 19.

Page 14: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

van de Vijver et al., New England Journal of Medicine, 2002 347: 19.

Validation of “Gene Expression Signature”

Project 1. Validating a New Porcine Oligonucleotide Array

Qiagen-Operon synthesized a large (~13,000) set of oligonucleotides last year with collaboration from a USDA-NRSP-8 committee.We have validated this new microarray (Zhao et al., 2005 Genomics 86:618):♦ Evaluated utility of sequence set for biology

» Identify number of spots with signal above background» Determine expression pattern for four tissues

♦ Tested specificity and annotation of oligonucleotides» Determine correlation of expression pattern for selected

spots with the expected pattern found for the annotated match in human/mouse

Testing of Porcine ArrayFour tissues: liver, lung, muscle, small intestine.RNA labeled using Cy3 and Cy5. Six biological replicates per tissue, each measured twice (each dye) for 48 measurements on 24 slides.Data analysis:♦ Normalization: LOWESS procedure♦ Linear Model ANOVA (dye, tissue as fixed effects and slide

and animal as random effects)

Page 15: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Comparison of labeling same target RNA with

Cy3 and Cy5

Using current protocols:•Only 50 spots (0.38%)

have greater than 2 fold difference between targets

•Technical reproducibility good

Further testing/use of Porcine Array

Evaluate utility of sequence set for biology♦ Identify number of spots with signal above background♦ Determine expression pattern for four tissues

Test specificity and annotation of oligonucleotides♦ Determine correlation of expression pattern for selected

spots with the expected pattern found for the annotated match in human/mouse

Example Slide Scan

Page 16: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Comparison of labeling different

target RNA

Small intestine Cy5

Liver (Cy 3)

How many spots represent expressed genes in each tissue?

Background: average of Arabidopsis spots signals

Tissue Selective Expression

Expected false positives: 13 per tissue

Lung Liver Muscle SI

266

147

405

538

Page 17: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Real-time quantitative RT-PCR (Q-PCR)

Tis sue e xpre ssion le vel ‡Oligo I D GeneSymb ol C T

* or ΔC T† Liv er Lung Mu scle Sm all

Intes tine

Agree w ithm icroarra y

results?

C T ± S D 37.8 ± 0 .7 32. 5 ± 0 .3 36.1 ± 0.8 36.6 ± 0.9SS 00002529 NO S2AΔC T ± S D 20.7 ± 0 .6a 15. 8± 0 .5c 18.7± 0 .4b 19.8 ± 1 .1ab yes

C T ± S D 30.4 ± 1 .8 28. 2 ± 0 .7 30.1 ± 0.9 31.7 ± 0.4SS 00010183 ICA M 1ΔC T ± S D 13.4± 1 .9ab 11. 6± 1 .0b 12.7± 0 .8ab 14.9± 0 .6a yes

C T ± S D 23.1 ± 0 .8 20. 3 ± 0 .6 25.6 ± 0.2 20.3 ± 0.1SS 00000872 CASP1ΔC T ± S D 6.0± 0.2b 3.7± 0 .3c 8.1± 0 .3a 3.5± 0 .2c yes

C T ± S D 27.9 ± 1 .3 21. 1 ± 0 .6 28.4 ± 0.8 27.7 ± 0.3SS 00006633 IND OΔC T ± S D 10.8± 1 .2a 4.4± 0 .4b 11.0± 0 .8a 10.9± 0 .5a yes

C T ± S D 27.7 ± 1 .9 27. 2 ± 1 .1 27.9 ± 1.2 29.4 ± 0.4SS 00004427 STAT6ΔC T ± S D 10.6± 1 .9a 10. 5± 1 .3a 10.5± 1 .1a 12.2± 0 .6a no

C T ± S D 22.2 ± 1 .2 20. 4 ± 0 .5 23.6 ± 0.8 21.1 ± 0.2SS 00002396 IRF 1ΔC T ± S D 5.1± 0.6b 3.7± 0 .2c 6.2± 0 .4a 4.4± 0 .3bc yes

C T ± S D 23.8 ± 0 .7 22. 8 ± 0 .5 25.0 ± 0.6 23.8 ± 0.2SS 00002273 IRF 2ΔC T ± S D 6.7± 0.8ab 6.2± 0 .2b 7.6± 0 .5a 7.0± 0 .3ab yes

C T ± S D 20.2 ± 0 .5 20. 6 ± 0 .6 21.8 ± 0.7 21.9 ± 0.2SS 00007514 M AKP 14ΔC T ± S D 3.1± 0.5c 3.9± 0 .3b 4.4± 0 .3b 5.2± 0 .1a yes

C T ± S D 21.0 ± 0 .7 20. 2 ± 0 .4 21.6 ± 0.5 20.8 ± 0.4SS 00008774 M AKP 1ΔC T ± S D 3.9± 0.3ab 3.5± 0 .3b 4.2± 0 .1a 4.1± 0 .9a yes

C T ± S D 28.4 ± 0 .3 25. 4 ± 0 .6 28.7 ± 0.5 27.1 ± 0.3SS 00000832 TG Fβ1ΔC T ± S D 11.3± 0 .5a 8.8± 0 .8b 11.3± 0 .7a 10.4± 0 .5a no

C T ± S D 24.7 ± 1 .0 20. 6 ± 0 .6 22.2 ± 0.4 23.7 ± 0.1SS 00000662 TG Fβ2ΔC T ± S D 7.6± 0.7a 3.9± 0 .4b 4.8± 0 .1b 6.9± 0 .2a yes

SS 00004196 RP L32 C T ± S D 17.1 ± 0 .7 16. 7 ± 0 .4 17.4 ± 0.5 16.8 ± 0.2 yes

9 of 11 genes agree with microarray results9 of 11 genes agree with microarray results-- 9 show statistical difference9 show statistical difference-- 1 more has same direction as MA1 more has same direction as MA

Additional verification

How else can we check the oligo expression data?Check data against expression of ortholog in other species[Also check position of gene in genome- confirm conservation of gene order across species as well]

Example: pig to mouse patterns

Skeletal muscle

mouse data

Pyruvate Kinase M2

SI

Liver

Lung

pig data

Page 18: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Project 2. Infection Response Transcriptomics

Current Purposes:♦ To use bioinformatics to investigate expression profiles in

porcine mesenteric lymph node during Salmonella infection♦ Initiate characterization of the regulatory pathways controlling

host response to Salmonella challenge♦ Test new 23K Affymetrix Porcine GeneChip

Long-term Goal: ♦ Identify genes to target for improving disease resistance --- few

QTL studies have identified genome regions important for resistance to Salmonella or other bacteria in pigs

Why lymph node?

Lymph node is the place where the innate (early,

non-specific immune response talks to the

adaptive (later, specific) immune system

Why mesenteric lymph node?

Mesenteric lymph node (MLN) is the

place where Salmonella usually

enters the body

Page 19: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Caveats for studying Immune Response at the MLN transcriptome level

♦Advantage of sampling the mesenteric lymph node is that we are studying at least a portion of the real host response♦Disadvantage is that the results are the combined efforts of a number of cell types which may or may not have responded to Salmonella♦We chose to study the gut-associated lymph tissue of specifically challenged animals- an experiment part-way between the most controlled type of study (single-cell, single-pathogen challenge) and the least controlled type of study (naturally challenged field population)

Experimental Design

• Pigs infected with 1 billion cfu S. Choleraesuis or S. typhimurium.

• Lymph nodes collected: Uninfected, 8h, 24h, 48h, and 21d post-infection.

• Three pigs per time point:

Wang et al., 2007

Affymetrix Probeset Annotation

1. Sequence based similarity using Affymetrix consensus sequence (2004 porcine data)

♦ Used BLAST to RefSeq Database ♦ BLASTN used; criteria e -10 maximum score♦ TBLASTX used; criteria e-5 maximum score♦ Hit rate: to human RefSeq: 14, 949/ 23,937* or 62%

• *Low value annotations: (918)480: Chr xx orf yy111: FLJxxxxx hypothetical protein159: KIAA cDNAs120: LOCxxxxx annotations48: MGC hypothetical protein

2. GO function/component/process annotation♦ Used GO terms associated with mouse RefSeq at NCBI♦ 1 or more GO terms were matched to 10,820 Affymetrix

probesets

Page 20: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

0200400600800

1000

8h/0 24h/0 48h/0 21d/0Time cours

No. of the differential expressed genes aftTyphimurium infected

down-regulatedup-regulated

0200400600800

10001200

8h/0 24h/0 48h/0 21d/0

Time cours

No. of the differential expressed genes aCholeraesuis infected

down-regulateup-regulated

Summary of differentially expressed genes

• p < 0.01 and fold change > 2.0

• FDR ranges from 0.04 - 0.26

Differentially expressed in ST infection

Differentially expressed in SC infection

Analysis of differentially expressed gene expression patterns

• How to further study ~ 1,000+ differentially expressed genes??

• Recognize patterns of expression of sets of genes using clustering tools

• Correlate with known biology through annotation:• to understand specific immune response(s)• to establish benchmark patterns marking health/disease• to identify and characterize transcriptional

regulatory networks

Why transcription factors and regulatory networks

• Transcription factor (TF) function is very close to RNA expression data

• Emphasize comparative analysis to use wealth of information from human/mouse data

• Meta-analysis of pathogen response shows fundamental pathways in common across many host cell types and pathogens (Jenner and Young 2005)

GenePromoterDNA

⇑recognition sequence

Page 21: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Conserved immune response networks (Jenner and Young 2005)

• Meta-analysis of 782 experiments and 77 different host-pathogen interactions studied.

• Epithelia, Endothelia, Macrophage, PBMC, DC, Liver, skin, fibroblast, stomach, T cell, B cell….

• 12+ viruses, 10+ bacteria plus stimulants--LPS, etc• Clustering analysis of data: direction of expression

response across all experiments was compared• 511 genes showed similar pattern of expression

upon infection --> co-expressed• Due to co-regulation?

Common host response networks

RG Jenner and RA Young

2005

Hierarchical Clustering of Genes

• 848 differentially expressed genes in ST infection

• 1,853 differentially expressed genes in SC infection

• p<0.01; q < 0.24; fold change >2

• all pair-wise comparisons used

• Heat map was built

• genecluster3 and treeview software

Page 22: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Hierarchical Clustering of DE genes - ST infectionUpUp--regulatedregulatedDownDown--regulatedregulated

Up-regulatedIn ST infection at 24 hr,but highest at 48 hr in SC infection - 105 genes

“Inflammatory immune response/NFkB” cluster contains;6 chemokines,6 interferons/interleukins, many NFkB targets

Q-PCR confirms Affymetrix MLN Gene Expression Patterns

Most areKnown NFκB target genes

Regulatory Networks Revealed by Time-Course “Co-Expression” Data

- Looked at genes induced at two stages of the acute SC infection (annotated only)

- Early response genes (E)- 83 genes; up at 8 and/or 24 hpi

- Late response (L)- 320 genes; induced only at 48 hpi

Page 23: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Text-mining to annotate Regulatory Networks

- Pathway Studio analysis of all common regulators of E genes

- Protein or complex that has a PubMed abstract with regulatory links to at least two genes in list

- 50 E genes were able to be so linked to other members of the E group

- 20 of these 50 genes are known to be regulated by the NFκB complex

Specific regulatory network:NFκB known co-regulated genes in E group

Text-mining to annotate Regulatory Networks

- Pathway Studio analysis of all common regulators of E genes

- 20 of these 50 genes are known to be regulated by the NFκB complex

- 30 other “co-expressed” genes---potential novel NFκB targets?

- How to test this?

Page 24: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

Evidence for TF Regulatory Network

Can we provide additional evidence that NFκB is regulating these gene sets in the pig?

Initial step:Use human promoter sequence as a surrogate to

look for regulatory sequences known to mediate NFκB activity at target genes

DNA Motifs at DE Genes: Evidence for TF

Regulatory Network

Data AfterMLN

Expression Analysis

Find Human Orthologous

Promoter(-1500 to +500

bp)

PERL scripts extract sequences from

GenBank

Clustering or other criteria

Group DE Genes by Expression

Similarity

Motif-Finding Software:TFM-Explorer

identified shared “windows” across group that contain NFκB motifs

Promoter regions with

Over-representedNFκB Motifs

Orthology based on BLAST of human

RefSeq

Putative NFkB Target Genes found by Motif Analysis of DE Gen

0

100

200

300

400

500

600

Early Late All

Early Late

Group Category

Known NFkB targets Found Known NFkB targets not Found

Unknown NFkB targets Found Unknown NFkB targets Not foun

Evidence for Novel NFκB Regulatory Targets

Number of Genes with Significant Windows with Over-representedNFκB Motifs

Putative unrecognized NFκB

target genes?

Page 25: Additional Research Goals using Transcriptomic DataHonavar/Transcriptomics-notes-tuggle-bcb570.pdfMy definition of Transcriptomics Research zResearch on profile of RNA transcripts

0 +500-1500

UBD promoters for human, mouse, pig

Example: Is UBD a putative NFκB target gene? Are there NFκB motifs at UBD?

NFκB motif human

NFκB motif mouse

NFκB motif pig

-1150 414

-1061 341

-1180 316

0 +500-1500

0 +500-1500

Testing by in vitro binding

EMSA using porcine UBD motif and mouse macrophage cell nuclear extract

LPS - + + +Competitor - - SP NSP

NFκB-DNA complex All four porcine

promoter motifs tested by EMSA were bound by nuclear proteins

CollaboratorsTuggle Lab

♦ Dr. Shu-hong Zhao♦ Dr. Yan-fang Wang♦ Oliver Couture♦ Sarah Orley ♦ Sender Lkhagvadorj Microarray development

NRSP8 Swine Genome User Committee Chris Tuggle, Co-Chair Daniel Pomp, Co-Chair Max Rothschild, CoordinatorJon Beever, Cathy Ernst, Diane MoodyMike Murtaugh

Qiagen-OperonSajeev Batra

University of MinnesotaVivek Kapur, Archana Deshpande

Dan Nettleton♦ Justin Recknor♦ Long Qu

Univ. Iowa Bioinformatics♦ Dr. Tom Casavant♦ Dr. Todd Scheetz♦ Bart Brown

USDA-ARS-Beltsville♦ Dr. Joan Lunney♦ Dr. Harry Dawson♦ Dr. Daniel Kuhar