how to avoid misinterpreting microarray data sungchul ji, ph.d. department of pharmacology and...

35
HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 [email protected] (DIMACS Workshop on Machine Learning Techniques in Bioinformatics, Center for Discrete Mathematics and Theoretical Computer Science, Rutgers University, Piscataway, July 11-12, 2006)

Post on 20-Dec-2015

224 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

HOW TO AVOID MISINTERPRETING

MICROARRAY DATA

Sungchul Ji, Ph.D.Department of Pharmacology and Toxicology

Rutgers UniversityPiscataway, N.J. 08855

[email protected]

(DIMACS Workshop on Machine Learning Techniques in Bioinformatics, Center for Discrete Mathematics and Theoretical Computer Science, Rutgers University,

Piscataway, July 11-12, 2006)

Page 2: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

The DNA Microarray Technology [1] inaugurated a new era in cell biology in the mid-1990’s. The integration of measurement, analysis and

interpretation of genome-wide expression data is essential for the successful application of this revolutionary technology to cell biology. Of these three aspects of the new technology, interpretation has been least developed, as evidenced by misinterpretations (secondary to conflating transcription

rates with transcript levels) of DNA microarray data found in numerous publications.

Measurement

AnalysisInterpretation

DNA Microarray Technology

Page 3: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

• A theoretical model of the living cell is deemed essential in interpret DNA mciroarray data.

• The Bhopalator model of the cell proposed in 1985 may provide a useful starting point [2].

• Because mRNA is broken down into nucleotides (step 2) as rapidly as it is synthesized (step 1), its concentration at any time is determined by both transcription (step 1) and degradation (step 2) rates.

• It has been the common practice in the field of microarray technology to assume that mRNA levels are determined mainly by transcription rate, but this assumption remains to be substantiated. On the contrary, simultaneous measurements of transcript levels (TL) and transcription rates (TR) from budding yeast subjected to glucose-galactose shift [3] indicate that TL is always controlled by the dual actions of transcription and transcript degradation, except occasionally by degradation alone [3a].

• Another error commonly committed in the field is to conflate “gene expression” which is a rate process (i.e., concentration change per unit time) with “mRNA levels” which are just concentrations.

Gradients

Proteins

RNA

Genes

InputOutput

AminoAcids

Ribo-nucleotides

1

3

5

2

46

7

8

910

Figure 1. The molecular model of the cell known as the Bhopalator [2].

Page 4: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

How To Avoid Misinterpreting DNA Microarray Data:

Do Not Say Gene Expression.

Say mRNA Levels.

• ‘Gene expression’ interpreted as transcription is not the same as mRNA levels: The former is a process and the latter a concentration.

• Gene expression rate, dnS/dt, and mRNA levels, n, are mathematically related as follows: dn/dt = dnS/dt – βdnD/dt, where and β are constants, and dnD/dt is the rate of mRNA degradation. By a suitable integration, we obtain the following equation:

Δn = dnS – βdnD,

where is the integration over a given time period and dnS can be calculated as the AUC (Area Under the Curve) of the function, dnS/dt = f(t).

• Mathematically speaking, Δn is a functional of dnS/dt. That is, mRNA levels are functionals, not a function, of rates of gene expression.

Page 5: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

The Six Modules of RNA Metabolism Observed in Budding Yeast Undergoing Glucose-Galactose Shift

• The mechanisms of interactions between transcription and transcript degradation induced by the glucose-galactose shift in budding yeast have been studied based on the dual measurements of TL (transcript levels) and TR (transcription rates) by Perez-Ortin and his coworkers [3, 3a].

• Δni denotes the changes in the number of the ith mRNA molecule experienced by a cell during a given time period, and nS,i and nD,i indicate the numbers of the ith mRNA molecules per cell synthesized and degraded between two time points, respectively. The subscript i is omitted below for convenience.

• There are six, and only six, modules of mRNA level control observed in yeast during glucose-galactose shift, labeled as A, B, C, D, E and F in the following table. Each module is characterized by a unique numerical value of the degradation-to-transcription ratio, nD/nS. This ratio was calculated from the equation relating the changes in transcript abundances (Δn) due to transcript synthesis (nS) and transcript degradation (nD) and the experimentally measured values of Δn and nS in [3, 3a]. Δn = nS – nD (1)

• The above equation can be visualized as a 3-dimensional plane (hyperplane) as shown in the figure next to the table in the following slide.

Page 6: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

Δn

Relative sizes of nS and nD

nD/nS

Modules of mRNA Metabolism

+

nS > nD > 0 < 1(ascending state with dual control)

A

nS > nD = 0 = 0

(ascending state with transcriptional control)

B

0

nS = nD > 0 = 1 (steady state) C

nS = nD = 0 (mathematically undefinable; equilibrium state)

D

-

0 < nS < nD

> 1(descending state with dual control)

E

0 = nS < nD

Infinity(descending state with degradational control)

F

Page 7: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

The Cell-Brain-Computer Relation

• Ontologically, cells gave rise to the human brain (step 1) which in turn gave rise to computers (step 2).

• Epistemologically, the well-known properties of computers will facilitate our understanding of the functioning of the brain (step 3) and the cell (step 5). Knowing how our brain works can also help us understand how cell works (step 4), as exemplified by the recent proposal that cells use a language whose principles share commonalities with those of human language [4].

• As an example of the computer science helping biologists to understand the workings of the cell (step 5), one may cite the application of the SVM (support vector machine) approaches [5] to analyzing DNA mciroarray data.

Page 8: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

The Cell as the Smallest DNA-Based Molecular Computer(S. Ji, BioSystems 52, 123-133, 1999)

DNA

Cells Brains Computers1 2

34

5

Ontogeny = 1, 2

Epistemology = 3, 4, and 5

Page 9: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

The Budding Yeast as the Hydrogen Atom of Cell Biology: The DNA microarray technique may play the role of the atomic spectroscopic technique in physisics

which helped unravel the structure of the hydrogen atom [6].

Page 10: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

The Cytoskeleton(Mouse Embryonic 3T3 cell)

Page 11: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu
Page 12: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu
Page 13: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

RP RT H

(+) DNA ------ > (-) mRNA ------ > (+) DNA ------- > (-) DNA. | | DNA | Polymerase RP = RNA polymerase | or RT = Reverse transcriptase | Synthesizer H = Hybridizes to; no enzymes needed | \/ (-) DNA

(Used to fabricate DNA microarrays)

 

The Complementary (+/-) Relations among the various DNA and RNA molecules involved in Microarray Experiments

Page 14: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

DNA Microarrays [1]

• There are two kinds of DNA microarrays – cDNA or EST microarray and the Gene Chips.

• One microarray can measure 104 mRNA levels simultaneously.

• mRNA levels in the cell are determined by mRNA synthesis (Vsyn) and mRNA hydrolysis or degradation (Vhyd), because the rate of change in mRNA levels (R) inside the cell is always:

dR/dt = Vsyn - Vhyd (2)

• Only when certain kinetic conditions are met (discussed below) can mRNA levels measured with DNA microarrays can be interpreted as reflecting rates of gene expression [1, 3a].

• Each square can recognize one kind of mRNA molecules.

Page 15: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

How DNA Microarray Experiments are Done [1]

1. Isolate mRNA from broken cells.2. Synthesize fluorescently labeled cDNA

from mRNA using reverse transcriptase and fluorescently labeled nucleotides.

3. Prepare a microarray either with EST (Expressed Sequence Tag) or oligonucleotides (synthesized right on the microarray surface; see Affimetric,Inc.).

4. Pour the fluorescently labeled cDNA preparations over the microarray surface to effect hybridization. Wash off excess debris.

5. Measure fluorescently labeled cDNA hybridized to a microarray using a computer-assisted microscope.

6. The final result is a table of numbers, each number registering the fluorescent intensity which is in turn proportional to the concentration of cDNA (and hence ultimately mRNA) located at row x and column y, row indicating the identity of genes, and y the conditions under which the mRNA levels are measured.

1

2

3

4

5

5

6

Page 16: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

Covalent and Noncovalent Interactions in Microarray Experiments

CTAATGT (Original DNA) 1 2

3

3

1) Transcription inside the cell

2) Reverse transcription inside the test tube

3) Hybridization on the microarray surface

4) Probably millions of cDNA molecules are attached on each square on a DNA microarray.

5) To the extent that mRNA is stable, the amount of mRNA formed during Step 1 can be estimated from the amount of cDNA bound to microarray surface in Step 3.

6) But mRNA molecules inside the cell are unstable, because they are rapidly hydrolyzed into ribonucleotides by various ribonucleases. Therefore, it is impossible to estimate how many mRNA molecules are formed in Step 1 by measuring only how many molecules of cDNA are bound to microarray surface in Step 3 (more on this later).

Page 17: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

The changes in mRNA levels of human fibroblasts (cells of connective tissues that synthesize and secrete fibrillar procollagen, fibronectin, and collagenase) measured with DNA microarrays over a time period of 24 hours.

Green represents a decrease in mRNA levels, black no change, and red an increase.

Each kind of mRNA molecule is represented by a single row of colored boxes, and a measuring time point is represented by a single column.

Notice that the mRNA molecules belonging to cluster A started to decrease around 8 hours after beginning experiment.

The mRNA molecules belonging to cluster E began to increase at around 5 hours after the beginning of the experiment.

The phrase “mRNA levels” in above statements is almost always replaced by “gene expression” (which phenomenon may be referred to as the “gene bias”), which is strictly speaking logically fallacious and can lead to false positive and false negative conclusions regarding the identities of the genes responsible for mRNA level changes.

Cluster Analysis

Page 18: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

The Duality of Transcription and Transcript Degradation. The Principle of Dual Control of mRNA Levels by Transcription and Transcript Degradation.

The decreases in mRNA levels measured with DNA microarrays cannot be accounted for without invoking the mRNA degradation step. If there were no transcript degradation step, the mRNA levels inside the cell can only

increase or remain constant, but never decrease.

Page 19: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

Simultaneous Measurements of Genome-Wide Transcript Levels (TL) and Transcription Rates (TR) from the Yeast [3] (I)

• Most DNA array measurements reported in the literature since the beginning of the DNA array era [1] have involved only mRNA levels, except Fan et al [7] and Garcia-Martinez et al [3], who measured both transcript levels (TL) and transcription rates (TR), the latter using nuclear run-on methods.

• Because TL is determined by a dynamic balance between transcription and transcript degradation, TL is a function of both TR and transcript degradation rates, TD:

TL = f(TR, TD) (3) Eq. (3) has three variables. Hence it cannot be solved without the input of the

numerical values of any two of these three variables. • Most workers in the field in effect have been trying to solve Eq. (3) for TR by

inputting just one of the two remaining numerical values, namely, TL, ignoring TD. This is mathematically impossible, and logically indefensible. Ignoring TD in an attempt to determine TR with TL measurements alone is tantamount to violating the Principle of Insufficient Reason, according to which if there is no sufficient reason for something's nonbeing, then it will exist.

• The significance of the TL and TR data obtained by Fan et al [7] and Garcia-Martinez et al [3] is that their data allowed TD in Eq. (3) to be determined genome-wide for the first time.

Page 20: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

Simultaneous Measurements of Genome-Wide Transcript Levels (TL) and Transcription Rates (TR) from the Yeast [3] (II)

• Garcia-Martinez et al [3] measured TL and TR from budding yeast at six time points, 0, 5, 120, 360, 450 and 850 minutes after replacing glucose with galactose.

• Typical plots of TR vs. TL revealed nonlinear trajectories as evident in the next slide. Each trajectory is divided into 5 directed segments (hence to be called TR-TL vectors). These vectors seem to assume all possible directions.

• A total of 5,725 genes of budding yeast were analyzed and the directions (measured as shown in the next slide) of their component vectors (5 for each trajectory and 28,625 vectors in total) were calculated from their coordinates in the TR vs TL plane. The direction of these vectors were grouped into 9 categories based on their measured angles as follows: 1 = -3 to +3; 2 = 3 to 87; 3 = 87 to 93; 4 = 93 to 177; 5 = 177 to 183; 6 = 183 to 267; 7 = 267 to 273; 8 = 273 to 357; and 9 = 0 or undefineable. (I want to thank Dr. WonSsk Yoo for carrying out these calculations.)

• The measured percentages of the vectors belonging to each category is as follows with the expected percentages given in parenthesis: 1 = 2.94% (1.67); 2 = 26.07% (23.33); 3 = 1.91% (1.67); 4 = 29.73% (23.33); 5 = 1.80% (1.67); 6 = 24.91% (23.33); 7 = 2.38% (1.67a); 8 = 10.26% (23.33); 9 = (not determined). These values are graphically represented as a histogram in the next slide.

• Three conclusions can be drawn from these measurements:(i) The TL-TR vectors are distributed non-randomly over the 7 out of the 8 categories of directions.(ii) TL can increase even when TR decreases or undergo no change.

(iii) TL can decrease even when TR increases.• Therefore, TL and TR can vary independently of each other.• Before these measurements were made, most workers assumed that TL and TR were related linearly.

Page 21: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

YBL091C-A

-40

0

40

80

120

160

0 5 10 15 20

TL

TR

1

6

a

YNL162W

0

10

20

30

40

0 100 200 300TL

TR

1

6

b

YLR084C

0

0.5

1

1.5

2

0 20 40 60 80

TL

TR

16

c

YHR029C

-5

0

5

10

15

20

25

0 10 20 30 40TL

TR

1

6

d

TL

TR

1

2

3

4

5

6

7

8

9

Page 22: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

Kinetic Equations Needed for Analyzing mRNA Metabolism

• TL = Transcript Level, arbitrary unit

• TR = Transcription Rate, arbitrary unit/min

• n = mRNA molecules per cell = f(TL) = aTL + b (4) where a and b are constants determined empirically.

• dnS/dt = rate of mRNA synthesis, molecules/cell/min = g(TR) = a’TR + b’ a’ and b’ are constants determined empirically.

• dnD/dt = rate of mRNA degradation, molecules/cell/min

• dn/dt = dnS/dt – β dnD/dt where and β are constants

• Δn = dnS – β dnD (5)

• Δn = nS – βnD + ε (6) where ε is a constant and nS and nD are the number of mRNA molecules

synthesized and degraded, respectively, between two time points. If it is assumed that and β are unity and ε is zero, the above equation reduces to:

Δn = nS – nD + ε (7)

which is visualized as the mRNA hyperpalne in the following slide.

(I want to thank Drs. R. Miura, N. Fefferman & W. Chaovalitwongse for helpful suggestions in formulating these expressions)

Page 23: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

The RNA Hyperplane: A Geometric Representation of the Six Modules of mRNA Metabolism Regulating mRNA Levels in Budding Yeast. The Symbols are defined in the table in a previous slide. Please

note that Δn can be +, - or zero but nS and nD are always positive. (The delta sign in front of nS and nD seem unnecessary.)

Page 24: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

Kinetics of Genome-Wide mRNA Level Changes Induced by Glucose-Galactose Shift in Budding Yeast

• The genome-wide average TL values are plotted against time in Slide #25.• During the first 5 minutes after replacing glucose with galactose, the average TL

value drops by about 30%, decreasing maximally by 60% during the next two hours.

• The TL level begins to rise at about 360 minutes reaching a maximal value of 70% by 450 minutes. The level then decreases gain to 55% by 850 minutes.

• The initial decline probably results from the decrease in ATP level in the cell due to inhibition of glycolysis, the main metabolic pathway to generate ATP in the presence of glucose (see Slide #28).

• The abrupt rise in TL beginning at 360 minutes is most likely due to the induction of enzymes needed for metabolizing galactose, in part forming glucose (see the left-hand side of Slide #28). One evidence for this conjecture is the induction of the mRNA molecules (Gal 1, 2, 3, 7 & 10) coding for the proteins required for galactose metabolism beginning at 120 minutes (see Slide #27) and their suppression at around 360 minutes, probably due to the presence of glucose newly synthesized from galactose (see hi Glu in Slide #29).

Page 25: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

Genone-Wide Average mRNA

0

20

40

60

80

100

120

-200 0 200 400 600 800 1000

time, min

mR

NA

, arb

itrar

y un

it

Page 26: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

Genome-Wide Average Transcription Rate

-5

0

5

10

15

20

25

-200 0 200 400 600 800 1000

time, min

mR

NA

/Cel

l/min

Page 27: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

GAL1, 2, 3, 7 & 10

0

5

10

15

20

25

-200 0 200 400 600 800 1000

Time, min

Ave

rage

TR

(arb

itrar

y un

it)

Page 28: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu
Page 29: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu
Page 30: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

Kinetics of the Glycolytic and Respiratory (or Oxidative Phosphorylation) mRNA Metabolism in Glucose-Derepressed

Budding Yeast

• Between 5 and 360 minutes after replacing glucose with galactose, the average mRNA levels of glycolytic and respiratory genes change in the opposite directions (see Slide #31a).

• Strikingly, the average TR values for these two groups of genes change in a parallel manner as shown in Slide #31b.

• Therefore, the opposite changes in the TL values of glycolytic and respiratory genes must be attributed to the opposite changes in the rates of their transcript degradation (TD).

• The average degradation to transcription (D/T) ratios for glycolytic and respiratory genes at the 5 time points (corresponding to the mid-points of the 5 time segments, namely, 0-5, 5-120, 120-360, 360-450, & 450-850 minutes) were calculated using nD/nS = 1 – Δn/nS, derived from Equation (1) in Slide #5. These ratios are plotted in Slide #31c.

• Based on the D/T ratios, we can assign the following sets of labels to the glycolytic and respiratory TL trajectories:

Glycolysis = ECCAC

Respiration = EAAAC

Page 31: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

Average mRNA Levels= Glycolysis; = Oxphos

0

10

20

30

40

50

-200 0 200 400 600 800 1000Time, min

mR

NA

, mo

lecu

les/

cell

a Time - v_S plots = Glycolysis; = Oxphos

0

0.2

0.4

0.6

0.8

1

-200 0 200 400 600 800 1000

Time, min

v_S

, mo

lecu

les/

cell/

min

b

Degradation/Transacription (D/T) Ratios vs Time= Glycolysis; = Oxphos

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

0 200 400 600 800Time, min

D/T

Rat

ios

c

Page 32: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

How to Interpret DNA Microarray Data (I) What we measure with DNA microarrays are changes in florescence intensities.

The changes in fluorescence intensities can be divided into two categories – artifactual and non-artifactual. The present state of the development of the microarray technique is such that artifactual fluorescence intensity changes probably account for about 50%. This is why it is a common practice to use the notion of “fold changes” referring to fluorescence intensity changes that are greater than 100% (or one-fold change).

Only the non-artifactual fluorescence intensities can be related to mRNA levels.

mRNA levels measured with DNA microarrays can be divided into two categories – steady state and non-steady state. The difference between these two categories of mRNA levels can be represented mathematically as follows, where R is a mRNA level and t is time:

Steady state : dR/dt = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (8)Non-steady state: dR/dt 0

The steady-state mRNA levels divide into two categories – dynamic and equilibrium. The intracellular levels of mRNA molecules are always determined by two terms – the source term (i.e., the rate of mRNA synthesis, denoted by dRS/dt) and the sink term (i.e., the rate of mRNA hydrolysis into smaller fragments, denoted as dRD/dt ):

dR/dt = dRS/dt - dRD/dt . . . . . . . . . . . . . . . . . . . . . . . . . . . . (9)

There are two ways of making Eq. (9) = 0; when dRS/dt and dRD/dt are equal and non-zero, and when dRS/dt and dRD/dt are both equalt to zero:

Dynamic steady state: dRS/dt = dRD/dt . . . . . . . . . . . . . . . . . . . . . . . . . . . . (10)Equilibrium steady state: dRS/dt = dRD/dt = 0 . . . . . . . . . . . . . . . . . . . . . . . . . (11)

Page 33: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

How to Interpret DNA Microarray Data (II) The non-steady state mRNA levels divide into two categories:

On-the-way-up, or Ascending: dR/dt > 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . (12) On-the-way-down or Descending: dR/dt < 0 . . .. . . . . . . . . . . . . . . . . . . . . . . . . (13)

It is probably safe to assume that dRS/dt always independent of R (i.e., gene expression is turned on or off by factors other than intracellular levels of corresponding mRNA levels). But dRD/dt may often (if not always) depend on R, leading to the conclusion that there are at least two categories of dynamic steady states:

Zero-order dynamic steady state: dRD/dt = k (R)0 = k . . . . . . . . . . . . . . . . . . (14)First-order dynamic steady state: dRD/dt = kR . . . . . . . . . . . . . . .

. . (15)

These results can be summarized as follows:

Combining Equations (10) and (15) leads to the following useful relation:

dRS/dt = dRD/dt = kR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (16)

Equation (16) states that, under the conditions of a dynamic steady state, the mRNA levels, R, measured with DNA microarrays are directly proportional to the rates of expression of their corresponding genes, dRS/dt, since it is equal to dRD/dt, the rate of transcript degradation, under a dynamic steady state.

An important corollary of Equation (16) is that, under all other conditions, there is no direct proportionality relation between mRNA levels and the rates of expression of their corresponding genes.

Page 34: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

Dissipative and Equilibrium Networks in the Living Cell: I. Prigogine (1917-2003) distinguished between two fundamental classes of structures in nature – equilibirum and dissipative structures. The former can exist without any input of free energy whereas the latter exist if and only if a continuous dissipation of free energy supports them. Similarly, it is proposed here that ‘dissipative networks’ in cells (e.g., some protein-protein interaction networks) disappear upon cessation of free energy input, while ‘equilibrium networks’ (e.g., Krebs cycle, glycolytic pathway, etc.) can persist without any dissipation of free energy. The protein network is unique in that it is the only network that can tap free energy from chemical reactions. Dissipative

networks may also be referred to as the “Self-Organizing-Whenever-and-Wherever-Needed (SOWAWN) Machine”.

Page 35: HOW TO AVOID MISINTERPRETING MICROARRAY DATA Sungchul Ji, Ph.D. Department of Pharmacology and Toxicology Rutgers University Piscataway, N.J. 08855 sji@rci.rutgers.edu

References:[1] Watson, S. J., and Akil, U. (1999). Gene Chips and Arrays Revealed: A Primer on

Their Power and Their Uses. Biol. Psychiatry 45:533-543.[2] Ji, S. (1985). The Bhopalator – A Molecular Model of the Living Cell Based on

the Concepts of Conformons and dissipative Structures. J. theoret. Biol. 116:399-426.[3] Garcia-Martinez, J., Aranda, A., and Perez-Ortin, J. E. (2004). Genomic Run-On

Evaluates Transcription Rates for all Yeast Genes and Identifies Gene Regulatory Mechanisms. Mol. Cell 15:303-313.

[3a] Ji, S., Chaovalitwongse, W., Fefferman, N., and Perez-Ortin, J. E. (2006). The Six Modules of Transcript Control Revealed by Genome-Wide Expression Data from Glucose-Derepressed Saccharomyces cerevisiae. (in preparation).

[4] Ji, S. (2004). Molecular Information Theory: Solving the Mysteries of DNA. In: Modeling in Molecular Biology (Ciobanu, G., and Rozenberg, G., eds.), Springer, Berlin. Pp. 141-150.

[5] Cristianini, N., Shawe-Taylor, J. (2000). An Introduction to support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge.

[6] Ji, S. (2005). Semiotics of Life: A Unified Theory of Molecular Machines, Cells, the Mind, Peircean Signs and the Universe Based on the Principle of Information and Energy Complementarity. Reports, Research Group on Mathematical Linguistics, Rovira i Virgili University, Tarragona, Spain. See the section entitled An Analogy between Atomic Physics and Cell Biology on pp. 58-61, available at http://www.grlmc.com, under Publications.

[7] Fan, J., Yang, X., Wang, W., Wood, W. H., Becjer, K. G., and Gorospec, M. (2002). Global analysis of stress-regulated mRNA turnover by using cDNA arrays. Proc. Nat. Acad. Sci. US 99(16):10611-10616.