statistical applications in genetics and molecular biologykornberg/tom_kornberg_lab/pdfs/100....

Statistical Applications in Geneticsand Molecular Biology

Volume 9, Issue 1 2010 Article 29

Generalizing Moving Averages for TilingArrays Using Combined P-Value Statistics

Katerina J. Kechris∗ Brian Biehs†

Thomas B. Kornberg‡

∗University of Colorado Denver, [email protected]†University of California, San Francisco, [email protected]‡University of California, San Francisco, [email protected]

Copyright c©2010 The Berkeley Electronic Press. All rights reserved.

Generalizing Moving Averages for TilingArrays Using Combined P-Value Statistics∗

Katerina J. Kechris, Brian Biehs, and Thomas B. Kornberg

Abstract

High density tiling arrays are an effective strategy for genome-wide identification of transcrip-tion factor binding regions. Sliding window methods that calculate moving averages of log ratiosor t-statistics have been useful for the analysis of tiling array data. Here, we present a method thatgeneralizes the moving average approach to evaluate sliding windows of p-values by using com-bined p-value statistics. In particular, the combined p-value framework can be useful in situationswhen taking averages of the corresponding test-statistic for the hypothesis may not be appropriateor when it is difficult to assess the significance of these averages. We exhibit the strengths ofthe combined p-values methods on Drosophila tiling array data and assess their ability to predictgenomic regions enriched for transcription factor binding. The predictions are evaluated based ontheir proximity to target genes and their enrichment of known transcription factor binding sites.We also present an application for the generalization of the moving average based on integratingtwo different tiling array experiments.

KEYWORDS: transcription factor, binding sequence, tiling array, combined p-value

∗This work was supported in part by NIH K01 AA016922 (KK) and R01 GM077407 (TK). Thesupport of facilities by NIH R24 AA013162 to Boris Tabakoff is gratefully acknowledged. Wethank Tony Southall for providing the Prospero data set, Bernhard Spangl and Peter Ruckdeschelfor their assistance with the “robust-ts” package and Gary Grunwald, Elizabeth Siewert and Nic-hole Carlson for helpful discussions.

1 IntroductionGenomic tiling arrays are an adaptation of microarray technology where probes onthe array are not only representative of genes but are designed to span the entiregenome at regular intervals (Royce et al., 2006). Applications include the mappingof transcribed sequences and DNA methylation sites (Yazaki et al., 2007). Tilingarrays can also be used to map regions containing sequences that are bound by tran-scription factors, proteins important for regulating transcription, the first stage ofgene expression. Transcription factors interact with their binding sequences in thegenome and can either activate or repress the transcription of target genes. The iden-tification of these binding sequences is important for understanding gene expressionregulation and tiling arrays offer a high-throughput approach for determining theirlocations in the genome.

Tiling arrays are often used to probe sequences obtained from chromatinimmunoprecipitation (ChIP). In ChIP, DNA fragments binding to the transcriptionfactor of interest are enriched using antibodies specific to that transcription fac-tor. The use of ChIP followed by hybridization to an array is called ChIP-chip.An alternative to the initial ChIP step is the DNA adenine methyltransferase ID(DamID) method (van Steensel et al., 2001). DamID is based on a fusion pro-tein of the transcription factor of interest to the Dam enzyme, which methylatesadenines in the sequence GATC along the genome. When the fused transcriptionfactor-Dam protein binds to the respective binding sequence for the transcriptionfactor, the fused Dam enzyme will deposit methylation tags close to the bindingsequence. The methylated sequences can then be isolated using methyl sensitiverestriction enzymes. DamID is useful when an antibody for the protein of interestis not available to perform ChIP.

For either of the initial procedures, DamID or ChIP, the result of the tilingarray experiment is a series of intensity measurements along the genome, typicallyevenly spaced (see example in Figure 1). Computational methods are availablefor analyzing this type of tiling array data to predict regions of transcription factorbinding (see review by Liu, 2007). Many methods are based on sliding windowaverages of the intensity values across the genome (Buck et al., 2005, Keles et al.,2006).

The strengths of the sliding window approaches are that they are simpleand fast. However, when initially introduced they did not account for dependenciesamong neighboring probes, which occur because genomic fragments bound by thetranscription factor, obtained by DamID or ChIP, may span multiple probes. Theaverage fragment length is 500-1000 base pairs (bp) for ChIP and 2000-5000 bp forDamID (Buck and Lieb, 2004, van Steensel et al., 2001), while probes may be 25-75 bp long separated by 100-300 bp depending on the platform. Using ChIP-chip

1

Kechris et al.: Combined P-Value Statistics for Tiling Arrays

Published by The Berkeley Electronic Press, 2010

data, Bourgon (2006) illustrated that probes more than 1000 bp apart are positivelycorrelated if the average fragment length is 500 bp. An alternative method for tilingarray analysis that takes first order dependencies into account is the use of hiddenMarkov models (Ji and Wong, 2005, Munch et al., 2006). However, computationmay become costly if longer range dependencies are taken into account. Morerecently, incorporating correlations among probes into sliding window averages hasbeen shown to improve performance in the work of Kuan et al. (2008) (see alsoBourgon, 2006).

In a typical analysis of tiling arrays, each probe is assessed for positive in-tensity (or log ratio intensity) to test whether intensity values significantly differfrom zero at a location. However, with more complicated experimental designs,other hypotheses may be evaluated using different types of statistics, e.g., an F-statistic when there are multiple factors, or non-parametric test statistics. In thesecases, a moving average of the statistic may not be appropriate. Furthermore, thelength of the sliding windows that are typically used may not have a large enoughsample size for the average to be approximately normally distributed. Therefore,in this work, we develop a generalization of the sliding window average methodthat allows for the “averaging” of results from a hypothesis test at each probe. Ourmethod is based on “averaging” of p-values instead of test statistics and thus ex-pands the capabilities of current moving average approaches.

Once a p-value resulting from a hypothesis test of interest is determined ateach probe, we present a sliding window “average” of the p-values based on theapplication of combined p-value statistics (Loughin, 2004) that also incorporatescorrelation adjustments (Bourgon, 2006, Kuan et al., 2008). We refer to our methodas a “generalized” moving average and make evaluations on DamID tiling array datafor transcription factors where there are target genes and binding site informationto confirm predictions of transcription factor binding regions.

The paper is organized as follows. In Section 2, we describe the three bench-mark data sets. In Section 3 we describe the Methods, including the method forgenerating p-values at each probe, the two combined p-value statistics that are ap-plied for the generalized moving average across probes and the three metrics forevaluating predictions. In Section 4, we show two sets of results. The first setevaluates the performance of the generalized moving average compared with othermethods, including what we define as the “standard” moving average approach fortaking averages of signal intensities, log ratios or t-statistics. To compare the differ-ent methods in this application, the results are based on t-statistics or t-test p-valuesfor the standard and generalized moving average methods respectively. The sec-ond set of results illustrates an application of the generalized moving average to ananalysis question that may not be straightforward for the standard moving averagemethods. In Section 5, we end with a conclusion and discussion of the methods.

2

Statistical Applications in Genetics and Molecular Biology, Vol. 9 [2010], Iss. 1, Art. 29

http://www.bepress.com/sagmb/vol9/iss1/art29DOI: 10.2202/1544-6115.1434

Figure 1: Example of Data. The chromosome arm and positions are indicated on top . first row shows log ratio intensityvalues (y-axis) in a region of the chromosome for one replicate. The second row indicates the location of the hkb gene(y-axis is for illustrative purposes). Plots were produced using the SignalMap software (NimbleGen).

. The

3



2 Data

2.1 Ci Data SetsTwo data sets are based on a study of DNA binding activity of Cubitus interruptus(Ci), the mediating transcription factor of the Hedgehog (Hh) signaling pathwayin Drosophila melanogaster (fruit fly). Two distinct forms of the Ci protein haverepressor and activator activities with respect to transcription. In cells which receivethe Hh ligand, the full length form of Ci is converted into a transcriptional activator,directly mediating the up-regulation of known Hh targets. This process is requiredfor a variety of developmental events, both in the embryo and larva. In cells that donot receive the Hh ligand, full-length Ci is processed into a repressor form. Sinceboth forms of Ci retain the same DNA binding motif and recognize the same DNAsequence, it is thought that both forms bind the same DNA targets. However, theextent to which both Ci forms bind the same sequences is unknown.

To identify the genomic regions that interact with either the activator orrepressor form of Ci, transgenic fly strains were constructed that express a consti-tutively active form of Ci fused to the DAM enzyme. The DamID technique resultsin the local methylation of DNA regions that are specific to the activator form of Ci.The methylated genomic fragments were isolated from fly embryos, PCR ampli-fied, and labeled with a fluorescent dye. As a control comparison, methylated DNAfrom embryos expressing DAM alone were treated in the same manner. For CiActivator (CiA) and Ci Repressor (CiR) separately, three independent replicates oflabeled DamCi/Dam alone samples, which includes one dye-swap Dam/DamCI toaccount for dye bias, were hybridized to Roche NimbleGen 385K Whole-GenomeTiling Arrays for Drosophila (UCSC DM2 version, FlyBase v4.0). The Nimble-Gen 385,000 feature arrays consist of 60mer oligos spaced approximately 300 bpapart spanning the whole genome. Hybridized arrays were scanned and fluorescentintensity ratios were calculated for the basis of this study (Biehs et al. submittedmanuscript).

Note that the Ci transcription factor in general, regardless of form, will bereferred to as Ci and that the collection of data for the two forms, CiA and CiR, willbe referred to as the Ci data sets. The individual data from one or the other Ci formwill be referred by its abbreviation respectively, CiA or CiR.

4



3 Methods

3.1 Pre-processingAll analyses were performed in R 2.7.1 (R Development Core Team, 2005) andBioconductor 2.2 (Gentleman et al., 2004). R code developed for the methods de-scribed in 3.3 is available upon request. The data were normalized, applying the“loess” option for within-array normalization and “Aquantile” option for between-array normalization based on the methods used in Wormald et al. (2006). Nor-malization results were also inspected visually. At each probe, accounting for thedye swap replicate(s), the log ratios between the target and control sample werecalculated using the limma package (Smyth, 2005).

3.2 P-values across ReplicatesThis work presents an application of existing combined p-value methods for theanalysis of tiling array data. The purpose of introducing these methods to tilingarray analysis, where moving averages are commonly calculated to identify enrich-ment regions in the genome, is that they can be used to determine a “moving” av-erage of p-values from any hypothesis test performed at each probe. The combinedp-value methods applied and described below are general for any test performed ata probe. In Sections 4.1 and 4.2, the results are based on t-tests. The use of t-testsfacilitates comparisons with other methods that can be applied using the t-statistic,the test statistic of the t-test. Then, in Section 4.3, we show results from a moregeneral application.

2.2 Prospero Data SetA third data set is based on a study of DNA binding activity of Prospero (Pros),a transcription factor that helps mediate the choice between stem cell self-renewalor differentiation in Drosophila melanogaster (Choksi et al., 2006). The DamIDtechnique was also used in this study to identify genomic regions bound by Prosand the tiling array has a similar design to the NimbleGen array used in the Cistudy, with details provided in Choksi et al. (2006). The Pros data set was providedby the authors and consists of four replicates, two of which are dye-swaps.

5



Under the null hypothesis, assuming a linear model fit, independence be-tween probes, specified prior distributions and approximate normality of the esti-mators, the moderated t-statistic follows a t-distribution with augmented degrees offreedom (Smyth, 2005). These assumptions are often violated in microarray andtiling array experiments, especially with small sample size. We found evidenceof this as well, since the calculated moderated t-statistic values are more extremethan expected (see example in Figure S1). Therefore, the moderated t-statistics twere further normalized using t−t∗

sd(tneg) , where t∗ is the mode of the distribution oft and sd(tneg) is the standard deviation of a symmetrical distribution based on thenegative values, tneg, of t. The mode and sd were both calculated after removingthe most extreme 5% probes. The mode is used because the distribution of t tendsto be asymmetrical around zero and the standard deviation is calculated based onthe negative values because they are considered to be background (Buck and Lieb,2004). This procedure follows techniques used in other tiling array analysis meth-ods (Buck and Lieb, 2004, Kuan et al., 2008) and the effect is illustrated in FigureS1, where now the t-statistics more closely follow the expected distribution, butthere are still more extreme positive values indicative of binding regions. One-sidedtest p-values, pi, were then determined by the moderated t-statistic and augmenteddegrees of freedom returned by the eBayes function.

3.2.2 New Application

In Section 4.3, we show an example where a t-statistic or corresponding p-valueof a t-test is not applicable for the analysis. To predict binding enrichment regionsfor CiA-CiR, we can perform a Union-Intersection Test (Casella and Berger, 2002),H0 : ∩k{µk

i = 0} and HA : ∪k{µki > 0} at each probe i (where k = 1,2). A p-value

3.3 P-values Across Probes - Combined P-value ApproachBecause transcription factor bound fragments are typically longer than array probes,it is of interest to find a genomic region spanning multiple probes with high signalintensity called an enrichment region (ER). Several authors have used a slidingwindow procedure to find ERs, where an average intensity value is calculated for

3.2.1 T-test Application

For the results in Sections 4.1 and 4.2, we are interested in testing the null hypoth-esis that the mean log ratio intensity is equal to 0, H0i : µi = 0 for each probe i. Be-cause only positive intensity signals are indicative of enriched binding sequences,we only consider the one-sided alternative HAi : µi > 0 and use a one-sided one-sample t-test. However, tiling array experiments may have small sample sizes (e.g.,n = 3 in the Ci data set) and estimates of the standard error in the t-statistic denom-inator can be unstable. Several modifications of the t-statistic have been suggested(Smyth, 2005, Tusher et al., 2001) that pool information from all genes to obtainmore stables variance estimates. We applied an empirical Bayes procedure (Smyth,2005), which is available in the limma package in R. First, a linear model for thereplicates was fit for each probe i with the lmFit function and then moderatedt-statistics ti were returned for each probe using the eBayes function.

6



a series of consecutive probes. We have adapted the sliding window procedureso that consecutive p-values across the region are combined rather than workingwith a test statistic such as the average intensity across the region. In particular,for each probe, a window size of w adjacent probes is considered on each side ofthe probe. A combined p-value is then based on the l = 2 ∗w + 1 p-values in thew-neighborhood for that probe. We discuss the selection of w below.

There are l tests in the w-neighborhood and the combined null hypothesisH0 is that each of the i = 1, . . . , l independent null hypotheses H0i is true (Loughin,2004). The combined alternative HA is that at least one of the null hypotheses H0i isfalse. A combined p-value test statistic CP

i is then calculated for each probe i basedon the l p-values. This test statistic can be used to test H0 vs HA (i.e., the mean in-tensity is equal to zero for all probes in the window versus at least one probe in thewindow has mean intensity greater than zero). Significant probes according to thecombined p-values are predicted to be in ERs. Following a review of other appli-cations of combined p-value methods, we detail several different combined p-valuetest-statistics CP and variations of these test statistics that account for dependenthypotheses.

3.3.1 Use of Combined P-values

Combined p-value methods are commonly used for meta-analysis and have a longhistory (see reviews in Folks, 1984, Hedges and Olkin, 1985 and Loughin, 2004),with Fisher’s Combined Probability Test dating back to 1932. In genomics, com-bined p-values are often used to integrate data in a meta-analysis of different studies(e.g., in Rhodes et al., 2002). But there have also been applications of the com-bined p-value methods within a study. For example, Zaykin et al. (2002) combine

for this test is determined by the Stouffer-Liptak Test (without adjustments) for eachprobe (see Section 3.3 below). This test was selected over the Fisher’s CombinedProbability Test, which puts relatively more emphasis on small p-values (Loughin,2004) and may be problematic here since the enrichment patterns between the datasets vary so widely. Now at each probe, a p-value has been determined for the testof interest, and we can apply the generalized moving average methods describedbelow in Section 3.3.

3.3 P-values cross Probes - Combined P-value ApproachBecause transcription factor bound fragments are typically longer than array probes,it is of interest to find a genomic region spanning multiple probes with high signalintensity called an enrichment region (ER). Several authors have used a slidingwindow procedure to find ERs, where an average intensity value is calculated for

´

a

7



scribed in Loughin (2004). In this framework, a parametric cumulative distributionfunction F is chosen and the p-values are transformed into quantiles according toqi = F−1(pi), for i = 1, . . . , l. The combined test statistic CP = ∑

lj=1 qi is a sum of

independent and identically distributed random variables qi each of which followsthe corresponding probability density function for F . This can be used to obtainthe p-value for CP using the additivity property of independent and identically dis-tributed random variables. Below, two common combined p-value test-statistics aredescribed that fall under this framework.

1. Fisher’s Combined Probability Test

In Fisher’s Combined Probability Test (Fisher, 1932), F is selected as thecumulative χ2 distribution with 2 degrees of freedom. Therefore, F(x) =1− exp(− x

2) and qi = F−1(pi) =−2ln(1− pi) and each qi follows the prob-ability density function for a χ2 distribution with 2 degrees of freedom. Thecombined p-value test statistic CP =−2∑

li=1 ln(1− pi) follows a χ2 distribu-

tion with 2l degrees of freedom due to the additivity property of independentχ2. The combined p-value is p∗ = F∗(CP), where F∗ is the cumulative dis-tribution function for the χ2 distribution with 2l degrees of freedom. Theseresults also follow directly from the fact that -2 times the logarithm of a uni-formly distributed random variable is χ2 with 2 degrees of freedom. Alterna-tively, note that if CP =−2∑

li=1 ln(pi) , then p∗ = 1−F∗(CP).

2. Stouffer-Liptak Test

In the Stouffer-Liptak test (Liptak, 1958, Stouffer et al., 1949), F is se-lected as the cumulative standard normal distribution N(0,1). Therefore,F(x) = Φ(x), where Φ is the cumulative distribution function of the standard

p-values for neighboring genomic markers in a genetic association study using thedependence correction described in 3.3.3 and a truncated p-value product method.In the area of microarray analysis, Affymetrix probe level p-values were combinedto obtain probe set level p-values (Hess and Iyer, 2007). For tiling arrays, Ghoshet al. (2006) calculate Wilcoxon signed-rank test p-values across probes and thencombine the p-values across multiple replicates. In this work, we perform tests us-ing the replicates and then combine the p-values across probes. Below, we describethe combined p-value methods that are applied in this work.

3.3.2 Independent Tests

Under the null hypothesis, the p-value pi for a test-statistic with a continuous nulldistribution is uniformly distributed in the interval [0,1]. A general framework forcombining p-values based on this feature are quantile combination methods de-

8



among probes as introduced first in Kuan et al. (2008). Since the correlations arepositive, if adjustments are not made, the test statistics tend to be inflated and there-fore statistical significance is exaggerated.

1. Dependence Adjustment to Fisher’s Combined Probability Test

When the pi’s are dependent, the distribution of Fisher’s Combined Proba-bility test statistic CP can be approximated by a scaled χ2 distribution, cχ2

f ,with scaling factor c and degrees of freedom f , such that CP/c∼ χ2

f (Brown,1975). Equating E[CP] with E[cχ2

f ] = c f and Var(CP) with Var(cχ2f ) = 2c2 f ,

we can solve for c and f and obtain

c =Var(CP)2E[CP]

f =2E[CP]2

Var(CP). (1)

These terms depend on the mean and variance for CP which are

E[CP] = 2l (2)

andVar(CP) = 4∗ l +2 ∑

j<kCov(−2ln p j,−2ln pk), (3)

since CP ∼ χ22l .

Exact computation of the covariance terms in Var(CP) requires numericalintegration but Kost and McDermott (2002) provide approximations by fit-ting a polynomial regression to the true values using a grid approach rang-ing values for the degrees of freedom (9 ≤ v ≤ 125) and the correlations

normal, and qi = Φ−1(pi). Each qi follows the probability density functionof a standard normal and using the additivity property of independent random

variables, the combined p-value test statistic CP = ∑li=1 qi√

lfollows a standard

normal distribution. The combined p-value is p∗ = Φ(CP). Note that if piis based on a t-test, then for large sample sizes when the distribution of thet-statistic is similar to the normal distribution, the Stouffer-Liptak test will beequivalent to the “standard” moving average approach where t-statistics areaveraged.

3.3.3 Dependent Tests

Due to the dependencies in intensity values for neighboring probes, we consideredvariations of the two combined p-value test statistics that account for correlations

9



t-statistics with v degrees of freedom. In Sections 4.1 and 4.2 we present re-sults based on applying this correction to p-values from moderated t-statisticsfor the three data sets where the degrees of freedom v vary from 7.53 for CiR,8.69 for CiA to 9.39 for Pros, which are slightly below or within the gridvalues for the approximations.

2. Dependence Adjustment to Stouffer-Liptak Test´

This dependence adjustment is based on transforming the correlated quantilesQ = {qi} into independent quantiles Q∗ as in Zaykin et al. (2002), and thenapplying the Stouffer-Liptak test on Q∗. To perform this transformation, thepi’s are assumed to be correlated according to a non-degenerate correlationmatrix Σ. Assuming that Σ is positive definite, the Cholesky factor C existssuch that Σ = CC′. By applying the transformation Q∗ = C−1Q, the q∗i ’s arenow independent and follow a standard normal distribution (Zaykin et al.,2002). The Stouffer-Liptak test can then be applied to the q∗i ’s. This methodis used for Section 4.3 since this adjustment, unlike the adjustment for theFisher’s Combined Probability test, does not depend on the assumption thatthe p-values are determined from t-statistics.

3.3.4 Auto Correlation

Both dependence adjustments rely on correlations between probes. The correlationof the moderated t-statistic between neighboring probes j and k is determined usingthe entire data. However, since regions of interest are contained within the entirechromosome, to estimate the correlation between probes, we use an estimation pro-cedure robust to outliers in R available in the robust-ts package (Spangl et al.,2009), which is based on the work in Spangl (2008). The auto correlation is cal-

−0.98 ≤ ρ ≤ 0.98). Based on their approximations, when the p-values piare determined from t-statistics with v degrees of freedom and where the cor-relations, ρ jk between probes j and k are known, then

Cov(−2ln p j,−2ln pk) = 3.263∗ρ jk +0.710∗ρ2jk +0.027∗ρ

3jk

+0.727∗ 1v

+0.327∗ρ jk

v−0.768∗

ρ2jk

v−0.331∗

ρ3jk

v. (4)

This approximation can be used to calculate Var(CP), which is then used tocalculate c and f . The combined p-value for Cp is then determined by usingthe approximating distribution CP/c ∼ χ2

f . As stated above, the approxi-mations are based on the assumption that the p-values are determined from

10



These may be a feature of DamID data sets where the fragment lengths tend to belonger. To use the auto correlation to estimate the correlation between probes, weare assuming that the probes are regularly spaced. Although there are exceptions,most of the probes are evenly spaced approximately 300 bp from each other: 81%of the probes are spaced within 350 bp of the neighboring probes and 99% arespaced within 600 bp. In the few cases where there are large gaps between probes(> 700 bp) a combined p-value was not calculated for the probes at the border ofthe gaps or probes within w probes of the gaps.

For chromosome arm 2L, which has overall length ∼ 22 megabases, thereis a ∼ 1.5 megabase stretch of the chromosome arm with probes every ∼ 100 bpin addition to those that are every ∼ 300 bp. In this overlapping region, the lag-3correlation of the 100 bp density probes is similar to the lag-1 correlation of the 300bp density probes (data not shown). Since the locations of the 100 bp spaced probesfall within 1-2 bp of the 300 bp spaced probes, overlapping probes are averagedto obtain one set of probe p-values that are regularly spaced 100 bp apart in thisregion. The auto correlation is calculated separately for this region to determinethe combined p-values. The rest of chromosome arm 2L, which has ∼ 300 bpspacing between probes, is analyzed in the same manner as the other chromosomearms. Finally, the combined p-value methods assume stationarity, i.e., the same autocorrelation is used across the entire chromosome. We examined this assumption bydividing the chromosomes into blocks and calculating the auto correlation for eachblock. Except for the blocks at chromosome ends, the auto correlation appeared tobe consistent (data not shown).

culated on each chromosome arm using the “ACF” function in robust-ts withthe Spearman’s rank option, for each lag x = 1,2, . . .. Probes were removed fromthe calculation that neighbored gaps (defined as more than 700 base pairs betweenprobes). For probes k > j, the correlation is estimated as ρ jk = ACF(k− j). Weuse window sizes of 3-8 which would correspond to a maximum lag of 16. Sincespacings between probes are roughly 300bp, these window sizes correspond to totallengths of∼1800-4800 (2∗w∗300), which is the range of average fragment lengthsin DamID experiments.

Figure 2 shows that the auto correlation drops under .2 after a lag of 5 probesfor chromosome arm 2R for the Ci data sets and under .4 after a lag 13 for the Prosdata set. Results are similar for other chromosomes arms (data not shown). Evenwith the robust alternative, all data sets had unusually high correlation (> 0) at farlags, especially for the Pros data set, suggestive of a long-term memory process.

11



●

●

●

●

●

●

●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

Ci Activator

Lag

ACF

●

●

●

●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

Ci Repressor

Lag

ACF

●

●

●

●

●

●

●

●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

Prospero

Lag

ACF

Figure 2: Auto Correlation for Chromosome Arm 2R in the Three Data Sets. The y-axis is the robust estimate of the auto correlation described in Methods. The x-axisis the lag measured in number of probes

3.4 Determining Enrichment RegionsEach probe is associated with a combined p-value using one of the methods de-scribed above. P-values were corrected for false discovery rate (FDR) control usingthe Benjamini & Hochberg procedure (Benjamini and Hochberg, 1995). Then, ERswere defined by scanning probes in order of the genome. If a probe had adjustedp-values below some set cutoff a new ER was formed. The next probe passing thecutoff is then evaluated to see if it is within w probes and 700 bp of the last probe inan ER. If so, it is added to that ER, otherwise a new ER is designated. This proce-dure is continued in a step-wise fashion until the last probe on the chromosome armis evaluated. Several different FDR cutoffs were selected to construct ERs in Sec-tion 4.1. Alternatively, results are also presented where a set number of top probesranked by a specific procedure are used to construct ERs in Section 4.2. Note thatthe FDR cutoffs are used to indicate false discovery control at the probe-level andnot at the level of ERs.

12



method that also incorporates correlations among probes in the moving average. InCMARRT, a Gaussian approximation is used to determine a p-value for the movingaverage statistic where the variance includes covariance terms determined from theauto correlation function. The R package for CMARRT (Kuan et al., 2008) wasrun with the window.opt option set to fixed.probe and the frag.lengthoption set to 2000, 2600, 3200, 3800, 4400 and 5000, which corresponds approxi-mately to the window sizes described above of 3-8, respectively, assuming probesare spaced ∼ 300 bp apart. As with the combined p-value methods, the input forCMARRT was the moderated t-statistic described in Section 3.2, which was alsosuggested by the CMARRT manual for two-channel arrays such as NimbleGen.One line of the original code of the ma.stat function was edited to accommodatethe larger ∼300 bp spacing between probes in the NimbleGen arrays. CMARRTwas also run with the independent option to get results based on a simple movingaverage without correlation adjustment. The ER definition described above wasapplied to the correlation and independent based p-values for probes returned byCMARRT in two ways. First, only FDR adjusted probes that survived a specifiedFDR control cutoff were used for the results in Section 4.1. Second, only FDR ad-justed probes that were in a ranked list of a set number of probes were used for theresults in Section 4.2.

Another existing software for tiling array analysis is TileMap, where a firstorder hidden Markov model can be used to combine neighboring probes and iden-tify regions that have hybridization patterns of interest. As with the other meth-ods, input for TileMap was the moderated t-statistic for each data set. We ran theHMM option with Expected hybridization length set to 10 (for a typicalhybridization region of 3000) and Maximal gap allowed set to 700 to corre-spond to settings used in the other methods or to settings that are appropriate for theDamID data. For each probe, the HMM option returns posterior probabilities thata probe is in a region of interest. We used the direct posterior probability approachin Newton et al. (2004) to control the FDR. The ER definition described above wasapplied to probes in two ways as with CMARRT. Note that the HMM option doesnot have a window option, so when TileMap results are plotted with the other meth-ods for different window sizes, the same results are repeated. TileMap only takesfirst order dependencies into account and does not incorporate longer range corre-

3.5 Comparison with Other MethodsWe compare the generalized moving approach, based on combined p-values, withother existing methods for tiling array analysis. First, comparisons are made witha “standard” moving average approach based on taking averages of signal intensi-ties, log ratios or t-statistics. For this purpose, we use the CMARRT (Correlation,Moving Average, Robust and Rapid method on Tiling array) software based on a

13



3.6 EvaluationAll methods were applied to the CiA, CiR and Pros tiling array data describedabove. For both transcription factors a consensus motif has been identified andtarget genes have been predicted using gene expression data. Taking advantage ofthese two sources of external information, we have used three methods to evaluatethe methods of ER prediction.

3.6.1 Target Gene Enrichment

Using gene locations from the v4.0 FlyBase collection (Wilson et al., 2008), a listof genes associated with an ER were determined by extracting the closest upstreamand downstream gene to the ER and any genes that overlap an ER. This procedureresulted in at least two genes predicted for each ER. Genes appearing multiple timesfor different ERs were only counted once. The collection of all genes predicted to benear or at ER, called “ER genes”, were then compared to a set of genes determinedby a series of gene expression experiments to be likely targets of Ci or Pros called“target genes”.

Ci is the mediating transcription factor of the Hedgehog signaling pathway.Target genes were identified by having to show a median gene expression fold in-duction of ∼ 1.4 in genetic backgrounds where Hedgehog signaling was retainedcompared to a situation where Hedgehog signaling was absent (Biehs et al., sub-mitted manuscript). We only examined the 199 of 230 target genes that also hadannotation in the v4.0 FlyBase collection. The Pros target genes were obtained fromthe list of the authors’ differentially expressed genes (Choksi et al., 2006) that hadat least log fold 2 change (n=214). Of these, 201 could be mapped to annotationsin the v4.0 FlyBase collection.

The overlap of ER genes with the predicted Ci and Pros target genes wereevaluated using the “target gene enrichment ratio”, which is equal to the percentageof target genes that intersect the ER genes divided by percentage of all genes that areER genes. For example using Ci, if one set of ERs has 2695 neighboring ER genes(∼ 20% of the 13472 total genes), which intersect with 60 target genes, (∼ 30% ofthe 199 target genes), then target gene enrichment is ∼ .3

.2 = 1.5.Gene Ontology (GO) analysis for Table S1 was performed using the GOstat

software (Beissbarth and Speed, 2004) with default options and “False DiscoveryRate correction (Benjamini)” for multiple testing correction.

lations as do the various dependence adjusted procedures. In Figures 3 and 4, wealso included the TileMap ER predictions directly based on using the HMM option.Since there is no FDR cutoff in this case, the x-axis locations of these results arefor illustrative purposes only.

14



by transcription-factor binding affinity experiments (Hallikas et al., 2006). A con-sensus motif for Pros “taagacg” is reported in Choksi et al. (2006). We searchedfor occurrences of the motifs along with its reverse complement in the set of ERsequences. This observed number was compared to the expected number, which isthe frequency of occurrences of the motif in the entire genome multiplied by thetotal length of the ER sequences,

E[# tgggtggtc in ERs] = total length ERs× # tgggtggtc in genomelength of genome

. (5)

The “motif enrichment ratio” is the ratio of observed versus expected counts.Motif enrichment is sensitive to the lengths and number of ERs. Especially

for the longer Ci consensus, there may be few or no occurrences if the total lengthof predicted ERs are short. In Figure 3, where the total ER lengths for each pre-dicted method may be different and may bias the results, the motif enrichment valueis divided by the total length of the ER regions for a set of predictions. This num-ber is then multiplied by 105 for plotting purposes and is referred to as the “motifenrichment score.”

3.6.3 Distance to Gene

For each ER, the shortest distance to a gene was determined by comparing the ERlocations with all transcription start positions from the v4.0 FlyBase collection ofgenes. Then, different distance cutoffs were used to find the “gene distance percent-age”, which is the percentage of ERs whose closest gene, upstream or downstream,is less than or equal to a defined distance (e.g., 1KB and 2KB).

This metric is also sensitive to the lengths and number of ERs. For a largeset of randomly selected ERs, this metric increases because the fly genome is rela-tively gene dense (median distance between transcription start sites is 3500-4500bpdepending on the chromosome arm) and by chance ERs will be selected that areclose to genes (see Results). Therefore, as with the motif enrichment score, the per-centage is divided by the total length of the ER regions for a set of predictions. Thisnumber, the “gene distance score” is then multiplied by 105 for plotting purposes inFigure 4. To evaluate the locations of random ERs to genes, ERs were constructedfrom a set number of random probes selected over all chromosomes. This was re-peated 10 times and for each set of 10 ERs, based on a set number of probes, theaverage percentage of ERs within a distance to a transcription start were tallied anddisplayed in Figures 6, S8 and S9.

3.6.2 Motif Enrichment

A consensus motif for Ci “tgggtggtc” has been identified both by its occurrencein known Ci enhancers (Alexandre et al., 1996, Kinsler and Vogelstein, 1990) and

15



4 ResultsWe report the evaluation of different ER prediction methods based on independentsources of target gene and motif data (see Methods). Although in theory, the gen-eralized moving average approach based on combined p-values is designed for anyhypothesis test, to facilitate comparison with existing methods, in Sections 4.1 and4.2, we apply the method using p-values based on t-tests. Then, we run other meth-ods such as CMARRT and TileMap using t-statistics, the test statistic for the t-test.The purpose of this comparison is to observe whether any performance is lost byusing a generalized moving average based on p-values instead of the test statisticsthemselves. Then, in Section 4.3, we present an example of an application that canbe more difficult to assess using other methods, but fits into the generalized mov-ing average framework. We use three metrics to evaluate predicted sets of ERs: bytarget gene enrichment, motif enrichment and proportion within a set distance to agene. For all three metrics, larger values are more favorable.

4.1 Evaluation of Moving Average Methods and Effect of De-pendence Adjustment

First, the generalized moving average method, based on combining p-values fromt-tests, is compared to an existing or “standard” moving average approach, based onaveraging t-statistics. In particular, two different options of the combined p-valuemethod, Fisher’s Combined Probability Test (F) or Stouffer-Liptak Test (SL) arecompared to the moving average procedure of CMARRT (C). Second, the effectof incorporating correlations among probes is evaluated by comparing the differentmoving average procedures with or without the dependence adjustments and byevaluating a third procedure, TileMap (see Methods). All results in this section arebased on the Ci Activator data set.

Figure 3 and Figure S2 show the target gene enrichment value for the dif-ferent combinations of tests, window size and FDR cutoffs for determining ERs.Each FDR cutoff (at the probe-level) on the x-axis determines a set of probes thatare used to construct ERs. For all tests, the target gene enrichment increases as theFDR cutoff becomes smaller, indicating that as the threshold for significant probesbecomes more stringent, relatively more target genes are included in the ER geneset than expected by chance. As the FDR cutoff becomes more stringent, the de-pendent tests consistently have higher enrichment values than the independent testsfor all window sizes and for both the combined p-value and moving average meth-ods. By not accounting for the dependence, p-values are much smaller and inflatethe significance of a probe. For example, using window size 4 a FDR cutoff of10−12 for Fisher’s Combined Probability Test is necessary to achieve a similar size

16



of predicted ERs as a FDR cutoff of 10−3 for Fisher’s Combined Probability Testwith Dependence (FwD).

The maximum value of target gene enrichment, ∼ 7.2, occurs for windowsize 7 and FwD. In general, there is a slight advantage for FwD over Stouffer-LiptakTest with Dependence (SLwD) and both have higher target gene enrichment valuesthan the t-statistic moving average method, CMARRT, with dependence (CwD)except for large FDR cutoffs. The three methods without dependence adjustmentsperform very similarly (F, SL and C) and achieve a maximum value of target geneenrichment of ∼ 2.0. They appear to be less sensitive to window size than themethods with dependence. For comparison, TileMap (TM) was also applied andperforms slightly better than the unadjusted versions, but not as well as the adjustedmethods.

Figure 3 and Figure S3 show the enrichment of the Ci transcription fac-tor binding site motif (see Methods) for the different combinations of tests, windowsize and probe-level FDR cutoffs for determining ERs. Again, the unadjusted meth-ods perform more poorly compared to the dependence adjusted methods. In gen-eral, the unadjusted methods identify many and long ERs; so although their motifenrichment ratio may be high, after adjusting for the overall length of the ERs, theyhave relatively worse motif enrichment score than the adjusted methods. Within thedependence adjusted methods, depending on the window size, one of the combinedp-value methods (FwD or SLwD) performs slightly better than the moving averageapproach (CwD) for smaller window sizes and FDR cutoffs. For larger windowsizes, all three methods perform similarly. TileMap again performs slightly betterthan the unadjusted methods, but not as well as the adjusted methods.

Figure 4 and Figures S4 and S5 show the enrichment of ERs close to tran-scription start sites. These results are consistent with the previous metrics in thatthe adjusted methods tend to predict ERs closer to genes, after correcting for overallER length, compared to the unadjusted methods and TileMap and that there is alsosome advantage for SLwD for smaller windows and FDR cutoffs, but this dimin-ishes for larger windows and FDR cutoffs.

4.2 Evaluation of Additional Data SetsThe generalized and existing moving average methods that incorporate correlationsamong probes appear to make the best ER predictions using the three evaluationmetrics on the CiA data set. To explore their performance on two additional datasets, these methods were also applied to data sets based on Ci Repressor (CiR) andProspero (Pros) (see Methods). However, because these are more challenging datasets, there were very few predictions. For example, at window size 3 and relativelyrelaxed probe-level FDR cutoff of 0.10, only 7 ERs were predicted for a total of

17



●●

●

●

●

●

●

●

Window = 5

FDR Cutoff

Tar

get G

ene

Enr

ichm

ent

●

● ● ● ●● ●

●

●

●

●

●

● ●

●

●

● ●FSLCTM

FwDSLwDCwDTM−H

10−−3 10−−2 10−−1

12

34

56

78

●●

●

●

●

●

●

●

Window = 5

FDR Cutoff

Mot

if E

nric

hmen

t Sco

re

●● ● ● ● ● ● ●

●

●●

●

●●

● ●

10−−3 10−−2 10−−1

0.0

0.2

0.4

0.6

0.8

Figure 3: Target Gene and Motif Enrichment. The x-axis corresponds to probe-level FDR cutoffs for determining ERs. On the left, the y-axis corresponds to the“target gene enrichment ratio” (see Methods). Values greater than one, indicated bythe line, correspond to relatively more target genes predicted than expected basedon their frequency in the genome. On the right, the y-axis corresponds to the “mo-tif enrichment score”, which is the motif enrichment ratio corrected for overall ERlengths (see Methods). Larger values correspond to relatively more motifs in theERs than expected based on the frequency of the genome. The results are dis-played for four combined p-value statistics, Fisher’s Combined Probability Test (F),Fisher’s Combined Probability Test with Dependence (FwD), Stouffer-Liptak Test(SL), Stouffer-Liptak Test with Dependence (SLwD), two moving average meth-ods CMARRT (C), CMARRT with Dependence (CwD) and TileMap (TM) usingFDR cutoffs or the HMM prediction method (TM-H). See Figures S2 and S3 for allwindow sizes.

18



●●

●

●

●

●

●

●

Window = 5

FDR Cutoff

Gen

e D

ista

nce

Sco

re (

1KB

)

●● ● ● ● ● ● ●

●

●

●

●●

● ● ●

10−−3 10−−2 10−−1

0.00

0.02

0.04

0.06

0.08

0.10

●●

●

●

●

●

●

●

Window = 5

FDR Cutoff

Gen

e D

ista

nce

Sco

re (

2KB

)

●● ● ● ● ● ● ●

●

●

●

●

●●

●●

10−−3 10−−2 10−−1

0.00

0.02

0.04

0.06

0.08

0.10

Figure 4: Gene Distance Score. The x-axis corresponds to probe-level FDR cutoffsfor determining ERs. The y-axis corresponds to the “gene distance score” correctedfor overall ER lengths (see Methods). Larger values correspond to relatively moreERs close to genes. Distances less than or equal to 1KB and 2KB are used. SeeFigure 3 for details on legend and Figure S4 and S5 for all window sizes.

19



9113 bases using CwD and none using FwD. This may be due to relatively smallerenrichment signal in the genome for these data sets or other factors that are data setspecific (e.g., Pros has relatively high auto correlation values compared to the Cidata sets, see Figure 2).

Therefore, for these two data sets, we took a set number of top probes,ranked by the respective method, and constructed ERs based on those probes. TheseERs were compared using the previously described evaluation criteria. This wasalso repeated on the original Ci activator (CiA) data set. In addition to exploring thebehavior of the methods on the noisier data sets, the advantage of this comparisonis that because the same number of probes are being used, the total ER lengthsshould roughly be the same. For example, for window size 3, using the top 1000probes, FwD has 240 ERs (251234bp) while CwD has 215 ERs (229699bp), whichare roughly the same magnitude. Therefore, ER total length corrections are notnecessary as in Figures 3 and 4. We compared one of the generalized movingaverage methods based on combined p-values with dependence (FwD) with themoving average method with dependence (CwD).

Figures 5 and S6 show the target gene enrichment for all three data sets andtwo different methods. FwD performs the best on the CiA and CiR data set forall window sizes with enrichment value of 7.0. CwD achieves the highest valueof 2.5 for Pros. In general, the results on Pros are strikingly worse. Either thetarget gene list provided by the authors has too many false positives or negatives,the enrichment signal for Pros is relatively low or the high auto correlation makesprediction difficult.

For motif enrichment in the CiA data set (Figures 5 and S7), except forwindow size 3, where FwD achieves the maximum enrichment, FwD does betterat the higher and lower ranked probes, but then CwD achieves larger values in themiddle of the probe rankings. CwD also achieves the highest motif enrichmentratio for CiR, with some advantage for FwD in the middle ranked probes. For Prosthe results are all poor, perhaps due to the specificity of the published consensusmotif for Pros (the authors of the original Pros study also did not find evidence forenrichment of the motif, personal communication). However, FwD achieves themaximum value of ∼ 1.6 for window size 8.

For the gene distance metric in Figure 6 and Figures S8 and S9, all methodsare very close, but FwD performs best for all three data sets for the highest rankingprobes. The differences between the methods is smaller for the Pros data set and thedifferences also decrease as the window size becomes smaller. Based on the genedensity of the Drosophila genome, with median distance 3500-4500bp betweentranscription start sites, even random ERs may be within a certain distance to agene. Roughly ∼ 20% and ∼ 35% of random ERs are within 1KB and 2KB to atranscription start site respectively, with a slight increase as more probes are used to

20



●

●

●

●

●

●

●●

●

●

●

●

0 10000 25000

12

34

56

78

Window = 5

Ranked Probes

Tar

get G

ene

Enr

ichm

ent

●

●

●

●

●

●

●●

●

●

●

●

CiA−FwDCiA−CwDCiR−FwDCiR−CwDP−FwDP−CwD

●

●

●

●

●

●

●●● ●

0 10000 25000

12

34

5

Window = 5

Ranked Probes

Mot

if E

nric

hmen

t Rat

io●

●

●

●

●

●

●

●

●

●

Figure 5: Target Gene and Motif Enrichment for Top Predictions. The x-axis cor-responds to the number of top ranked probes used to construct ERs. On the left,the y-axis corresponds to the “target gene enrichment ratio” (see Methods). Valuesgreater than one, indicated by the line, correspond to relatively more target genespredicted than expected based on their frequency in the genome. On the right, they-axis corresponds to the “motif enrichment ratio” (see Methods). Values greaterthan one, indicated by the line, correspond to relatively more motifs in the ERsthan expected by what is observed in the overall genome. The results are displayedfor Fisher’s Combined Probability Test with Dependence (FwD) and the movingaverage method CMARRT with Dependence (CwD). See Figure 3 for details andFigures S6 and S7 for all window sizes.

21



construct the ERs (see Methods and Figures S8 and S9). After a certain set of topprobes are included, almost all methods for all data sets find ERs closer to genesthan the random sets.

4.3 New ApplicationIn the previous sections, the results of the generalized moving average methodsshowed either comparable or slightly enhanced performance to standard movingaverage methods. Therefore, for the t-test application, there does not appear to be aloss in performance by using results of the hypothesis test instead of the test-statisticthemselves and in some cases there appears to be a gain. However, the generalizedmoving average approach may also be useful in situations where it is difficult toassess significance of the moving average of certain statistics and/or asymptoticnormality does not hold. In these situations, if a problem can be expressed as ahypothesis test at each probe and a p-value can be derived, then the generalizedmoving approach can be applied as an alternative.

We present an example using the Ci data sets, where it is also of interestto identify enrichment regions that overlap between the two forms of the Ci tran-scription factor. Since both forms have the same DNA binding domain, it would beexpected that they would bind to similar sequences. On the other hand, the overallshape of the two protein forms are vastly different and you might expect that one iscapable of binding to low affinity sites better than the other. These differences mayalso explain why the two forms have very different levels of enrichment across thegenome. To explore this problem in more detail, we are interested in finding ERsthat overlap between the two forms. One strategy would be to take the two sets ofERs and examine the genomic regions where they overlap, or the overlap of genesnearby the ERs. However, as discussed above, using even relatively relaxed FDRcutoffs, very few or no ERs are predicted for CiR for most methods. Therefore, wehave analyzed the two data sets to make integrated predictions of ERs for CiA andCiR.

We express the problem as a hypothesis test at each probe and p-values aredetermined such that the generalized moving average approach can be applied tothe resulting p-values (see Methods). We use the Stouffer-Liptak Test combinedp-value test (see Methods) with dependence adjustment and predict 1657 ERs forCiA-CiR using window size 4 and FDR cutoff of .01, which resulted in 1029 as-sociated ER genes. In Table S1, the GO terms associated with the ER genes rangefrom general categories (e.g., multicellular organismal process) to specific (e.g.,neurogenesis or wing disc development). However, the developmental programsthat require Hh signaling in the fly are reflected in the ER genes grouped accord-ing to their GO terms. For example, Hh signaling has largely been characterized

22



●

●

●

●

●

●●●

●

●

●

●

0 10000 25000

0.0

0.1

0.2

0.3

0.4

0.5

Window = 5

Ranked Probes

Gen

e D

ista

nce

Per

cent

age

(1K

B)

●

●

●

●

●

● ●●

●●

●

●

CiA−FwDCiA−CwDCiR−FwDCiR−CwDP−FwDP−CwDRandom

●

●

●

●

●

●

●●

●

●

0 10000 25000

0.0

0.1

0.2

0.3

0.4

0.5

Window = 5

Ranked Probes

Gen

e D

ista

nce

Per

cent

age

(2K

B)

●

●

●●

●

● ●●

●●

Figure 6: Gene Distance Percentage for Top Predictions. The x-axis corresponds toprobe-level FDR cutoffs for determining ERs. The y-axis corresponds to the “genedistance percentage” (see Methods). Larger values correspond to relatively moreERs close to genes. Distances less than or equal to 1KB and 2KB are used. SeeFigure 5 for details and Figure S8 and S9 for all window sizes.

23



in the context of the fly wing imaginal disc and we identify a group of genes thatfall into this category. In addition, Hh signaling has shown to be required for thedevelopment of specialized cells in the nervous system and we identify genes thatare involved in that process (see Table S1, neurogenesis and generation of neurons).Although several conserved signaling pathways participate in the above mentionedprocesses, we attribute these findings to the identification of Hh target genes. Thisconclusion is based in part to the presence of known Hh target genes in these lists.

5 Conclusion and DiscussionWe have described a new method for analyzing tiling arrays which generalizes themoving average approach to scenarios where a problem can be presented as a hy-pothesis test at each probe and a moving “average” can be applied to the p-valuesresulting from this test. Our method was applied to data for the Ci and Pros tran-scription factors, where we had information on both their target genes and bindingsites.

The first set of results based on a t-test were used to compare the perfor-mance of the method to the standard moving average approaches where the movingaverage of a log-ratio or t-statistic can be evaluated (Sections 4.1-4.2). In general,we found that the results were comparable between the methods on the three met-rics used. Since the t-statistic moving average approach is a special case that fallsinto the generalized moving average framework, it is reassuring that the results wereconsistent. Furthermore, for large sample sizes, the Stouffer-Liptak test should beequivalent to the standard moving average approach.

The generalized moving average generally performed at least as well as theexisting moving average methods, with some variation in the best performer dueto the data set and evaluation criteria. As in Kuan et al. (2008), an adjustmentwas made based on correlations between probes and our results (Figures 3-4) areconsistent with previous work that showed improvement in predictions due to thedependence adjustments (Bourgon, 2006, Kuan et al., 2008). TileMap, which onlyaccounts for first order dependencies, performs more similarly to the unadjustedmethods than the dependence adjusted methods.

The ER predictions were compared using several evaluation metrics (targetgene enrichment, motif enrichment and distance to gene). These types of informa-tion are useful for investigators to evaluate the quality of ER predictions. However,there are caveats regarding their use. First, the consensus is based on current knowl-edge and may not completely specify the binding specificity and second, the sets oftarget genes are likely to suffer from both false positives and false negatives. Thismay be especially true for the Pros results which are strikingly worse than the Ci

24



results. However, the Pros results may also be due to to a relatively poorer enrich-ment signal or considerably higher auto correlation, which may make predictiondifficult. Despite the inherent problems with the different benchmarks, using thecombination of the three can still be useful for comparing methods.

The initial set of results can also be used to evaluate how different cutoffsand options affect the results. As expected, the more stringent FDR control cutoffsprovided more accurate ER predictions based on the metrics. However, there aretwo caveats regarding these cutoffs. First, many moving averages methods includ-ing those applied here use Benjamini & Hochberg FDR control assuming tests areindependent despite tests being locally correlated. The effect of correlation maymake FDR estimation unstable and little is known about how to account for corre-lation (Efron, 2007, Schwartzman and Liny, 2009). Second, the FDR cutoffs areat the probe-level not the ER-level, which is also common for many tiling arrayanalysis methods. It would be more ideal for investigators to have FDR cutoffs thatcorrespond to the ER-level test.

We also evaluated the effect of the window size option. Windows sizes of3-8 were selected to correspond to a typical DamID fragment length of 1800-4800.The combined p-value methods appear to be more sensitive to window length.Larger window sizes result in the prediction of longer ERs in general and bettertarget gene enrichment, but worse motif enrichment and gene distance scores. In-termediate window sizes in the range 4-5 appear to balance these two effects.

Finally, two different versions of the combined p-value methods were intro-duced; Fisher’s Combined Probability Test and Stouffer-Liptak Test. For the mostpart, there appears to be an advantage of FwD compared to SLwD on smaller win-dow sizes based on the evaluation metrics. In general, Fisher’s Combined Proba-bility Test puts relatively more emphasis on small p-values than the Stouffer-LiptakTest (Loughin, 2004). This would be more of a problem when the p-values that arecombined are very disparate in magnitude, as in the analysis of the two Ci data setsin Section 4.3.

Although t-statistics were used in the first set of results, the use of the gen-eralized moving average need not depend on that specific type of test as long as theproblem can be expressed in a hypothesis testing framework and p-values can bedetermined for each probe. In the second set of results (Section 4.3), we presentedanother application to illustrate this point and the generalized moving average withdependence adjustment could be applied across probes. Using a standard movingaverage in this scenario may not be appropriate, since the moderated t-statistics ateach probe from the two different experiments (CiA and CiR) have different degreesof freedom and it is of interest to identify regions where there is enrichment of atleast one or both signals. Alternatively, TileMap provides the flexibility to makecomparisons under multiple experimental methods (e.g., mutant 1 < wild type <

25



mutant 2), but in this application, it was not easily adapted to test for binding underthe two conditions (personal communication). In this context, the generalized mov-ing average method provided an alternative analysis method with predictions neargenes that are consistent with annotation for known Hh targets.

ReferencesAlexandre, C., A. Jacinto, and P. Ingham (1996): “Transcriptional Activation of

hedgehog Target Genes in Drosophila is Mediated Directly by the Cubitus in-terruptus Protein, a Member of the GLI Family of Zinc Finger DNA-BindingProteins,” Genes and Development, 10, 2003–2013.

Beissbarth, T. and T. Speed (2004): “GOstat: Find Statistically OverrepresentedGene Ontologies within a Group of Genes,” Bioinformatics, 20, 1464–1465.

Benjamini, Y. and Y. Hochberg (1995): “Controlling the False Discovery Rate -A Practical and Powerful Approach to Multiple Testing,” Journal of the RoyalStatistical Society Series B-Methodological, 57, 289–300.

Bourgon, R. (2006): Chromatin-Immunoprecipitation and High-Density Tiling Mi-croarrays: A Generative Model, Methods for Analysis, and Methodology Assess-ment in the Absence of a ”Gold Standard”, Ph.D. thesis, University of California,Berkeley.

Brown, M. (1975): “A Method for Combining Non-Independent, One-Sided Testsof Significance,” Biometrics, 31, 987–992.

Buck, M. and J. Lieb (2004): “ChIP-chip: Considerations for the Design, Analy-sis, and Application of Genome-Wide Chromatin Immunoprecipitation Experi-ments,” Genomics, 83, 349–360.

Buck, M., A. Nobel, and J. Lieb (2005): “ChIPOTle: A User-Friendly Tool for theAnalysis of ChIP-chip Data,” Genome Biology, 6.

Casella, G. and R. Berger (2002): Statistical Inference, Belmont, CA: DuxburyPress, second edition.

Choksi, S., T. Southall, T. Bossing, K. Edoff, E. de Wit, B. Fischer, B. van Steensel,G. Micklem, and A. Brand (2006): “Prospero Acts as a Binary Switch betweenSelf-Renewal and Differentiation in Drosophila Neural Stem Cells,” Develop-mental Cell, 6, 775–789.

26



Efron, B. (2007): “Correlation and Large-Scale Simultaneous Hypothesis Testing,”Journal of the American Statistical Association, 102, 93–103.

Fisher, R. (1932): Statistical Methods for Research Workers (4th Edition ed.), Lon-don: Oliver and Boyd.

Folks, J. (1984): “Combination of independent tests,” in P. Krishnaiah, ed., Hand-book of Statistics 4: Nonparametric Methods, Amsterdam: Elsevier, 113–121.

Gentleman, R. C., V. J. Carey, D. M. Bates, B. Bolstad, M. Dettling, S. Dudoit,B. Ellis, L. Gautier, Y. Ge, J. Gentry, K. Hornik, T. Hothorn, W. Huber, S. Ia-cus, R. Irizarry, F. Leisch, C. Li, M. Maechler, A. J. R. G. Sawitzki, C. Smith,G. Smyth, L. T. J. Y. H. Yang, and J. Zhang (2004): “Bioconductor: Open Soft-ware Development for Computational Biology and Bioinformatics,” Genome Bi-ology, 5, R80.

Ghosh, S., H. Hirsch, E. Sekinger, K. Struhl, and T. Gingeras (2006): “Rank-Statistics Based Enrichment-Site Prediction Algorithm Developed for ChromatinImmunoprecipitation on Chip Experiments,” BMC Bioinformatcs, 7.

Hallikas, O., K. Palin, N. Sinjushina, R. Rautiainen, J. Partanen, E. Ukkonen, andJ. Taipale (2006): “Genome-wide Prediction of Mammalian Enhancers Based onAnalysis of Transcription-Factor Binding Affinity,” Cell, 124, 47–59.

Hedges, L. and I. Olkin (1985): Statistical Methods for Meta-Analysis, London:Academic Press.

Hess, A. and H. Iyer (2007): “Fisher’s Combined P-value for Detecting Differen-tially Expressed Genes using Affymetrix Expression Arrays,” BMC Genomics,8.

Ji, H. and W. Wong (2005): “TileMap: Create Chromosomal Map of Tiling ArrayHybridizations,” Bioinformatics, 21, 3629–3636.

Keles, S., M. van der Laan, S. Dudoit, and S. Cawley (2006): “Multiple Test-ing Methods for ChIP-Chip High Density Oligonucleotide Array Data,” JournalComputational Biology, 13, 579–613.

Kinsler, K. and B. Vogelstein (1990): “The GLI Gene Encodes a Nuclear Proteinwhich Binds Specific Sequences in the Human genome,” Molecular and CellularBiology, 10, 634–642.

Kost, J. and M. McDermott (2002): “Combining Dependent P-values,” Statistics &Probability Letters, 60, 183–190.

27



Kuan, P., H. Chun, and S. Keles (2008): “CMARRT: A Tool for the Analysis ofChIP-chip Data from Tiling Arrays by Incorporating the Correlation Structure,”in R. Altman, A. Dunker, L. Hunter, T. Murray, and T. Klein, eds., Pacific Sym-posium on Biocomputing, World Scientific, 515–526.

Liptak, T. (1958): “On the Combination of Independent Tests,” Magyar TudomnyosAkadmia Matematikai Kutat Intezetenek Kozlemenyei, 3, 1971–1977.

Liu, X. S. (2007): “Getting Started in Tiling Microarray Analysis,” PLoS Compu-tational Biology, 3.

Loughin, T. (2004): “A Systematic Comparison of Methods for Combining P-values from Independent Tests,” Computational Statistics & Data Analysis, 47,467–485.

Munch, K., P. Gardner, P. Arctander, and A. Krogh (2006): “A Hidden MarkovModel Approach for Determining Expression from Genomic Tiling Micro Ar-rays,” BMC Bioinformatics, 7.

Newton, M., A. Noueiry, D. Sarkar, and P. Ahlquist (2004): “Detecting Differ-ential Gene Expression with a Semiparametric Hierarchical Mixture Method,”Biostatistics, 5, 155–176.

R Development Core Team (2005): R: A Language and Environment for Statisti-cal Computing, R Foundation for Statistical Computing, Vienna, Austria, URLhttp://www.R-project.org.

Rhodes, D. R., T. R. Barrette, M. A. Rubin, D. Ghosh, and A. M. Chinnaiyan(2002): “Meta-Analysis of Microarrays: Interstudy Validation of Gene Expres-sion Profiles Reveals Pathway Dysregulation in Prostate Cancer,” Cancer Res,62, 4427–4433.

Royce, T., J. Rozowsky, N. Luscombe, O. Emanuelsson, H. Yu, X. Zhu, M. Snyder,and M. Gerstein (2006): “Extrapolating Traditional DNA Microarray Statisticsto Tiling and Protein Microarray Technologies,” Methods in Enzymology, 411,282–311.

Schwartzman, A. and X. Liny (2009): “The Effect of Correlation in False DiscoveryRate Estimation,” Harvard University Biostatistics Working Paper Series 106,Harvard University.

28



Smyth, G. (2005): “Limma: Linear Models for Microarray Data,” in R. Gentleman,V. Carey, S. Dudoit, R. Irizarry, and W. Huber, eds., Bioinformatics and Com-putational Biology Solutions using R and Bioconductor, New York: Springer,397–420.

Spangl, B. (2008): On Robust Spectral Density Estimation, Ph.D. thesis, ViennaUniversity of Technology.

Spangl, B., K. Boudt, P. Ruckdeschel, R. Fried, C. Agostinelli,and M. Kohl (2009): “robust-ts: Robust Time Series project,”http://r-forge.r-project.org/projects/robust-ts/.

Stouffer, S., E. Suchman, L. DeVinnery, S. Star, R. Williams Jr., A. Lumsdaine,M. Lumsdaine, M. Smith, I. Janis, and L. Cottrell Jr. (1949): The AmericanSoldier, Volume I: Adjustment During Army Life, Princenton, NJ: Princeton Uni-versity Press.

Tusher, V., R. Tibshirani, and G. Chu (2001): “Significance Analysis of Microar-rays Applied to the Ionizing Radiation Response,” Proceedings of the NationalAcademy of Sciences of the United States of America, 98, 5116–5121.

van Steensel, B., J. Delrow, and S. Henikoff (2001): “Chromatin Profiling UsingTargeted DNA Adenine Methyltransferase,” Nature Genetics, 27, 304–308.

Wilson, R., J. Goodman, V. Strelets, and FlyBase Consortium (2008): “FlyBase: In-tegration and Improvements to Query Tools,” Nucleic Acids Research, 36, D588–D593.

Wormald, S., D. Hilton, G. Smyth, and T. Speed (2006): “Proximal Genomic Lo-calization of STAT1 Binding and Regulated Transcriptional Activity,” BMC Ge-nomics, 7, 254.

Yazaki, J., B. Gregory, and J. Ecker (2007): “Mapping the Genome LandscapeUsing Tiling Array Technology,” Current Opinion in Plant Biology, 10, 534–542.

Zaykin, D., L. Zhivotovsky, P. Westfall, and B. Weir (2002): “Truncated ProductMethod for Combining P-values,” Genetic Epidemiology, 22, 170–185.

29



statistical applications in genetics and molecular biologykornberg/tom_kornberg_lab/pdfs/100....

Documents