enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all...

34
1 Short title: Comparing motif mappers for network inference 1 2 To whom correspondence should be addressed. Tel: +32 9 3313822; Fax: +32 9 3313809; Email: 3 [email protected] 4 5 Enhanced maps of transcription factor binding sites improve 6 regulatory networks learned from accessible chromatin data 7 8 Shubhada R. Kulkarni 1,2,3 , D. Marc Jones 1,2,3 and Klaas Vandepoele 1,2,3 9 10 1 Ghent University, Department of Plant Biotechnology and Bioinformatics, Technologiepark 71, 9052 11 Ghent, Belgium 12 2 VIB Center for Plant Systems Biology, Technologiepark 71, 9052 Ghent, Belgium 13 3 Bioinformatics Institute Ghent, Ghent University, Technologiepark 71, 9052 Ghent, Belgium 14 15 One-sentence summary: Evaluation of transcription factor binding site mapping tools and a protocol to 16 map gene regulatory networks from open chromatin datasets 17 18 AUTHOR CONTRIBUTIONS 19 S.R.K., D.M.J., and K.V. designed the research, S.R.K., and K.V. performed the analyses and S.R.K., D.M.J., 20 and K.V. wrote the manuscript. 21 22 Plant Physiology Preview. Published on July 25, 2019, as DOI:10.1104/pp.19.00605 Copyright 2019 by the American Society of Plant Biologists www.plantphysiol.org on July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Upload: others

Post on 26-Jun-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

1

Short title: Comparing motif mappers for network inference 1

2

To whom correspondence should be addressed. Tel: +32 9 3313822; Fax: +32 9 3313809; Email: 3

[email protected] 4

5

Enhanced maps of transcription factor binding sites improve 6

regulatory networks learned from accessible chromatin data 7

8

Shubhada R. Kulkarni1,2,3, D. Marc Jones1,2,3 and Klaas Vandepoele1,2,3 9 10 1Ghent University, Department of Plant Biotechnology and Bioinformatics, Technologiepark 71, 9052 11

Ghent, Belgium 12 2VIB Center for Plant Systems Biology, Technologiepark 71, 9052 Ghent, Belgium 13 3Bioinformatics Institute Ghent, Ghent University, Technologiepark 71, 9052 Ghent, Belgium 14

15

One-sentence summary: Evaluation of transcription factor binding site mapping tools and a protocol to 16

map gene regulatory networks from open chromatin datasets 17

18

AUTHOR CONTRIBUTIONS 19

S.R.K., D.M.J., and K.V. designed the research, S.R.K., and K.V. performed the analyses and S.R.K., D.M.J., 20

and K.V. wrote the manuscript. 21

22

Plant Physiology Preview. Published on July 25, 2019, as DOI:10.1104/pp.19.00605

Copyright 2019 by the American Society of Plant Biologists

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 2: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

2

ABSTRACT 23

Determining where transcription factors (TFs) bind in genomes provides insight into which 24

transcriptional programs are active across organs, tissue types, and environmental conditions. Recent 25

advances in high-throughput profiling of regulatory DNA have yielded large amounts of information 26

about chromatin accessibility. Interpreting the functional significance of these datasets requires 27

knowledge of which regulators are likely to bind these regions. This can be achieved by using 28

information about TF binding preferences, or motifs, to identify TF binding events that are likely to be 29

functional. Although different approaches exist to map motifs to DNA sequences, a systematic 30

evaluation of these tools in plants is missing. Here we compare four motif mapping tools widely used in 31

the Arabidopsis (Arabidopsis thaliana) research community and evaluate their performance using 32

chromatin immunoprecipitation datasets for 40 TFs. Downstream gene regulatory network (GRN) 33

reconstruction was found to be sensitive to the motif mapper used. We further show that the low recall 34

of FIMO, one of the most frequently used motif mapping tools, can be overcome by using an Ensemble 35

approach, which combines results from different mapping tools. Several examples are provided 36

demonstrating how the Ensemble approach extends our view on transcriptional control for TFs active in 37

different biological processes. Finally, a protocol is presented to effectively derive more complete cell 38

type-specific GRNs through the integrative analysis of open chromatin regions, known binding site 39

information, and expression datasets. This approach will pave the way to increase our understanding of 40

GRNs in different cellular conditions. 41

42

43

44

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 3: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

3

INTRODUCTION 45

Plants are exposed to a wide variety of internal and external signals which need to be correctly 46

processed to facilitate growth and development and to trigger defense responses against environmental 47

stimuli. An important mechanism mediating these signal processing pathways is the control of gene 48

expression. Gene expression is regulated by transcription factors (TFs), proteins which often bind to 49

specific, short DNA sequences and influence gene expression. The identification of functional TF binding 50

is an important step in understanding the biological roles of these regulators. Regulatory links between 51

TFs and target genes together form a gene regulatory network (GRN), which can be used to understand 52

the dynamics of plant processes, such as diverse cellular functions, responses to various external stimuli, 53

and organ development (Song et al., 2016; Sparks et al., 2016; Varala et al., 2018). 54

55

An early and important step in the characterization of GRNs is understanding TF binding preferences, or 56

motifs, as determining potential binding locations of a TF within a genome assists identification of 57

putative target genes. Advancements in technologies that profile regulatory DNA have successfully 58

characterized the binding preferences of many plant TFs (reviewed in Franco-Zorrilla and Solano, 2017). 59

Protein binding microarrays (PBMs), a high throughput experimental technique, determine sequence 60

preferences of TFs by allowing fluorescently labeled proteins to bind to an array of oligonucleotides. 61

Using this technology, TF binding profiles were determined for 63 Arabidopsis (Arabidopsis thaliana) 62

TFs, representing 25 TF families, while Weirauch and co-workers identified motifs for >1000 TFs across 63

131 species (Franco-Zorrilla et al., 2014; Weirauch et al., 2014). Another in vitro assay, DNA affinity 64

purification sequencing (DAP-seq), combines in vitro expressed TFs with next-generation sequencing of a 65

genomic DNA library. Using this technique, binding profiles for 529 TFs in Arabidopsis have been 66

elucidated (O’Malley et al., 2016). In recent years, numerous TF chromatin immunoprecipitation (ChIP) 67

experiments have been performed, expanding our knowledge of TF binding in plants (Heyndrickx et al., 68

2014; Song et al., 2016). Collectively, these binding profiles offer an interesting resource to study TF 69

binding in the Arabidopsis genome for over 900 TFs (Kulkarni et al., 2018). 70

71

The simplest approach to delineate GRNs from these profiles is by naively mapping the TF motifs to the 72

nearest gene promoter. However, the high rate of false positives when mapping motifs to a DNA 73

sequence, especially if the motif is short and degenerate, results in low specificity to identify functional 74

regulatory events (Baxter et al., 2012). To overcome these issues, additional sources of evidence, such as 75

gene co-regulation or evolutionary sequence conservation, are frequently used to define functional 76

binding sites. Based on the hypothesis that a set of co-regulated genes are regulated by a similar cohort 77

of TFs, identification of over-represented sequences in the promoters of these genes can enrich for 78

functional true positives (Michael et al., 2008; Vandepoele et al., 2009; Hickman et al., 2017; Kulkarni et 79

al., 2018). An alternative approach involves filtering potential binding sites using conservation 80

information over large evolutionary distances. This method assumes that functionally important binding 81

sites will be under purifying selection, and as such, will be conserved between species. Filtering motif 82

matches using this metric substantially reduces the false positive rate (Vandepoele et al., 2006; Haudry 83

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 4: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

4

et al., 2013; Van de Velde et al., 2014; Burgess et al., 2015; Yu et al., 2015), although it is important to 84

note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 85

86

Recent advances in the profiling of open chromatin have increased our understanding of regulatory DNA 87

in Arabidopsis (Zhang et al., 2012; Sullivan et al., 2014; Lu et al., 2017). Combined with cell type-specific 88

nuclear purification, methods such as Assay for Transposase‐Accessible Chromatin followed by DNA 89

sequencing (ATAC-Seq) offer unprecedented opportunities to identify cell type‐specific TF networks (Lu 90

et al., 2017; Maher et al., 2018; Sijacic et al., 2018). Nevertheless, elucidation of GRNs from chromatin 91

accessibility data requires detailed information about TF binding preferences in order to identify 92

potential binding sites within accessible regions of the genome, and therefore infer TF-target gene 93

regulatory interactions. 94

95

Based on the importance of motif mapping to find locations of potentially functional TF binding, in this 96

study we compared four frequently used motif mapping tools and performed a detailed evaluation of 97

their global performance for 40 TFs in Arabidopsis. We evaluated the similarities and differences 98

between these tools at a TF level and found that differences in tool sensitivity and specificity affect the 99

inference of GRNs. By combining the results from two tools into an Ensemble, we were able to improve 100

the identification of TF-target regulatory interactions in different experimental datasets. Using this 101

Ensemble approach we developed a protocol to elucidate cell type-specific GRNs from ATAC-Seq defined 102

accessible genomic regions. The results of this analysis, relative to the original study, offer a more 103

complete view on gene regulation in shoot apical meristem (SAM) stem and mesophyll cells in 104

Arabidopsis. 105

106

RESULTS 107

Performance of individual motif mapping tools to identify in vivo binding events 108

A wide variety of tools are used in the plant research community to map TF motifs (Supplemental Table 109

S1). We selected and evaluated four frequently used tools to map TF motifs in Arabidopsis: FIMO, 110

Cluster-Buster (CB), matrix-scan (MS), and MOODS (Frith et al., 2003; Turatsinze et al., 2008; Korhonen 111

et al., 2009; Grant et al., 2011). These tools were used to map a set of 66 motifs (corresponding to 40 112

TFs and 19 TF families; see Supplemental Table S2 for TF motif details) onto the Arabidopsis genome 113

(see "Materials and Methods"). The motifs, mainly derived from protein binding microarrays and DAP-114

Seq, were selected based on the availability of experimental ChIP-Seq datasets for the profiled TF. The 115

set of TFs included in this analysis have diverse roles in processes such as the cell cycle, flower 116

development, response to light or hormones, and defense responses. Motif matches (referred to as TF 117

binding sites; TFBSs) reported by the different tools were evaluated by counting the number of TFBSs 118

confirmed by ChIP-Seq datasets (precision). Recall for each tool was calculated as the fraction of regions 119

identified by ChIP-Seq that were covered by a motif match, that is, how many target genes are correctly 120

recovered by motif matches (median values of performance statistics in Table 1). FIMO produced the 121

lowest number of motif matches (2.4 million matches versus 19-34 million matches for the other tools) 122

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 5: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

5

and showed the highest precision among all tools. The median precision for FIMO is 5%, compared to 123

2.2-2.4% for the other tools (Figure 1A), indicating that FIMO reports a higher fraction of experimentally 124

supported matches. However, recall is low with the FIMO results as a consequence of the tool predicting 125

approximately 10-fold fewer matches relative to the other tools (22% median recall versus 36-48% for 126

the other tools; Figure 1B). Overall, these results suggest that FIMO misses some real TFBSs based on 127

the ChIP-Seq data, considering all matches. Due to the large variation in the total number of matches 128

predicted by each tool, we also evaluated the tool performance considering only the 7000 highest 129

scoring matches (top7000). The size of this subset was chosen to optimize the compromise between 130

precision and recall for CB (see "Materials and Methods"). Using this subset of matches, the median 131

precision and recall of all tools are similar (Figure 1A and 1B). In order to assess the false positive rate 132

(FPR) for each tool, TF motifs were mapped using shuffled promoter sequences of Arabidopsis genes 133

(see "Materials and Methods"). Due to its stringency, FIMO has the lowest FPR (Figure 1C), while 134

MOODS, which identifies the highest number of matches, has the highest FPR compared to the other 135

tools. Together with the above results, this suggests that many of the matches identified by MOODS are 136

false positives. Overall, the FPR for all tools was below 10%. 137

138

Following the evaluation of mapping tool performances, we next studied the effect of TF motif 139

complexity on the precision and recall values, using the information content (IC) of each motif. Given the 140

clustering pattern in Supplemental Figure S1, all matches predicted by FIMO and CB were considered for 141

this analysis. To examine the effect of motif complexity on the performance measures, 21 TFs were 142

selected for which more than one motif was available. For CB, for 15 TFs, the F1 score increased with 143

increasing motif complexity (Figure 2). For FIMO, however, this trend was observed for 8 TFs only. FIMO, 144

besides implementing a p-value threshold for calling motif matches, has an internal cut-off to restrict 145

spurious matches when used with low complexity motifs. This additional threshold is likely responsible 146

for the quality of the TFBSs found with FIMO being less dependent upon TF motif complexity. Of the TFs 147

selected for the above analysis, 13 had motifs from different sources, such as CisBP and DAP-Seq. We 148

next checked if the source of motifs had an impact on the performance measures. CisBP motifs, derived 149

from PBMs, were on an average shorter than motifs derived from DAP-Seq (average lengths for 150

CisBP=11.67 and DAP-Seq=14.55) and were less complex (average IC for CisBP=8 and DAP-Seq=10; 151

Supplemental Table S2). For ABI5, AGL15, ERF115, GFB3, HB7, and WRKY33, the F1 score was higher for 152

DAP-Seq motifs compared to CisBP. For the remainder of the TFs, where the complexity between the 153

two motifs did not vary, the F1 scores were similar. 154

155

Evaluation of unique motif matches reveals complementary between mapping tools 156

For the TFs included in our benchmark, the varying recovery of true positive matches suggests that each 157

tool performs differently depending on the complexity of the motif (Supplemental Figure S1). To 158

investigate the differences between tools further, we compared the motif matches confirmed by ChIP-159

Seq data for each tool. To account for the large differences in the number of matches reported by each 160

tool, only the top7000 matches per tool and per motif were used in this analysis. Conducting pairwise 161

comparisons between tools reveals that for 12 out of 66 motifs, the TFBSs identified uniquely by CB 162

have high recall rates (Supplemental Figure S2). This pattern is retained when matches found uniquely 163

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 6: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

6

by a particular tool, relative to all matches of the other tools combined, are used (Supplemental Figure 164

S3). For 65 motifs, the recall of the top7000 matches uniquely found with CB was larger than zero, 165

making it the only tool to identify functional matches for 98% of all TF motifs considered in this study 166

(Figure 3A). Moreover, for 21 of 65 motifs, the motif mappings from CB were able to achieve recall 167

values between 10% and 38%, considerably higher than the recall rates of other tools, which did not 168

exceed 10% (Figure 3B). This result highlights that CB is able to identify a unique set of functional 169

matches with high ChIP recovery for 32% of the studied motifs. 170

171

Another aspect in which the tools differ is in the length of TFBSs reported. The average motif match, 172

considering all matches, is 17.73 bp for CB, whereas for other tools, it is 13.72 bp (Table 1). In some 173

cases, the TFBSs identified by CB are longer than the motifs mapped. This difference is due to CB 174

merging TFBSs that are located closer than a specified gap parameter, with the default value of this 175

parameter set to 35 bp. To evaluate if the high recall rate of TFBSs unique to CB is due to the merging of 176

close TFBSs, the above analysis was repeated with the gap parameter set to 1 bp. As in the previous 177

results (Supplemental Figure S3), CB is distinct from the other tools by having high precision and recall 178

for a number of samples clustered at the bottom of the heatmap (Supplemental Figure S4). However, 179

relative to the findings when the default values were used, using a 1 bp gap parameter results in the 180

maximum recall reducing from 40% to 25%, potentially due to the unmerged matches of CB no longer 181

being unique to the tool. As a result of these findings, unless specified, all analyses performed with CB in 182

this study use the default gap parameter value. 183

184

Given the observation that the two clusters of tools in Supplemental Figure S3 capture complementary 185

sets of functional TFBS in their top-scoring matches, we next explored how these results can be unified 186

into an Ensemble approach. Comparing the global similarity of unique motif matches reveals that the 187

results from FIMO cluster with those of MOODS and MS, while the results from CB are distinct from the 188

other tools (Supplemental Figures S2 and S3). Due to the similarity of results from FIMO, MOODS, and 189

MS, only the results from FIMO were selected for the Ensemble. FIMO was selected as it achieved the 190

highest precision of the three tools, with the least number of motif matches. All matches found by FIMO 191

were combined with a set of quality matches from CB to overcome the recall problem of FIMO. The top 192

7000 matches, determined as the optimal number of matches to select based on considerations of 193

precision and recall (Supplemental Figure S5), were integrated into all matches of FIMO to form the 194

Ensemble set of matches (see "Materials and Methods"). 195

196

Ensemble motif mapping yields additional target genes when characterizing gene regulatory 197

networks from TF perturbation experiments 198

One of the fields in which motif mapping plays an important role is GRN inference. To validate the 199

applicability of the Ensemble approach to study GRNs in plants, we compared the regulatory links 200

predicted from the motif mapping to lists of genes that are differentially expressed after TF perturbation 201

(DE gene sets). TFs for which perturbation experiments have been conducted covered a wide range of 202

biological processes, such as AGL15 in embryogenesis, AP3 and PI in flower development, BES1 in plant 203

growth and development, FHY3 and PIFs in response to light, WRKY33 in defense response, and EIN3 in 204

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 7: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

7

response to ethylene (see "Materials and Methods"). To test the recall of the Ensemble, we investigated 205

whether the TFBSs corresponding to the perturbed TF (referred to as the correct TFBSs) were 206

significantly enriched in the promoters of the DE genes (hypergeometric test, FDR corrected p-207

value≤0.01). Furthermore, the subset of genes from the DE gene set that contained a correct TFBS were 208

compared with experimental ChIP-Seq data for the TF to identify bona fide target genes (see "Materials 209

and Methods"). For 9 of the 10 DE gene sets, a significant enrichment of the correct TFBSs were found 210

for the DE gene sets using the Ensemble. Out of these 9 sets of DE genes, the Ensemble showed better 211

recovery of ChIP-confirmed target genes for 5 sets (PIF4, WRKY33, EIN3, FHY3, and PI) compared to 212

FIMO. For the remaining sets, the rate of recovery was comparable to FIMO. In total, the Ensemble 213

method identified 41 target genes that were missed by FIMO for 10 DE sets (referred to as ‘extra 214

targets’), out of which 32 (78%) were confirmed using ChIP-Seq datasets for the respective TFs (Table 215

S3). For WRKY33 and PI, the Ensemble yielded the largest number of additional ChIP-confirmed target 216

genes: 16 for WRKY33 (Figure 4) and 8 for PI. Moreover, the FIMO matches lacked a significant 217

enrichment of the WRKY33 motif for WRKY33 perturbed genes. The target genes of WRKY33 which were 218

missed by FIMO included ZFAR1/CZF1 (AT2G40140) and ERF1 (AT3G23240), which are both involved in 219

defense response to biotic stimulus (Table 2). Other examples of target genes detected by Ensemble and 220

missed by FIMO included: HEC1 (AT5G67060) in the PI DE gene set, a well-known TF involved in 221

gynoecium development (Gremski et al., 2007), and BLH1 (AT2G35940) in the FHY3 DE set, a gene 222

known to be involved in the response to far-red light (Staneloni et al., 2009). A detailed intersection of 223

the Ensemble motif matches, the DE genes after TF perturbation, and the ChIP targets is shown in 224

Supplemental Figure S6. 225

226

An improved protocol to identify gene regulatory networks starting from accessible 227

chromatin regions 228

The identification of highly accessible open chromatin regions throughout the genome helps to 229

determine the location of potential regulatory elements. Recent advancements in experimental 230

technologies have allowed researchers to map open chromatin regions in specific plant cell types 231

(Maher et al., 2018; Sijacic et al., 2018). However, identifying which TFs are likely to bind these regions, 232

and how they affect gene expression, is still a major challenge. A recent study used ATAC-Seq to identify 233

transposase hypersensitive sites (THSs) specific to stem cells of the shoot apical meristem (SAM) and 234

leaf mesophyll cells (Sijacic et al., 2018). To identify potential TFs that bind to these THSs, Sijacic and co-235

workers used de novo motif discovery to identify over-represented motifs in these regions. Motifs found 236

de novo were compared to known TF motifs to identify potential regulators. The genomic locations of 237

these over-represented motifs, determined using FIMO, were then assigned to the closest gene to 238

identify potential cell type-specific target genes of the associated TFs. By selecting TFs showing cell type-239

specific expression, measured using high rank ratios (RR) in each cell (see "Materials and Methods"), and 240

that also had at least one de novo motif assigned to it, Sijacic et al. reported 23 and 128 TFs in SAM stem 241

and leaf mesophyll cells, respectively. 242

243

This traditional pipeline, besides having multiple steps to identify cell type-specific GRNs, is dependent 244

on de novo motif discovery tools and parameters. Furthermore, linking motifs found de novo with 245

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 8: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

8

known TFs can be challenging (Castro-Mondragon et al., 2017). In order to overcome some of these 246

problems, we developed a novel protocol in which the enrichment of TFBSs was directly compared 247

against a set of 2,132 SAM stem cell and 1,508 mesophyll-specific THSs, to identify putative regulators 248

and targets (see "Materials and Methods"). 249

250

Starting from 59 SAM stem cell-specific and 158 mesophyll cell-specific TFs having high RR, we 251

determined which motifs were significantly enriched in the corresponding cell type THSs using both 252

FIMO and Ensemble motif mappings (see "Materials and Methods"). The Ensemble approach reported a 253

larger number of significantly enriched TFBSs compared to FIMO in both cell types. Of 59 TF motifs 254

mapped, 29 were significantly enriched in SAM stem cell-specific THSs when Ensemble mappings were 255

used, whereas 25 motifs were enriched when FIMO motif mappings were used. Whereas 13 motifs 256

correspond to TFs also reported in the original study, 16 of the 29 significantly enriched motifs from the 257

Ensemble set corresponded to TFs that were not described by Sijacic et al. (2018). These TFs include 258

BRC2 from the TCP family, AGL24, AGL27, AGL31, and AGL70 belonging to the MADS family, IDD15 from 259

the C2H2 family, and additional TFs from the zf-HD and C2C2-Dof families. For mesophyll cells, 55% (87 260

out of 158) of the motifs were enriched for mesophyll THS regions using Ensemble motif mappings, 261

whereas for FIMO only 48% (77 out of 158) of the motifs showed a significant enrichment. Eleven of the 262

87 TFs found enriched using Ensemble motif mappings were not reported by Sijacic et al., and included 263

DREB2, AT1G33760, and RAP2.4 belonging to the AP2-EREBP family, CCA1 and LCL1 from a MYB-related 264

family, AT5G50915 from the bHLH family, bZIP60 from the bZIP family, SPL13 from the SBP family, 265

AT1G14580 from the C2H2 family, and WRKY30 from the WRKY family. 266

267

The TFs binding to all enriched motifs identified using the novel protocol in both SAM stem and 268

mesophyll cells were compared with the previously reported 23 and 128 TFs in the respective cell types 269

(Sijacic et al., 2018). Results from the Ensemble method showed enrichment for 13 of 23 motifs, 270

whereas FIMO TFBSs were enriched in 11 of 23 cases. Similarly, for mesophyll cells, 76 and 69 of 128 TF 271

motifs were found to be enriched using the Ensemble and FIMO, respectively (Supplemental Table S4). 272

Overall, our one-step protocol identified 116 regulators showing both significant TFBS enrichment for 273

THSs and cell type-specific expression, of which 23% (n=27) were not reported in the original study. 274

Conversely, for 62 TFs reported by Sijacic and co-workers no significant TFBS enrichment was found 275

using our protocol, suggesting the corresponding motifs do not occur more in the THSs than expected by 276

chance. 277

278

To understand how the choice of motif mapping tool affects GRN construction, we investigated the 279

differences between the Ensemble and FIMO motif mappings based on the putative target genes they 280

identify. In total, the Ensemble identified 6,917 targets for 29 significantly enriched motifs in SAM stem 281

cells, whereas FIMO identified 6,428 targets. To determine whether the extra targets identified using 282

the Ensemble are potentially functional, we evaluated their gene expression in each cell type. We 283

initially counted how many of the targets exhibit a two-fold expression difference (|log2(RR)| > 1) in 284

either of the cell types. Out of 489 extra targets identified by the Ensemble approach in SAM stem cells, 285

171 genes (35%) were expressed and 93 genes (19%) showed cell type-specific expression (-log2(RR) > 1; 286

Supplemental Table S4). The fraction of Ensemble-unique target genes which are expressed in a cell 287

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 9: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

9

type-specific manner are consistent across the different TFs (Figure 5A, TFs labelled blue indicate new 288

regulators). Nine of the cell type-specific genes show more than 6-fold higher expression in SAM stem 289

cells and are therefore likely to be bona fide target genes within the SAM stem cell-specific GRN (Table 290

3). Three of these nine genes (AT4G11211, AT5G02450, and AT5G13340) lack experimental evidence 291

about their biological function. The remaining 6 genes are known to be involved in a number of 292

processes based on experimental Gene Ontology annotations: primary root development (ATHB13), 293

xylem development (KNAT1), response to cold (DIN10), salt stress and abscisic acid (GASA14), and 294

defense response to bacterium (ERD5, TGA4). These genes are regulated by a diverse array of TFs, such 295

as IDD7, TCP7, ALC, AGL70, DAG2, AGL27, BRC2, JKD, KNAT1, AGL31, and AGL24, that were both 296

described in the original study or identified here. Interestingly, KNAT1 and TGA4, being TF themselves, 297

are regulated by multiple TFs (IDD7 and JKD regulate KNAT1, ALC and KNAT1 regulate TGA4), suggesting 298

some new transcriptional cascades in the SAM stem cell-specific GRN (Figure 5B). 299

300

For mesophyll cells, 574 of 1,660 new target genes (35%) were expressed in either of the cell types, 301

which is a similar fraction as that reported for SAM stem cells. The percentage of cell type-specific 302

targets identified using the Ensemble motif mappings was 24% for mesophyll cells, corresponding to 402 303

identified targets with higher expression (-log2(RR) < -1) only in mesophyll cells. 29 of these genes had 304

more than 6-fold expression in mesophyll cells (Table 3). The mesophyll cell-specific GRN of highly 305

expressed genes contained 77 regulatory interactions between 29 targets and 42 TFs, with many of 306

these new target genes being regulated by multiple TFs (Figure 5C). Several of the new target genes 307

have a role in hormone signaling, such as AOS, LOX3, and LOX4, reported to be jasmonic acid responsive, 308

and RRTF1, involved in ethylene biosynthesis. Examples of new unknown target genes are AT5G54165 309

(regulated by BIM1, BIM2, BIM3, PIF7, UNE10, and bHLH105), AT3G51660 (regulated by TCP2, TCP17, 310

and SPL1), and AT4G12005 (regulated by ARF7, LRL2, SPL14, CBF1, and CBF4). A complete set of 311

interactions between TFs and target genes in SAM stem and mesophyll cells predicted using the 312

Ensemble TFBS enrichment protocol are available as a Cytoscape network session file (Supplemental 313

Dataset S1). 314

315

DISCUSSION 316

Recent technological developments have made it possible to profile the chromatin state of particular 317

cell types with high specificity (Maher et al., 2018; Sijacic et al., 2018). This specificity has extended to 318

the level of single cells, allowing cell-to-cell variability in chromatin accessibility to be assessed. 319

However, the impact of these studies is dependent on determining the biological relevance of the 320

accessible regions, particularly if those regions are not located within genes. In addition, as the cost of 321

sequencing decreases and as long read sequencing technologies improve, the number of available 322

genome sequences will increase. While methods to annotate genes are relatively mature, methods to 323

annotate non-coding, regulatory regions are less so. One method of understanding the relevance of 324

accessible chromatin regions, and of annotating potential regulatory sequences, is to map known TF 325

binding preferences onto DNA sequences to identify likely locations bound by TFs. While many tools 326

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 10: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

10

exist to perform this mapping, each makes certain biological assumptions, and consequently it can be 327

unclear which tool leads to more reliable results in a particular situation. 328

329

In order to address this problem, we performed a detailed evaluation of motif mapping tools to 330

determine regulatory relationships in Arabidopsis. Precision and recall were determined for each tool 331

using ChIP-Seq data to assess true positives, revealing that although vastly different numbers of matches 332

were found for each tool, the ability to identify sites that are supported experimentally was similar when 333

similarly-sized subsets of top scoring matches were taken for each tool. FIMO, which is widely used in 334

the plant science community, gave the best precision within its predicted motif matches, but fails to 335

recover some true motif matches due to its stringent settings. Using a benchmark dataset consisting of 336

40 TFs, we observed that FIMO and CB offer a complementary view on functional TFBSs. We found that 337

when focusing on top7000 matches, despite having a higher false positive rate than FIMO using default 338

settings, CB identified a set of unique motif matches, up to 38% of which were confirmed by ChIP-Seq 339

data. Combining the results of FIMO and CB into an Ensemble set of motif mappings resulted in 340

improved recall relative to FIMO, when motif enrichment of TF perturbation DE gene sets was 341

performed. Overlap of enriched motifs with the ChIP-Seq datasets revealed that, for 5 of the DE gene 342

sets, the Ensemble identified 32 extra functional targets missed by FIMO. Several of these additional TF-343

target regulatory interactions identified using the Ensemble approach are supported by literature. In 344

independent WRKY33 perturbation experiments (Birkenbihl et al., 2012; Sham et al., 2017), half of the 345

extra targets identified by the Ensemble approach were also found to be DE between wrky33 mutants 346

and wild-type Arabidopsis plants (ERF1, FLP / MYB124, DOF1, WRKY15, WRKY30, SZF2, AT2G43140, and 347

AT3G55950). In addition to the known role of WRKY33 in defense responses (Birkenbihl et al., 2012), 348

expression of this TF is also associated with broad stress conditions such as cold, salinity, wounding, and 349

biotic stress (Ma and Bohnert, 2007). Of the 16 extra WRKY33 targets, MYB124, WRKY15, WRKY22, 350

WRKY30, SZF1, and SZF2 have all been found to play roles in a range of stress responses (Sun et al., 351

2007; Xie et al., 2010; Vanderauwera et al., 2012; Scarpeci et al., 2013; Kloth et al., 2016), supporting the 352

proposed function of WRKY33 as a central stress response factor. The identification of BLH1 as an 353

additional target of FHY3, which integrates responses to far-red light and abscisic acid signaling (Wang 354

and Deng, 2002; Tang et al., 2013), suggests a role for FHY3 during germination and early seedling 355

development, as BLH1 is known to be involved in ABA-mediated signaling pathway acting during early 356

plant development (Kim et al., 2013). The finding of FLOWERING BHLH 1 (FBH1) / bHLH80 as an 357

additional target of FHY3 is also consistent with the role the gene has in light signaling, as FBH1 has been 358

found to control CONSTANS, a key photoperiod gene, and influence the response of the circadian clock 359

to temperature (Ito et al., 2012; Nagel et al., 2014). Finally, the function of PI as a floral homeotic gene 360

in the shoot apical meristem to ensure correct floral organ determination (Goto and Meyerowitz, 1994) 361

is in line with the regulation of HEC1, a bHLH transcription factor which also acts downstream of 362

WUSCHEL to control stem cell proliferation (Schuster et al., 2014). The supporting literature for these 363

interactions strongly suggests that the additional targets identified by the Ensemble motif mappings are 364

functional. 365

366

Next, we introduced a novel protocol to learn GRNs from accessible open chromatin regions profiled 367

using cell type-specific ATAC-Seq. Starting from all TFs for which motif information was available, TFBS 368

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 11: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

11

enrichment was combined with information about cell type-specific expression to infer GRNs. Both 369

traditional and our new protocol inherently depend on the availability of TF motifs, which is a limitation. 370

However, the protocol employed in this study is independent of both de novo motif discovery and 371

similarity searches of de novo found motifs against motif databases, which is an important step in 372

traditional pipelines to learn regulatory interactions from open chromatin regions (Sullivan et al., 2014; 373

Maher et al., 2018; Sijacic et al., 2018). Moreover, the protocol not only reduces the number of steps to 374

go from cell type-specific THSs to GRNs, but also identifies TFs missed in the previous study by Sijacic et 375

al. (29 and 87 additional significantly enriched TF motifs in SAM stem cells and leaf mesophyll cells, 376

respectively). Conversely, 62 TFs described in the original study were not found to be enriched using our 377

protocol, suggesting there is still room for improvement to learn complete GRNs starting from cell type-378

specific accessible regions. Apart from identifying additional regulators, we observed that the 379

performance of the Ensemble approach surpassed that of FIMO when used to map motifs as part of the 380

protocol reported here. Additional enriched TF motifs were identified using the Ensemble, with 4 381

additional regulators out of 29 total TFs in SAM stem cells and 10 additional TFs out of 87 in mesophyll 382

cells. A striking addition to the set of TF motifs enriched in the SAM stem cell THSs are those of the 383

MADS-box containing genes MAF1/FLM, MAF2, MAF3, and AGL24. All of these genes have been found 384

to influence flowering time, and have positions within a TF network in the SAM that integrates 385

environmental and developmental signals to control flowering (Yu et al., 2002; de Folter et al., 2005; 386

Werner et al., 2005; Rosloski et al., 2010; Capovilla et al., 2017). The motifs of these TFs were found 387

enriched in THSs specific to the SAM stem cells, suggesting that signal integration is occurring in the 388

stem cells at the apex. In addition to these motifs, the motifs corresponding to KNAT1 and AtCSP2 were 389

also enriched. Correspondingly, the expression of both genes has previously been found to be localized 390

to the SAM, with KNAT1 being a homeodomain important for leaf morphogenesis and AtCSP2 involved 391

in the transition to flowering and silique development (Lincoln et al., 1994; Nakaminami et al., 2009). In 392

contrast to the SAM, the additional mesophyll cell-specific enriched motifs contain TFs known to be 393

involved with stress responses, the circadian clock, and growth. DREB2 is involved in controlling 394

drought-responsive genes (Sakuma et al., 2006), while WRKY30 has been found to be important for both 395

biotic and abiotic stress responses (Scarpeci et al., 2013). In addition to stress responses, motifs from 396

TFs involved in the age-related flowering time pathway (SPL13) and the circadian clock (AtCCA1) are 397

enriched (Wang and Tobin, 1998; Xu et al., 2016), consistent with the leaf playing a key role in 398

environmental sensing. Finally, ARF7 is an auxin regulated TF which promotes leaf expansion (Wilmoth 399

et al., 2005). 400

401

Taken together, the additional enriched motifs identified in the SAM stem cell and leaf mesophyll 402

specific THSs are consistent with the central role of the SAM in flowering time control, and of the leaf 403

responding to stress elicitors and circadian clock entrainment. This demonstrates that the Ensemble-404

based approach leads to biologically relevant results that contribute towards a more complete picture of 405

the GRNs active in these two tissues, and that might otherwise be missed when using de novo motif 406

based methods. In addition, the extra target genes identified by the Ensemble, comprising 93 and 402 407

target genes for SAM stem and mesophyll cells, respectively, were found to be highly expressed in the 408

corresponding cell types, suggesting that the unique regulators as well as their targets identified by the 409

Ensemble are biologically relevant. 410

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 12: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

12

411

In conclusion, we have shown that an integrative approach, utilizing two complementary motif mapping 412

tools, results in improved power to detect functional TFBSs relative to FIMO, the most frequently used 413

tool. This approach facilitates more accurate inference of GRNs and will be especially important as 414

chromatin accessibility data continues to be collected. While motif mapping alone is insufficient to 415

accurately map functional regulatory interactions, determining likely positions can help direct future 416

experimental work. A supplemental website offering the Ensemble TFBS mapping results for 1,793 TF 417

motifs corresponding to 916 Arabidopsis TFs is available at 418

http://bioinformatics.psb.ugent.be/cig_data/motifmappings_ath/ as a file in BED format. 419

420

MATERIALS AND METHODS 421

Collection of TFBSs 422

The motif collection used for this analysis consisted of 66 Arabidopsis (Arabidopsis thaliana) position 423

weight matrices (PWMs) representing 40 TFs from different sources including CisBP (Weirauch et al., 424

2014), Franco-Zorilla and co-workers (Franco-Zorrilla et al., 2014), Plant Cistrome Database (O’Malley et 425

al., 2016), and JASPAR 2016 (Mathelier et al., 2015). IC of PWMs was calculated using the ‘convert-426

matrix’ command from rsa-tools version 2012-05-25 with –return option set to ‘info’ (Turatsinze et al., 427

2008). TFs were assigned to gene families based on the PlnTFDB 3.0 database (Pérez-Rodríguez et al., 428

2009). 429

430

PWM mapping using different tools 431

Four mapping tools that are widely used in the plant science community were evaluated in this study. 432

Cluster-Buster (version ‘Compiled on Sep 22 2017’; (Frith et al., 2003)) was run with the '–c' parameter 433

set to zero, as the other tools do not offer prediction of motif clusters. For FIMO, default parameters 434

were used (meme version 4.11.4; (Grant et al., 2011)). For MOODS (version 1.9.3; (Korhonen et al., 435

2009)), a p-value threshold of <0.0001 was used to enable comparison to FIMO. This threshold was also 436

used for matrix-scan while all other parameters were set to default (rsa-tools version 2012-05-25; 437

(Turatsinze et al., 2008)). The command lines for the different tools are: 438

439

cbust-linux $PWMfile $seqFile -c 0 -f 1 440

fimo -o $output $PWMfile $seqFile 441

moods_dna.py -m $PWMfile -s $seqFile -p $threshold --batch -o $output 442

matrix-scan -v 1 -matrix_format cb -m $PWMfile -i $seqFile -2str -return limits -return sites -seq_format 443

fasta -o $output 444

445

$threshold was set to 0.0001 (default value for FIMO and matrix-scan) 446

447

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 13: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

13

Extraction of promoter regions 448

In addition to the TAIR10 Arabidopsis genome annotation, a set of 5,711 non-coding RNAs (ncRNAs) 449

described by (Liu et al., 2012) were added, resulting in a dataset covering 38,966 genes (Lamesch et al., 450

2012). For all genes a promoter region 5000bp upstream of the translation start site and 1000bp 451

downstream of the translation end site, including introns, were used. If another gene was present 452

upstream of the gene, the region is cut where this upstream gene starts or ends. 453

454

Estimation of recall, precision, and false positive rate 455

For each TF, all PWM matches from each mapping tool were overlapped with publicly available TF ChIP-456

Seq data (Table S5). BEDTools was used to intersect the BED files, using the ‘-f’ option set to 1 for 457

complete overlap (Quinlan, 2014). Precision was calculated as the number of TFBS matches confirmed 458

by ChIP-Seq divided by the total number of matches. Recall was calculated as the number of ChIP-Seq 459

peaks for the studied TF that were covered by motif matches, divided by the total number of ChIP-Seq 460

peaks. 461

462

To calculate the false positive rate (FPR) of the motif mappers, shuffled promoters (n=38,966) were 463

generated by shuffling the sequences of the real promoters. The 66 TFBSs were mapped to these 464

shuffled promoters. Following (Jayaram et al., 2016), actual negatives (ANs) were calculated for every 465

promoter and every motif as the length of the promoter divided by the length of motif. The FPR was 466

then calculated as the number of matches predicted by a specific tool divided by AN. The FPR value for a 467

TFBS is the average over all promoters. 468

469

Selection of optimal number of top scoring matches 470

To define the set of matches of CB to combine with FIMO, we took progressively larger sets of CB 471

matches and evaluated which set size resulted in the highest F1 score, a metric which combines 472

precision and recall (Supplemental Figure S5). The F1 score is the harmonic average of the precision and 473

recall, where an F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0. An 474

optimal F1 score was observed between 7000 and 9000 matches (Supplemental Figure S5). Based on 475

this observation, the top scoring 7000 matches of were selected to keep an optimal balance between 476

precision and recall for the CB matches. The same number was also used to identify performance of 477

individual mapping tool by considering equal number of top scoring matches for Figure 1. 478

479

Enrichment on DE genes after TF perturbation 480

10 publicly available DE gene sets after TF perturbation were used to determine motif enrichment (Table 481

S6). We determined, for each TF, the number of DE genes with a proximal TFBS. The significance of this 482

overlap was determined using the hypergeometric distribution. For each enriched motif, the multiple 483

testing corrected p-value (or q-value) of enrichment is determined using the Benjamini-Hochberg 484

correction. Only q-values ≤0.01 were considered significant. For the motifs that are both enriched in the 485

DE and correspond to the perturbed TF, the subset of genes having that motif were retrieved and 486

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 14: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

14

compared with TF ChIP-Seq binding data (denoted ‘ChIP confirmed hits’ in Table 2). The ChIP-Seq data 487

sets used are the same as discussed in the section ‘Estimation of recall, precision, and false positive 488

rate’, Table S5. 489

490

Case study on cell type-specific transposase hypersensitive sites (THSs) 491

Based on the ATAC-Seq datasets from (Sijacic et al., 2018), we defined a set of THSs for stem cell and 492

mesophyll cells. Candidate regulators were predicted using the TFBS information present in the mapping 493

file. We identified a set of specific THSs for both cell types, based on a 2-fold (or higher) difference in the 494

ratio between the stem cell and mesophyll counts, yielding two region files with 2,132 stem cell THSs 495

and 1,508 mesophyll THSs (Table S2). Using the TFBS mappings from FIMO and Ensemble, the 496

significance of the overlap between a specific TFBS and a THS region file was assessed. To select the TF 497

motifs for enrichment analysis, RR for each gene was computed by considering expression ranks from 498

Sijacic et al. (2018). RR was calculated as the ratio of expression rank in stem and mesophyll cells. Genes 499

with -log2(RR) > 1 were called SAM stem cell-specific and -log2(RR) < -1 were called mesophyll specific 500

genes. After this selection, 59 and 158 TFs for the SAM stem cell and mesophyll cell respectively were 501

considered for the analysis. These TFs included the TFs reported in Sijacic et al. (2018). 502

503

The THS region file and the mapped TFBSs for a given tool (after running BEDTools merge per TFBS) 504

were formatted as BED files and the overlap between both files was determined using the BEDTools 505

function intersectBed with ‘-u’ parameter and the ‘–f’ parameter set to 0.5. As such, we obtained for 506

each THS region file and each TFBS an observed number of mapped TFBSs overlapping with THSs 507

(Supplemental Figure S7). To determine the significance of this observed overlap, the expected amount 508

of overlapping TFBS with the same THS region file was determined by shuffling the TFBS mapping bed 509

file 1000 times, using shuffleBed with the ‘-noOverlapping’ option enabled across the pre-defined 510

promoter regions (described in “Extraction of promoter regions” section). The overlap with the THS 511

region file was determined for each shuffled file and the median number of TFBS over all shuffled files 512

was used as a measure for the expected overlap. This estimation was used to calculate the fold 513

enrichment, defined as the ratio between observed overlap and expected overlap by chance. An 514

empirical p-value was determined by counting how many times the expected overlap was bigger than or 515

equal to the observed overlap. Only cases where p-value ≤0.01 were considered as significant. 516

517

Command line for the pipeline 518

# find how many TFBSs ($motiffile) overlap with HS sites ($regionBed) using Bedtools 519

realNumber=`bedtools intersect -a $motiffile -b $regionBed -u -f 0.5 | wc -l` 520

# for nShuffling times generate the shuffled TFBSs, check their overlap with HS sites and save the 521

numbers in “shufflednumbers” file. 522

for i in `seq 1 $nShuffling`; 523

do 524

shuffledFile="shuffled_"$motifid"_"$i".out" 525

bedtools shuffle -i $motiffile -g $chromLength -noOverlapping -excl $motiffile -incl $promoterBed 526

> $shuffledFile 527

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 15: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

15

number=`bedtools intersect -a $shuffledFile -b $regionBed -u -f 0.5 | wc -l` 528

echo -e "$i\t$number" >> $shufflednumbers 529

done 530

# calculate the p-value of enrichment 531

countBigger=0 532

for eachNumber in`cat $shufflednumbers`; 533

do 534

if [ $eachNumber -ge realNumber]; 535

then 536

countBigger=$($countBigger+1) 537

done 538

pvalue=$countBigger/$nShuffling 539

540

Statistical analyses 541

542

The significance of the overlap between DE genes and the presence of a proximal TFBS was determined 543

by performing a hypergeometric test using a custom script. Benjamini-Hochberg multiple testing 544

correction was performed on the calculated p-values using the p.adjust function in the statistical 545

programming language R. To determine whether the overlap between TFBSs and THSs was significant, 546

an empirical p-value was calculated by shuffling the promoter sequences as detained in “Materials and 547

Methods”. 548

549

550

SUPPLEMENTAL DATA 551

Supplemental Figure S1. TF level performance of TFBS mapping tools. 552

Supplemental Figure S2. TF level performance of unique matches considering pairwise combinations of 553

tools for top-scoring 7000 matches. 554

Supplemental Figure S3. TF level performance of unique matches considering one tool against all other 555

tools for top-scoring 7000 matches. 556

Supplemental Figure S4. TF level performance of unique matches considering one tool against all other 557

tools for top-scoring 7000 matches and CB gap parameter set to 1. 558

Supplemental Figure S5. The relationship between F1 score and subset size suggests the top 7000 559

highest scoring matches of Cluster-Buster should be used in the Ensemble. 560

Supplemental Figure S6. Overlap analysis for perturbation experiments. 561

Supplemental Figure S7. Cartoon for an improved protocol to identify GRNs starting from accessible 562

chromatin regions. 563

564

Supplemental Table S1. List of publications in plant science community using different mapping tools. 565

Supplemental Table S2. Overview of 66 TF motifs selected to evaluate the performance of motif 566

mapping tools. 567

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 16: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

16

Supplemental Table S3. TFBS enrichment results for DE gene sets. 568

Supplemental Table S4. List of TFs considered for ATAC-seq case-study with the distribution of their 569

target genes in stem and mesophyll cells. 570

Supplemental Table S5. Overview of TF ChIP-Seq datasets used for estimation of precision and recall. 571

Supplemental Table S6. Overview of DE gene sets after TF perturbation used for the case-study. 572

573

Supplemental Dataset S1. Cystoscope session file with GRNs in SAM stem and mesophyll cells described 574

in the case-study. 575

576

FUNDING 577

S.R.K. is funded by Research Foundation–Flanders [G001015N]. 578

579

ACKNOWLEDGMENTS 580

We thank Francois Bucchini for the technical assistance to set up the supplemental website. 581

582

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 17: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

17

TABLES 583

Table 1. Performance statistics of mapping 66 TF motifs using different motif mapping tools and an 584

Ensemble approach. 585

586

Mapping tool

Total Matches

Total Matches Confirmed

#Bases Average Base Length

Median Precision

Median Recall

Median F1 Score

CB 26,930,509 1,605,815 477,611,367 17.73 2.26% 36.14% 4.36%

FIMO 2,447,772 232,549 33,849,397 13.83 4.91% 22.09% 8.38%

MOODS 34,338,371 1,956,294 467,805,766 13.62 2.37% 48.47% 4.49%

MS 19,970,225 1,030,288 273,845,141 13.71 2.43% 39.32% 4.67%

Ensemble1 2,837,772 291,794 58,252,718 20.53 5.04% 23.72% 8.17% 1 Ensemble = All matches of FIMO + top 7000 scoring matches of CB 587

588

589

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 18: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

18

Table 2. List of ChIP confirmed targets only identified by the Ensemble approach when comparing 590

enriched TFBSs with perturbed DE gene sets. 591

592

TF #Extra ChIP confirmed

targets

ChIP confirmed targets

WRKY33 16 AT3G23240(ERF1), AT1G14350(MYB124), AT1G51700(DOF1), AT2G23320(WRKY15), AT2G43140, AT2G01940(IDD15), AT2G40140(SZF2), AT3G55950 (CCR3),

AT2G36960(TKI1), AT3G55980(SZF1), AT3G27785(MYB118), AT4G11070(WRKY41), AT4G01250(WRKY22), AT5G56960, AT5G24110(WRKY30), AT5G56550

FHY3 6 AT2G35940(BLH1), AT1G13020(EIF4B2), AT1G35460(bHLH80), AT2G33860(ARF3), AT2G39130, AT5G01780

PI 8 AT5G67060(HEC1), AT1G08570(ACHT4), AT1G12240(VI2), AT2G19110(HMA4), AT3G56360, AT3G56370, AT4G01120(bZIP54), AT5G07350(Tudor1)

593

594

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 19: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

19

Table 3. List of bona fide targets identified in SAM stem and mesophyll cells using the novel TFBS 595

enrichment protocol. 596

597

Cell type # targets showing cell type-specific expression

Target genes

SAM stem 9 AT1G69780 (ATHB13), AT3G30775 (ERD5), AT4G08150 (KNAT1), AT4G11211, AT5G20250 (DIN10), AT5G13340, AT5G10030 (TGA4), AT5G02450, AT5G14920 (GASA14)

Mesophyll 29 AT1G14580, AT1G17420 (LOX3), AT1G19450, AT1G51090, AT1G59870 (PDR8), AT1G61890, AT1G72520 (LOX4), AT1G77760 (NR1), AT2G15020, AT2G26530 (AR781),

AT2G27310, AT2G36990 (SIG6), AT2G39200 (MLO12), AT3G21670 (NPF6.4), AT3G22060, AT3G24190, AT3G51660,

AT3G51895 (AST12), AT3G54020 (AtIPCS1), AT4G02970 (AT7SL-1), AT4G12005, AT4G21570, AT4G34410 (RRTF1),

AT5G41740, AT5G42650 (AOS), AT5G44070 (ARA8), AT5G49520 (WRKY48), AT5G49730 (FRO6), AT5G54165

598

599

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 20: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

20

FIGURE LEGENDS 600

Figure 1. Global performances of motif mapping tools in Arabidopsis. (A) and (B) Precision and recall of 601

motif matches considering all matches (in red) and top-scoring 7000 matches (in cyan). (C) Boxplot 602

showing the false positive rate (FPR) for every tool. Boxes indicate the interquartile range of the data, 603

with the median indicated as a horizontal line within the box. The whiskers show the range of the data. 604

The precision, recall, and FPR values were calculated for each of the 66 motifs. 605

606

Figure 2. Variation of motif mapping accuracy in function of TF motif complexity. Scatterplot showing 607

the effect of motif complexity, quantified using the information content of a motif, on F1 scores for 608

Cluster-Buster (CB) and FIMO for 21 TFs. Each motif is visualized through a specific shape indicating the 609

TF it belongs to and colored based on the source of that motif. An increase and decrease in F1 score in 610

function of motif complexity is marked with blue and red, respectively. 611

612

Figure 3. Evaluation of unique motif matches predicted by each tool. (A) Barplot showing the fraction 613

of transcription factor binding sites (TFBSs) with recall>0 considering unique matches in top7000 of one 614

tool compared to the matches of all other tools combined. Whereas green series indicates motifs for 615

which the unique matches reported by that tool do not show a recall above 10% compared to the ChIP-616

Seq data, orange, purple and pink series depict unique motif matches with a recall higher than 10%, 20% 617

and 30%, respectively. (B) Heatmap showing the recall for each tool for TFBSs where CB outperforms the 618

other tools. Only motifs part of the orange, purple, and pink series from panel A where the recall of CB 619

was >10% are shown. 620

621

Figure 4. Overlap analysis for WRKY33 perturbation experiment. UpSetR plot showing the overlap 622

between the FIMO targets, the Ensemble, the perturbed differentially expressed (DE) genes, and the 623

ChIP targets for WRKY33. Overlap of target genes predicted by FIMO and Ensemble with ChIP targets 624

show a better recovery of ChIP-confirmed targets using the Ensemble. 625

626

Figure 5. Results of the new protocol to identify potential regulators in SAM stem and mesophyll cell-627

specific ATAC-seq regions. (A) Barplot showing extra target genes obtained using the Ensemble 628

approach for shoot apical meristem (SAM) stem cell. Grey sections show how many of the extra targets 629

have an expression in either of the two cell types. Shaded regions show the target genes that are 630

specific to SAM stem cells. TFs labelled in brown are the TFs reported in Sijacic et al., 2018 and TFs 631

labelled in blue are the TFs only identified using the new protocol. Only TFs that have ≥ 10 extra targets 632

are shown. (B) and (C) A gene regulatory network (GRN) showing SAM stem and mesophyll cell-specific 633

targets identified by enriched TFs, respectively. Round green/white circles refer to the genes that have 634

higher expression in SAM stem/mesophyll cell, while diamonds represent TFs. All nodes that have an 635

incoming edge are target genes having high SAM stem/mesophyll cell-specific expression. The size of 636

each node corresponds to the expression specificity, determined using the ratio of expression rank (RR), 637

of the gene in the respective cell type. 638

639

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 21: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

21

LITERATURE CITED 640

641

Baxter L, Jironkin A, Hickman R, Moore J, Barrington C, Krusche P, Dyer NP, Buchanan-Wollaston V, 642

Tiskin A, Beynon J, Denby K, Ott S (2012) Conserved noncoding sequences highlight shared 643

components of regulatory networks in dicotyledonous plants. Plant Cell 24: 3949-3965 644

Birkenbihl RP, Diezel C, Somssich IE (2012) Arabidopsis WRKY33 is a key transcriptional regulator of 645

hormonal and metabolic responses toward Botrytis cinerea infection. Plant Physiol 159: 266-285 646

Burgess DG, Xu J, Freeling M (2015) Advances in understanding cis regulation of the plant gene with an 647

emphasis on comparative genomics. Curr Opin Plant Biol 27: 141-147 648

Capovilla G, Symeonidi E, Wu R, Schmid M (2017) Contribution of major FLM isoforms to temperature-649

dependent flowering in Arabidopsis thaliana. J Exp Bot 68: 5117-5127 650

Castro-Mondragon JA, Jaeger S, Thieffry D, Thomas-Chollier M, van Helden J (2017) RSAT matrix-651

clustering: dynamic exploration and redundancy reduction of transcription factor binding motif 652

collections. Nucleic Acids Res 45: e119 653

de Folter S, Immink RG, Kieffer M, Parenicova L, Henz SR, Weigel D, Busscher M, Kooiker M, Colombo 654

L, Kater MM, Davies B, Angenent GC (2005) Comprehensive interaction map of the Arabidopsis 655

MADS Box transcription factors. Plant Cell 17: 1424-1433 656

Franco-Zorrilla JM, López-Vidriero I, Carrasco JL, Godoy M, Vera P, Solano R (2014) DNA-binding 657

specificities of plant transcription factors and their potential to define target genes. Proceedings 658

of the National Academy of Sciences 111: 2367-2372 659

Franco-Zorrilla JM, Solano R (2017) Identification of plant transcription factor target sequences. Biochim 660

Biophys Acta 1860: 21-30 661

Frith MC, Li MC, Weng Z (2003) Cluster-Buster: Finding dense clusters of motifs in DNA sequences. 662

Nucleic Acids Res 31: 3666-3668 663

Goto K, Meyerowitz EM (1994) Function and regulation of the Arabidopsis floral homeotic gene 664

PISTILLATA. Genes Dev 8: 1548-1560 665

Grant CE, Bailey TL, Noble WS (2011) FIMO: scanning for occurrences of a given motif. Bioinformatics 666

27: 1017-1018 667

Gremski K, Ditta G, Yanofsky MF (2007) The HECATE genes regulate female reproductive tract 668

development in Arabidopsis thaliana. Development 134: 3593-3601 669

Haudry A, Platts AE, Vello E, Hoen DR, Leclercq M, Williamson RJ, Forczek E, Joly-Lopez Z, Steffen JG, 670

Hazzouri KM (2013) An atlas of over 90,000 conserved noncoding sequences provides insight 671

into crucifer regulatory regions. Nature genetics 45: 891-898 672

Heyndrickx KS, Van de Velde J, Wang C, Weigel D, Vandepoele K (2014) A functional and evolutionary 673

perspective on transcription factor binding in Arabidopsis thaliana. Plant Cell 26: 3894-3910 674

Hickman R, Van Verk MC, Van Dijken AJH, Mendes MP, Vroegop-Vos IA, Caarls L, Steenbergen M, Van 675

der Nagel I, Wesselink GJ, Jironkin A, Talbot A, Rhodes J, De Vries M, Schuurink RC, Denby K, 676

Pieterse CMJ, Van Wees SCM (2017) Architecture and Dynamics of the Jasmonic Acid Gene 677

Regulatory Network. Plant Cell 29: 2086-2105 678

Ito S, Song YH, Josephson-Day AR, Miller RJ, Breton G, Olmstead RG, Imaizumi T (2012) FLOWERING 679

BHLH transcriptional activators control expression of the photoperiodic flowering regulator 680

CONSTANS in Arabidopsis. Proc Natl Acad Sci U S A 109: 3582-3587 681

Jayaram N, Usvyat D, AC RM (2016) Evaluating tools for transcription factor binding site prediction. 682

BMC Bioinformatics 683

Kim D, Cho YH, Ryu H, Kim Y, Kim TH, Hwang I (2013) BLH1 and KNAT3 modulate ABA responses during 684

germination and early seedling development in Arabidopsis. Plant J 75: 755-766 685

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 22: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

22

Kloth KJ, Wiegers GL, Busscher-Lange J, van Haarst JC, Kruijer W, Bouwmeester HJ, Dicke M, Jongsma 686

MA (2016) AtWRKY22 promotes susceptibility to aphids and modulates salicylic acid and 687

jasmonic acid signalling. J Exp Bot 67: 3383-3396 688

Korhonen J, Martinmaki P, Pizzi C, Rastas P, Ukkonen E (2009) MOODS: fast search for position weight 689

matrix matches in DNA sequences. Bioinformatics 25: 3181-3182 690

Kulkarni SR, Vaneechoutte D, Van de Velde J, Vandepoele K (2018) TF2Network: predicting 691

transcription factor regulators and gene regulatory networks in Arabidopsis using publicly 692

available binding site information. Nucleic Acids Res 46: e31 693

Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, Muller R, Dreher K, Alexander DL, 694

Garcia-Hernandez M, Karthikeyan AS, Lee CH, Nelson WD, Ploetz L, Singh S, Wensel A, Huala E 695

(2012) The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. 696

Nucleic Acids Res 40: D1202-1210 697

Lincoln C, Long J, Yamaguchi J, Serikawa K, Hake S (1994) A knotted1-like homeobox gene in 698

Arabidopsis is expressed in the vegetative meristem and dramatically alters leaf morphology 699

when overexpressed in transgenic plants. Plant Cell 6: 1859-1876 700

Liu J, Jung C, Xu J, Wang H, Deng S, Bernad L, Arenas-Huertero C, Chua NH (2012) Genome-wide 701

analysis uncovers regulation of long intergenic noncoding RNAs in Arabidopsis. Plant Cell 24: 702

4333-4345 703

Lu Z, Hofmeister BT, Vollmers C, DuBois RM, Schmitz RJ (2017) Combining ATAC-seq with nuclei sorting 704

for discovery of cis-regulatory regions in plant genomes. Nucleic Acids Res 45: e41 705

Ma S, Bohnert HJ (2007) Integration of Arabidopsis thaliana stress-related transcript profiles, promoter 706

structures, and cell-specific expression. Genome Biol 8: R49 707

Maher KA, Bajic M, Kajala K, Reynoso M, Pauluzzi G, West DA, Zumstein K, Woodhouse M, Bubb K, 708

Dorrity MW, Queitsch C, Bailey-Serres J, Sinha N, Brady SM, Deal RB (2018) Profiling of 709

Accessible Chromatin Regions across Multiple Plant Species and Cell Types Reveals Common 710

Gene Regulatory Principles and New Control Modules. Plant Cell 30: 15-36 711

Mathelier A, Fornes O, Arenillas DJ, Chen C-y, Denay G, Lee J, Shi W, Shyr C, Tan G, Worsley-Hunt R 712

(2015) JASPAR 2016: a major expansion and update of the open-access database of transcription 713

factor binding profiles. Nucleic acids research: gkv1176 714

Michael TP, Mockler TC, Breton G, McEntee C, Byer A, Trout JD, Hazen SP, Shen R, Priest HD, Sullivan 715

CM, Givan SA, Yanovsky M, Hong F, Kay SA, Chory J (2008) Network discovery pipeline 716

elucidates conserved time-of-day-specific cis-regulatory modules. PLoS Genet 4: e14 717

Muino JM, de Bruijn S, Pajoro A, Geuten K, Vingron M, Angenent GC, Kaufmann K (2016) Evolution of 718

DNA-Binding Sites of a Floral Master Regulatory Transcription Factor. Mol Biol Evol 33: 185-200 719

Nagel DH, Pruneda-Paz JL, Kay SA (2014) FBH1 affects warm temperature responses in the Arabidopsis 720

circadian clock. Proc Natl Acad Sci U S A 111: 14595-14600 721

Nakaminami K, Hill K, Perry SE, Sentoku N, Long JA, Karlson DT (2009) Arabidopsis cold shock domain 722

proteins: relationships to floral and silique development. J Exp Bot 60: 1047-1062 723

O’Malley RC, Huang S-sC, Song L, Lewsey MG, Bartlett A, Nery JR, Galli M, Gallavotti A, Ecker JR (2016) 724

Cistrome and Epicistrome Features Shape the Regulatory DNA Landscape. Cell 165: 1280-1292 725

Pérez-Rodríguez P, Riano-Pachon DM, Corrêa LGG, Rensing SA, Kersten B, Mueller-Roeber B (2009) 726

PlnTFDB: updated content and new features of the plant transcription factor database. Nucleic 727

acids research: gkp805 728

Quinlan AR (2014) BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc 729

Bioinformatics 47: 11 12 11-34 730

Rosloski SM, Jali SS, Balasubramanian S, Weigel D, Grbic V (2010) Natural diversity in flowering 731

responses of Arabidopsis thaliana caused by variation in a tandem gene array. Genetics 186: 732

263-276 733

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 23: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

23

Sakuma Y, Maruyama K, Osakabe Y, Qin F, Seki M, Shinozaki K, Yamaguchi-Shinozaki K (2006) 734

Functional analysis of an Arabidopsis transcription factor, DREB2A, involved in drought-735

responsive gene expression. Plant Cell 18: 1292-1309 736

Scarpeci TE, Zanor MI, Mueller-Roeber B, Valle EM (2013) Overexpression of AtWRKY30 enhances 737

abiotic stress tolerance during early growth stages in Arabidopsis thaliana. Plant Mol Biol 83: 738

265-277 739

Schuster C, Gaillochet C, Medzihradszky A, Busch W, Daum G, Krebs M, Kehle A, Lohmann JU (2014) A 740

regulatory framework for shoot stem cell control integrating metabolic, transcriptional, and 741

phytohormone signals. Dev Cell 28: 438-449 742

Sham A, Moustafa K, Al-Shamisi S, Alyan S, Iratni R, AbuQamar S (2017) Microarray analysis of 743

Arabidopsis WRKY33 mutants in response to the necrotrophic fungus Botrytis cinerea. PLoS One 744

12: e0172343 745

Sijacic P, Bajic M, McKinney EC, Meagher RB, Deal RB (2018) Changes in chromatin accessibility 746

between Arabidopsis stem cells and mesophyll cells illuminate cell type-specific transcription 747

factor networks. Plant J 94: 215-231 748

Song L, Huang S-sC, Wise A, Castanon R, Nery JR, Chen H, Watanabe M, Thomas J, Bar-Joseph Z, Ecker 749

JR (2016) A transcription factor hierarchy defines an environmental stress response network. 750

Science 354: aag1550 751

Sparks EE, Drapek C, Gaudinier A, Li S, Ansariola M, Shen N, Hennacy JH, Zhang J, Turco G, Petricka JJ, 752

Foret J, Hartemink AJ, Gordan R, Megraw M, Brady SM, Benfey PN (2016) Establishment of 753

Expression in the SHORTROOT-SCARECROW Transcriptional Cascade through Opposing Activities 754

of Both Activators and Repressors. Dev Cell 39: 585-596 755

Staneloni RJ, Rodriguez-Batiller MJ, Legisa D, Scarpin MR, Agalou A, Cerdan PD, Meijer AH, Ouwerkerk 756

PB, Casal JJ (2009) Bell-like homeodomain selectively regulates the high-irradiance response of 757

phytochrome A. Proc Natl Acad Sci U S A 106: 13624-13629 758

Sullivan AM, Arsovski AA, Lempe J, Bubb KL, Weirauch MT, Sabo PJ, Sandstrom R, Thurman RE, Neph 759

S, Reynolds AP (2014) Mapping and dynamics of regulatory DNA and transcription factor 760

networks in A. thaliana. Cell reports 8: 2015-2030 761

Sun J, Jiang H, Xu Y, Li H, Wu X, Xie Q, Li C (2007) The CCCH-type zinc finger proteins AtSZF1 and AtSZF2 762

regulate salt stress responses in Arabidopsis. Plant Cell Physiol 48: 1148-1158 763

Tang W, Ji Q, Huang Y, Jiang Z, Bao M, Wang H, Lin R (2013) FAR-RED ELONGATED HYPOCOTYL3 and 764

FAR-RED IMPAIRED RESPONSE1 transcription factors integrate light and abscisic acid signaling in 765

Arabidopsis. Plant Physiol 163: 857-866 766

Turatsinze JV, Thomas-Chollier M, Defrance M, van Helden J (2008) Using RSAT to scan genome 767

sequences for transcription factor binding sites and cis-regulatory modules. Nat Protoc 3: 1578-768

1588 769

Van de Velde J, Heyndrickx KS, Vandepoele K (2014) Inference of transcriptional networks in 770

Arabidopsis through conserved noncoding sequence analysis. The Plant Cell 26: 2729-2745 771

Vandepoele K, Casneuf T, Van de Peer Y (2006) Identification of novel regulatory modules in 772

dicotyledonous plants using expression data and comparative genomics. Genome Biol 7: R103 773

Vandepoele K, Quimbaya M, Casneuf T, De Veylder L, Van de Peer Y (2009) Unraveling transcriptional 774

control in Arabidopsis using cis-regulatory elements and coexpression networks. Plant Physiol 775

150: 535-546 776

Vanderauwera S, Vandenbroucke K, Inze A, van de Cotte B, Muhlenbock P, De Rycke R, Naouar N, Van 777

Gaever T, Van Montagu MC, Van Breusegem F (2012) AtWRKY15 perturbation abolishes the 778

mitochondrial stress response that steers osmotic stress tolerance in Arabidopsis. Proc Natl 779

Acad Sci U S A 109: 20113-20118 780

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 24: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

24

Varala K, Marshall-Colon A, Cirrone J, Brooks MD, Pasquino AV, Leran S, Mittal S, Rock TM, Edwards 781

MB, Kim GJ, Ruffel S, McCombie WR, Shasha D, Coruzzi GM (2018) Temporal transcriptional 782

logic of dynamic regulatory networks underlying nitrogen signaling and use in plants. Proc Natl 783

Acad Sci U S A 115: 6494-6499 784

Wang H, Deng XW (2002) Arabidopsis FHY3 defines a key phytochrome A signaling component directly 785

interacting with its homologous partner FAR1. EMBO J 21: 1339-1349 786

Wang ZY, Tobin EM (1998) Constitutive expression of the CIRCADIAN CLOCK ASSOCIATED 1 (CCA1) gene 787

disrupts circadian rhythms and suppresses its own expression. Cell 93: 1207-1217 788

Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, Najafabadi HS, Lambert 789

SA, Mann I, Cook K, Zheng H, Goity A, van Bakel H, Lozano JC, Galli M, Lewsey MG, Huang E, 790

Mukherjee T, Chen X, Reece-Hoyes JS, Govindarajan S, Shaulsky G, Walhout AJM, Bouget FY, 791

Ratsch G, Larrondo LF, Ecker JR, Hughes TR (2014) Determination and inference of eukaryotic 792

transcription factor sequence specificity. Cell 158: 1431-1443 793

Werner JD, Borevitz JO, Warthmann N, Trainer GT, Ecker JR, Chory J, Weigel D (2005) Quantitative trait 794

locus mapping and DNA array hybridization identify an FLM deletion as a cause for natural 795

flowering-time variation. Proc Natl Acad Sci U S A 102: 2460-2465 796

Wilmoth JC, Wang S, Tiwari SB, Joshi AD, Hagen G, Guilfoyle TJ, Alonso JM, Ecker JR, Reed JW (2005) 797

NPH4/ARF7 and ARF19 promote leaf expansion and auxin-induced lateral root formation. Plant J 798

43: 118-130 799

Xie Z, Li D, Wang L, Sack FD, Grotewold E (2010) Role of the stomatal development regulators 800

FLP/MYB88 in abiotic stress responses. Plant J 64: 731-739 801

Xu M, Hu T, Zhao J, Park MY, Earley KW, Wu G, Yang L, Poethig RS (2016) Developmental Functions of 802

miR156-Regulated SQUAMOSA PROMOTER BINDING PROTEIN-LIKE (SPL) Genes in Arabidopsis 803

thaliana. PLoS Genet 12: e1006263 804

Yu CP, Chen SC, Chang YM, Liu WY, Lin HH, Lin JJ, Chen HJ, Lu YJ, Wu YH, Lu MY, Lu CH, Shih AC, Ku MS, 805

Shiu SH, Wu SH, Li WH (2015) Transcriptome dynamics of developing maize leaves and 806

genomewide prediction of cis elements and their cognate transcription factors. Proc Natl Acad 807

Sci U S A 112: E2477-2486 808

Yu H, Xu Y, Tan EL, Kumar PP (2002) AGAMOUS-LIKE 24, a dosage-dependent mediator of the flowering 809

signals. Proc Natl Acad Sci U S A 99: 16336-16341 810

Zhang W, Zhang T, Wu Y, Jiang J (2012) Genome-wide identification of regulatory DNA elements and 811

protein-binding footprints using signatures of open chromatin in Arabidopsis. The Plant Cell 24: 812

2719-2731 813

814

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 25: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

Cluster-Buster FIMO matrix-scan MOODS

A B C

Cluster-Buster FIMO matrix-scanMOODS

Pre

cisi

on

Rec

all

FP

R

Cluster-Buster FIMO matrix-scan MOODS

Figure 1. Global performances of motif mapping tools in Arabidopsis.(A) and (B) Precision and recall of motif matches considering all matches (in red) and top-scoring 7000 matches (in cyan). (C) Boxplot showing the false positive rate (FPR) for every tool. Boxes indicate the interquartile range of the data, with the medianindicated as a horizontal line within the box. The whiskers show the range of the data. The precision, recall, and FPR values were calculated for each of the 66 motifs.

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 26: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

Cluster-Buster

FIMO

Motif source

Increase in F1 score with complexity

Information content of motifs

F1

scor

eF

1 sc

ore

Decrease in F1 score with complexity

Figure 2. Variation of motif mapping accuracy in function of TF motif complexity. Scatterplot showing

the effect of motif complexity, quantified using the information content of a motif, on F1 scores for Cluster-

Buster (CB) and FIMO for 21 TFs. Each motif is visualized through a specific shape indicating the TF it

belongs to and colored based on the source of that motif. An increase and decrease in F1 score in function

of motif complexity is marked with blue and red, respectively

Information content of motifs

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 27: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

Recall

Cluster-Buster FIMO matrix-scan MOODS

Cluster-BusterFIMO matrix-scanMOODS

A B

Figure 3. Evaluation of unique motif matches predicted by each tool. (A) Barplot showing the fraction of TFBS with recall>0

considering unique matches in top7000 of one tool compared to the matches of all other tools combined. Whereas green series

indicates motifs for which the unique matches reported by that tool do not show a recall above 10% compared to the ChIP-Seq data,

orange, purple and pink series depict unique motif matches with a recall higher than 10%, 20% and 30%, respectively. (B) Heatmap

showing the recall for each tool for TFBSs where CB outperforms the other tools. Only motifs part of orange, purple and pink series

from panel A where the recall of CB was >10% are shown.

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 28: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

17,86216,432

Number of genes

Figure 4. Overlap analysis for WRKY33 perturbation experiment. UpSetR plot showing the overlap between the FIMO targets, the Ensemble, the perturbed DE genes and the ChIP targets for WRKY33. Overlap of target genes predicted by FIMO and Ensemble with ChIP targets show a better recovery of ChIP-confirmed targets using Ensemble.

1311,558

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 29: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

A B

Targets genes with SAM stem cell-specific expression

TFs from Sijacic et al. TFs from advanced protocol

KN

AT

1

Targets genes with mesophyll cell-specific expression

AGL27

AGL31

ALC

TGA4

AT4G11211

IDD7

BRC2

KNAT1

ERD5

DIN10

AGL70 TCP7

GASA14

AGL24ATHB13

AT5G02450

JKD

AT5G13340

DAG2

AOS

ARA8

DREB2

AT1G76580

LOX4

TCP2

AT1G14580

RVE1

At2g45680

AT7SL-1

ATCBF2

AtIPCS1

ATWRKY28

AtWIND1

AT2G15020

LOX3ATWRKY27

AT1G75490

AT1G19210

AT4G28140

ANAC029

RAP2.4

CBF1

NLP7

AT5G41740

AST12

ATMLO12

ARF1-BPATSPL14

AT3G24190

ARF7

AT1G61890

AT1G51090

AT3G51660

SPL1

CAMTA5

NR1

RRTF1

CAMTA3AT4G21570

AtNPF6.4

CBF4LRL2

AT4G12005

AT1G19450

BIM3

PDR8ATFRO6

AT3G22060

ATSIG6

ATWRKY48

AT5G50915 PTF1

CAMTA2

TCP17

AT2G27310

AtbZIP16

AT5G54165

RAP21

AGL20

BIM1

UNE10

ANAC055BIM2

bHLH105

ATBZIP54

PIF7

ATWRKY33 ATWRKY40

AR781

C

Figure 5. Results of the new protocol to identify potential regulators in SAM stem and mesophyll cell-specific ATAC-seq regions. (A) Barplot showing extra target genes obtained

using the Ensemble approach for shoot apical meristem (SAM) stem cell. Grey sections show how many of the extra targets have expression in either of the two cell types. Shaded regions

show the target genes that are specific to SAM stem cells. TFs labelled in brown are the TFs reported in Sijacic et al., 2018 and TFs labelled in blue are the TFs only identified using the new

protocol. Only the TFs that have ≥ 10 extra targets are shown. (B) and (C) A gene regulatory network (GRN) showing SAM stem and mesophyll cell-specific targets identified by enriched TFs

respectively. Round green/white circles refer to the genes that have higher expression in SAM stem/mesophyll cell, while diamonds represent TFs. All nodes that have an incoming edge are

target genes having high SAM stem/mesophyll cell-specific expression. The size of each node corresponds to the expression specificity, determined using the ratio of expression rank (RR),

of the gene in the respective cell type.

Cell-specific targets

Expressed targets

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 30: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

Parsed CitationsBaxter L, Jironkin A, Hickman R, Moore J, Barrington C, Krusche P, Dyer NP, Buchanan-Wollaston V, Tiskin A, Beynon J, Denby K, OttS (2012) Conserved noncoding sequences highlight shared components of regulatory networks in dicotyledonous plants. Plant Cell24: 3949-3965

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Birkenbihl RP, Diezel C, Somssich IE (2012) Arabidopsis WRKY33 is a key transcriptional regulator of hormonal and metabolicresponses toward Botrytis cinerea infection. Plant Physiol 159: 266-285

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Burgess DG, Xu J, Freeling M (2015) Advances in understanding cis regulation of the plant gene with an emphasis on comparativegenomics. Curr Opin Plant Biol 27: 141-147

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Capovilla G, Symeonidi E, Wu R, Schmid M (2017) Contribution of major FLM isoforms to temperature-dependent flowering inArabidopsis thaliana. J Exp Bot 68: 5117-5127

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Castro-Mondragon JA, Jaeger S, Thieffry D, Thomas-Chollier M, van Helden J (2017) RSAT matrix-clustering: dynamic exploration andredundancy reduction of transcription factor binding motif collections. Nucleic Acids Res 45: e119

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

de Folter S, Immink RG, Kieffer M, Parenicova L, Henz SR, Weigel D, Busscher M, Kooiker M, Colombo L, Kater MM, Davies B,Angenent GC (2005) Comprehensive interaction map of the Arabidopsis MADS Box transcription factors. Plant Cell 17: 1424-1433

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Franco-Zorrilla JM, López-Vidriero I, Carrasco JL, Godoy M, Vera P, Solano R (2014) DNA-binding specificities of plant transcriptionfactors and their potential to define target genes. Proceedings of the National Academy of Sciences 111: 2367-2372

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Franco-Zorrilla JM, Solano R (2017) Identification of plant transcription factor target sequences. Biochim Biophys Acta 1860: 21-30Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Frith MC, Li MC, Weng Z (2003) Cluster-Buster: Finding dense clusters of motifs in DNA sequences. Nucleic Acids Res 31: 3666-3668Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Goto K, Meyerowitz EM (1994) Function and regulation of the Arabidopsis floral homeotic gene PISTILLATA. Genes Dev 8: 1548-1560Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Grant CE, Bailey TL, Noble WS (2011) FIMO: scanning for occurrences of a given motif. Bioinformatics 27: 1017-1018Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Gremski K, Ditta G, Yanofsky MF (2007) The HECATE genes regulate female reproductive tract development in Arabidopsis thaliana.Development 134: 3593-3601

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Haudry A, Platts AE, Vello E, Hoen DR, Leclercq M, Williamson RJ, Forczek E, Joly-Lopez Z, Steffen JG, Hazzouri KM (2013) An atlas ofover 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nature genetics 45: 891-898

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Heyndrickx KS, Van de Velde J, Wang C, Weigel D, Vandepoele K (2014) A functional and evolutionary perspective on transcriptionfactor binding in Arabidopsis thaliana. Plant Cell 26: 3894-3910

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Hickman R, Van Verk MC, Van Dijken AJH, Mendes MP, Vroegop-Vos IA, Caarls L, Steenbergen M, Van der Nagel I, Wesselink GJ,Jironkin A, Talbot A, Rhodes J, De Vries M, Schuurink RC, Denby K, Pieterse CMJ, Van Wees SCM (2017) Architecture and Dynamics ofthe Jasmonic Acid Gene Regulatory Network. Plant Cell 29: 2086-2105

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from

Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 31: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

Ito S, Song YH, Josephson-Day AR, Miller RJ, Breton G, Olmstead RG, Imaizumi T (2012) FLOWERING BHLH transcriptional activatorscontrol expression of the photoperiodic flowering regulator CONSTANS in Arabidopsis. Proc Natl Acad Sci U S A 109: 3582-3587

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Jayaram N, Usvyat D, AC RM (2016) Evaluating tools for transcription factor binding site prediction. BMC BioinformaticsPubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Kim D, Cho YH, Ryu H, Kim Y, Kim TH, Hwang I (2013) BLH1 and KNAT3 modulate ABA responses during germination and earlyseedling development in Arabidopsis. Plant J 75: 755-766

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Kloth KJ, Wiegers GL, Busscher-Lange J, van Haarst JC, Kruijer W, Bouwmeester HJ, Dicke M, Jongsma MA (2016) AtWRKY22promotes susceptibility to aphids and modulates salicylic acid and jasmonic acid signalling. J Exp Bot 67: 3383-3396

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Korhonen J, Martinmaki P, Pizzi C, Rastas P, Ukkonen E (2009) MOODS: fast search for position weight matrix matches in DNAsequences. Bioinformatics 25: 3181-3182

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Kulkarni SR, Vaneechoutte D, Van de Velde J, Vandepoele K (2018) TF2Network: predicting transcription factor regulators and generegulatory networks in Arabidopsis using publicly available binding site information. Nucleic Acids Res 46: e31

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, Muller R, Dreher K, Alexander DL, Garcia-Hernandez M,Karthikeyan AS, Lee CH, Nelson WD, Ploetz L, Singh S, Wensel A, Huala E (2012) The Arabidopsis Information Resource (TAIR):improved gene annotation and new tools. Nucleic Acids Res 40: D1202-1210

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Lincoln C, Long J, Yamaguchi J, Serikawa K, Hake S (1994) A knotted1-like homeobox gene in Arabidopsis is expressed in thevegetative meristem and dramatically alters leaf morphology when overexpressed in transgenic plants. Plant Cell 6: 1859-1876

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Liu J, Jung C, Xu J, Wang H, Deng S, Bernad L, Arenas-Huertero C, Chua NH (2012) Genome-wide analysis uncovers regulation of longintergenic noncoding RNAs in Arabidopsis. Plant Cell 24: 4333-4345

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Lu Z, Hofmeister BT, Vollmers C, DuBois RM, Schmitz RJ (2017) Combining ATAC-seq with nuclei sorting for discovery of cis-regulatory regions in plant genomes. Nucleic Acids Res 45: e41

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Ma S, Bohnert HJ (2007) Integration of Arabidopsis thaliana stress-related transcript profiles, promoter structures, and cell-specificexpression. Genome Biol 8: R49

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Maher KA, Bajic M, Kajala K, Reynoso M, Pauluzzi G, West DA, Zumstein K, Woodhouse M, Bubb K, Dorrity MW, Queitsch C, Bailey-Serres J, Sinha N, Brady SM, Deal RB (2018) Profiling of Accessible Chromatin Regions across Multiple Plant Species and Cell TypesReveals Common Gene Regulatory Principles and New Control Modules. Plant Cell 30: 15-36

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Mathelier A, Fornes O, Arenillas DJ, Chen C-y, Denay G, Lee J, Shi W, Shyr C, Tan G, Worsley-Hunt R (2015) JASPAR 2016: a majorexpansion and update of the open-access database of transcription factor binding profiles. Nucleic acids research: gkv1176

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Michael TP, Mockler TC, Breton G, McEntee C, Byer A, Trout JD, Hazen SP, Shen R, Priest HD, Sullivan CM, Givan SA, Yanovsky M,Hong F, Kay SA, Chory J (2008) Network discovery pipeline elucidates conserved time-of-day-specific cis-regulatory modules. PLoSGenet 4: e14

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Muino JM, de Bruijn S, Pajoro A, Geuten K, Vingron M, Angenent GC, Kaufmann K (2016) Evolution of DNA-Binding Sites of a FloralMaster Regulatory Transcription Factor. Mol Biol Evol 33: 185-200 www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from

Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 32: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Nagel DH, Pruneda-Paz JL, Kay SA (2014) FBH1 affects warm temperature responses in the Arabidopsis circadian clock. Proc Natl AcadSci U S A 111: 14595-14600

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Nakaminami K, Hill K, Perry SE, Sentoku N, Long JA, Karlson DT (2009) Arabidopsis cold shock domain proteins: relationships to floraland silique development. J Exp Bot 60: 1047-1062

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

O'Malley RC, Huang S-sC, Song L, Lewsey MG, Bartlett A, Nery JR, Galli M, Gallavotti A, Ecker JR (2016) Cistrome and EpicistromeFeatures Shape the Regulatory DNA Landscape. Cell 165: 1280-1292

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Pérez-Rodríguez P, Riano-Pachon DM, Corrêa LGG, Rensing SA, Kersten B, Mueller-Roeber B (2009) PlnTFDB: updated content andnew features of the plant transcription factor database. Nucleic acids research: gkp805

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Quinlan AR (2014) BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc Bioinformatics 47: 11 12 11-34Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Rosloski SM, Jali SS, Balasubramanian S, Weigel D, Grbic V (2010) Natural diversity in flowering responses of Arabidopsis thalianacaused by variation in a tandem gene array. Genetics 186: 263-276

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Sakuma Y, Maruyama K, Osakabe Y, Qin F, Seki M, Shinozaki K, Yamaguchi-Shinozaki K (2006) Functional analysis of an Arabidopsistranscription factor, DREB2A, involved in drought-responsive gene expression. Plant Cell 18: 1292-1309

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Scarpeci TE, Zanor MI, Mueller-Roeber B, Valle EM (2013) Overexpression of AtWRKY30 enhances abiotic stress tolerance duringearly growth stages in Arabidopsis thaliana. Plant Mol Biol 83: 265-277

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Schuster C, Gaillochet C, Medzihradszky A, Busch W, Daum G, Krebs M, Kehle A, Lohmann JU (2014) A regulatory framework for shootstem cell control integrating metabolic, transcriptional, and phytohormone signals. Dev Cell 28: 438-449

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Sham A, Moustafa K, Al-Shamisi S, Alyan S, Iratni R, AbuQamar S (2017) Microarray analysis of Arabidopsis WRKY33 mutants inresponse to the necrotrophic fungus Botrytis cinerea. PLoS One 12: e0172343

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Sijacic P, Bajic M, McKinney EC, Meagher RB, Deal RB (2018) Changes in chromatin accessibility between Arabidopsis stem cells andmesophyll cells illuminate cell type-specific transcription factor networks. Plant J 94: 215-231

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Song L, Huang S-sC, Wise A, Castanon R, Nery JR, Chen H, Watanabe M, Thomas J, Bar-Joseph Z, Ecker JR (2016) A transcriptionfactor hierarchy defines an environmental stress response network. Science 354: aag1550

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Sparks EE, Drapek C, Gaudinier A, Li S, Ansariola M, Shen N, Hennacy JH, Zhang J, Turco G, Petricka JJ, Foret J, Hartemink AJ,Gordan R, Megraw M, Brady SM, Benfey PN (2016) Establishment of Expression in the SHORTROOT-SCARECROW TranscriptionalCascade through Opposing Activities of Both Activators and Repressors. Dev Cell 39: 585-596

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Staneloni RJ, Rodriguez-Batiller MJ, Legisa D, Scarpin MR, Agalou A, Cerdan PD, Meijer AH, Ouwerkerk PB, Casal JJ (2009) Bell-likehomeodomain selectively regulates the high-irradiance response of phytochrome A. Proc Natl Acad Sci U S A 106: 13624-13629

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Sullivan AM, Arsovski AA, Lempe J, Bubb KL, Weirauch MT, Sabo PJ, Sandstrom R, Thurman RE, Neph S, Reynolds AP (2014) Mappingand dynamics of regulatory DNA and transcription factor networks in A. thaliana. Cell reports 8: 2015-2030 www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from

Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 33: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Sun J, Jiang H, Xu Y, Li H, Wu X, Xie Q, Li C (2007) The CCCH-type zinc finger proteins AtSZF1 and AtSZF2 regulate salt stressresponses in Arabidopsis. Plant Cell Physiol 48: 1148-1158

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Tang W, Ji Q, Huang Y, Jiang Z, Bao M, Wang H, Lin R (2013) FAR-RED ELONGATED HYPOCOTYL3 and FAR-RED IMPAIREDRESPONSE1 transcription factors integrate light and abscisic acid signaling in Arabidopsis. Plant Physiol 163: 857-866

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Turatsinze JV, Thomas-Chollier M, Defrance M, van Helden J (2008) Using RSAT to scan genome sequences for transcription factorbinding sites and cis-regulatory modules. Nat Protoc 3: 1578-1588

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Van de Velde J, Heyndrickx KS, Vandepoele K (2014) Inference of transcriptional networks in Arabidopsis through conservednoncoding sequence analysis. The Plant Cell 26: 2729-2745

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Vandepoele K, Casneuf T, Van de Peer Y (2006) Identification of novel regulatory modules in dicotyledonous plants using expressiondata and comparative genomics. Genome Biol 7: R103

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Vandepoele K, Quimbaya M, Casneuf T, De Veylder L, Van de Peer Y (2009) Unraveling transcriptional control in Arabidopsis using cis-regulatory elements and coexpression networks. Plant Physiol 150: 535-546

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Vanderauwera S, Vandenbroucke K, Inze A, van de Cotte B, Muhlenbock P, De Rycke R, Naouar N, Van Gaever T, Van Montagu MC,Van Breusegem F (2012) AtWRKY15 perturbation abolishes the mitochondrial stress response that steers osmotic stress tolerance inArabidopsis. Proc Natl Acad Sci U S A 109: 20113-20118

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Varala K, Marshall-Colon A, Cirrone J, Brooks MD, Pasquino AV, Leran S, Mittal S, Rock TM, Edwards MB, Kim GJ, Ruffel S, McCombieWR, Shasha D, Coruzzi GM (2018) Temporal transcriptional logic of dynamic regulatory networks underlying nitrogen signaling and usein plants. Proc Natl Acad Sci U S A 115: 6494-6499

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Wang H, Deng XW (2002) Arabidopsis FHY3 defines a key phytochrome A signaling component directly interacting with its homologouspartner FAR1. EMBO J 21: 1339-1349

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Wang ZY, Tobin EM (1998) Constitutive expression of the CIRCADIAN CLOCK ASSOCIATED 1 (CCA1) gene disrupts circadian rhythmsand suppresses its own expression. Cell 93: 1207-1217

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, Najafabadi HS, Lambert SA, Mann I, Cook K, Zheng H, GoityA, van Bakel H, Lozano JC, Galli M, Lewsey MG, Huang E, Mukherjee T, Chen X, Reece-Hoyes JS, Govindarajan S, Shaulsky G,Walhout AJM, Bouget FY, Ratsch G, Larrondo LF, Ecker JR, Hughes TR (2014) Determination and inference of eukaryotic transcriptionfactor sequence specificity. Cell 158: 1431-1443

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Werner JD, Borevitz JO, Warthmann N, Trainer GT, Ecker JR, Chory J, Weigel D (2005) Quantitative trait locus mapping and DNA arrayhybridization identify an FLM deletion as a cause for natural flowering-time variation. Proc Natl Acad Sci U S A 102: 2460-2465

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Wilmoth JC, Wang S, Tiwari SB, Joshi AD, Hagen G, Guilfoyle TJ, Alonso JM, Ecker JR, Reed JW (2005) NPH4/ARF7 and ARF19 promoteleaf expansion and auxin-induced lateral root formation. Plant J 43: 118-130

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Xie Z, Li D, Wang L, Sack FD, Grotewold E (2010) Role of the stomatal development regulators FLP/MYB88 in abiotic stress responses.Plant J 64: 731-739

Pubmed: Author and Title www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.

Page 34: Enhanced maps of transcription factor binding sites …...2019/07/25  · 85 note that not all functional binding events are evolutionary conserved (Muino et al., 2016). 86 87 Recent

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Xu M, Hu T, Zhao J, Park MY, Earley KW, Wu G, Yang L, Poethig RS (2016) Developmental Functions of miR156-Regulated SQUAMOSAPROMOTER BINDING PROTEIN-LIKE (SPL) Genes in Arabidopsis thaliana. PLoS Genet 12: e1006263

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Yu CP, Chen SC, Chang YM, Liu WY, Lin HH, Lin JJ, Chen HJ, Lu YJ, Wu YH, Lu MY, Lu CH, Shih AC, Ku MS, Shiu SH, Wu SH, Li WH(2015) Transcriptome dynamics of developing maize leaves and genomewide prediction of cis elements and their cognatetranscription factors. Proc Natl Acad Sci U S A 112: E2477-2486

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Yu H, Xu Y, Tan EL, Kumar PP (2002) AGAMOUS-LIKE 24, a dosage-dependent mediator of the flowering signals. Proc Natl Acad Sci US A 99: 16336-16341

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

Zhang W, Zhang T, Wu Y, Jiang J (2012) Genome-wide identification of regulatory DNA elements and protein-binding footprints usingsignatures of open chromatin in Arabidopsis. The Plant Cell 24: 2719-2731

Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title

www.plantphysiol.orgon July 9, 2020 - Published by Downloaded from Copyright © 2019 American Society of Plant Biologists. All rights reserved.