protein dna binding

Upload: burgerman2

Post on 10-Jan-2016

231 views

Category:

Documents


0 download

DESCRIPTION

Protein DNA Binding

TRANSCRIPT

  • SURVEY AND SUMMARY

    ProteinDNA binding: complexities andmulti-protein codesTrevor Siggers1,* and Raluca Gordan2,*

    1Department of Biology, Boston University, Boston, MA 02215, USA, 2Departments of Biostatistics andBioinformatics, Computer Science, and Molecular Genetics and Microbiology, Institute for Genome Sciencesand Policy, Duke University, Durham, NC 27708, USA

    Received September 6, 2013; Revised October 16, 2013; Accepted October 22, 2013

    ABSTRACT

    Binding of proteins to particular DNA sites acrossthe genome is a primary determinant of specificityin genome maintenance and gene regulation. DNA-binding specificity is encoded at multiple levels,from the detailed biophysical interactions betweenproteins and DNA, to the assembly of multi-proteincomplexes. At each level, variation in the mechan-isms used to achieve specificity has led todifficulties in constructing and applying simplemodels of DNA binding. We review the complexitiesin proteinDNA binding found at multiple levels anddiscuss how they confound the idea of simple rec-ognition codes. We discuss the impact of new high-throughput technologies for the characterization ofproteinDNA binding, and how these technologiesare uncovering new complexities in proteinDNArecognition. Finally, we review the concept ofmulti-protein recognition codes in which new DNA-binding specificities are achieved by the assemblyof multi-protein complexes.

    INTRODUCTION

    In the mid 1970s, motivated by early X-ray crystal struc-tures of proteins and DNA, Seeman et al. (1) proposed aproteinDNA recognition code based on hydrogenbonding patterns between amino acids and bases. Forexample, in the DNA major groove, an arginine side-chain can make two hydrogen bonds with a guaninebase, but not with any other base. Therefore, an Arg-Gua residue base pairing provides a mechanismorcodefor proteins to preferentially select for guanines.Although aesthetically pleasing, 30 years of subsequentanalyses of proteinDNA complexes and interactionshave demonstrated myriad complications with such a

    simple recognition code. We briey summarize ndingsfrom structural and biochemical studies that explain whysimple proteinDNA recognition codes do not exist.

    Side-chain flexibility and water molecules

    Protein side-chains and water molecules in the bindinginterface create a exible network of interactions thatcan readily adopt new conformations in response tochanges in DNA or protein sequence (25). Twoexamples illustrate the relevant complexities. ComparingproteinDNA crystal structures for wild-type and mutantzinc nger protein Zif268/Egr1, it was demonstrated that asingle amino acid mutation (Zif268 D20A) could lead toaltered protein side-chain conformations and DNA-binding preferences (6). Further, the side-chain re-arrangement occurred without a change in the proteindocking geometry, but resulted in a new, positionedwater molecule in the binding interface for one of theDNA sequences. Structural comparison of two Cre re-combinase variants in complex with different DNA se-quences revealed that both DNA and protein differencesaffect the contacts made in the binding interface (7). Incomplex with the same LoxM7 DNA sequence, two Crevariants, which differ at only three residue positions, haddistinctly different conformations for the common side-chains mediating base contacts. Further, comparison ofone of the variants bound to a different LoxP DNA siterevealed that recognition of the two DNA sequences,LoxM7 and LoxP, led to altered side-chain conformationsand different side-chain and water-mediated contacts.These studies highlight the complex interplay betweenside-chains, water molecules and DNA bases in thebinding interface.

    DNA structure and indirect readout

    Direct readout (also known as base readout) refers tothe situation where proteins discriminate different bases ina DNA sequence via direct (or water-mediated)

    *To whom correspondence should be addressed. Tel: +1 617 358 7118; Fax: +1 617 353 6340; Email: [email protected] may also be addressed to Raluca Gordan. Tel: +919 684 9881; Fax: +919 668 0795; Email: [email protected]

    Published online 16 November 2013 Nucleic Acids Research, 2014, Vol. 42, No. 4 20992111doi:10.1093/nar/gkt1112

    The Author(s) 2013. Published by Oxford University Press.This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercialre-use, please contact [email protected]

  • interactions with the DNA bases. In contrast, indirectreadout (also known as shape readout) describes a dif-ferent mechanism of recognition in which the sequence-dependent deformability or structural differences betweenDNA molecules contribute to their discrimination (8).A recent review categorized different types of shapereadout and distinguished between local shape readout,in which deviations from B-form DNA are localized alongthe sequence, and global shape readout, in which muchof the DNA molecule is deformed or bent (2). In localshape readout, sequence-dependent kinks in the DNAmolecule or localized deviations in DNA groove widthcan lead to preferential binding to the cognate protein.Examples of widely used types of local shape readoutare the preferential binding of arginine and histidineresidues into narrow DNA minor grooves DNA (9,10)or binding to DNA kinks observed at YpR steps, suchas seen for TATA-box binding protein (9,11,12). Inglobal shape readout, larger deformations are involved.For example, a mechanism proposed for DNA bindingof the human papillomavirus E2 protein is that DNA se-quences in the unbound form adopt global conformationssimilar to that found in the bound proteinDNA complex(13). A particular DNA sequence that was already pre-bent would require less energy to deform and hence bepreferentially bound by a protein (2,13). What complicatesprediction of the effects of indirect readout on the bindingspecicity is that DNA deformation is a structuralproperty that involves multiple, distributed interactions(residuebase and basebase) and is sensitive to the con-formation adopted in the bound complex. Even localshape readout (e.g., the local narrowing of the DNAminor grove) involves integrating the individual structuralpropensities over multiple bases (2). Therefore, a recogni-tion code for many proteins will need to include or accountfor the detailed structure of the bound proteinDNAcomplex.

    Docking geometry and spatial relationships

    Energetically favorable residuebase interactions, particu-larly hydrogen bond interactions, depend on the relativespatial orientation of the protein residue and the DNAbase (1,1416). Subsequently, the docking geometry of aproteinDNA complex, dened by the orientation of theprotein backbone relative to the DNA, is critical to estab-lishing favorable amino acidbase interactions (14,15,17).Comparisons of the docking geometry for differentproteinDNA complexes have revealed complications toa simple recognition code. First, the docking geometry ofdifferent protein folds can differ dramatically; therefore,favorable interactions made by different residue types willdepend strongly on the protein fold (i.e. not all arginineswill be able to make optimal contacts with a guanine).Second, even structurally homologous proteins can dockon the DNA in different orientations (14,15,17,18).Therefore, the distributed proteinDNA interactionsthat contribute to the overall docking geometry, bothbase-specic and non-specic (i.e., with the DNAbackbone), can all potentially affect the proteinDNAinteractions. In other words, effects are non-local and

    residues distributed throughout the protein will affectDNA-binding specicity.Studies have also shown that the docking geometry of the

    same protein can vary when bound to different DNA se-quences (14,18,19). In an illustrative case, crystal structuresof the nuclear factor kappa B (NF-kB)-family RelAhomodimer in complex with different DNA sequences ex-hibited markedly different orientations for the dimersubunits. In the structure of RelA homodimer bound tothe pseudo-symmetric sequence 50-GGAA(A)TTTC-30,one subunit makes a canonical set of hydrogen bond inter-actions with the 50-GGAA half-site sequence. In contrast,the other subunit undergoes a large rotation and translationand makes no base-specic contacts with the apposing 50-GAAA half-site, yet allows the protein to bind well to DNA(20). This contrasts starkly with the structure of RelAbound to the more symmetric 50-GGAA(T)TTCC-30 sitewhere the same docking and DNA-base contacts are sym-metric for the two subunits (21). Flexibility of the proteindocking geometry in response to different DNA sequencespresents a major difculty in predicting the interactions andsubsequent DNA-binding specicity of a protein (17,22).The complexities described here that result from exible

    proteinDNA interactions, indirect DNA readout andprotein docking geometry make it difcult to establishsimple recognition rules that can provide a comprehensivedescription of proteinDNA binding. Structural and bio-chemical studies have now identied the major specicity-determining residues for many different protein structuralfamilies and subfamilies, providing a detailed picture ofthe basic mechanisms of DNA recognition used by differ-ent proteins (2,3). However, despite these structural andmechanistic descriptions, simple recognition codes havenot emerged that can accurately characterize the bindingafnities of a protein to the vast number of potentialDNA-binding site sequences and has spurred the develop-ment of more complex models (2326).In the last 10 years, newly developed high-throughput

    (HT) technologies have revolutionized our ability tocharacterize proteinDNA binding specicity. Suchtechnologies include: bacterial one-hybrid (B1H) (27,28),protein-binding microarrays (PBMs) (2933), total internalreectance uorescence-PBM (34), mechanically inducedtrapping of molecular interactions (MITOMI) (35), Bind-n-seq (36), EMSA-seq (37), HT-SELEX/SELEX-seq (3840), microarray evaluation of genomic aptamers by shift(MEGAshift) (41), cognate site identier (CSI) (42), andHT sequencing uorescent ligand interaction proling(HiTS-FLIP) (43). The rich datasets provided by thesenew HT technologies are facilitating the development ofimproved models of proteinDNA recognition (44,45),while at the same time revealing new complications formodels of binding. Here we will briey review the mostwidely used models and outline features of proteinDNAbinding that complicate their development and application.

    COMPLEXITIES IN PROTEINDNA RECOGNITION

    Consensus sequences in which base preferences are repre-sented using letters (e.g., A=Ade, Y=Cyt or Thy) are

    2100 Nucleic Acids Research, 2014, Vol. 42, No. 4

  • widely used to represent proteinDNA binding specicity.These models are appropriate for high-specicity DNA-interacting proteins such as restriction enzymes. Forexample, enzyme HincII will cut DNA sites that matchthe consensus GTYRAC (R=Ade or Gua), but theafnity for all other DNA sites is orders of lower magni-tude (46). However, transcription factor (TF)DNAbinding is much more degenerate and is oftencharacterized by a gradual transition from high to lowafnity sites (24,29,35,43,4648). To account for thesequence degeneracy in TF binding, the concept ofposition weight matrices (PWMs) was proposed andremains the most widely used representation of TF-binding specicity (46,49). A PWM is a matrix of scores(or weights) for each DNA base pair along a binding site.PWM models can be learned from a variety of data types,from small collections of known binding sites to largedatasets generated using HT technologies. PWMs alsohave the benet of being easy to visualize as DNAlogos, providing an intuitive feel for the TF-bindingspecicity (50).One drawback of the PWM formalism is that it makes

    the implicit assumption that individual base pairs within abinding site contribute independently to the proteinDNAbinding afnity. It has been shown that this assumptiondoes not always hold (5153), and more complex modelsof proteinDNA specicity have been developed toaccount for position dependencies within proteinDNAbinding sites (52,5456). Some studies have focused onextending traditional PWM models into higher-ordermodels by including contributions from dinucleotidesand trinucleotides (45,5762). A drawback of includingnon-independent contributions is that the number ofmodel parameters increases signicantly, which makesthe models harder to learn and prone to overtting.Thus, selecting the relevant dinucleotides andtrinucleotides is critical for building higher-order modelsthat take into account dependencies between positions inTF-binding sites (58,60). We note that other approacheshave also been proposed to model DNA-binding speci-city, however, a full survey of these various approaches isoutside the scope of this review. Beyond inherent sequencedegeneracy and positional dependencies, several otheraspects of proteinDNA binding complicate the develop-ment and application of binding models and are reviewedbelow.

    Low-afnity binding sites

    Regardless of how degeneracy is represented in proteinDNA binding models, it is critically important to be ableto capture the full breadth of binding sites used in vivo.Complicating this goal is the fact that TFs can specicallyutilize low-afnity DNA-binding sites to regulate genes.TF binding to low-afnity DNA sites can provide a mech-anism for interpreting both spatial (6366) and temporal(67,68) TF gradients that often arise during developmentto control where and when genes are expressed. Analysisof genome-wide binding data has also provided evidencethat low-afnity sites are under wide-spread evolutionaryselection (69,70) and that their inclusion can greatly

    improve quantitative models of TF binding and generegulation used for predicting segmentation patternsduring early embryonic development in Drosophila (71).Utilization of sites selected to be lower afnity than anoptimal sequence opens the door for functionallyrelevant sites to deviate strongly from the consensus se-quence and may not be well represented by a particularbinding model. For example, a comprehensive analysis ofDNA binding by NF-kB dimers identied numerouslower afnity, non-traditional sites that differ signicantlyfrom the consensus sites and are not captured by thewidely used PWMs (37,72). Disagreement with a PWMmodel may be due to: (i) a protein having multiplebinding modes, which will require multiple PWMs (dis-cussed more below) or (ii) poor or biased parameterizationof the PWM model. PWMs can capture low-afnitybinding sites but must be explicitly parameterized usinglow-afnity binding data (59). In either case, by focusingonly on the highest afnity sites, proteinDNA bindingmodels are likely to miss functionally important inter-actions that occur through lower afnity sites; however,making binding models too exible or encompassing,without proper parameterization, runs the risk ofincreasing the rate of false-positive predictions.

    TF-specic preferences

    Another complexity highlighted by recent HT proteinDNA binding studies is that closely related TFs oftenexhibit both common and TF-specic binding preferences(23,24,35,37,40,7279). In a study of 104 mouse TFs, itwas found that even proteins with a high degree of simi-larity in their DNA-binding domains (DBDs) (as high as67% amino acid sequence identity) can exhibit distinctDNA-binding proles (24). In many cases, the highestafnity site is identical for several members of a DBDfamily, but individual proteins within the family have dif-ferent preferences for lower afnity sites. For example,mouse TFs Irf4 and Irf5 share a strong preference forsequences containing the 50-CGAAAC-30 site, but prefer,albeit with lower afnity, 50- TGAAAG-30 or 50-CGAGAC-30, respectively. Homeodomain proteins have been ex-tensively studied using HT approaches (23,77,78), reveal-ing rich and diverse sequence preferences for memberseven within protein subfamilies. For example, the Lhxsubfamily binds with high afnity sites containing thecore 50-TAATTA-30, but different subfamily membershave different preferences for medium- and low-afnitysites: Lhx2 prefers 50-TAATGA-30 and 50-TAACGA-30,whereas Lhx4 prefers 50-TAACGA-30 and 50-TAATCT-30. These TF-specic preferences cause difculties forDNA-binding models, as the models must accuratelycharacterize the common binding sites while at the sametime capture the sites specic to each TF.

    Flanking DNA

    Even in the case of TFs with well-dened core DNAbinding sites, complexities can still arise from the DNAsequence anking the core, which can affect bindingafnity. Two such examples have been highlighted inrecent HT studies (45). The TF Gcn4 binds to the core

    Nucleic Acids Research, 2014, Vol. 42, No. 4 2101

  • motif 50-TGACTCA-30. Binding measurements of Gcn4 tothousands of sequences containing this core motif revealeda wide range of binding afnities, from no appreciablebinding to high-afnity binding (43). Nucleotides immedi-ately adjacent to the core 7-mer were important forbinding afnity, but even the two nucleotides ankingthe best 9-mer affected afnity by an order of magnitude.Another recent study examined the DNA-binding prefer-ences of the paralogous yeast TFs Cbf1 and Tye7 tohundreds of genomic sequences containing the E-boxmotif 50-CACGTG-30 (45). The study revealed a strongdependence on anking DNA and computationalanalyses suggested that the sequence beyond the canonicalE-box motif contributes to binding specicity byinuencing the three-dimensional structure of the DNA-binding site. Importantly, the additional specicity comingfrom sequences anking the core motif could not bedescribed by simply extending the core (45), which isnot surprising given that the inuence of anking DNAis likely to be exerted through DNA shape and notthrough specic DNA contacts. DNA shape is afunction of the DNA sequence; however, this relationshipis complex (80) and cannot be captured by simple PWMmodels. Thus, in order to account for the inuence ofanking regions on the afnity of DNA-binding sites,future models of binding specicity will likely use DNAshape characteristics explicitly.

    Multiple modes of DNA binding

    Complicating a simple model of DNA binding for manyproteins is that they can bind DNA using two or moredistinct modes (Figure 1). These different modes of inter-action can lead to fundamentally different DNA-bindingpreferences (i.e., motifs), complicating simple bindingmodels that do not explicitly account for these multiplemodes. HT studies are increasingly revealing unknowndiversity in DNA-binding preferences of numerousproteins, many of which may be the results of differentbinding modes (29,72,75,81). Here, we classify variablebinding modes into four categories (Figure 1) and brieyreview each category.

    Variable spacingSome proteins that bind to bipartite DNAmotifs (i.e. motifscomposed of two half-sites), can recognize distinct classes ofmotifs in which the half-sites are separated by differentnumbers of bases (Figure 1A). This phenomenon was rstobserved more than 20 years ago for basic leucine zipper(bZIP) proteins, which can bind overlapping or adjacentTGAC half-sites (82). Subsequent studies have classiedthe bZIP family into two subfamilies based on proteinsequence: (i) Activator protein 1 (AP-1) proteins, which gen-erally prefer overlapping half-sites, but bind to adjacent half-sites with almost equal afnity and (ii) Activatingtranscription factor (ATF)/cAMP response element-binding protein (CREB) proteins, which generally preferadjacent half-sites and bind very poorly to overlappinghalf-sites (82,83). However, a recent PBM-based studyfound that at least one ATF/CREB protein in yeast(Yap3) binds with high afnity to both adjacent and

    overlapping half-sites (74). Therefore, although some ofthe residues that dened bZIP half-site spacing have beenidentied (84), additional residues are also involved.Variable half-site spacing has also been well documentedfor the nuclear receptors (NRs). For example, both the per-oxisome proliferator activated receptors (PPARs) andretinoic acid receptors (RARs) form heterodimers with theretinoid X receptor (RXR) and can bind to responseelements composed of direct repeats of the 50-AGGTCA-30half-site separated by different length spacers (85).PPAR:RXR dimers bind direct repeats spaced by 1 or 2bases (i.e. DR1=50-AGGTCANAGGTCA-30; DR2=50-AGGTCANNAGGTCA-30), while RAR:RXR dimersbind to repeats spaced by 1 (DR1) or 5 (DR5) bases. Thevariable spacing has even been shown to affect in vivofunction of DNA-bound NR dimers (86). RAR:RXRbound to DR5 elements can activate transcription inresponse to ligand, but will not activate transcription whenbound to the DR1 elements (86).

    Multiple DBDsDNA-binding proteins can contain multiple, independentDBDs (3,87). Multiple DBDs can allow a protein to alter-natively recognize different DNA elements using differentDBDs. For example, the zinc nger (ZF) protein Evi1contains 10 ZF domains separated into two autonomouslyfunctioning DBDsan N-terminal seven ZF DBD and aC-terminal three ZF DBDthat recognize completely dif-ferent DNA motifs (8890). A still more complicatedscenario is seen for the mouse TF Oct-1 that has twoDBDs known as the POU (Pit-Oct-Unc) homeodomain(POUHD) and the POU-specic domain (POUS) (Figure1B) (24,91,92). Oct-1 can bind to three distinct DNAmotifs using different combinations of the two DBDs:the POUHD site is recognized by the POUHD domain,the POUS site is recognized by POUS domain and thecomposite POU site is recognized using both domains.

    Multi-meric bindingSelective multimerization provides another means toexpand or diversify the DNA-binding specicity of aprotein. Proteins such as Oct-1(93) and RXR (94) havebeen reported to bind DNA as either monomers orhomodimers. Recent large-scale studies using HT-SELEX assays (39,95) have revealed that the dimericmode of binding might be more common than previouslyappreciated and even proteins that are known to bindDNA primarily as monomers, such as Elk1, (Figure 1C)can also form dimers with specic orientation and spacingpreferences. These dimeric sites are enriched withingenomic regions bound in vivo, suggesting that thedimeric proles are biologically relevant. Furthermore,the dimer orientation and spacing preferences can some-times distinguish individual members of the same TFfamily (95). For example, T-box factors share a commonmonomeric binding specicity but show seven distinctdimeric spacing/orientation preferences.

    Alternate structural conformations

    The recognition of distinct DNA motifs is expected whena protein contains multiple DBDs or binds as a multi-mer;

    2102 Nucleic Acids Research, 2014, Vol. 42, No. 4

  • however, some proteins with only a single DBD can alsobind to multiple, distinctly different DNA sites. One suchexample is mouse TF sterol regulatory element-bindingfactor 1 (SREBF1) (Figure 1D), the ortholog of humanTF sterol regulatory element-binding protein 1 (SREBP)(59,96). SREBF/SREBP proteins belong to the basichelixloophelix (bHLH) family of TFs that typically rec-ognize symmetric E-boxes (50-CAnnTG-30). Unlike mostbHLH proteins, SREBF/SREBP can bind the asymmetricsterol regulatory element 50-ATCACnCCAC-30. A co-crystal structure of the DNA-binding domain of humanSREBP-1a bound to DNA revealed an asymmetric DNAprotein interface with one monomer binding the E-boxhalf-site (50-ATCAC-30) using proteinDNA contactstypical of bHLH proteins and the second monomerrecognizing the non-E-box half-site (50-GTGGG-30)using entirely different proteinDNA contacts (96,97).

    Thus, in the case of SREBF/SREBP, residues in theDBD are responsible for the multiple modes of DNAbinding. Regions outside the DNA-binding domain ofthe protein can also enable alternate binding modes. Forexample, a region N-terminal to the basic DBD of yeastTF Hac1 is required for dual recognition modes.Mutations in this region were shown to preferentiallyreduce binding to one of the modes, whereas mutationof an arginine in the basic domain was crucial for theother mode of binding (98). This indicates that theprotein can bind DNA using two distinct conformations,and individual residues (within and outside the DBD) playspecic roles within each binding mode (98).The ability of proteins to utilize different DNA-binding

    modes presents difculties or even precludes the construc-tion of simple DNA-binding models for many proteins.The presence of distinct modes usually requires that

    Multiple Modes of DNA Binding

    Gcn4 dimers can bind to bipartite sites with half-sites separated byvariable-length spacers (82); motifsfrom (73,74)

    Multiple DBDs

    Oct-1 can bind to different DNA sites using different arrangementsof its two DNA-binding domains (91,92);motifs from (24)

    Variable Spacing

    GAT

    ACT

    TA

    CTTGTCGTATAGTAGAT

    Alternate Structural Conformations

    SREBP can bind to different DNA sites by adopting alternate structural conformations (96,97); motifs from (44)

    Alternate conformation

    CGA

    AT

    AGC

    GA

    TGC

    ACG

    CT

    TCG

    TA

    GCT

    Multi-meric Binding Elk1 can bind both as a monomeror as a dimer (95)

    Monomer site

    GA

    TGC

    ATTCCGCGCGGATACAGCT

    CTGA

    GAC

    A

    CGGATACAGGACTGADimer site

    GCAG

    ATC

    AT

    CA

    TA

    ATGTTAATGGTC

    POU site HD

    ATG

    GTCA

    TATAATCAGTTACGCTA

    POU site S POU site

    CGATTGCTACGGATACAGCT TGACTTGATCAGTACGACT

    A

    B

    C

    D

    Overlapping half-sites Adjacent half-sites

    CAG

    ATCTGAGTCACGCCAGTC

    Figure 1. Multiple modes of DNA binding. Schematized are examples illustrating four mechanistic categories by which proteins recognize differentDNA sites via distinct structural modes.

    Nucleic Acids Research, 2014, Vol. 42, No. 4 2103

  • multiple DNA-binding models are used to represent aproteins specicity. This can cause difculties when thedata available to construct the DNA-binding modelcontains a mixture of multiple types of binding motifs,such as in genome-wide chromatin immunoprecipitation(ChIP)-seq datasets. In some situations, the DNA-bindingmodes can potentially be separated and independentlycharacterized. For example, one could independentlycharacterize the binding of individual DBDs when aprotein contains multiple DBDs. However, even thesesituations can lead to difculties as illustrated by thecase of Oct-1 where the autonomous DBDs can functioneither independently or together, in which case examiningthem separately will cause one to miss the binding sitesrecognized when both DBDs are involved (Figure 1B).

    Multi-protein recognition codes

    The DNA-binding specicity of a TF is a primary deter-minant of where it will bind in the genome (99,100).However, transcriptional regulation often involves theassembly of multi-protein complexes on DNA (101,102),and it has been shown that multi-protein complexes canexhibit novel DNA-binding specicities not predictablefrom measurements of the individual proteins (40,48).Therefore, in efforts aimed at understanding the biophys-ical determinants of specicity in gene regulation it is ofprimary importance to also consider how multi-proteincomplexes can lead to new or enhanced specicities.Here, we will review the varied mechanism by whichnovel DNA-binding specicities can arise when proteins,both DNA-binding and non-DNA-binding, assembletogether. We suggest that these mechanisms representhigher-order multi-protein recognition codesmechan-isms by which genomic targeting of regulatory factors isencoded in multi-protein complexes.

    Cooperative binding

    Cooperative binding of TFs to DNAusually viaproteinprotein interactions between adjacently boundTFsstabilizes the proteins on the DNA and canenhance their individual contributions to transcription ofa gene (3,103). Cooperative binding can also affect theDNA-binding specicity and lead to recognition of newbinding sites (3) (Figure 2A). One way in which coopera-tive binding alters specicity is by extending the binding ofa TF to include lower afnity sites as a result of theenhanced overall afnity of the cooperative complex forDNA. A well-studied example is the binding of yeast TFsMATa2, MATa1 and Mcm1, which regulate mating-typegenes (103,104). MATa2 binds DNA cooperatively withcofactors MATa1 or Mcm1 to repress different setsof genes. MATa2:MATa1 heterodimers bind to sitesfound in the promoter regions of haploid-specic genesand repress their expression, whereas MATa2:Mcm1heterotetramers bind different sites found in mating-typea-specic gene promoters and repress gene expression.Both complexes repress gene expression via recruitmentof co-repressor proteins Tup1 and Ssn6 by interactionswith MATa2. Therefore, MATa2 functions to recruitco-repressors and repress gene expression, but its

    DNA-binding specicityand subsequently its targetgenesare mediated by cooperative binding with cell-type-specic cofactors. In a recent study, a more subtleform of specicity alteration by cooperative binding waspresented in which DNA-binding of one protein can sta-bilize or destabilize the DNA-binding of another proteinvia the deformation of the DNA structure (105). Notably,the cooperative binding was not mediated by directprotein contacts, but by an allosteric mechanism whereDNA structural deformations propagate along the DNAhelix. The cooperative allosteric effects were shown tooperate even to 16 bp away.A second, more active, mechanism has been described

    where cooperative binding alters the residuebase contactsthat a TF can make with the DNA, thereby altering itsinherent binding specicity (3,40) (Figure 2A). Oneexample is the binding of the TFs Ets-1 and Pax5, wherePax5 alters the DNA contacts made by Ets-1, making itmore permissive to alternate DNA sequences (106).Bound alone to DNA, Ets-1 binds with high-afnity to50-GGAA-30 and with lower afnity to 50-GGAG-30.A tyrosine residue (Y395) in the Ets-1 recognition helixinteracts with this fourth base position (underlined) andmediates the preference for adenine over guanine.However, in complex with Pax-5, the Y395 residueadopts an alternate conformation and no longer interactswith the DNA at this base position. This has the effect ofmaking the Ets-1 DNA binding permissive for both se-quences, and therefore broadens the specicity of thePax-5:Ets-1 complex to include the suboptimal50-GGAG0-30 Ets-1 half-site. Another example has beenelucidated in a series of studies on Drosophila Hoxproteins and their cofactor Extradenticle (Exd) (40,107).The Hox protein Sex combs reduced (Scr) binds with thecofactor protein Exd preferentially to a paralog-specicsite (fkh250), found in the fork head (fkh) gene, over aconsensus site (fkh250con) recognized by other Hox:Exdcomplexes (107,108). The preferential binding of theScr:Exd complex to the fkh250 site involves the stabiliza-tion of a exible arm region N-terminal to the Scrhomeodomain (107). Two of the stabilized residues inthe N-terminal arm region (Arg3 and His-12) insert intothe fkh250 DNA minor groove, which is narrower thanthe same region in the fkh250con sequence and moreelectrostatically favorable for the Arg and His residues.Therefore, the enhanced specicity of the Scr:Exdcomplex to the fhk250 sequence is due to local DNAshape (i.e., minor groove width) readout by the Scr Argand His residues when stabilized by the cooperativelybound Exd cofactor. This work highlights how the intrin-sic specicity of a monomeric homeodomain (Scr) can bealtered or rened through interaction with a proteincofactor (Exd), a characteristic called latent specicity.A recent comprehensive analysis of other Hox proteinsextended these results and demonstrated latent specicitiesfor all the Drosophila Hox proteins (40). A HT-SELEX/SELEX-seq assay was used to measure and compare theDNA-binding specicity of all eight Drosophila Hoxproteins alone and in complex with Exd. As seen forScr:Exd, binding with the Exd cofactor revealed latent

    2104 Nucleic Acids Research, 2014, Vol. 42, No. 4

  • specicity differences between the Hox proteins, not ob-servable when Hox proteins were examined as monomers.

    Cooperative cofactor recruitment

    The recruitment of non-DNA-binding cofactors (i.e.coactivators and corepressors) to promoters and enhan-cers by DNA-binding TFs is central to gene regulation(101). Cooperative recruitment occurs when a cofactorcan make simultaneous interactions with multiple, DNA-bound TFs; the result is a synergistic enhancement in thecofactor recruitment (109,110). Cooperative cofactor re-cruitment integrates the contributions from multiple TFsto achieve maximal gene expression and is a powerfulmechanism for establishing an integrative, AND-type

    regulatory logic in gene transcription (111114). At thesame time, cooperative recruitment enhances the bindingspecicity of the cofactor. The genomic location of thecofactor is determined not by the presence of a singleTF, but by the less-frequent (i.e., more specic), coinci-dent presence of multiple TFs (Figure 2B).A paradigm for cooperative cofactor recruitment are

    multi-protein, enhanceosome complexes in which multipleTFs cooperatively assemble on DNA and coordinatelyrecruit a common cofactor. A well-studied example isthe enhanceosome that binds to the Interferon beta(IFNb) gene promoter and cooperatively recruits thecoactivator CREB-binding protein (CBP) (111,112). Inresponse to viral infection, the TFs ATF-2, c-Jun, Irf3,Irf7 and NF-kB, facilitated by the architectural factor

    Multi-Protein Recognition Codes

    Enhanced complex stability due to cooperativity allows binding to lower-affinity (weak) sites (103,104,106) Weak site

    (assisted binding)

    Inter-protein interactions alter or stabilizeprotein-DNA contacts, altering DNA-binding specificity (40,106,107)

    Altered sidechain conformation

    Cooperative recruitment

    Cofactor recruitment requires multiple factors (rather than only one), allowing more specific cofactor targeting (109-114)

    Weak or no recruitment

    Strong cooperative recruitment

    Allostery Allosteric control of cofactor recruitmentlimits cofactor recruitment to only a subset of the TF binding sites (116-121,

    Recruiting siteNon-recruiting site

    Cofactor-based targeting

    Enhanced binding of multi-protein complexto specialized composite sites is mediated by interactions between non-DNA-bindingcofactor and an auxiliary motif (48,129)

    Enhanced binding to composite site

    Stabilization of contacts(e.g., N-terminal tail)

    Weak site (no binding)

    Strong site

    Cooperative binding

    A

    B

    C

    D

    124,125)

    Figure 2. Multi-protein recognition codes. Schematized are examples illustrating four mechanistic categories by which targeting of proteins todistinct DNA sites involves the assembly of multi-protein complexes.

    Nucleic Acids Research, 2014, Vol. 42, No. 4 2105

  • Hmga1, bind cooperatively to the IFNb promoter.Multiple proteins in the DNA-bound complex contributeto the cooperative recruitment of CBP, and operate syn-ergistically to enhance transcription (111). While NF-kB,the ATF-2:c-Jun dimer and Irf factors can recruit CBPindividually, studies have demonstrated that removal ofthe activation domains from IRF or NF-kB factorsresulted in complete loss of CBP recruitment, highlightingthe cooperative nature of the CBP cofactor recruitment(111). A similar situation occurs in the cooperative recruit-ment of class II, major histocompatibility complextransactivator (CIITA) to a set of conserved transcrip-tional control elements found in the promoters of majorhistocompatibility complex class II (MHC-II) genes(113,115). The control sequences contain four DNAelements termed the W, X, X2 and Y boxes, with aconserved sequence, spacing and orientation across thedifferent promoters. The TFs X-box binding protein(RFX), X2-binding protein (X2BP) and Nuclear factorY (NF-Y) bind to the X, X2 and Y boxes, respectively,and form a cooperative, nucleoprotein, enhanceosomecomplex. This multi-protein complex recruits the CIITAto the MHC-II promoters via multiple weak interactionsmediated by different components of the enhanceosomecomplex (113). Removal of interactions from any ofthese component proteins leads to signicant loss ofCIITA recruitment.

    Allosteric effects in cofactor recruitment

    DNA can act as a sequence-specic allosteric ligand thatalters the function of a DNA-bound TF (116121).A common mechanism by which this functions is byDNA sequence-dependent effects on cofactor recruitment.A TF bound to one set of DNA sites will recruit a speciccofactor, whereas the same TF bound to a different set ofsites will not recruit the cofactor. Therefore, allostericcontrol of cofactor recruitment functionally partitionsDNA binding sites of a TF and consequently renesbinding specicity of the recruited cofactor (Figure 2C).Allosteric mechanisms have been reported for both the

    glucocorticoid (GR) and the estrogen (ER) nuclear recep-tors (116,118). The DNA sequence bound by ER can in-uence its transcriptional activity and recruitment ofLXXLL-containing peptides and p160 cofactors (SRC-1,GRIP1, ACTR) (116). The DNA sequence bound by GR,called the GR response element (GRE), was shown to in-uence gene expression, but the effect did not correlatewith GR-binding afnity, suggesting a mechanism thatinvolves differential recruitment of cofactor proteins(118). Furthermore, knock down of Brahma, a subunitof the SWI/SNF nucleosome remodeling complex,affected GR-dependent gene expression in a GREsequence-dependent manner, suggesting that the compos-ition of the DNA-bound protein complexes wereallosterically regulated. Allosteric control of cofactor re-cruitment has also been described for different NF-kBfamily dimers (117,120). Single-base changes in NF-kBbinding sites have been demonstrated to affect recruitmentof the TF Irf3 protein by DNA-bound p65/Rel-containingdimers (117) as well as the recruitment of the cofactors

    histone de-acetylase 3 (HDAC3) and Tip60 to a DNA-bound, ternary complex containing p50 homodimers andBcl3 (120).The mechanism of allosteric control for GR and NF-kB

    appears to be moderated by structural differences betweenTFs bound to different DNA sites. Structural studies ofGR bound to different GREs demonstrated that the struc-ture of a lever arm loop region, connecting the DNArecognition helix and the ligand binding domain, washighly dependent on the GRE sequence (118). Studies ofNF-kB dimers bound to different kB sites have shown thatDNA sequences differences can result in structuralrearrangements (rotations and translations) of one dimersubunit (19,21,122). Protease protection assays anddetailed binding studies have also suggested DNAsequence-dependent changes in protein conformation(72,123).In contrast to the situation for GR and NF-kB where

    allostery is mediated by structural differences betweenthe DNA-bound complexes, allostery involving the POUfactors operates via larger, more global differences in qua-ternary structure when bound to different DNA sites.POU factors, such as Oct-1 described above, havetwo DBDs connected by a exible linker: POUS anda POUHD (121). POU factors can homo- andheterodimerize to a diverse set of DNA-binding sites inwhich the relative orientations and spacing of the POUSand POUHD can vary and subsequently lead to differentialcofactor recruitment (121,124). The POU factors Oct-1and Oct-2 can bind as homodimers to two distinctresponse elements, PORE and MORE, found in the pro-moters of B-cell-specic genes (121). When bound to thePORE sequence, Oct dimers can recruit the transcriptionalcoactivator Oct-binding factor 1 (OBF-1); however,dimers bound to the MORE sequence do not recruitOBF-1. Crystal structures of the DNA-bound Oct-1dimer have revealed that in the MORE-bound dimer theOBF-1 binding site on Oct-1 is blocked due to the relativeorientation of the dimer subunits, but in the PORE-bounddimer the site is exposed, allowing OBF-1 recruitment(125,126). A similar situation arises with the Pit-1 POUfactor in which binding site differences lead to cell-type-specic expression patterns and differential recruitment ofthe N-CoR co-repressor complex (124,127,128). Studiesrevealed that an extra 2 bp between the POUS andPOUHD half-sites found in the growth hormone (GH)gene promoter, compared with a similar afnity site inthe prolactin gene promoter, altered the quaternary struc-ture of the DNA-bound Pit-1 and enabled the recruitmentof the N-CoR co-repressor complex to the GH site (124).

    Cofactor-based specicity

    Targeting of non-DNA-binding cofactors to DNA is trad-itionally viewed as being mediated solely via proteinprotein interactions with DNA-bound TFs. In otherwords, the cofactor is recruited and interacts withDNA indirectly through the TFs. However, severalstudies have described a provocative alternative mechan-ism in which recruited cofactors interact directly withDNA when part of a larger, multi-protein complex

    2106 Nucleic Acids Research, 2014, Vol. 42, No. 4

  • (48,129). Further, as part of this larger complex, the non-DNA-binding cofactors mediate sequence-specic inter-actions that preferentially stabilize the binding to compos-ite DNA sites containing specic auxiliary motifs (Figure2D). This represents yet another mechanism to enhancethe DNA sequence specicity of multi-protein regulatorycomplexes.The regulation of sulfur metabolism genes in yeast

    involves a multi-protein complex composed of asequence-specic TF Cbf1 and two non-DNA-bindingcofactors Met28 and Met4 (130,131). Cbf1 is a bHLHprotein and binds as a homodimer to E-box sites with aconsensus 50-CACGTG-30 core (131). A PBM-basedanalysis revealed that the Met4:Met28:Cbf1 complexbinds preferentially to composite DNA sites in which theCbf1 E-box is anked by an adjacent 50-RYAAT-30recruitment motif (i.e. 50-RYAATNNCACGTG-30)(48). High-afnity binding to the composite sites ishighly cooperative and requires the full heteromericcomplex. Sequence analysis, modeling and bindingassays suggest that the recruitment motif is recognizedby the non-DNA-binding Met28 subunit. A highlysimilar situation has also been described for the multi-protein Oct-1:HCF-1:VP16 complex that regulates expres-sion of the herpes simplex virus immediate-early genes bybinding to a composite 50-TAATGARAT sequence in thegene promoters (129). The complex is composed of theDNA-binding TF Oct-1 and two non-DNA-bindingcofactors: the viral activator VP16 and the host cellfactor 1 (HCF-1). Oct-1 binds to the 50-TAAT sequenceand a DNA-binding surface of VP16 mediates interactionswith the 50-GARAT sequence. Similar to the situation inyeast, the VP16 cofactor does not interact with DNA wellon its own, but when stabilized on DNA as part of a largermulti-protein complex it can adopt a conguration thatmediates recognition of the 50-GARAT subsequence.Therefore, in both situations multi-protein complexespreferentially bind to composite binding sites composedof a recruitment motif (50-RYAAT-30 and 50-GARAT-30)and an adjacent TF-binding motif (50-CACGTG-30 and50-TAAT-30) that are recognized by a non-DNA-bindingcofactor (Met28 and VP16) and sequence-specic TF(Cbf1 and Oct-1) subunits, respectively.

    SUMMARY

    Complexities in DNA-binding operate at multiple levelsand complicate efforts to construct simple models, or rec-ognition codes, of DNA-binding. Flexible protein-DNAinteractions and non-local structural effects confoundsimple descriptions of DNA base preferences, andrequire the use of binding models, such as PWMs, thatcan account for the resulting sequence degeneracy. Theability of proteins to bind DNA via multiple modesfurther complicates the situation and leads to the require-ment of multiple binding models for many proteins.Fortunately, HT technologies are not only increasingthe rate at which DNA-binding proteins are beingcharacterized, but are providing the comprehensivebinding data needed to construct models that involve

    multiple DNA-binding modes (23,24,27,28,35,3840,42,43,48,59,73,74,79,132134). An added benet of thecomprehensive nature of certain HT datasets, such ascomprehensive k-mer or binding site level data, is thatpredictions of binding sites in the genome can be per-formed directly using the measured binding data(23,24,40,43,133).Multi-protein complexes add tremendously to the

    DNA-binding diversity of proteins and provide a keymechanism to integrate signals in gene regulation.However, these higher-order multi-protein mechanismscause difculties in constructing models, primarily dueto the fact that DNA-binding models of proteinsexamined in isolation may not capture the binding sitesutilized in vivo when cofactors are present. Here again, HTtechnologies are being used successfully to characterizebinding of multi-protein complexes, revealing recognitioncodes mediated by cooperative binding (40,133) andcofactor-mediated targeting (48). The continued applica-tion of HT technologies to examine DNA binding ofmulti-protein complexes will undoubtedly provide increas-ingly rened binding models and provide new insights intogene targeting in vivo. Furthermore, just as the applicationof HT technologies has drawn our attention to the preva-lence of TFs that exhibit subtle binding preferences orbind DNA via multiple modes, studies examining multi-protein complexes will likely identify wide spread use ofhigher-order mechanisms such as allostery, cooperativerecruitment and cofactor-mediated targeting. Integratingthese datasets with whole genome chromatin immunopre-cipitation (ChIP) and expression datasets will lead tomuch more complete and sophisticated descriptions ofspecicity in gene regulation.

    FUNDING

    NIH [K22AI093793 to T.S.]; PhRMA FoundationResearch Starter Grant (to R.G.); Basil OConnorResearch Award #5-FY13-212 from March of DimesFoundation (to R.G.). Funding for open access charge:Waived by Oxford University Press.

    Conict of interest statement. None declared.

    REFERENCES

    1. Seeman,N.C., Rosenberg,J.M. and Rich,A. (1976) Sequence-specic recognition of double helical nucleic acids by proteins.Proc. Natl Acad. Sci. USA, 73, 804808.

    2. Rohs,R., Jin,X., West,S.M., Joshi,R., Honig,B. and Mann,R.S.(2010) Origins of specicity in protein-DNA recognition. Annu.Rev. Biochem., 79, 233269.

    3. Garvie,C.W. and Wolberger,C. (2001) Recognition of specicDNA sequences. Mol. Cell, 8, 937946.

    4. Jayaram,B. and Jain,T. (2004) The role of water in protein-DNArecognition. Annu. Rev. Biophys. Biomol. Struct., 33, 343361.

    5. Reddy,C.K., Das,A. and Jayaram,B. (2001) Do water moleculesmediate protein-DNA recognition? J. Mol. Biol., 314, 619632.

    6. Miller,J.C. and Pabo,C.O. (2001) Rearrangement of side-chains ina Zif268 mutant highlights the complexities of zinc nger-DNArecognition. J. Mol. Biol., 313, 309315.

    7. Baldwin,E.P., Martin,S.S., Abel,J., Gelato,K.A., Kim,H.,Schultz,P.G. and Santoro,S.W. (2003) A specicity switch in

    Nucleic Acids Research, 2014, Vol. 42, No. 4 2107

  • selected cre recombinase variants is mediated by macromolecularplasticity and water. Chem. Biol., 10, 10851094.

    8. Otwinowski,Z., Schevitz,R.W., Zhang,R.G., Lawson,C.L.,Joachimiak,A., Marmorstein,R.Q., Luisi,B.F. and Sigler,P.B.(1988) Crystal structure of trp repressor/operator complex atatomic resolution. Nature, 335, 321329.

    9. Rohs,R., West,S.M., Sosinsky,A., Liu,P., Mann,R.S. andHonig,B. (2009) The role of DNA shape in protein-DNArecognition. Nature, 461, 12481253.

    10. Chang,Y.P., Xu,M., Machado,A.C., Yu,X.J., Rohs,R. andChen,X.S. (2013) Mechanism of origin DNA recognition andassembly of an initiator-helicase complex by SV40 large tumorantigen. Cell Reports, 3, 11171127.

    11. Werner,M.H., Gronenborn,A.M. and Clore,G.M. (1996)Intercalation, DNA kinking, and the control of transcription.Science, 271, 778784.

    12. Kim,J.L., Nikolov,D.B. and Burley,S.K. (1993) Co-crystalstructure of TBP recognizing the minor groove of a TATAelement. Nature, 365, 520527.

    13. Rohs,R., Sklenar,H. and Shakked,Z. (2005) Structural andenergetic origins of sequence-specic DNA bending: Monte Carlosimulations of papillomavirus E2-DNA binding sites. Structure,13, 14991509.

    14. Pabo,C.O. and Nekludova,L. (2000) Geometric analysis andcomparison of protein-DNA interfaces: why is there no simplecode for recognition? J. Mol. Biol., 301, 597624.

    15. Suzuki,M., Gerstein,M. and Yagi,N. (1994) Stereochemical basisof DNA recognition by Zn ngers. Nucleic Acids Res., 22,33973405.

    16. Kortemme,T., Morozov,A.V. and Baker,D. (2003) An orientation-dependent hydrogen bonding potential improves prediction ofspecicity and structure for proteins and protein-proteincomplexes. J. Mol. Biol., 326, 12391259.

    17. Siggers,T.W. and Honig,B. (2007) Structure-based prediction ofC2H2 zinc-nger binding specicity: sensitivity to dockinggeometry. Nucleic Acids Res., 35, 10851097.

    18. Siggers,T.W., Silkov,A. and Honig,B. (2005) Structural alignmentof proteinDNA interfaces: insights into the determinants ofbinding specicity. J. Mol. Biol., 345, 10271045.

    19. Chen,F.E. and Ghosh,G. (1999) Regulation of DNA binding byRel/NF-kappaB transcription factors: structural views. Oncogene,18, 68456852.

    20. Chen,Y.Q., Ghosh,S. and Ghosh,G. (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer. Nat.Struct. Biol., 5, 6773.

    21. Chen,Y.Q., Sengchanthalangsy,L.L., Hackett,A. and Ghosh,G.(2000) NF-kappaB p65 (RelA) homodimer uses distinctmechanisms to recognize DNA targets. Structure, 8, 419428.

    22. Yanover,C. and Bradley,P. (2011) Extensive protein and DNAbackbone sampling improves structure-based specicity predictionfor C2H2 zinc ngers. Nucleic Acids Res., 39, 45644576.

    23. Berger,M.F., Badis,G., Gehrke,A.R., Talukder,S.,Philippakis,A.A., Pena-Castillo,L., Alleyne,T.M., Mnaimneh,S.,Botvinnik,O.B., Chan,E.T. et al. (2008) Variation inhomeodomain DNA binding revealed by high-resolution analysisof sequence preferences. Cell, 133, 12661276.

    24. Badis,G., Berger,M.F., Philippakis,A.A., Talukder,S.,Gehrke,A.R., Jaeger,S.A., Chan,E.T., Metzler,G., Vedenko,A.,Chen,X. et al. (2009) Diversity and complexity in DNArecognition by transcription factors. Science, 324, 17201723.

    25. Ramirez,C.L., Foley,J.E., Wright,D.A., Muller-Lerch,F.,Rahman,S.H., Cornu,T.I., Winfrey,R.J., Sander,J.D., Fu,F.,Townsend,J.A. et al. (2008) Unexpected failure rates formodular assembly of engineered zinc ngers. Nat. Methods, 5,374375.

    26. Wei,G.H., Badis,G., Berger,M.F., Kivioja,T., Palin,K., Enge,M.,Bonke,M., Jolma,A., Varjosalo,M., Gehrke,A.R. et al. (2010)Genome-wide analysis of ETS-family DNA-binding in vitro andin vivo. EMBO J., 29, 21472160.

    27. Meng,X., Smith,R.M., Giesecke,A.V., Joung,J.K. and Wolfe,S.A.(2006) Counter-selectable marker for bacterial-based interactiontrap systems. Biotechniques, 40, 179184.

    28. Noyes,M.B., Meng,X., Wakabayashi,A., Sinha,S., Brodsky,M.H.and Wolfe,S.A. (2008) A systematic characterization of factors

    that regulate Drosophila segmentation via a bacterial one-hybridsystem. Nucleic Acids Res., 36, 25472560.

    29. Berger,M.F., Philippakis,A.A., Qureshi,A.M., He,F.S.,Estep,P.W. 3rd and Bulyk,M.L. (2006) Compact, universal DNAmicroarrays to comprehensively determine transcription-factorbinding site specicities. Nat. Biotechnol., 24, 14291435.

    30. Bulyk,M.L., Gentalen,E., Lockhart,D.J. and Church,G.M. (1999)Quantifying DNA-protein interactions by double-stranded DNAarrays. Nat. Biotechnol., 17, 573577.

    31. Mukherjee,S., Berger,M.F., Jona,G., Wang,X.S., Muzzey,D.,Snyder,M., Young,R.A. and Bulyk,M.L. (2004) Rapid analysis ofthe DNA-binding specicities of transcription factors with DNAmicroarrays. Nat. Genet., 36, 13311339.

    32. Linnell,J., Mott,R., Field,S., Kwiatkowski,D.P., Ragoussis,J. andUdalova,I.A. (2004) Quantitative high-throughput analysis oftranscription factor binding specicities. Nucleic Acids Res., 32,e44.

    33. Field,S., Udalova,I. and Ragoussis,J. (2007) Accuracy andreproducibility of protein-DNA microarray technology. AdvBiochem. Eng. Biotechnol., 104, 87110.

    34. Bonham,A.J., Neumann,T., Tirrell,M. and Reich,N.O. (2009)Tracking transcription factor complexes on DNA using totalinternal reectance uorescence protein binding microarrays.Nucleic Acids Res., 37, e94.

    35. Maerkl,S.J. and Quake,S.R. (2007) A systems approach tomeasuring the binding energy landscapes of transcription factors.Science, 315, 233237.

    36. Zykovich,A., Korf,I. and Segal,D.J. (2009) Bind-n-Seq: high-throughput analysis of in vitro protein-DNA interactions usingmassively parallel sequencing. Nucleic Acids Res., 37, e151.

    37. Wong,D., Teixeira,A., Oikonomopoulos,S., Humburg,P.,Lone,I.N., Saliba,D., Siggers,T., Bulyk,M., Angelov,D.,Dimitrov,S. et al. (2011) Extensive characterization of NF-kappaBbinding uncovers non-canonical motifs and advances theinterpretation of genetic functional traits. Genome Biol., 12, R70.

    38. Zhao,Y., Granas,D. and Stormo,G.D. (2009) Inferring bindingenergies from selected binding sites. PLoS Comput. Biol., 5,e1000590.

    39. Jolma,A., Kivioja,T., Toivonen,J., Cheng,L., Wei,G., Enge,M.,Taipale,M., Vaquerizas,J.M., Yan,J., Sillanpaa,M.J. et al. (2010)Multiplexed massively parallel SELEX for characterization ofhuman transcription factor binding specicities. Genome Res., 20,861873.

    40. Slattery,M., Riley,T., Liu,P., Abe,N., Gomez-Alcala,P., Dror,I.,Zhou,T., Rohs,R., Honig,B., Bussemaker,H.J. et al. (2011)Cofactor binding evokes latent differences in DNA bindingspecicity between Hox proteins. Cell, 147, 12701282.

    41. Tantin,D., Gemberling,M., Callister,C. and Fairbrother,W.G.(2008) High-throughput biochemical analysis of in vivo locationdata reveals novel distinct classes of POU5F1(Oct4)/DNAcomplexes. Genome Res., 18, 631639.

    42. Warren,C.L., Kratochvil,N.C., Hauschild,K.E., Foister,S.,Brezinski,M.L., Dervan,P.B., Phillips,G.N. Jr and Ansari,A.Z.(2006) Dening the sequence-recognition prole of DNA-bindingmolecules. Proc. Natl Acad. Sci. USA, 103, 867872.

    43. Nutiu,R., Friedman,R.C., Luo,S., Khrebtukova,I., Silva,D., Li,R.,Zhang,L., Schroth,G.P. and Burge,C.B. (2011) Directmeasurement of DNA afnity landscapes on a high-throughputsequencing instrument. Nat. Biotechnol., 29, 659664.

    44. Weirauch,M.T. and Hughes,T.R. Dramatic changes intranscription factor binding over evolutionary time. Genome Biol.,11, 122.

    45. Gordan,R., Shen,N., Dror,I., Zhou,T., Horton,J., Rohs,R. andBulyk,M.L. (2013) Genomic regions anking E-box binding sitesinuence DNA binding specicity of bHLH transcription factorsthrough DNA shape. Cell Reports, 3, 10931104.

    46. Stormo,G.D. (2000) DNA binding sites: representation anddiscovery. Bioinformatics, 16, 1623.

    47. Maniatis,T., Ptashne,M., Backman,K., Kield,D., Flashman,S.,Jeffrey,A. and Maurer,R. (1975) Recognition sequences ofrepressor and polymerase in the operators of bacteriophagelambda. Cell, 5, 109113.

    48. Siggers,T., Duyzend,M.H., Reddy,J., Khan,S. and Bulyk,M.L.(2011) Non-DNA-binding cofactors enhance DNA-binding

    2108 Nucleic Acids Research, 2014, Vol. 42, No. 4

  • specicity of a transcriptional regulatory complex. Mol. Syst.Biol., 7, 555.

    49. Staden,R. (1984) Computer methods to locate signals in nucleicacid sequences. Nucleic Acids Res., 12, 505519.

    50. Workman,C.T., Yin,Y., Corcoran,D.L., Ideker,T., Stormo,G.D.and Benos,P.V. (2005) enoLOGOS: a versatile web tool for energynormalized sequence logos. Nucleic Acids Res., 33, W389W392.

    51. Man,T.K. and Stormo,G.D. (2001) Non-independence of Mntrepressor-operator interaction determined by a new quantitativemultiple uorescence relative afnity (QuMFRA) assay. NucleicAcids Res., 29, 24712478.

    52. Bulyk,M.L., Johnson,P.L. and Church,G.M. (2002) Nucleotidesof transcription factor binding sites exert interdependent effectson the binding afnities of transcription factors. Nucleic AcidsRes., 30, 12551261.

    53. Tomovic,A. and Oakeley,E.J. (2007) Position dependencies intranscription factor binding sites. Bioinformatics, 23, 933941.

    54. Zhou,Q. and Liu,J.S. (2004) Modeling within-motif dependencefor transcription factor binding site predictions. Bioinformatics,20, 909916.

    55. Agius,P., Arvey,A., Chang,W., Noble,W.S. and Leslie,C. (2010)High resolution models of transcription factor-DNA afnitiesimprove in vitro and in vivo binding predictions. PLoS Comput.Biol., 6, pii: e1000916.

    56. Ben-Gal,I., Shani,A., Gohr,A., Grau,J., Arviv,S., Shmilovici,A.,Posch,S. and Grosse,I. (2005) Identication of transcription factorbinding sites with variable-order Bayesian networks.Bioinformatics, 21, 26572666.

    57. Gershenzon,N.I., Stormo,G.D. and Ioshikhes,I.P. (2005)Computational technique for improvement of the position-weightmatrices for the DNA/protein binding sites. Nucleic Acids Res.,33, 22902301.

    58. Zhao,Y., Ruan,S., Pandey,M. and Stormo,G.D. (2012) Improvedmodels for transcription factor binding site identication usingnonindependent interactions. Genetics, 191, 781790.

    59. Weirauch,M.T., Cote,A., Norel,R., Annala,M., Zhao,Y.,Riley,T.R., Saez-Rodriguez,J., Cokelaer,T., Vedenko,A.,Talukder,S. et al. (2013) Evaluation of methods for modelingtranscription factor sequence specicity. Nat. Biotechnol., 31,126134.

    60. Mordelet,F., Horton,J., Hartemink,A.J., Engelhardt,B.E. andGordan,R. (2013) Stability selection for regression-based modelsof transcription factor-DNA binding specicity. Bioinformatics,29, i117i125.

    61. Siddharthan,R. (2010) Dinucleotide weight matrices for predictingtranscription factor binding sites: generalizing the position weightmatrix. PLoS One, 5, e9722.

    62. Mathelier,A. and Wasserman,W.W. (2013) The next generation oftranscription factor binding site prediction. PLoS Comput. Biol.,9, e1003214.

    63. Jiang,J. and Levine,M. (1993) Binding afnities and cooperativeinteractions with bHLH activators delimit threshold responses tothe dorsal gradient morphogen. Cell, 72, 741752.

    64. White,M.A., Parker,D.S., Barolo,S. and Cohen,B.A. (2012) Amodel of spatially restricted transcription in opposing gradients ofactivators and repressors. Mol. Syst Biol., 8, 614.

    65. Scardigli,R., Baumer,N., Gruss,P., Guillemot,F. and Le Roux,I.(2003) Direct and concentration-dependent regulation of theproneural gene Neurogenin2 by Pax6. Development, 130,32693281.

    66. Struhl,G., Struhl,K. and Macdonald,P.M. (1989) The gradientmorphogen bicoid is a concentration-dependent transcriptionalactivator. Cell, 57, 12591273.

    67. Rowan,S., Siggers,T., Lachke,S.A., Yue,Y., Bulyk,M.L. andMaas,R.L. (2010) Precise temporal control of the eye regulatorygene Pax6 via enhancer-binding site afnity. Genes Dev., 24,980985.

    68. Gaudet,J. and Mango,S.E. (2002) Regulation of organogenesis bythe Caenorhabditis elegans FoxA protein PHA-4. Science, 295,821825.

    69. Jaeger,S.A., Chan,E.T., Berger,M.F., Stottmann,R., Hughes,T.R.and Bulyk,M.L. (2010) Conservation and regulatory associationsof a wide afnity range of mouse transcription factor bindingsites. Genomics, 95, 185195.

    70. Tanay,A. (2006) Extensive low-afnity transcriptional interactionsin the yeast genome. Genome Res., 16, 962972.

    71. Segal,E., Raveh-Sadka,T., Schroeder,M., Unnerstall,U. andGaul,U. (2008) Predicting expression patterns from regulatorysequence in Drosophila segmentation. Nature, 451, 535540.

    72. Siggers,T., Chang,A.B., Teixeira,A., Wong,D., Williams,K.J.,Ahmed,B., Ragoussis,J., Udalova,I.A., Smale,S.T. and Bulyk,M.L.(2012) Principles of dimer-specic gene regulation revealed by acomprehensive characterization of NF-kappaB family DNAbinding. Nat. Immunol., 13, 95102.

    73. Zhu,C., Byers,K.J., McCord,R.P., Shi,Z., Berger,M.F.,Newburger,D.E., Saulrieta,K., Smith,Z., Shah,M.V.,Radhakrishnan,M. et al. (2009) High-resolution DNA-bindingspecicity analysis of yeast transcription factors. Genome Res., 19,556566.

    74. Gordan,R., Murphy,K.F., McCord,R.P., Zhu,C., Vedenko,A. andBulyk,M.L. (2011) Curated collection of yeast transcription factorDNA binding specicity data reveals novel structural and generegulatory insights. Genome Biol., 12, R125.

    75. Nakagawa,S., Gisselbrecht,S.S., Rogers,J.M., Hartl,D.L. andBulyk,M.L. (2013) DNA-binding specicity changes in theevolution of forkhead transcription factors. Proc. Natl Acad. Sci.USA, 110, 1234912354.

    76. Bolotin,E., Chellappa,K., Hwang-Verslues,W., Schnabl,J.M.,Yang,C. and Sladek,F.M. (2011) Nuclear receptor HNF4alphabinding sequences are widespread in Alu repeats. BMC Genomics,12, 560.

    77. Chu,S.W., Noyes,M.B., Christensen,R.G., Pierce,B.G., Zhu,L.J.,Weng,Z., Stormo,G.D. and Wolfe,S.A. (2012) Exploring theDNA-recognition potential of homeodomains. Genome Res., 22,18891898.

    78. Noyes,M.B., Christensen,R.G., Wakabayashi,A., Stormo,G.D.,Brodsky,M.H. and Wolfe,S.A. (2008) Analysis of homeodomainspecicities allows the family-wide prediction of preferredrecognition sites. Cell, 133, 12771289.

    79. Fang,B., Mane-Padros,D., Bolotin,E., Jiang,T. and Sladek,F.M.(2012) Identication of a binding motif specic to HNF4 bycomparative analysis of multiple nuclear receptors. Nucleic AcidsRes., 40, 53435356.

    80. Zhou,T., Yang,L., Lu,Y., Dror,I., Dantas Machado,A.C.,Ghane,T., Di Felice,R. and Rohs,R. (2013) DNAshape: a methodfor the high-throughput prediction of DNA structural features ona genomic scale. Nucleic Acids Res., 41, W56W62.

    81. Badis,G., Chan,E.T., van Bakel,H., Pena-Castillo,L., Tillo,D.,Tsui,K., Carlson,C.D., Gossett,A.J., Hasinoff,M.J., Warren,C.L.et al. (2008) A library of yeast transcription factor motifs revealsa widespread function for Rsc3 in targeting nucleosome exclusionat promoters. Mol. Cell, 32, 878887.

    82. Kim,J. and Struhl,K. (1995) Determinants of half-site spacingpreferences that distinguish AP-1 and ATF/CREB bZIP domains.Nucleic Acids Res., 23, 25312537.

    83. Vinson,C., Myakishev,M., Acharya,A., Mir,A.A., Moll,J.R. andBonovich,M. (2002) Classication of human B-ZIP proteins basedon dimerization properties. Mol. Cell. Biol., 22, 63216335.

    84. Kuo,D., Licon,K., Bandyopadhyay,S., Chuang,R., Luo,C.,Catalana,J., Ravasi,T., Tan,K. and Ideker,T. (2010) Coevolutionwithin a transcriptional network by compensatory trans and cismutations. Genome Res., 20, 16721678.

    85. Khorasanizadeh,S. and Rastinejad,F. (2001) Nuclear-receptorinteractions on DNA-response elements. Trends Biochem. Sci., 26,384390.

    86. Kurokawa,R., DiRenzo,J., Boehm,M., Sugarman,J., Gloss,B.,Rosenfeld,M.G., Heyman,R.A. and Glass,C.K. (1994) Regulationof retinoid signalling by receptor polarity and allosteric control ofligand binding. Nature, 371, 528531.

    87. Luscombe,N.M., Austin,S.E., Berman,H.M. and Thornton,J.M.(2000) An overview of the structures of protein-DNA complexes.Genome Biol., 1, REVIEWS001.

    88. Perkins,A.S., Fishel,R., Jenkins,N.A. and Copeland,N.G. (1991)Evi-1, a murine zinc nger proto-oncogene, encodes a sequence-specic DNA-binding protein. Mol. Cell. Biol., 11, 26652674.

    89. Delwel,R., Funabiki,T., Kreider,B.L., Morishita,K. and Ihle,J.N.(1993) Four of the seven zinc ngers of the Evi-1 myeloid-transforming gene are required for sequence-specic binding to

    Nucleic Acids Research, 2014, Vol. 42, No. 4 2109

  • GA(C/T)AAGA(T/C)AAGATAA. Mol. Cell. Biol., 13,42914300.

    90. Funabiki,T., Kreider,B.L. and Ihle,J.N. (1994) The carboxyldomain of zinc ngers of the Evi-1 myeloid transforming genebinds a consensus sequence of GAAGATGAG. Oncogene, 9,15751581.

    91. Klemm,J.D., Rould,M.A., Aurora,R., Herr,W. and Pabo,C.O.(1994) Crystal structure of the Oct-1 POU domain bound to anoctamer site: DNA recognition with tethered DNA-bindingmodules. Cell, 77, 2132.

    92. Verrijzer,C.P., Alkema,M.J., van Weperen,W.W., VanLeeuwen,H.C., Strating,M.J. and van der Vliet,P.C. (1992) TheDNA binding specicity of the bipartite POU domain and itssubdomains. EMBO J., 11, 49935003.

    93. Kemler,I., Schreiber,E., Muller,M.M., Matthias,P. andSchaffner,W. (1989) Octamer transcription factors bind to twodifferent sequence motifs of the immunoglobulin heavy chainpromoter. EMBO J., 8, 20012008.

    94. Kersten,S., Gronemeyer,H. and Noy,N. (1997) The DNA bindingpattern of the retinoid X receptor is regulated by ligand-dependent modulation of its oligomeric state. J. Biol. Chem., 272,1277112777.

    95. Jolma,A., Yan,J., Whitington,T., Toivonen,J., Nitta,K.R.,Rastas,P., Morgunova,E., Enge,M., Taipale,M., Wei,G. et al.(2013) DNA-binding specicities of human transcription factors.Cell, 152, 327339.

    96. Kim,J.B., Spotts,G.D., Halvorsen,Y.D., Shih,H.M.,Ellenberger,T., Towle,H.C. and Spiegelman,B.M. (1995) DualDNA binding specicity of ADD1/SREBP1 controlled by asingle amino acid in the basic helix-loop-helix domain. Mol.Cell. Biol., 15, 25822588.

    97. Parraga,A., Bellsolell,L., Ferre-DAmare,A.R. and Burley,S.K.(1998) Co-crystal structure of sterol regulatory element bindingprotein 1a at 2.3 A resolution. Structure, 6, 661672.

    98. Fordyce,P.M., Pincus,D., Kimmig,P., Nelson,C.S., El-Samad,H.,Walter,P. and DeRisi,J.L. (2012) Basic leucine zippertranscription factor Hac1 binds DNA in two distinct modes asrevealed by microuidic analyses. Proc. Natl Acad. Sci. USA,109, E3084E3093.

    99. Pique-Regi,R., Degner,J.F., Pai,A.A., Gaffney,D.J., Gilad,Y. andPritchard,J.K. (2011) Accurate inference of transcription factorbinding from DNA sequence and chromatin accessibility data.Genome Res., 21, 447455.

    100. Bulyk,M.L. (2003) Computational prediction of transcription-factor binding site locations. Genome Biol., 5, 201.

    101. Levine,M. and Tjian,R. (2003) Transcription regulation andanimal diversity. Nature, 424, 147151.

    102. Johnson,A.D. (1995) Molecular mechanisms of cell-typedetermination in budding yeast. Curr. Opin Genet. Dev., 5,552558.

    103. Wolberger,C. (1999) Multiprotein-DNA complexes intranscriptional regulation. Annu. Rev. Biophys. Biomol. Struct.,28, 2956.

    104. Johnson,A.D. (1992) In: McKnight,S.L. and Yamamoto,K.R.(eds), Transcriptional Regulation. Cold Spring Harbor Lab.Press, Cold Spring Harbor, NY, pp. 9751006.

    105. Kim,S., Brostromer,E., Xing,D., Jin,J., Chong,S., Ge,H.,Wang,S., Gu,C., Yang,L., Gao,Y.Q. et al. (2013) Probingallostery through DNA. Science, 339, 816819.

    106. Garvie,C.W., Hagman,J. and Wolberger,C. (2001) Structuralstudies of Ets-1/Pax5 complex formation on DNA. Mol. Cell, 8,12671276.

    107. Joshi,R., Passner,J.M., Rohs,R., Jain,R., Sosinsky,A.,Crickmore,M.A., Jacob,V., Aggarwal,A.K., Honig,B. andMann,R.S. (2007) Functional specicity of a Hox proteinmediated by the recognition of minor groove structure. Cell,131, 530543.

    108. Ryoo,H.D. and Mann,R.S. (1999) The control of trunk Hoxspecicity and activity by Extradenticle. Genes Dev., 13,17041716.

    109. Carey,M. (1998) The enhanceosome and transcriptional synergy.Cell, 92, 58.

    110. Ptashne,M. and Gann,A. (1997) Transcriptional activation byrecruitment. Nature, 386, 569577.

    111. Merika,M., Williams,A.J., Chen,G., Collins,T. and Thanos,D.(1998) Recruitment of CBP/p300 by the IFN beta enhanceosomeis required for synergistic activation of transcription. Mol. Cell,1, 277287.

    112. Kim,T.K. and Maniatis,T. (1997) The mechanism oftranscriptional synergy of an in vitro assembled interferon-betaenhanceosome. Mol. Cell, 1, 119129.

    113. Masternak,K., Muhlethaler-Mottet,A., Villard,J., Zufferey,M.,Steimle,V. and Reith,W. (2000) CIITA is a transcriptionalcoactivator that is recruited to MHC class II promoters bymultiple synergistic interactions with an enhanceosome complex.Genes Dev., 14, 11561166.

    114. del Blanco,B., Garcia-Mariscal,A., Wiest,D.L. and Hernandez-Munain,C. (2012) Tcra enhancer activation by inducibletranscription factors downstream of pre-TCR signaling.J. Immunol., 188, 32783293.

    115. Benoist,C. and Mathis,D. (1990) Regulation of majorhistocompatibility complex class-II genes: X, Y and other lettersof the alphabet. Annu. Rev. Immunol., 8, 681715.

    116. Hall,J.M., McDonnell,D.P. and Korach,K.S. (2002) Allostericregulation of estrogen receptor structure, function, andcoactivator recruitment by different estrogen response elements.Mol. Endocrinol., 16, 469486.

    117. Leung,T.H., Hoffmann,A. and Baltimore,D. (2004) Onenucleotide in a kappaB site can determine cofactor specicity forNF-kappaB dimers. Cell, 118, 453464.

    118. Meijsing,S.H., Pufall,M.A., So,A.Y., Bates,D.L., Chen,L. andYamamoto,K.R. (2009) DNA binding site sequence directsglucocorticoid receptor structure and activity. Science, 324,407410.

    119. Mrinal,N., Tomar,A. and Nagaraju,J. (2011) Role of sequenceencoded kappaB DNA geometry in gene regulation by Dorsal.Nucleic Acids Res., 39, 95749591.

    120. Wang,V.Y., Huang,W., Asagiri,M., Spann,N., Hoffmann,A.,Glass,C. and Ghosh,G. (2012) The transcriptional specicity ofNF-kappaB dimers is coded within the kappaB DNA responseelements. Cell Reports, 2, 824839.

    121. Remenyi,A., Scholer,H.R. and Wilmanns,M. (2004)Combinatorial control of gene expression. Nat. Struct. Mol.Biol., 11, 812815.

    122. Chen,Y.Q., Ghosh,S. and Ghosh,G. (1998) A novel DNArecognition mode by the NF-kappa B p65 homodimer. Nat.Struct. Biol., 5, 6773.

    123. Fujita,T., Nolan,G.P., Ghosh,S. and Baltimore,D. (1992)Independent modes of transcriptional activation by the p50 andp65 subunits of NF-kappa B. Genes Dev., 6, 775787.

    124. Scully,K.M., Jacobson,E.M., Jepsen,K., Lunyak,V., Viadiu,H.,Carriere,C., Rose,D.W., Hooshmand,F., Aggarwal,A.K. andRosenfeld,M.G. (2000) Allosteric effects of Pit-1 DNA sites onlong-term repression in cell type specication. Science, 290,11271131.

    125. Remenyi,A., Tomilin,A., Pohl,E., Lins,K., Philippsen,A.,Reinbold,R., Scholer,H.R. and Wilmanns,M. (2001) Differentialdimer activities of the transcription factor Oct-1 by DNA-induced interface swapping. Mol. Cell, 8, 569580.

    126. Chasman,D., Cepek,K., Sharp,P.A. and Pabo,C.O. (1999)Crystal structure of an OCA-B peptide bound to an Oct-1 POUdomain/octamer DNA complex: specic recognition of a protein-DNA interface. Genes Dev., 13, 26502657.

    127. Ingraham,H.A., Chen,R.P., Mangalam,H.J., Elsholtz,H.P.,Flynn,S.E., Lin,C.R., Simmons,D.M., Swanson,L. andRosenfeld,M.G. (1988) A tissue-specic transcription factorcontaining a homeodomain species a pituitary phenotype. Cell,55, 519529.

    128. Ingraham,H.A., Flynn,S.E., Voss,J.W., Albert,V.R.,Kapiloff,M.S., Wilson,L. and Rosenfeld,M.G. (1990) The POU-specic domain of Pit-1 is essential for sequence-specic, highafnity DNA binding and DNA-dependent Pit-1-Pit-1interactions. Cell, 61, 10211033.

    129. Babb,R., Huang,C.C., Auero,D.J. and Herr,W. (2001) DNArecognition by the herpes simplex virus transactivator VP16: anovel DNA-binding structure. Mol. Cell Biol., 21, 47004712.

    130. Kuras,L., Cherest,H., Surdin-Kerjan,Y. and Thomas,D. (1996) Aheteromeric complex containing the centromere binding factor 1

    2110 Nucleic Acids Research, 2014, Vol. 42, No. 4

  • and two basic leucine zipper factors, Met4 and Met28, mediatesthe transcription activation of yeast sulfur metabolism. EMBOJ., 15, 25192529.

    131. Blaiseau,P.L. and Thomas,D. (1998) Multiple transcriptionalactivation complexes tether the yeast activator Met4 to DNA.EMBO J., 17, 63276336.

    132. Wong,D., Teixeira,A., Oikonomopoulos,S., Humburg,P., Lone,I.N.,Saliba,D., Siggers,T., Bulyk,M., Angelov,D., Dimitrov,S. et al.(2011) Extensive characterization of NF-KappaB binding uncoversnon-canonical motifs and advances the interpretation of geneticfunctional traits. Genome Biol., 12, R70.

    133. Ferraris,L., Stewart,A.P., Kang,J., DeSimone,A.M.,Gemberling,M., Tantin,D. and Fairbrother,W.G. (2011)Combinatorial binding of transcription factors in thepluripotency control regions of the genome. Genome Res., 21,10551064.

    134. Bolotin,E., Liao,H., Ta,T.C., Yang,C., Hwang-Verslues,W.,Evans,J.R., Jiang,T. and Sladek,F.M. (2010) Integrated approachfor the identication of human hepatocyte nuclear factor 4alphatarget genes using protein binding microarrays. Hepatology, 51,642653.

    Nucleic Acids Research, 2014, Vol. 42, No. 4 2111