Structural basis for the complex DNA binding behavior of the plant stem cell regulator WUSCHEL
Nature Communications volume 11, Article number: 2223 (2020) Cite this article
Stem cells are one of the foundational evolutionary novelties that allowed the independent emergence of multicellularity in the plant and animal lineages. In plants, the homeodomain (HD) transcription factor WUSCHEL (WUS) is essential for the maintenance of stem cells in the shoot apical meristem. WUS has been reported to bind to diverse DNA motifs and to act as transcriptional activator and repressor. However, the mechanisms underlying this remarkable behavior have remained unclear. Here, we quantitatively delineate WUS binding to three divergent DNA motifs and resolve the relevant structural underpinnings. We show that WUS exhibits a strong binding preference for TGAA repeat sequences, while retaining the ability to weakly bind to TAAT elements. This behavior is attributable to the formation of dimers through interactions of specific residues in the HD that stabilize WUS DNA interaction. Our results provide a mechanistic basis for dissecting WUS dependent regulatory networks in plant stem cell control.
Plant stem cells are embedded into specialized tissues that promote their life-long maintenance, which are called meristems. These meristems are located at the growth points of all plants, namely the shoot and root tips, as well as the vascular cylinder, to support apical-basal and lateral growth, respectively 1 . Similar to animal stem cell systems, signals controlling stem cell identity and activity within the meristem emanate from niche cells located adjacently to stem cells 2 , 3 . However, the cellular and molecular mechanisms of this communication are highly divergent between the two kingdoms. While cell–cell contact and secreted ligands play a central role in animals, direct cytoplasmic connections between neighboring cells, called plasmodesmata, take center stage for the maintenance of plant stem cells 4 . Interestingly, the related homeodomain (HD) transcription factors (TFs) that define the niche cells in shoot and root, namely WUSCHEL (WUS) and WUSCHEL HOMEOBOX 5 (WOX5), move to stem cells and execute their function primarily in these cells 4 , 5 , 6 . Consequently, there is no need for downstream niche to stem cell signaling cascades, since the DNA binding specificities of these TFs will directly dictate the repertoire of genes expressed in stem cells.
Prokaryotes and eukaryotes use different strategies to target TFs to distinct genomic locations. In bacteria, it seems to be sufficient that TFs recognize an extended DNA sequence, whereas TFs in eukaryotes typically bind to shorter DNA recognition motifs and therefore require clustering of sites to achieve specificity 7 . Numerous gene regulatory proteins of higher eukaryotes, such as leucin zipper and zinc finger TFs, bind as symmetric dimers to DNA in a sequence-specific manner, which allows each monomer to bind in a similar fashion and greatly increases the DNA binding affinity 8 , 9 . Consistently, the DNA recognition sequences that are bound by these TFs often are arranged as inverted or everted repeat elements. Many TFs, however, can also associate with nonidentical proteins to form heterodimers composed of two different subunits. As heterodimers are typically composed of two distinct proteins with different DNA-binding specificities, the combination of multiple TFs immensely expands the repertoire of recognized DNA sequences and greatly improves the binding specificity 10 .
The eukaryotic superfamily of homeobox TFs is characterized by the presence of a HD, a short stretch of amino acids (60–66 residues) that forms a helix–loop–helix-turn-helix DNA-binding domain consisting of three alpha helices 11 . HD-TFs play a wide variety of roles in developmental and growth processes such as embryonic patterning, stem cell maintenance, and organ formation in all kingdoms of life 11 , 12 , 13 . In animals, the HOX TFs are the best studied family of HD proteins and specify segment identity during embryo development along the head–tail axis 14 , 15 .
Several studies have addressed WUS DNA binding in some detail and at least three divergent sequence motifs bound by WUS, specifically sequences with a TAAT core, a G-Box like and a TGAA repeat element, have been identified 6 , 16 , 17 , 18 , 19 , 20 , 21 . The TAAT sequence was originally identified since it represents the canonical binding element for HD proteins and was subsequently experimentally confirmed to be bound by WUS by electrophoretic mobility shift assays (EMSAs) and reporter genes for multiple independent targets 6 , 16 , 17 , 20 . More recently, dimerization of WUS on TAAT repeats was suggested to control expression of the stem cell specific signaling factor CLAVATA3 (CLV3) based on EMSA and reporter gene assays 21 . The G-Box like (TCACGTGA) motif was found in a combination of systematic evolution of ligands by exponential enrichment (SELEX) and in vivo WUS chromatin binding data derived from chromatin immunoprecipitation followed by detection by microarrays (ChIP-chip) 18 . SELEX using a recombinant WUS-HD fragment resulted in the enrichment of TCA containing sequences and the G-Box like element, which represented an inverted repeat of TCA bases, was found to be the most overrepresented DNA sequence in chromatin regions bound by WUS 18 . Binding of WUS to this motif was confirmed by EMSA and reporter gene analysis. Lastly, the TGAA repeat motif was identified in a large-scale approach using recombinantly expressed TFs and genomic DNA in a highly parallel protein–DNA interaction screen, but has not been verified independently so far 19 . In addition to binding to multiple DNA motifs, WUS also exhibits further functional complexity by acting as transcriptional activator and repressor 17 , 18 , 21 , 22 and the mechanistic basis for both unusual behaviors has remained largely elusive so far.
Here, we have combined molecular, biochemical, and structural approaches to address how the WUS-HD recognizes specific DNA target sites. We find that the DNA-binding preferences of WUS-HD depend on appropriately arranged sequence motifs in a direct tandem repeat and that homodimerization is one of the key determinants to achieve high sequence specificity. We also show that disrupting the dimer interface, either on the protein level or the DNA level, severely reduces DNA-binding affinity.
WUS has a canonical HD fold with unique structural features
As an entry point to elucidate the mechanisms by which the WUS-HD carries out its functions, we recombinantly produced and purified a fragment containing residues 34–103 of WUS (Fig. 1a , Supplementary Fig. 1 ) and determined its crystal structure to a resolution of 1.4 Å (Fig. 1b , Table 1 ). The overall WUS-HD fold reflected that of a canonical HD structure, consisting of a three-helix bundle with an N-terminal arm. Interestingly, superposition with the structure of Engrailed (En) (PDB code 3HDD 23 ) and comparative sequence analyses identified unique structural features in WUS-HD. First, the loop regions connecting the three α-helices are expanded (Fig. 1b, c ). Loop region I is slightly longer compared to other HDs and is characterized by a distortion at the end of helix α1. This so called π-helix or π-bulge is typically characterized by a single amino acid insertion into an existing α-helix (Y54 in WUS) and usually correlates with a particular functional role 24 , 25 . Loop region II has an even longer insertion, which also extends helix α2 by an additional turn compared to En. Secondly, the N-terminal arm is anchored by docking of a tryptophan residue (W39) into a groove formed by helices α2 and α3, while a F, Y, or I residue typically performs this function in canonical HDs (Fig. 1c, d ). Furthermore, this docking residue is shifted by one residue relative to the conserved DNA-contacting arginine (R38) in contrast to three residues in canonical HDs (Fig. 1c ). Electrostatic surface calculations showed a large positively charged surface formed by the C-terminal recognition helix (α3) and the N-terminal arm (Fig. 1e ), which represent the most conserved part of the HD (Fig. 1f ). Finally, comparison with the sequence and structure of En suggested that the readout of DNA bases is mediated by the conserved residues R38, N90, and R94 of WUS-HD (Fig. 1b, c, f ).
Fig. 1: Structure and conservation of the A. thaliana WUS-HD.
a Domain organization of WUS from A. thaliana. WUS contains an N-terminal homeodomain (HD; light blue) and two short linear motifs at its C-terminus, namely the WUS-box (green) and the EAR motif (red), respectively. The domain boundaries are given in residue numbers. b Superimposition of WUS-HD structure (light blue) with HD from En (orange, PDB 3HDD 23 ), illustrating the characteristic three-helix bundle fold of HDs. Encircled are loop regions I and II of WUS, which highlight structural differences compared to classical HDs. c Multiple sequence alignment of representative HDs from plants and animals. The sequences of Arabidopsis thaliana (At), Drosophila melanogaster (Dm) and Homo sapiens (Hs) were aligned using Clustal Omega and visualized with ESPRIPT. Numbering and secondary structure assignment is according to A. thaliana WUS. Loop regions I and II are depicted below the sequences by black lines and the anchoring residue of the N-terminal arm is illustrated by yellow boxes. Highly conserved residues are highlighted (red boxes) and residues making direct DNA-base contact (blue asterisk) and residues involved in the dimer interface (blue up-pointing triangle) are indicated. d Comparison of different anchoring mechanisms of the N-terminal arm for En (orange, PDB 3HDD 23 ) and WUS (light blue). Surface representation of the HDs (gray) highlighting the hydrophobic pocket formed by helices α1 and α2. Amino acids responsible for fixing the N-terminal arm are shown as sticks. e Electrostatic surface potential (red: negative, blue: positive, contoured at ±5kBT) of WUS-HD. f ConSurf analysis showing the degree of amino acid conservation (magenta: conserved, cyan: variable) mapped on to the surface of WUS-HD. Highly conserved amino acids typically involved in DNA base interactions are indicated.
Full size table
WUS prefers tandemly arranged DNA recognition sequences
WUS has been reported to bind to at least three divergent DNA sequences and these motifs have been proposed to be important to determine the transcriptional output of WUS 16 , 18 , 19 . Despite the obvious importance of this issue for resolving the mechanisms of WUS activity in vivo, quantitative data comparing the binding affinities to these motifs were still lacking. In order to elucidate the DNA binding preferences of WUS-HD, we therefore analyzed its interaction with three of the best studied sequences (Fig. 2a, b ), namely a TAAT element from the AG enhancer 16 , a G-Box from the CLV1 promoter 18 and a TGAA repeat element identified in a large in vitro screen 19 . Both TGAA and G-Box harbor two atypical HD recognition motifs 26 , 27 that are arranged as a direct repeat and inverted repeat, respectively (Supplementary Fig. 2a, b ), while the TAAT DNA only contains one typical HD recognition motif (Fig. 2b ).
Fig. 2: Characterization of WUS-HD DNA-binding behavior in vitro, in vivo, and by crystallography.
a, DNA-binding affinity of YFP-WUS determined by MST for TGAA, G-Box, and TAAT DNA. All measurements were performed in triplicates and the respective dissociation constant (Kd) is indicated. A control MST reaction was performed with YFP alone in the presence of TGAA DNA. Data are means ± SEM (error bars), n = 3. Source data are provided as a Source Data file. b Comparison of the fraction bound for WUS-HD binding to different sequences of a 16-bp DNA fragment. The respective DNA recognition motifs are highlighted for TGAA (red), G-Box (green), and TAAT (blue). c Analysis of WUS chromatin binding in vivo by ChIP-seq. Binding probabilities of WUS to TGAATGAA (red), TCACGTGA (G-Box, green), and TTAATGG (blue) containing chromatin regions. Curves shifted to the right indicate a higher probability of a given sequence element to be associated with chromatin of high WUS occupancy and hence high affinity binding. d Overall structures of WUS–DNA complexes showing the mode of binding of two WUS-HD molecules per DNA for TGAA (top) and three WUS-HD molecules per DNA for G-Box (center) and TAAT (bottom). WUS-HDs are in teal, dark, and light blue and DNA-strands are in gray and green. e Schematic representation of DNA contact sites of WUS-HD for TGAA (top), G-Box (center), and TAAT (bottom). Ovals indicate amino acids that mainly contact a DNA base or a sugar-phosphate backbone moiety, and ovals with arrowheads specify amino acids that make multiple contacts with DNA bases and/or the sugar-phosphate backbone. Numbering of DNA bases is arbitrary starting from position 1 at each 5′-end.
Full size image
We employed microscale thermophoresis (MST) with a N-terminal YFP fusion of WUS-HD and 16-bp double stranded DNA probes corresponding to naturally occurring regulatory sequences containing either the TAAT, G-Box, or TGAA repeat motif. In line with earlier results 18 and in agreement with the low affinity generally reported for other HDs 11 , the WUS-HD bound the TAAT probe with lower affinity compared to the G-Box containing probe with dissociation constants (Kd) of 10.60 ± 1.67 µM and 3.78 ± 0.42 µM, respectively; ± indicates standard deviation, n = 3 (Fig. 2a, b ).
Intriguingly, the TGAA repeat probe was bound by WUS-HD with much higher affinity (Kd = 0.27 ± 0.03 µM; ± indicates standard deviation, n = 3) than the other two sequences (Fig. 2a, b ). As the 4-bp recognition motifs of the TGAA sequence are not significantly different from the G-Box sequence (Supplementary Fig. 2b ), we hypothesized that the relative position of recognition motifs may be a major determinant of binding specificity. To rule out any contribution of the YFP-tag to DNA-binding specificity a control measurement with YFP alone was performed, which showed no binding to DNA (Fig. 2a ).
In order to compare the observed DNA-binding of WUS-HD to that of the full length (FL) protein, we expressed a fusion of WUS-FL to maltose-binding protein (MBP), and performed EMSA experiments (Supplementary Fig. 3 ). In accordance with our MST results, WUS-FL also exhibited divergent DNA-binding behavior, comparable to WUS-HD, when probed with TAAT, G-Box, and TGAA fluorescently labeled oligonucleotides. Consistent with the observations from the MST analysis, the TGAA repeat sequence had the highest binding affinity (Kd = 0.36 ± 0.06 µM; ± indicates standard deviation, n = 3) of the three probes (Supplementary Fig. 3a ). In addition, WUS-FL bound the G-Box probe with higher affinity compared to the TAAT containing probe (Kd = 1.68 ± 0.30 µM and Kd = 3.15 ± 0.26 µM, respectively; ± indicates standard deviation, n = 3) (Supplementary Fig. 3b, c ).
Overall, the Kd values from MST and EMSA experiments were in good agreement and deviations were mainly within measurement errors. Earlier studies had shown that WUS has the ability to homodimerize via protein domains outside the HD, which was suggested to be critical for WUS function 18 , 21 . However, our results showed that DNA-binding preference of WUS is dictated by the WUS-HD alone. To test whether these results reflect WUS chromatin binding behavior in living plant cells, we analyzed WUS ChIP-seq data 28 using read counts associated with the three DNA-binding motifs as a proxy for affinity (Fig. 2c ). Specifically, we analyzed the probability of TAAT, G-Box, or 2xTGAA repeat motifs to be present in chromatin regions strongly bound by WUS and therefore being covered by a large number of ChIP-seq reads. Since the motifs occur in the genome at vastly divergent numbers, we converted read counts into relative binding probabilities. To this end, we plotted the relative occurrence of an individual sequence in all WUS binding peaks against the number of Chip-seq reads in a ±25 bp window around the motif. In such an analysis, very steep curves in the left part of the coordinate system indicate motifs that occur most frequently in peak regions with low ChIP-seq coverage, whereas curves that are shifted to the right indicate an association of the motif with peaks of higher ChIP-seq reads and hence are suggestive of higher affinity (Supplementary Data 1 ). Our analyses showed that native WUS was indeed associated with 2xTGAA repeat sequences more often than with G-Box containing genomic regions, which was followed by TTAATGG sites. Taken together, our results demonstrated that WUS strongly prefers the TGAA repeat sequence over the G-Box motif and the canonical TAAT element, both in vitro and in vivo.
WUS-HD uses a general binding mode for different DNA sequences
To elucidate the structural basis for these differential interactions, we solved crystal structures of WUS-HD bound to TAAT, G-Box and TGAA repeat probes to resolutions of 2.8, 2.7 and 1.6 Å, respectively (Fig. 2d ). In all crystal structures of WUS-HD/DNA complexes, the unit cell contained two DNA molecules which were occupied by at least two WUS-HDs (Supplementary Fig. 4 ). The structure of the HD fold of each WUS molecule was not modified by the formation of the ternary complex, with an overall root mean square deviation (rmsd) before and after DNA-binding of 0.9 Å over 62 residues, although the length of the N- or C-termini vary in a context dependent manner. Interestingly, in the case of G-Box and TAAT, one of the two protein–DNA complexes in the asymmetric unit contained an additional bound HD, whereas both TGAA structures only included two HDs per DNA (Fig. 2d , Supplementary Fig. 4 ). In both cases, this additional molecule inserted its C-terminal recognition helix into a major groove of the DNA on the complementary strand of one of the prevalent recognition motifs, but did not make contact to the other two protein molecules. In the G-Box structure, this extra HD (Fig. 2d teal) was stabilized by crystal contacts from the neighboring complex and thus likely represents a crystallization artifact (Supplementary Fig. 4c ). Furthermore, initial low resolution (>3 Å) crystal structures of the G-Box complex only ever included two HDs per DNA molecule, similar to the structure seen in Supplementary Fig. 5a .
The binding behavior in the TAAT structure was more complex. Whilst the additional HD in the TAAT complex (Fig. 2d, e light blue) bound on the opposite side of the TAAT recognition motif, one of the other HDs (Fig. 2d, e teal) contacted an unexpected DNA sequence, with less clear DNA-base interactions and an overall higher flexibility, as indicated by elevated B-factors (Supplementary Fig. 6 ). In order to understand the significance of these protein–DNA contacts and to delineate the critical binding regions of the TAAT sequence we performed EMSA experiments, which clearly demonstrated a change in the DNA-binding behavior of WUS-HD to the TAAT sequence probes (Supplementary Fig. 7 ). DNA binding to the T4C probe was largely impaired and no distinct band shifts were visible, indicating that this DNA position is important to form a stable protein–DNA complex (Supplementary Fig. 7a ). In contrast, the T12C probe gave a similar band shift as the wild-type (wt) TAAT sequence, suggesting no or little interference with WUS-HD binding. The band shift for the double mutant T12C, T15C was only slightly modified, suggesting that this DNA position either plays only a minor role in TAAT DNA-binding or the detection of the WUS-HD at this sequence represents an artifact of crystal packing (Supplementary Fig. 7b ).
Collectively, the structural analysis and the EMSA results suggested that two HDs, which bind on opposite sides of the TAAT recognition motif (light and dark blue in Fig. 2d, e ), are crucial for an efficient interaction with the TAAT DNA. An additional HD observed in the TAAT crystal structure (teal in Fig. 2d, e ) seems to be less important for DNA-binding in solution, consistent with a less defined structure and more ambiguous contact sites.
All WUS-HDs were bound to the expected 4-bp recognition motifs, except in the TAAT structure, where one molecule occupied a sequence distinct from the TAAT motif (Fig. 2e ). Despite this, comparison of the ternary complex structures revealed a very similar mode of DNA-binding for each HD; the N-terminal arm spanned the DNA minor groove, whereas the C-terminal recognition helix inserted into the major groove (Supplementary Fig. 8 ). Helix α3 made extensive backbone contacts, whilst both N- and C-terminal regions were engaged in establishing base-specific contacts. In almost all WUS-HD molecules, the N-terminal arm inserted R38 into the minor groove to hydrogen bond with base pairs and typically specified a thymine at the −2 position 29 (Fig. 2e , Supplementary Fig. 8 ). The hydrogen bond donor–acceptor pattern was neither specific to the sense- or antisense-strand, however, the readout by R38 was mediated by base pair recognition and may have also be dependent on DNA shape 30 . The majority of DNA contacts were established by major groove interactions (Fig. 2e , Supplementary Fig. 5b ), involving extensive backbone contacts as well as base pair recognition.
The readout of bases in the major groove was mediated by the conserved residues Q89, N90, and R94 (Figs. 1 f, 2e , and Supplementary Fig. 5b ). N90 that specified adenine (position 0), crucial for HD binding, appeared as most relevant 23 , 31 . R94 favored a guanine at position −1 from the adenine (Supplementary Fig. 2b ), in agreement with the specificity of atypical HDs 26 , 27 . Interestingly, in the TAAT structure position −1 was an adenine, similar to the DNA recognition motif of typical HDs; thus, in this complex R94 was not involved in base recognition and instead contacted the sugar phosphate backbone (Fig. 2e , Supplementary Fig. 8 ). The role of Q89 was less clearly defined by the structures; in some cases, it did not interact directly with DNA and in others it contacted a base at position +2 or +3 on either side of the double strand, consistent with the idea that the conserved Q89 promotes the recognition of bases at these positions 32 , 33 .
Residues K82, N83, and Y86 formed a cluster which bound consecutive phosphate groups of the DNA backbone. K92, R96, and R100 were also involved in backbone contacts, although to a lesser extent as these interactions were not present in all structures and thus presumably were dependent on protein–protein interactions (Fig. 2e , Supplementary Fig. 5b ). Importantly, all side-chains involved in the readout of base pairs were among the most highly conserved residues of the WUS-HD (Figs. 1 c and 2e ). In addition, the observed protein–DNA contacts were consistent with results obtained with other HDs, where specific interactions are established with a 4–7 bp DNA binding site 26 , 27 . Most other conserved amino acids without indicated functions appeared to have structural roles in maintaining the overall HD fold (Fig. 1c ).
WUS-HD prefers the atypical TGAA over the typical TAAT motif
Having identified the DNA recognition preferences of WUS, we compared the binding mode with typical and atypical HDs from metazoans with similar interaction motifs (Fig. 3 ). The “typical” Antennapedia (Antp) HD binds to a core TAAT motif (PDB code 4XID 34 ) as found in our AG derived TAAT probe. The residues involved in establishing base-specific contacts are conserved, however there are notable differences in the interactions formed by Antp-HD and WUS-HD (Fig. 3b ). Commonly, arginine (or lysine) as residue R2 or R3 enables specific read-out of the adenine in the −1 position 29 , 35 , the hallmark of the typical recognition motif. In contrast, the equivalent residues in WUS-HD (T35 and S36) did not form this base-specific contact. However, the N-terminal arm of WUS-HD still contacted adenine −1 via R38, equivalent to the highly conserved R5 that conventionally reads out the −2 position only (Fig. 3a, b ).
Fig. 3: Comparison of sequence specificity with typical and atypical HDs.
a Schematic representation of DNA sequence specificity for typical and atypical HDs. The DNA base-recognition details are depicted for a typical HD in orange (top) and an atypical HD in red (bottom). Note the specific read-out of the adenine at position 0 by N51, characteristic for HD proteins. The numbering of residues is according to the Antp-HD 34 b DNA base-recognition details of WUS compared to other HDs. Top, showing hydrogen bond interactions of Antp (orange, PDB 4XID 34 ) and WUS with the same typical core DNA recognition motif (yellow). Bottom, showing hydrogen bond interactions of Exd (red, PDB 2R5Y 30 ) and WUS with the same atypical core DNA recognition motif. Diagrams below each cartoon representation summarize DNA-base contacts made by each HD.
Full size image
The binding of the “atypical” Extradenticle (Exd) HD (PDB code 2R5Y 30 ) to a core TGAT motif, although very similar in sequence, also shows some differences in the hydrogen bond pattern compared to WUS-HD bound to the G-Box probe (Fig. 3b ). Notably, the conserved R38 of the N-terminal arm in our structure neither contacted the −1 nor the −2 position of the sense-strand. Instead, R38 bound to positions −2 and −3 of the antisense-strand, which highlighted the broader specificity of HDs at position −1 and −2 of the core recognition motif 26 , 27 . However, the guanine at position −1 was specified by R94 in the C-terminal recognition helix, following the usual mechanism for HDs contacting atypical recognition motifs (Fig. 3 ). Taken together, our crystal structures indicated that WUS-HD is able to establish the canonical base-specific contact with guanine in position −1 but not adenine. As the nucleotide base in this position is the main determinant in preferential recognition of typical or atypical motifs, this would suggest that WUS-HD prefers binding to atypical motifs. Regardless, WUS-HD is still able to form specific interactions with the typical TAAT motif, likely reflecting the inherent broad specificity of HDs for DNA sequence recognition 26 , 27 .
WUS-HD binding specificity depends on DNA shape
We noticed that position +1 of the DNA motif made no hydrogen bonds to the protein in any of our structures and therefore we analyzed whether WUS had any base-preference at this position (Fig. 4a ). Intriguingly, MST experiments showed a strong preference of WUS-HD for A or T, with binding to A containing probes roughly twofold stronger than to probes with a T at this position (Supplementary Fig. 9a, b ). In contrast, probes with C or G were bound less tightly, decreasing affinity by ∼7-fold and ∼26-fold, respectively (Supplementary Fig. 9c, d ). Interestingly, the recognition motifs in the TGAA repeat sequence each have adenine at position +1, consistent with the observation that this sequence was bound with a higher affinity than the other two crystallized variants (Fig. 2a, b ).
Fig. 4: Molecular basis for preference of the +1 recognition motif position.
a MST-analysis of WUS sequence recognition specificity for DNA position +1, based on an atypical recognition motif. The tandem recognition motif is in bold letters and position +1 is adenine (red), thymine (blue), cytosine (cyan), and guanine (green). b Predicted minor groove width (MGW) profiles for atypical DNA sequences differing only in the +1 position of the WUS-HD recognition sequence. The color scheme is the same as in (a) and the binding position of WUS R38 inserting into the minor groove is indicated. The DNA sequence is shown at the bottom, where N represents any of the four nucleotides (A,T,C,G). c Structure of WUS (light blue) bound to DNA (green) highlighting the insertion of R38 into the minor groove. The minor groove width (MGW) is indicated by gray arrows. d Structural basis of adenine (TGAA, left) and thymine (G-Box, right) preference at the +1 position. Shown are the residues of the WUS recognition helix making hydrophobic contacts with the C5 methyl group of thymine. The conserved Asn90 residue is shown as a reference and the 2Fo–Fc electron density maps (blue mesh) are contoured at 1.0σ.
Full size image
Why does the +1 DNA motif position have such a strong influence on WUS-HD affinity, despite the fact that this base is not involved in hydrogen bond interactions? Since DNA shape can have a substantial effect on specificity and affinity of HD–DNA complexes 36 , we computationally investigated potential structural differences in the DNA sequences experimentally tested. To this end, we used the DNAShape tool 37 to predict the intrinsic conformation of unbound DNA probes differing in the +1 position focusing on minor groove width (MGW) (Fig. 4b ). Consistent with the overall similarity of the sequences, the predicted MGW profiles are similar in all cases with two MGW minima occurring around the different nucleotides at the +1 position. Interestingly, these minima spatially coincided with the position of the WUS R38 side chain insertion for contacting thymine 7 and thymine 11 (Fig. 4b, c ).
Our analysis showed that the local MGW minima for the 3xTGAA and 3xTGAT sequences are much more pronounced than in the 3xTGAC and 3xTGAG sequences. The 3xTGAA sequence, which had the highest affinity of the four sequences (Kd = 0.06 ± 0.01 µM; ± indicates standard deviation, n = 3), showed two strong minima that overlapped best with the binding position of WUS R38 (Fig. 4b ). In contrast, the 3xTGAT sequence, which had a slightly weaker affinity (Kd = 0.12 ± 0.01 µM; ± indicates standard deviation, n = 3), also exhibited two strong minima, however, they were shifted to position +2. In addition, the 3xTGAC and 3xTGAG sequences had even weaker affinities (Kd = 0.39 ± 0.05 µM and Kd = 1.42 ± 0.17 µM, respectively; ± indicates standard deviation, n = 3), consistent with the local MWG minima being in a different position and a less narrow minor groove. Consequently, the DNA Shape tool predictions suggested that the 3xTGAC and 3xTGAG sequences are less well pre-organized for WUS DNA-binding and require larger conformational changes compared to the 3xTGAA and 3xTGAT sequences.
Besides the widely recognized hydrogen bond interactions of specific bases, hydrophobic contacts can also be an important determinant for protein–DNA specificity 38 . Analysis of the DNA contacts in the structures of TGAA and G-Box, where the +1 motif position was an adenine or thymine, respectively, revealed that hydrophobic residues of WUS-HD made contact to the C5 methyl group of a thymine base (Fig. 4d ). In the G-Box crystal structure, Y86 formed Van der Waals interactions with thymine of the +1 position of the TGAT motif. In addition, the aliphatic chain of K82 interacted with Y86 and thus contributed to the local hydrophobic environment. In contrast, in the TGAA structure A93 contacted the thymine from the complementary strand, which base paired with the adenine of the +1 position (Fig. 4d ). Thus, despite the fact that the +1 DNA motif position was not involved in base-recognition via hydrogen bonds with WUS-HD, the bases at this position had a substantial influence on local DNA conformation and, together with Van der Waals contacts of hydrophobic side chains from WUS-HD, led to a strong preference for A/T over G/C. These findings were also consistent with the experimental observation that the G-Box probe, which contained TGAG and TGAT motifs, was bound with much lower affinity than the TGAA probe, which has two TGAA motifs (Fig. 2a, b ). Furthermore, thymine is the most common base at position +1 in typical recognition motifs and correlates with the presence of an aliphatic residue contacting this position 26 , 27 .
The WUS-HD undergoes DNA-mediated dimerization
One of the surprising findings of our crystallization experiments was that two WUS-HDs were found to bind every DNA molecule, even though the canonical TAAT motif is usually only bound by a single HD 26 , 27 . In addition, we observed that irrespective of the probe sequence, the two WUS-HD molecules are engaged in protein–protein contacts (Fig. 5a , Supplementary Fig. 4 ) even though multi angle light scattering (MALS) demonstrated that the WUS-HD was monomeric in solution (Supplementary Fig. 10a ). Interestingly, these DNA-bound dimers had a unique relative orientation in all structures. Bound to the G-Box probe, the two monomers were positioned head-to-head on the same side of the DNA and had almost identical binding features, probably due to the palindromic nature of the DNA recognition sequence and the negligible interaction between them (Figs. 2 d, 5a , and Supplementary Fig. 11a ). In contrast, the two HD molecules interacting with the TAAT and the TGAA repeat probes were on opposite sides of the DNA and formed specific protein-protein modifications between each other (Figs. 2 d and 5a ). One of the WUS-HD molecules bound to the TGAA probe made additional DNA contacts through stabilization of the helix α3 C-terminus by the other WUS molecule (Fig. 2e , Supplementary Fig. 11b ), which might explain the higher affinity for the TGAA sequence (Fig. 2a, b ). In particular, R96 and R100 of the recognition helix established new contacts to the DNA-backbone, not observed in any of the other WUS–DNA complexes. Similarly, extensive protein–protein interactions with the HD bound to the typical core TAAT sequence likely allowed an additional WUS-HD molecule to occupy an unexpected position in the TAAT structure (Figs. 2 d and 5a ). However, in this case, the interaction with the DNA was less important and the structure was not well resolved in the electron density, as indicated by elevated B-factors (Supplementary Fig. 6 , Supplementary Fig. 11c ). In contrast, in the other, likely more relevant configuration, the two WUS-HD molecules did not exhibit any protein–protein interface, but individually formed a stable protein–DNA complex with the TAAT probe (Fig. 2d , Supplementary Fig. 7 ).
Fig. 5: DNA sequence specificity depends on WUS-HD dimerization.
a DNA-facilitated protein interactions between individual HDs of WUS bound to TGAA (top), G-Box (center), and TAAT (bottom). For clarity DNA was omitted and side chains mediating the protein–protein interface are shown. Amino acids contributing most to the buried surface area are indicated and colors schemes are related to Fig. 2 . b MST-analysis of the DNA-facilitated WUS dimerization interface. Single point mutants were introduced (I66A in red, F85A in green and F101A in blue) and binding was quantified for a 2xTGAA motif. c MST-analysis of orientation preferences for WUS binding toward different arrangements of a 2xTGAA DNA motif. The head-to-head arrangement is in blue and the tail-to-tail arrangement is in red, whilst the same arrangements with a 1 bp spacer are shown in cyan and orange, respectively. As a reference, the binding towards the tandem repeat 2xTGAA DNA motif is shown in black. d MST-analysis of spacing preferences for WUS DNA-binding activity. Binding affinity was measured for three TGAA recognition motifs (red) and with a 4 bp spacer (orange), and for two TGAA recognition motifs (blue), with a 1 bp spacer (cyan) and a 4 bp spacer (green).
Full size image
Cooperative binding determines WUS-HD sequence specificity
Further analysis of the DNA-binding activity of WUS-HD by MST measurements clearly demonstrated a gain in binding affinity with increasing number of recognition motifs, indicating that the binding of multiple HD molecules per DNA molecule occurs in solution as well as in our crystal structures (Supplementary Fig. 12a ). Although in this experimental setup, the derived Kd values are not directly comparable due to the variation in the number of binding sites, the increase in affinity from one (1xTGAA, Kd = 10.50 ± 2.30 µM; ± indicates standard deviation, n = 3) to two binding sites (2xTGAA, Kd = 0.30 ± 0.04 µM; ± indicates standard deviation, n = 3) was still higher than expected for two independent binding sites (Kd ≈2–4 µM). Thus, we hypothesized that this must be a cooperative effect due to favorable interactions between the protein molecules. Interestingly, the affinity of an ideal 2xTGAA DNA repeat was similar to that of the naturally occurring TGAA repeat probe from the CLV1 locus (Kd = 0.27 ± 0.03 µM; ± indicates standard deviation, n = 3) (Fig. 2a , Supplementary Fig. 9e ), suggesting the binding of two WUS-HD molecules per DNA as seen in our TGAA crystal structure (Fig. 2d ). Indeed, the CLV1 derived sequence has two TGAA and one TGTA motif (Fig. 2b ), demonstrating the importance of adenine at the 0 position, crucial for HD binding 23 , 31 .
To test the relevance of cooperativity for chromatin binding of WUS in vivo, we quantified reads of our ChIP-seq data aligning to sequences containing one, two, or three TGAA recognition motifs (Supplementary Fig. 13a ). Consistent with the increase in binding affinity seen by MST, multiple TGAA repeat motifs were bound by WUS much more frequently compared to individual TGAA sequences, demonstrating that cooperative binding is a relevant mechanism for WUS chromatin interaction in vivo.
In order to assess the complex stoichiometry of WUS-HD/DNA complexes we performed MALS analysis with the same DNA probes containing tandem TGAA recognition motifs (Supplementary Fig. 10b–e ). However, the determined molecular mass (MMcalc) for the complex fraction was always approximately 9 kDa lower than the expected theoretical molecular mass (MMtheo), if all recognition motifs were occupied. This suggested the absence of one WUS-HD monomer in the final protein–DNA complex and could be due to a dilution effect during gel filtration.
To unambiguously determine the number of binding events and to probe for cooperativity, we used isothermal titration calorimetry (ITC) to quantify the binding thermodynamics of WUS-HD with a 2xTGAA recognition motif (Supplementary Fig. 12b ). In line with our expectations, binding of the 2xTGAA DNA could be fit best by a sequential binding model, indicating two binding events (Kd,1 = 1.24 ± 0.14 µM and Kd,2 = 0.82 ± 0.07 µM; ± indicates standard deviation, n = 3) with positive cooperativity and a protein to DNA stoichiometry of 2:1.
Dimerization drives cooperative binding of repeat motifs
To mechanistically dissect the positive cooperativity for WUS-HD binding to atypical TGAA repeat sequences, we investigated the protein–protein interactions between the DNA-bound HD dimers observed in our crystal structures in more detail (Fig. 5a ). Although the interaction surface area between the WUS-HD molecules was very small in all cases, covering only 2–6% (90–290 Å2) of the solvent accessible surface area, a few hydrophobic residues (I66, F85, and F101) were notably more buried in both the TAAT and TGAA crystal structures (Fig. 5a ). To functionally test the contribution of the dimerization interface to the DNA-binding activity of WUS, we independently substituted these residues with alanine. MST analysis revealed a reduced DNA-binding affinity of all three WUS-HD variants (Fig. 5b ), supporting our hypothesis that high affinity DNA-binding to TGAA repeat probes requires WUS homodimerization. Interestingly, only positions I66 and F85, lining the interface of helices α2 and α3 of WUS-HD, are conserved within the WOX family, suggesting that F101 may represent a specific feature of WUS compared to other WOX members (Fig. 1c ). Consistent with these findings, we observed that the F101A mutation only reduced binding affinity by a factor of about three (Kd = 0.80 ± 0.12 µM; ± indicates standard deviation, n = 3) compared to >20 (Kd = 8.20 ± 0.84 µM; ± indicates standard deviation, n = 3) and >20 (Kd = 6.81 ± 0.87 µM; ± indicates standard deviation, n = 3) for the I66A and the F85A mutations, respectively (Fig. 5b , Supplementary Fig. 9o–q ).
To test whether these substitution alleles indeed modify the dimerization status rather than indirectly reducing DNA binding affinity by more globally affecting WUS-HD structure, we analyzed the interaction between the TGAA direct repeat probe and the mutants by EMSA experiments (Supplementary Fig. 14 ). In accordance with our MST results, all variants exhibited divergent DNA binding behavior compared to wild-type WUS-HD when probed with the 2xTGAA repeat sequence. Consistent with the observations from the MST analysis, DNA binding of the I66A and F85A variants was largely impaired and no distinct band shifts were visible, suggesting that these conserved residues may play an important role for the overall fold of WUS-HD rather than only mediating dimerization (Supplementary Fig. 14b, c ). In contrast, the F101A variant still bound DNA with reasonable affinity as observed in MST, but the resulting complex was predominantly monomeric, in comparison to the mostly dimeric form observed with wild-type WUS-HD (Fig. 5b, Supplementary Fig. 14a, d ).
Therefore, these results confirmed that WUS-HD forms a homodimer upon DNA-binding, where the interaction between the monomers is scaffolded by DNA and limited to a few amino acid contacts with an important role for F101. In addition, the newly identified dimerization sites of WUS greatly contribute to the cooperative DNA-binding of tandemly arranged TGAA recognition motifs, such as the ones observed in the TGAA direct repeat or CLV1 derived TGAA probes (Fig. 2a, b ).
Since the arrangement of recognition motifs is likely to influence WUS-HD binding affinities for all probes (Fig. 2a, b ), we further examined how WUS-HD DNA-binding depends on the orientation or spacing of two identical core recognition motifs using the TGAA interaction as a model. Interestingly, changing the relative position of the TGAA core recognition motif from a direct tandem repeat into an inverted (tail-to-tail) or everted (head-to-head) repeat configuration on opposite strands led to a drastic decrease in binding affinity by about ∼10-fold (Fig. 5c ). The affinity of the head-to-head sequence probe (Kd = 3.17 ± 0.35 µM; ± indicates standard deviation, n = 3) was similar to that of the naturally occurring G-Box probe (Kd = 3.78 ± 0.42 µM; ± indicates standard deviation, n = 3) from the CLV1 locus, consistent with the observation that this sequence was bound with a lower affinity compared to the TGAA probe (Fig. 2a, b ). This reduction was also observed when changing the orientation from a head-to-head arrangement to a tail-to-tail arrangement (Kd = 1.90 ± 0.20 µM; ± indicates standard deviation, n = 3) (Fig. 5c , Supplementary Fig. 9h, j ). To test whether these observations are relevant for WUS chromatin binding in vivo, we again mined our ChIP-seq data. In accordance with the MST results, the binding probabilities showed a clear correlation with the orientation of two TGAA motifs (Supplementary Fig. 13b ). The direct TGAA repeat motif was bound significantly more often compared to the head-to-head and the tail-to-tail configuration, which both had similar read distributions, consistent with the binding affinities determined by MST. These results confirmed that high affinity binding of WUS-HD to direct TGAA repeat sequences is dependent on the protein–protein interactions observed in our crystal structure, and that these interactions are highly relevant in vivo.
To test this further, we analyzed direct tandem repeat sequences with variable spacing between the TGAA motifs by MST (Fig. 5d ). In line with our hypothesis that interactions between neighboring WUS-HD molecules are required for high affinity binding, we observed a substantial reduction in DNA-binding affinity when we separated the TGAA motifs. Additional spacing by one nucleotide led to a reduction by ∼7-fold (Kd = 2.21 ± 0.25 µM; ± indicates standard deviation, n = 3) and ∼15-fold (Kd = 0.84 ± 0.12 µM; ± indicates standard deviation, n = 3) for 2xTGAA and 3xTGAA respectively (Fig. 5d , Supplementary Fig. 9l, m ). Increasing the spacer length up to four nucleotides (Kd = 3.58 ± 0.44 µM; ± indicates standard deviation, n = 3) did not lead to a more pronounced effect. Notably, this effect was not observed when we introduced a spacer between motifs situated on different DNA strands of inverted and everted repeat probes (Fig. 5c ), a motif arrangement that does not allow protein–protein interactions to begin with. Consistent with the observations from the MST analysis, the ChIP-seq data also showed a reduction in binding probability when two TGAA motifs were separated by an additional nucleotide (Supplementary Fig. 13c ). Taken together with the cooperativity shown by ITC, these results strongly suggested that stabilizing protein-protein interactions between WUS-HDs promote high affinity binding to DNA containing direct repeats of tandemly arranged TGAA recognition motifs.
Base specific contacts are crucial for sequence specificity
Characterization of the WUS-HD DNA-binding specificity has shown that WUS-HD prefers atypical TGAA repeat sequences, while typical TAAT elements were bound less efficiently (Figs. 2 b and 3b ). Hence, we wanted to identify the mechanisms responsible for this behavior and test whether we could reprogram the DNA-binding preferences of WUS from an atypical TGAA motif to a typical TAAT motif. In atypical HDs (e.g., Exd 30 ), an arginine (R94 in WUS) reads out the guanine at position −1 of the DNA recognition sequence. However, in typical HDs this residue is commonly a lysine, which contacts the sugar-phosphate backbone (Fig. 6a ). In addition, typical HDs (e.g., Antp 34 and En 23 ) usually contain one or two positively changed residues at their N-terminal arm that specify an adenine at position −1 of the DNA recognition motif. In WUS these residues are T35 and S36, which were either not visible in the crystal structures of the WUS/DNA complexes or not involved in DNA contacts.
Fig. 6: Altered DNA specificity of the WUS-HD.
a Schematic representation showing the sequence specificity of WUS-HD wild-type (left) and the RRK mutant (right) for typical TAAT (top panel) and atypical TGAA recognition motifs (bottom panel). Hydrogen bond interactions involved in base-recognition are highlighted (dashed lines) and relevant residues are indicated. Altered residues in the RRK mutant are shown in blue. b Electrophoretic mobility shift assays (EMSAs) of altered DNA specificity of WUS-HD for TGAA (left) and TAAT (right) probes. Monomer (M) and dimer (D) bound forms of WUS are indicated and the tested construct is given on top of the gel. c Quantification of WUS-HD DNA-binding affinity for TGAA (left) and TAAT (right) DNA by MST for the single point mutants T35R (blue), S36R (green), and R94K (red). As a reference the DNA-binding affinity of WUS-HD wt is shown in black.