Trinucleotide repeats: triggers for genomic disorders?

Among the various sequence repeats that shape the human genome, trinucleotide repeats have attracted special interest as a result of their involvement in a class of human genetic disorders known as triplet repeat expansion diseases. Recently, long TGG repeat tracts were shown to be implicated in a genomic disorder resulting from chromosome 14q32.2 deletion. Various different mechanisms might trigger this deletion, and looking at the problem from a structural biology perspective may help. Deeper insight into repeated sequences and their features may shed light on the mechanisms involved in this microdeletion and similar genomic rearrangements.

insertions and a class of diseases known as genomic disorders, caused by deletions or insertions of tens of thousands to several million base pairs. The group of genomic disorders with identified mutation mechanisms is constantly increasing, with major mechanisms including non-allelic homologous recombination (NAHR), non-homologous end joining (NHEJ) and replication fork stalling and template switching (FoSTeS) [14].

TGG repeats trigger recurrent microdeletion
A recently published article [15] shows a link between a TNR sequence and a human genomic disorder related to OMIM #608149. The authors demonstrated that the recurrent 1.11 Mb microdeletion from the long arm of paternal chromosome 14 (14q32.2) is catalyzed by long tracts of interrupted TGG repeats (approximately 500 bp in size) located at both sides of the deletion with 88% sequence similarity (Figure 1d). An identical heterozygous deletion was found in two unrelated patients diagnosed with several clinical phenotypes (such as growth retardation, hypotonia, precocious puberty and mental retardation) characteristic of maternal uni paren tal disomy (UPD(14)mat). UPD is defined by the inheritance of two copies of a chromosome from only one parent, a mother in this case, and is related to parent-specific imprinting of some genes. The deleted 14q32.2 region harbors 13 protein-coding genes, small nucleolar RNA (snoRNA) and microRNA loci [15] (Figure 1d). Two of these genes, Delta-like homolog 1 (DLK1) and retrotransposon-like 1 (RTL1), are maternally imprinted (pater nally expressed), which explains several disease symptoms [15].
The authors [15] considered several possible deletion mechanisms ( Figure 2b). First, the deletion may be mediated by NAHR that occurs between two TGG repeat tracts. Second, it may result from an inherent instability of the repeat and/or from the stable structure that the repeated sequence is very likely to form, and either of these would affect the second and third possible mechanisms, NHEJ and FoSTeS. NAHR is the mechanism that best explains genomic rearrangements in which sites are flanked by highly similar sequences. Most of the recurrent genomic rearrangements that have a common size and fixed breakpoints are thought to occur by NAHR [14]. However, none of the recurrent genomic disorders known so far, perhaps with the exception of some cases of Jacobsen syndrome [16], have recombination hot spots located in triplet repeat tracts. Typically, the NAHR breakpoints are located in LCRs 10 to 300 kb in size that share over 95% similarity [14]. NAHR hotspots are typically 300 to 500 bp in size and contain non-B DNA structures capable of inducing double-stranded DNA (dsDNA) breaks, such as palindromes, DNA transposons and minisatellites but not microsatellites [17]. The STR (b) Expanded CTG repeats (60 to a few thousand) in the 3' UTR of the DMPK gene are transcribed but not translated. Long CUG repeat hairpins cause a toxic dominant RNA gain-of-function effect mediated by sequestration of nuclear RNA-binding proteins, such as the alternative splicing regulator muscleblind-like 1 (MBNL1). There is clear evidence of an RNA gain-of-function effect in at least five TREDs: DM1, DM2 (expanded CCTG repeats), fragile X-associated tremor ataxia syndrome (FXTAS; expanded CGG repeats), Huntington's disease-like 2 (HDL2) and SCA8 (expanded CTG repeats). (c) The mutated HTT gene with expanded CAG repeats (40 to 100 repeats) in the coding region is transcribed and translated into a toxic protein containing an abnormally long polyglutamine domain. Intracellular aggregation of mutant protein is responsible for the pathogenesis of HD. A similar pathological mechanism is postulated for several dominant disorders known as polyglutamine expansion diseases: seven different spinocerebellar ataxias (SCA1, 2, 3, 6, 7, 8 and 17), dentatorubral-pallidoluysian atrophy (DRPLA) and spinal and bulbar muscular atrophy (SBMA). (d) Diseases caused by long TGG repeat tracts. The dominant UPD (14)mat-like phenotype is caused by the deletion of a 1.11 Mb fragment of chromosome 14q32, which is mediated by two interrupted TGG repeat tracts (red boxes A and B). The deleted fragment contains about a dozen protein and short RNA coding genes, including paternally (green) and maternally (red) imprinted genes. The phenotype results from loss of function of two genes, DLK1 and RTL1, and haplo-insufficiency of the others.  Blue bars indicate the number of pure TNR tracts with at least eight repeat units according to [4] (this is the length required for stable G-quadruplex formation); red bars indicate the number of interrupted TNR tracts with at least 100 units (the minimal sequence length required for catalyzing NAHR is 300 bp) according to Simple Repeat track, available on the UCSC Browser (hg18) (our unpublished data). (d) G-quadruplex structure formed by eight GGA DNA repeats (GGA) 8 [21]. The most 5' and 3' nucleotides are shown and arrows indicate direction of DNA strand from 5' to 3' end. A similar structure can be expected for TGG repeats based on the results of an RNA study [23].  TGGTGGTGGTGGTGGTGGTGGTGGTGGTGG  TGGTGGTGGTGG  TG  GTGG  TGGTGG  TGGTGGTGGTGGTGGTGGTGGTGG  TGGT  GGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGG  TGG  TGGTGGTGGTGGTGG  TGGTGGTGGTGG  TGGTGG  TGGTGGTG  GTGG  TGGTGGTGG  TGGTGG  TGGTGGTGGTGG  TGGT  GGTGGTGG  TGGTGGTGGTGG  TGGTGGTGGTGGTGGTGG  TGGTGG  TGGTGGTGGTGGTGGTGGTGGTGG  TGGTGGTGGTG  G  TGGTGGTGGTGG  TGGTGGTGGTGGTGGTGGTGGTGGTGG  T  GGTGGTGGTGGTGGTGGTGG  TGGTGGTGGTGG  TGGTGG  TGG  TGGTGGTGGTGGTGGTGGTGGTGG  TGGTGGTGG   TGA  TGA  TGA  TAT  TACTGA  TACTGA  TGT  TGT  TGT  TTGTGA  TAC  TGT  TGT  TGT  TTGTGA  TAT  TAT  TACTGA  TGT  TGT  TAC  TGT  TGT  TGT  AGGG   Repeat tract B   GTGGTGG  TGGTGGTGG  TGGTGGTGG  TGGTGGTGG  TGG  TGGTGGTGGTGGTGG  TGGTGGTGG  TGGTGGT  GGTGG  TGG  TGGTGG  TGGTGGTGGTGGTGGTGGTGGTGG  TGGTGGTGG  TGGTGGTGGTGG  TGGTGG  TGGTGGTGGTGGTG  G  TGGTGGTGGTGG  TGGTGGTGGTGGTGG  TGGTGGTGGTGGT  GG  TGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGG  TGG  TGG  TGGTGGTGGTGGTGGTGGTGGTGGTGG  TGG  TGGTGGTGGTGGTG  G  TGGTGGTGGTGGTGG  TGGTGGTGGTGGTGGTGGTGGTGGTGGT  GG  TGGTGGTGGTGGTGG  TGGTGGTGGTGGTGGTGGTGGTGGTGG  TGG  TGG  TGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGG   TGATAGTAG  TAG  TAT  G  GGGGG  GGGGG  TAT  TGA  TAG  TAGTAG  CGG  TAT  TGA  TGA  AGG  TGA  TGA  AGG  TGA  TGA  TGA  TGA  AGG  AGG  TGA  AGG  TGA  sequences are typically associated with a second recombination mechanism, NHEJ (Figure 2b), which has evolved to repair dsDNA breaks [17] and as such does not require sequence similarity at breakpoints. A third mechanism, FoSTeS, involves switching of the replicated strand to another replication fork (Figure 2b), which could also happen at TGG repeats [14]. None of these three mechanisms requires TGG repeat expansion, but repeat polymorphisms could modulate deletion frequency.

Structural insight into TGG repeats
A closer inspection of the nucleotide sequences of the TGG repeat segments (Figure 2a) may shed more light on the likelihood of the proposed mechanisms. Both segments (A and B in Figure 2a) contain approximately 60 repeat interruptions (mainly single nucleotide substitutions). The longest uninterrupted TGG repeat is 15 repeat units, and 12 tracts are at least 8 units. Pure repeat tracts of this length probably show only moderate repeat number polymorphism [18]. The repeat interruptions are mostly TGA, TAG and AGG triplets in one repeat tract and TGA, TGT and TAC in the other (Figure 2a). The interrupting triplets may prevent repeated sequences from expansion, which is known to be the case for interrupted CGG and CAG repeats in genes implicated in FXS, SCA1 and SCA2 [19]. Repeat expansions in these genes require the previous loss of repeat interruptions, which are thought to inhibit inter-strand slippage and to suppress intra-strand interaction [7,19]. Bena et al. [15] consider the possibility that the TGG repeat tracts are unstable. They demonstrate that TGG repeats are, on average, much longer than any other TNR in the genome. The analysis we have performed using the same constraints (our unpublished work) shows the frequency of TNR tracts in the genome and reveals that AGG and TGG repeats most frequently form the longest tracts of at least 100 units (300 bp), which may facilitate the NAHR mechanism (Figure 2c). Considering only pure repeat tracts of at least 8 units, which may be implicated in repeat instability, the total number of TGG repeats in the genome is similar to that of AGG and much lower than that of TAA and CAA repeats (Figure 2c) [4]. Taking the structural perspective, the repeated sequences within DNA become transiently singlestranded during DNA replication, recombination, repair and transcription, which allows non-B-DNA structure formation and various downstream effects [20]. The repeat interruptions present within the TGG repeats will no doubt influence their ability to form G-quadruplexes and would be likely to diversify the G-quadruplex structures. It is likely that there will be a heterogeneous mixture of structural variants formed by the repeated sequence and their core elements may resemble the Gquadruplex structures described for AGG repeats ( Figure   2d) [21]. Notably, the longest repeat tracts of at least 100 units consist of AGG and TGG repeats (Figure 2c), which are capable of forming G-quadruplex structures. For both of these repeat types, the presence of just four repeats is sufficient to form minimal G-quadruplex structures (Figure 2d) that can stack on each other and become more stable. One lesson that can be taken from our analysis of the putative mechanisms underlying the 14q32.2 deletion is that deeper insight into the features of repeated sequences may be needed to identify and better understand the mechanism involved.

The tip of the iceberg or a scarce phenomenon?
Whatever the exact mechanism implicated in the 14q32.2 deletion [15], the involvement of TGG repeat tracts in this deletion cannot be questioned. One important issue that needs to be addressed now is how general this kind of mechanism could be. If NAHR is in operation, similar TNR-mediated genomic rearrangements should be predictable, as was shown earlier for LCR sequences [22]. If stable structure is important, the analysis can be narrowed to repeats having the potential to form G-quadruplex (TGG, AGG and CGG) and hairpin (CNG, GAC and GTC) structures [23,24]. If repeat instability is essential, more attention needs to be paid to the nature, density and localization of the repeat interruptions. Genome-wide copy-number variation discovery studies (for example, [25]) may provide important information on this intriguing question.