Genomic disorders ten years on

It is now becoming generally accepted that a significant amount of human genetic variation is due to structural changes of the genome rather than to base-pair changes in the DNA. As for base-pair changes, knowledge of gene and genome function has been informed by structural alterations that convey clinical phenotypes. Genomic disorders are a class of human conditions that result from structural changes of the human genome that convey traits or susceptibility to traits. The path to the delineation of genomic disorders is intertwined with the evolving technologies that have enabled the resolution of human genome analyses to continue increasing. Similarly, the ability to perform high-resolution human genome analysis has fueled the current and future clinical implementation of such discoveries in the evolving field of genome medicine.

Genomic disorders are diseases that result from rearrangements of the human genome rather than from DNA sequence base changes. Moreover, such rearrangements occur because of architectural features of the genome that incite genome instability. The idea of genomic disorders emanated from locus-specific studies of the common autosomal dominant peripheral neuropathies: Charcot-Marie-Tooth disease type 1A (CMT1A; Mendelian Inheritance in Man (MIM) database ID 118220 [1]) and hereditary neuropathy with liability to pressure palsies (HNPP; MIM 162500). A careful re-read of the early reports on these conditions reveals nearly all the key concepts of genomic disorders, including genomic duplication [2,3] and deletion [4], gene dosage (PMP22) [5-8] and specific gene copy number variation (CNV) [6][7][8]. The concepts of genome architecture and low-copy repeats (LCRs) or segmental duplications (SDs) were well described before there was either a draft or a finished reference genome sequence [9,10] ( Figure 1). The term LCR was first introduced by Bernice Morrow following her studies of DiGeorge syndrome (MIM 188400) rearrangement breakpoints [11] whereas the term SD was introduced by Evan Eichler [12,13] to explain his observations from genomewide studies. The concepts of non-allelic homologous recombination (NAHR [9], although the specific term NAHR was not introduced until later [14]), reciprocal recombination resulting in duplication/deletion of the same genomic interval [9,10], recombination hotspots [15,16] and the effects of CNV (such as duplication) on the interpretation of the segregation of marker genotypes [2,17] also began to emerge at this early stage.
Nevertheless, progress was blocked by both technological and conceptual limitations. Technically, we had no way to view the entire human genome simultaneously at a level of resolution that would enable insights into molecular mechanisms. Conceptually, locus-specific thinking had permeated genetics for over a century, with genocentric (gene-specific) views and base-pair changes as the one form of mutation predominating during the latter half of the 20th century and often blindly biasing genetic thinking to this day. The significant heritability and uncertain molecular basis of common disorders has been approached with such genocentric and 'point mutation' genetic thinking. Even now, we witness this as a recurrent theme with an excessive focus on genome-wide association studies (GWASs) evaluating ancient SNPs, as contrasted with the potential involvement of recent or new mutations and/or CNV.
At the time of the early studies leading to the concept of genomic disorders, the one way to visualize the entire human genome was through chromosome studies and usually by the G-banded karyotype provided from clinical cytogenetics. We were thus fascinated and excited to find that our studies of a microdeletion syndrome, the Smith-Magenis syndrome (SMS; MIM 182290), which results from a 3.7 Mb genomic deletion rearrangement large enough to be visualized by microscopy, revealed similar observations to those found for CMT1A/HNPP, including recurrent breakpoints [18-20], a surrounding genomic architecture consisting of LCRs (repeat gene clusters in this case) [21], reciprocal recombination [22,23] and occurrence by NAHR [21] (Figures 1 and 2).
These findings crystallized and solidified the concept of genomic disorders [24]. The concept of genomic disorders is predicated on two general ideas: firstly, that genomic disorders occur by rearrangements of our genome (the human genome is disordered) and not by DNA-sequencebased changes (that is, not by base-pair changes or by SNPs that cause disease); and secondly, that genome architecture incites genome instability. This article stated that structural characteristics of the human genome predispose it to rearrangements that result in human disease traits, and that genome alterations can occur through many mechanisms, including homologous recombination between regionspecific LCRs [24]. This first mechanism was later termed NAHR [14]. The term NAHR stresses the mechanism by which these particular rearrangements of the human genome occur, including the requirement for homologous substrates and the observations of gene conversion and recombination hotspots. Furthermore, NAHR can cause duplication, deletion and inversion. In contrast, unequal crossing-over usually ) Southern hybridization with a CMT1A-REP probe. There are two cross-hybridizing signals in human genomic DNA (lane 1), none in the mouse and hamster genomic DNA (lanes 2 and 3), and the same two in a monochromosomal hybrid (MH22-6, lane 4) retaining human chromosome 17. Both copies map to the CMT1A duplication region at 17p12. This is interpreted as showing that there are two copies of CMT1A-REP, both mapping to the CMT1A duplication locus, and both of which evolved late in the mammalian radiation as they are not present in mouse or hamster [9]. ( (c c) ) Three copies of SMS-REP (arrows) on chromosome 17 [21]. We used the term REP because at the time my laboratory was working with prokaryotic repeated sequences (REP) and had developed a technique we referred to as rep-PCR [157,158].   [127]. Submicroscopic duplications as a cause of X-linked mental retardation [128,129] and other mental retardation syndromes [130,131] are now revealed. Many new genomic disorders caused by submicroscopic duplications and deletions continue to be described and are catalogued in the DECIPHER database [132].
Continued systematic investigations of rearrangements associated with genomic disorders have uncovered a new mechanism for rearrangements within our genome. As explained above, research on recurrent rearrangements with breakpoint clustering at LCRs/SDs enabled the elucidation of the NAHR mechanism. Recent studies of genomic disorders caused by non-recurrent rearrangements (rearrangements of different sizes and with different breakpoints in each individual) have uncovered a new replication-based human genomic rearrangement mechanism termed FoSTeS (fork stalling and template switching). First unveiled through studies of PLP1 duplications associated with Pelizaeus Merzbacher disease [133], a genomic disorder by the criteria originally defined [24], the mechanism has now been shown to cause some LIS1 duplications [134], MECP2 duplications [93], PMP22 and RAI1 duplications [135], PMP22 exon deletions [135] and some interstitial 9q34 deletions thought to represent terminal deletions [136]. The FoSTeS mechanism, as described based upon the phenomenology of breakpoint/join point sequence analysis in human genomic disorders, has been generalized and the molecular details refined, including through genetic and genomic observations on chromosomal rearrangements in other model organisms (for example, Escherichia coli and yeast), and resulting in the microhomology mediated break induced replication (MMBIR) model that may be operative in all life forms [137]. MMBIR can explain many complex rearrangements [137], such as duplication-triplication-duplication ( Figure 3). It may be a novel repair pathway for one-ended, doublestranded DNA generated from collapsed replication forks [137]. Such collapsed forks can occur as a replication fork proceeds through a nick or single-strand region generated by local genome architecture. Furthermore, MMBIR predicts that complex human genomic rearrangements will often be accompanied by extensive loss of heterozygosity and, in some cases, by loss of imprinting because the chromosome that is copied may be either the sister or the homolog [137]. Such loss of heterozygosity could lead to regional uniparental disomy [138] as a novel mechanism for disease.
In addition to NAHR and FoSTeS/MMBIR, other mechanisms may remain to be uncovered that fulfill the original conception of genomic disorders. Genome architecture may be different for individuals as a result of structural variation within a particular population [50-54,139], so particular individuals may be more susceptible than others to having either a genomic disorder or an offspring with one. Furthermore, other mechanisms, such as nonhomologous end joining and retrotransposition, can lead to structural variation that results in genomic disorders [140], and unique genome architectural features other than LCR/SD, such as AT-rich palindromes [141,142] and non-B DNA conformations [86,143], can incite genome instability. Systematic studies of disorders that occur by such mechanisms may provide insights into local genome architecture that could potentially influence susceptibility to rearrangement; they may thus delineate the 'rules' for FoSTeS/MMBIR as was done for NAHR.
It was initially not known whether human genomic rearrangements reflected random DNA breaks or perhaps selection/survival of genomic regions that could tolerate the gains and losses of CNV. Over the past decade, our thinking has evolved and we can now speak of specific mechanisms (NAHR, MMBIR/FoSTeS, nonhomologous end joining and retrotransposition), and elucidation of the rules for such mechanisms has enabled powerful predictions that have had a direct clinical impact. We have also learnt some of the 'rules' regarding genome architecture. It seems that each rearrangement mechanism can occur anywhere in the human genome, but one mechanism may be preferred over another at a given locus depending on local genome architecture (for example, LCR/SD or non-B DNA). We have realized that CNVs are as important as SNPs to human mutation and perhaps even more important with regard to human sporadic traits [87,127]. Whether CNV or SNP is the more favored mutational event at a given locus may again reflect what the local genome architecture is around that locus [140]. The elucidation of both the mechanisms of CNV formation [144] and how CNVs affect genes to convey phenotypes [145], whether the latter occurs through altered copy number [75,146], gene dysregulation or position effect, has to a large extent come from studies of genomic disorders [147]. The clinical phenotype allows the ascertainment of the genomic rearrangement from the population to enable the molecular studies.
The 'rules' for MMBIR/FoSTeS remain to be further defined with respect to the human genome architecture that might stimulate the events [93,133]. Unquestionably, many more genomic disorders are still to be defined and many Mendelian and complex traits may be shown to be caused by CNV, rather than SNPs of a given gene in selected patients. Thus, a potentially more fruitful and cost-efficient approach to the study of human complex traits may be to examine a few hundred patients for CNV associated with the trait, rather than perform SNP-based GWASs. Such an approach recently yielded insights into Wolf-Parkinson-White syndrome, a common pre-excitation phenomenon resulting in a characteristic electrocardiographic pattern [148]. Certainly all GWASs should look for CNV and not just focus on SNPs [149].
Perhaps the most significant findings regarding the human genome that were not anticipated by the human genome project [45-47,77,78] were the elucidation of genomic disorders and the discovery of the extent to which we vary from each other genetically as a result of CNV. In fact, the establishment of a reference haploid versus diploid genome truly reflects our naiveté with regards to the importance of CNV for human traits. With further widespread clinical implementation of high-resolution human genome analysis, submicroscopic genomic duplications and deletions will probably be identified at an increasing rate. Potentially, the vast majority of the human genome could be involved in CNV, perhaps more of the genome will be subject to, or tolerate, duplication CNV than deletion as observed for chromosomal studies [150,151], and 'reverse genomics' could be used to systematically delineate genomotype-phenotype correlations [134]. The genomic change accompanying a CNV results in a genomotype that may include either more than one, or no genes involved in conveying the specific phenotype and thus is distinct from a genotype.  Such studies will directly address the question: what is the genomic code? This is needed because the genetic code has only addressed the functions of under 2% of the human genome: the coding exons. Systematic analyses of the size, extent and genomic content of CNV and associated phenotypes might lead to a new understanding of 'cis-genetics', the phenotypic consequences of CNV encompassing multiple genes and/or regulatory sequences on one chromosome homolog, as opposed to the 'trans-genetics' focus of Mendelian segregation and transmission of homologous chromosomes. Furthermore, the extents to which human genomic rearrangements occur somatically in mitotic cells are only beginning to be explored [135,[152][153][154][155][156]. Thus, genomic disorders will probably continue to be a fruitful area for ongoing and future research.
A Ab bb br re ev vi ia at ti io on ns s