A clinical survey of mosaic single nucleotide variants in disease-causing genes detected by exome sequencing

Background Although mosaic variation has been known to cause disease for decades, high-throughput sequencing technologies with the analytical sensitivity to consistently detect variants at reduced allelic fractions have only recently emerged as routine clinical diagnostic tests. To date, few systematic analyses of mosaic variants detected by diagnostic exome sequencing for diverse clinical indications have been performed. Methods To investigate the frequency, type, allelic fraction, and phenotypic consequences of clinically relevant somatic mosaic single nucleotide variants (SNVs) and characteristics of the corresponding genes, we retrospectively queried reported mosaic variants from a cohort of ~ 12,000 samples submitted for clinical exome sequencing (ES) at Baylor Genetics. Results We found 120 mosaic variants involving 107 genes, including 80 mosaic SNVs in proband samples and 40 in parental/grandparental samples. Average mosaic alternate allele fraction (AAF) detected in autosomes and in X-linked disease genes in females was 18.2% compared with 34.8% in X-linked disease genes in males. Of these mosaic variants, 74 variants (61.7%) were classified as pathogenic or likely pathogenic and 46 (38.3%) as variants of uncertain significance. Mosaic variants occurred in disease genes associated with autosomal dominant (AD) or AD/autosomal recessive (AR) (67/120, 55.8%), X-linked (33/120, 27.5%), AD/somatic (10/120, 8.3%), and AR (8/120, 6.7%) inheritance. Of note, 1.7% (2/120) of variants were found in genes in which only somatic events have been described. Nine genes had recurrent mosaic events in unrelated individuals which accounted for 18.3% (22/120) of all detected mosaic variants in this study. The proband group was enriched for mosaicism affecting Ras signaling pathway genes. Conclusions In sum, an estimated 1.5% of all molecular diagnoses made in this cohort could be attributed to a mosaic variant detected in the proband, while parental mosaicism was identified in 0.3% of families analyzed. As ES design favors breadth over depth of coverage, this estimate of the prevalence of mosaic variants likely represents an underestimate of the total number of clinically relevant mosaic variants in our cohort. Electronic supplementary material The online version of this article (10.1186/s13073-019-0658-2) contains supplementary material, which is available to authorized users.


Background
Mosaicism is defined by the presence of different genotypic variants among cells of an individual that are derived from the same zygote [1]. Depending on the timing of mutation acquisition, mosaicism may be restricted to the germline (gonadal mosaicism) or non-germ cell tissues (somatic mosaicism) or may involve both (gonosomal mosaicism) [2]. It is estimated that three base substitution mutations arise per cell division in early human embryogenesis [3]. Postzygotic mutations dynamically accumulate and/or are negatively selected during the developmental process [4,5], rendering each individual a complex mosaic of multiple genetically unique cell lines [1,4].
Somatic mutations have been well known for their critical role in tumorigenesis [6] and overgrowth syndromes [5]. Mosaic variation has been reported also in asymptomatic individuals. In healthy donors, mutant allele fractions within organ samples ranged from 1.0 to 29.7% [7]. Mosaic variants may be clinically silent for several possible reasons: (1) the mutation is functionally inconsequential, (2) it is restricted to tissues not pertinent to the gene in which the mutation has arisen, (3) it may have occurred after a critical time frame for gene function, or (4) the mutation may be so disadvantageous that selective pressures favor survival and proliferation of cells carrying the reference allele.
Clinically relevant mosaicism is easily recognizable when cutaneous manifestations are present as with segmental neurofibromatosis or McCune-Albright syndrome [8]. However, in the absence of overt skin findings, recognizing underlying mosaicism may present a clinical challenge, particularly when the expressed phenotype deviates substantially from what has been reported in patients with non-mosaic variation. As patients with atypical phenotypes are often referred for exome sequencing (ES), an assessment of the performance of ES for detecting mosaic variation is warranted. Previous studies have evaluated the frequency and type of mosaic variation detectable by ES in specific disease populations, including neurodevelopmental disorders [9], autism [10,11], and congenital heart disease [12]. However, few systematic analyses of mosaic variants detected by diagnostic ES for diverse clinical indications have been performed [13].
To address this gap in the literature and to lay a framework for additional studies of mosaicism in clinically relevant genes, we present a retrospective review of all reported mosaic variants detected in nearly 12,000 consecutive patients referred for diagnostic ES at Baylor Genetics (BG).

Study cohort
Laboratory reports for 11,992 consecutive unrelated patients referred for ES were queried to ascertain all clinically relevant mosaic variants reported between Nov 2011 and Aug 2018. Exome analyses were performed as trio ES in 19.8% (n = 2373) and proband-only ES in 80.2% (n = 9619) of cases. One hundred twenty clinical reports with mosaic variants were analyzed for this study; this included 30 cases (25%) analyzed by trio ES and 90 cases (75%) by proband-only ES. Only mosaic variants detected in DNA samples from peripheral blood were analyzed.

Exome sequencing and analysis
ES was performed at BG laboratories as previously described [14,15] (Additional file 1: Supplementary Methods). The validated ES protocol achieves a mean coverage of 130× with over 95% of targeted regions, including coding and untranslated exons, reaching a minimum coverage of 20×. All samples were concurrently analyzed by the Huma-nOmni1-Quad or HumanExome-12 v1 array (Illumina) for sample identity confirmation and to screen for copy-number variants and regions of homozygosity. Variant classification was performed in accordance with the American College of Medical Genetics and Genomics (ACMG) and Association for Molecular Pathology (AMP) guidelines for variant interpretation [16]. Mosaic variants of uncertain significance in our cohort that were reported prior to the publication of the ACMG/AMP guidelines were reassessed and classified according to the updated criteria.

Computational analyses
To better assess the somatic mosaicism burden in ES data, we performed additional computational analyses of AAF distribution for heterozygous single nucleotide variants (SNVs) in 900 ES trios and simulation experiments for evaluating the effect of potential alignment biases.

Mosaic variants in probands
In proband samples, 80 mosaic variants were found in 72 genes in 33 female patients, 45 male patients, and two fetuses. The vast majority were reported in genes associated with AD (47.5%) and X-linked (30.0%) disorders. Mean AAF in proband samples was 32.6% ± 24.4% (n = 15) for X-linked variants in males and 20.2% ± 9.8% (n = 65) for autosomal variants and variants in X-linked disease genes in females (Table 1, Additional file 3: Table  S4). For 65 of the 80 probands with mosaic variants, both parental samples were available for inheritance determination. Eight probands had only one parental sample available, and 7 probands had no parental samples available for analysis. The majority of mosaic variants detected in probands (63/65) were deemed de novo due to the absence of the variant in parental DNA by Sanger sequencing. Parental chromosome of origin could not be determined due to a lack of informative SNPs flanking the mosaic variants. In patient 55F, a c.1077dupT (p.L362fs) change in ZMPSTE24 (an autosomal recessive disease gene) was found at an AAF of 80% due to suspected uniparental disomy (UPD) involving chromosome 1. In patient 52F, an inherited c.1129A>T (p.K377*) change in COX15 (also an autosomal recessive disease gene) was found at an AAF of 12% due to suspected segmental UPD involving chromosome 10.
Of the mosaic variants detected in the proband samples, 58.8% (n = 47) were classified as pathogenic (P) or likely  Genotype-phenotype analysis was performed for 47 patients with mosaic P/LP variants (Additional file 4) [17]. Eighty-three percent of the patients had core phenotypes that were consistent with what had been previously reported in association with heterozygous variants, with no evidence of disease attenuation related to the mosaic status of the variant. However, patient 43F carrying a c.38G>A (p.G13D) variant  We also found three patients with dual molecular diagnoses in whom a second non-mosaic pathogenic variant was considered contributory to the patient' s phenotype (patients 12U, 27F, and 35M). Two patients had multiple mosaic variants detected, including patient 3M who had 17 mosaic variants, only two of which were clinically reported and included in this analysis (see "Discussion"). Patient 12U had eight mosaic variants detected, but only one was found in a known disease-associated gene; the remaining mosaic variants were excluded from this analysis. In both cases, it was unclear whether the mosaic variants had contributed to the patient' s phenotype or if they were a consequence of an underlying predisposition to somatic mutation in the context of a precancerous or cancerous state.

Mosaic variants in parental samples
Forty mosaic variants in 37 genes were detected in 40 parental samples, including one variant detected in a grandparental sample (Table 2). Seven mosaic variants were identified by trio ES analysis whereas the remaining 33 variants were found by Sanger sequencing. Thirtytwo of 33 mosaic variants detected by Sanger sequencing were confirmed by PCR-based amplicon NGS. The average AAF of variants detected in autosomal chromosomes and in X-linked disease genes in maternal samples was 14.6 ± 8.0% (Additional file 3:

Discussion
Each cell division brings with it a risk of a new mutation.
Mutations that occur after fertilization lead to the formation of distinct cell lineages or a state of genetic mosaicism. Depending on the functional consequence of the mutation, the timing of its acquisition, and its tissue distribution, the effect of a mosaic variant on patient phenotype can range from negligible to catastrophic. Although mosaic variation has been known to cause disease for decades, high-throughput sequencing technologies with the analytical sensitivity to consistently detect variants at reduced allelic fractions have only recently emerged as routine clinical diagnostic tests. Therefore, empirical studies of the frequency of mosaicism in large patient populations are only now being performed and published. The incidence of mosaic CNVs and aneuploidy found in patients referred for microarray testing has been estimated at 0.55-1% [18,19]. Without additional verification studies, it is challenging in routine ES analyses to distinguish real somatic variants from apparently de novo heterozygous variants with highly skewed (lower than 0.36) AAF. Therefore, we have focused here only on clinically relevant SNVs. A systematic assessment of the rate of clinically relevant mosaic variant detection in large cohorts of individuals referred for ES with heterogeneous clinical presentations needs more investigations [13]. We endeavored to study the frequency, type, allelic fraction, and phenotypic consequences of reportable mosaic SNVs in a cohort of nearly 12,000 consecutive unrelated patients referred for clinical ES. A total of 120 mosaic variants in 107 established disease genes were detected and reported in either proband (n = 80) or parental (n = 39)/ grandparental (n = 1) samples. Mosaic variation was considered definitely or possibly contributory to disease in approximately 1% of 11,992 subjects in this study. Assuming a molecular diagnosis was ascertained in 25% of patients in this cohort [14], an estimated 1.5% of all molecular diagnoses could be attributed to a mosaic variant detected in the proband samples. The fact that these estimates are low relative to other published cohorts was anticipated, as existing reports have studied mosaicism in specific genes [9,20] or phenotypes [10,11,21], and/or have assessed the frequency of rare mosaic variants [11] but not specifically clinically reportable variants.
To assess the phenotypic effects of mosaicism in our cohort, we analyzed the provided clinical information and compared the phenotype of each patient to descriptions in the literature and/or in Online Mendelian Inheritance in Man (OMIM) of individuals with predominantly non-mosaic mutations. In the vast majority of probands with mosaic P/LP variants in AD/X-linked/ somatic genes and no confounding factors (e.g., presence of multiple mosaic variants, underlying structural variation), the clinical presentation was not appreciably diminished in severity. In contrast, among parents with mosaic variants, only two (82M-Mo, 120F-Fa) were reported to have a phenotype that could be attributed to the identified mosaic mutation. Excluding mosaic variants detected in X-linked genes in males, a comparison of the AAF of mosaic variants in parental samples (14.6% ± 8.0%) relative to proband samples (20.0% ± 9.8%) showed that unaffected parents with mosaic variants have a significantly lower AAF (p = 0.004, t-test). It is intriguing that mosaic variants with~5% lower AAFs can result in mild or absent phenotypes or can cause clinically significant manifestations. One explanation would be that the impact of any given postzygotic variant is likely to be dependent on the biological function of the gene and the distribution of the mutation in critical tissues. This notion is supported by the mosaic variants found in MTOR, PIK3CA, and CACNA1A in our study. Mosaic variants in MTOR and PIK3CA with AAFs ranging from 12.7 to 24.4% were detected in affected probands with Smith-Kingsmore syndrome [MIM: 616638], Cowden syndrome 5 [MIM: 615108], and/or megalencephaly-capillary malformation-polymicrogyria syndrome [MIM: 602501]. Conversely, mosaic variants in CACNA1A with similar AAFs ranging from 15.7 to 29.5% were all detected in asymptomatic parents. The contrasting severity of phenotypes seen in probands versus clinically unaffected parents highlights the challenge of predicting phenotypic outcomes based on genetic testing alone. It also raises the question of how variant mosaicism should be weighed in the course of variant classification given that both pathogenic and benign effects are possible depending on the clinical context in which the variant is detected.
Interestingly, recurrent mosaic variants in a subset of 9 genes: MTOR, CREBBP, CACNA1A, DDX3X, DNM1, DYRK1A, GRIA3, KMT2D, and PIK3CA accounted for 18.3% (22/120) of all detected mosaic variants in the analyzed cohort. Mosaic variants in several of these genes have been reported previously in the literature: MTOR [11], CREBBP [22], CACNA1A [23], DNM1 [24], KMT2D [25], and PIK3CA [26]. In some cases, e.g., the MTOR and PIK3CA genes, somatic variants are the predominant or the only form of disease-causing mutation described in affected individuals. We have also noted that 10 (12.5%) of the 80 de novo mosaic variants detected in the proband samples were found in a gene associated with the Ras or PI3K-AKT-mTOR pathway, including one variant each in BRAF, NF1, HRAS, and KRAS, and three variants in PIK3CA and MTOR. Heterozygous variants in the same six genes were reported in less than 1% of the entire cohort, indicating that mosaic variation is disproportionately likely to affect this pathway. In fact, mosaic events in this pathway have been commonly observed [27]. The reason for enrichment of mosaicism in the Ras or PI3K-AKT-mTOR signaling pathway is unclear; possible explanations include (1) preferential expansion of hematologic clones with variants in these genes increasing the likelihood of mosaic variant detection, (2) high penetrance of mosaic variants in Ras pathway genes relative to other genes, and (3) a preponderance of intragenic mutation-prone residues.
The recognition that certain genes are more prone to pathogenic postzygotic mutation critically informs recurrence risk counseling and enables optimization of test development and data interpretation in the diagnostic lab setting. Panel-based tests targeting genes with recurrent mosaic variants should have sufficient depth of coverage and, to account for the risk of parental mosaicism, should include recommendations for parental testing. AAF filters are often utilized for comprehensive genomic assays such as exome and whole genome sequencing to exclude variants that are likely to represent sequencing artifact, a practice that can preclude detection of low-level mosaicism. Even with an average ES read depth of 130×, mosaic variants with AAF of less than 10% may be filtered out and excluded from review. For these methodologies, relaxing AAF filters for a defined subset of phenotypically relevant genes in which recurrent mosaic events are known to occur may help to optimize mosaic variant detection. Additionally, testing of tissues distant from the hematopoietic lineage (e.g., urine or hair follicles) could be performed to confirm mosaic status [7].
Adding to the complexity of mosaic variant interpretation, several patients in our cohort were found to harbor more than one mosaic variant. One patient (12U) with multiple congenital malformations was found to have compound heterozygous variants in RAD51C, a gene associated with Fanconi anemia [28], a mosaic VOUS in ENG, and seven additional mosaic variants in genes with no definitive disease association. Genomic instability resulting from spontaneous chromosome breakage is a hallmark of FA [29] and previous studies have shown an increased risk of mosaic copy-number and structural variants in affected individuals [30]. However, the impact of underlying FA on acquisition of somatic single nucleotide and small insertion/deletion variants has not been clearly elucidated. Therefore, although likely, the mosaic variants detected in this patient cannot be unequivocally attributed to the FA diagnosis. Multiple mosaic variants (n = 17) were also detected in patient 3M referred for ES with a history of malignant astrocytoma, myelodysplasia, and dysmorphic features. The mosaic mutations detected in this individual were likely related to the patient's recent history of myelodysplastic syndrome. Although the phenomenon of mutation acquisition in pre-cancerous and cancerous states is not novel [31], multiple mosaic events stemming from malignancy can be an unexpected finding on assays like ES that are generally performed for the detection of germline, rather than somatic mutations. These findings are also challenging from the standpoint of clinical follow-up, as guidelines do not exist to direct management of incidentally ascertained cancer variants in individuals without a known malignancy.
Finally, we have noted that SNV mosaicism can also be explained by chromosomal abnormalities. Patient 52F with developmental delay and microcephaly was found to have a pathogenic variant in the COX15 gene detected at an AAF of 12%. Analysis of the parental samples for the pathogenic change indicated that the father was heterozygous and the mother was negative for the variant. Due to the unexpectedly low AAF in the proband of the purportedly inherited COX15 variant, review of the SNP array data was performed and the mosaic maternal uniparental disomy of distal chromosome 10q encompassing the COX15 gene was found. In a second case, patient 55F with macrocephaly, dysmorphic features, and digital anomalies was found to have a mosaic pathogenic variant in ZMPSTE24 at an AAF of 80%. The pathogenic variant was found to be heterozygous in the mother and negative in the father. Analysis of the SNP array data again revealed mosaic copy neutral AOH suspicious for UPD involving chromosome 1 and encompassing the ZMPSTE24 gene, which presumably served as the "second hit" for the autosomal recessive disorder.
The many variables that complicate mosaic variant interpretation can also be leveraged in research studies to make inferences about variant pathogenicity and to provide insights into gene function. For example, from the observation that activating mutations in GNAS (associated with McCune-Albright syndrome, OMIM 174800) are detected only in the mosaic state, one can infer that constitutional activating mutations in this gene are incompatible with life [8,32]. It is plausible that studies of affected individuals, including analyses of AAF by tissue type, would help to define key aspects of gene function, including after what critical developmental period the mutation must occur to ensure viability. For example, conditional PIK3CA activation in mouse cortex showed that abnormal mTOR activation in excitatory neurons and glia, but not interneurons, is sufficient for abnormal cortical overgrowth [33].
Although our cohort is comprised of nearly 12,000 families and we have detected and reported 120 mosaic mutations, only a minority of individuals were found to have mosaic variants in the same gene, which limits our