Reanalysis of exome negative patients with rare disease: a pragmatic workflow for diagnostic applications

Approximately two third of patients with a rare genetic disease remain undiagnosed after exome sequencing (ES). As part of our post-test counseling procedures, patients without a conclusive diagnosis are advised to recontact their referring clinician to discuss new diagnostic opportunities in due time. We performed a systematic study of genetically undiagnosed patients 5 years after their initial negative ES report to determine the efficiency of diverse reanalysis strategies. We revisited a cohort of 150 pediatric neurology patients originally enrolled at Radboud University Medical Center, of whom 103 initially remained genetically undiagnosed. We monitored uptake of physician-initiated routine clinical and/or genetic re-evaluation (ad hoc re-evaluation) and performed systematic reanalysis, including ES-based resequencing, of all genetically undiagnosed patients (systematic re-evaluation). Ad hoc re-evaluation was initiated for 45 of 103 patients and yielded 18 diagnoses (including 1 non-genetic). Subsequent systematic re-evaluation identified another 14 diagnoses, increasing the diagnostic yield in our cohort from 31% (47/150) to 53% (79/150). New genetic diagnoses were established by reclassification of previously identified variants (10%, 3/31), reanalysis with enhanced bioinformatic pipelines (19%, 6/31), improved coverage after resequencing (29%, 9/31), and new disease-gene associations (42%, 13/31). Crucially, our systematic study also showed that 11 of the 14 further conclusive genetic diagnoses were made in patients without a genetic diagnosis that did not recontact their referring clinician. We find that upon re-evaluation of undiagnosed patients, both reanalysis of existing ES data as well as resequencing strategies are needed to identify additional genetic diagnoses. Importantly, not all patients are routinely re-evaluated in clinical care, prolonging their diagnostic trajectory, unless systematic reanalysis is facilitated. We have translated our observations into considerations for systematic and ad hoc reanalysis in routine genetic care.

Background Exome sequencing (ES) is a genetic diagnostic approach used to reduce the diagnostic odyssey of patients and in particular in children with complex neurological disorders of presumed genetic origin [1,2]. Understanding of the genetic defect may provide information on prognosis, improve patient management, guide therapeutic choices, and allow for more informed reproductive options for family members [3]. In addition, a diagnosis often facilitates access to supportive care systems [4][5][6][7].
As part of post-test counseling procedures, patients without a conclusive diagnosis are advised to recontact their referring clinician to discuss new diagnostic opportunities in due time. A number of studies in pediatric neurology have shown a diagnostic yield of ES of ~ 30% [1,2,8], but this is likely to increase with time through new developments. Improved enrichment technologies, optimized sequence chemistry, and more sophisticated analytical tools continuously allow the discovery of previously unrecognized clinically relevant variants [9]. In addition, systematic re-evaluation of existing ES datasets allows reinterpretation and novel diagnoses because of new disease-gene associations unknown at the time of initial analysis. Estimates suggest that the diagnostic yield by ES could be increased with ~ 15%, when using up-to-date software, literature, and phenotypic information for reinterpretation [3,[10][11][12][13][14][15]. However, previous systematic studies have limited their selves to reanalysis of existing data at the time of their research, disregarding the effects of reanalysis in a patient initiated diagnostic workflow. Moreover, despite ongoing technological developments for next generation sequencing (NGS), potentially providing additional diagnosis, models on clinical reanalysis are limited to the reuse of existing data and rarely include resequencing [13,14,16], despite recommendations of the American College of Medical Genetics (ACMG) [17].
To emphasize the significance of ES reanalysis, we revisited a cohort of 150 children with a complex neurological disorder, originally enrolled at Radboud University Medical Center for a clinical utility study on the performance of ES, which was shown to be representative for the broad phenotypic spectrum of disorders seen in the routine pediatric neurology diagnostic trajectory [1]. To gain insight into the relative contribution of reanalysis strategies of ES, we monitored the increase of diagnostic yield over a 5-year period resulting from routine care, based on patients that indeed recontacted their referring clinician. Moreover, we subsequently performed a systematic reanalysis, including resequencing with an advanced clinical ES pipeline (i.e., including low quality variants, copy number variant (CNV) analysis, and up-to-date disease-gene panels), of all patients in this cohort without a genetic diagnosis, with the patients that did not initiate reanalysis, allowing to translate the findings into a generalizable and effective strategy for clinical reanalysis.

Clinical cohort
In this study, we revisited the 103 genetically undiagnosed patients from an original cohort of 150 consecutive patients with complex neurological symptoms of suspected genetic origin who were seen at the department of pediatric neurology at the Radboud University Medical Center [1]. Patients (and their unaffected parents) were included between November 2011 and January 2015 [1]. The original study was approved by the Medical Ethics Review Committee Arnhem-Nijmegen under 2011/188 and the systematic evaluation of diagnostic follow-up and innovation under 2020-7142.

Systematic ES reanalysis
A three-step process was used to examine the increase of the diagnostic yield within the original pediatric neurology cohort of 150 patients [1]. In brief, we first retrospectively collected all genetic diagnostic tests that were performed after the original data freeze of July 2015. Physician-initiated analyses (ad hoc analyses) consisted of reanalysis of existing exome data or analysis of newly generated exome data (i.e., resequencing). Updated pipelines for variant calling allowed CNV analysis as well as the analysis of variants that previously failed quality control (QC) parameters, e.g., total number of reads or percentage of variant reads. Bioinformatic methods were as described before; in short, sequence reads were aligned to the hg19 reference genome using BWA, CNVs were called by CoNIFER [18], and SNVs/indels were called by the GATK unified genotyper [19]. Reannotations reflected gene-panel updates and the Variant Effect Predictor (VEP) [20] for prioritization as well as knowledge bases such as gnomAD [21], in-house variant frequencies, OMIM phenotypes, and new literature searches. Reinterpretation of previous class 3 variants/ variants of uncertain clinical significance (VUS) included Keywords: NGS-based resequencing, Systematic reanalysis, Rare disease, Diagnostic implementation of recommended ACMG guideline, Longitudinal follow-up of systematic cohort literature studies, segregation analysis, and/or functional follow-up.
Second, for all patients who did not have genetic diagnostic follow-up since 2015, we performed systematic reanalysis of the original ES data using the updated pipelines and knowledge databases provided that the original data was compatible with current diagnostic bioinformatic pipelines [18,19].
Third, for patients who remained undiagnosed after reanalysis under steps one and two, we performed resequencing following diagnostic procedures [1], using Twist Bioscience Human Core Exome+ RefSeq Panel Enrichment Kit (TWIST Bioscience, San Francisco, CA, USA) and the Illumina NovaSeq 6000 at 100x coverage (Illumina, San Diego, CA, USA).

Variant classification
Variant interpretation of ES (re)analyses were performed as follows. First, a disease-gene panel strategy was performed by in silico enrichment of single-nucleotide and copy-number variants (SNVs and CNVs) in genes with established disease-associations related to the patients phenotype [1]. Subsequently, (de novo) variants (SNVs, indels and CNVs) outside these gene-panels were evaluated for pathogenicity, as well as their disease relevance (i.e., open exome strategy). Prioritization of the variants was based on conservation and predicted impact using the VEP [20] and gnomAD [21]. For classification of SNVs and indels, we used a classification based on the guidelines jointly established by the Association for Clinical Genetic Science (ACGS) and the Dutch Society of Clinical Genetic Laboratory Specialists (VKGL) [22]: (1) benign/likely benign, (2) variant of uncertain significance (VUS), or (3) likely pathogenic/pathogenic. CNVs were classified according to the European guidelines for constitutional cytogenomic analysis (class 1 to class 5) [23].
All remaining VUS were once more reassessed in February 2021. Finally, the outcomes of the ES analyses were described according to the categories in the original study: (1) no diagnosis (e.g., absence of variants explaining disease phenotype), (2) a possible diagnosis (VUS in known disease gene related to patients phenotype or a (likely) pathogenic variant in a candidate disease gene), or (3) a conclusive diagnosis ((likely) pathogenic variant explaining the patients phenotype) [1].

Results
Previously, we identified genetic diagnoses in 47 of 150 patients with disorders of presumed genetic origin, by studying the clinical utility of exome sequencing [1]. From 103 patients without a genetic diagnosis in 2015 (Fig. 1A), 45 revisited our clinics and received additional ad hoc diagnostic testing as well as clinical reanalysis by a pediatric neurologist. For one of the patients, MRI determined that the origin of disease was an acquired cerebral palsy rather than a disorder of genetic origin. The other 44 patients had reanalysis of existing exome data (n = 29) or resequencing followed by reanalysis (n = 15, Additional file 1: Table S1). This yielded 17 new conclusive genetic diagnoses. (Table 1step 1, Fig. 1B).
Systematic follow-up of all patients without a conclusive diagnosis after ad hoc genetic diagnostic testing (n = 27), and those for whom ad hoc testing was not performed (n = 58), revealed another 14 new conclusive genetic diagnoses; 9 resulted from bioinformatic improvements, and 5 required resequencing (Table 1- Table S1), in total, 36% (37/103) of the patients in this study required resequencing for reanalysis.
All analyses together elevated the total diagnostic yield in the cohort of 150 patients from 31% (n = 47) to 53% (n = 79; Fig. 1). Of the 31 novel conclusive genetic diagnoses, 12 were based on variants previously reported as possibly pathogenic [1], and 19 were based on variants that were not identified in the initial analysis (Table 1, Fig. 2).

From VUS to conclusive diagnosis
For 10/12 previous possible diagnoses (Table 1, Fig. 2C), publications of the disease-gene associations appeared after our initial analysis of a possible diagnosis in 2015. Examples of these include CSNK2A1 that was reported to be causative for Okur-Chung neurodevelopmental syndrome (OCNDS; OMIM #617062) in 2016 [24] and ADPRHL2 that was associated with stress-induced childhood-onset neurodegeneration with variable ataxia and seizures (CONDSIAS; OMIM #618170) in 2018 [25]. Likewise, for one variant broadening of the phenotypes related to variants in TBC1D24 was reported in 2019, now including rolandic epilepsy with paroxysmal exercise-induced dystonia and writer's cramp (EPRPDC; OMIM #608105) [26]. On average, the time between the initial report of the VUS and publication of the novel disease-gene association was 3.3 years, leading to an average time to final diagnosis of 4.4 years and an average time from publication to diagnosis of 1.2 years (Additional file 2: Fig.  S1). The remaining 2/12 new conclusive diagnoses were variants reclassified based on additional testing, either segregation analysis (ACSL4) or metabolic investigation (HSD17B10) (Fig. 2B).

Conclusive diagnoses based on variants not identified in the initial analysis
For 19 patients, the conclusive genetic diagnosis was based on variants that were not detected or prioritized in the initial ES analysis ( Table 1). Nine of those variants were only detected after resequencing combined with updated bioinformatic analyses and interpretation. Evaluation of the underlying explanation showed that there was no variant call in the original analysis; for 2 variants, the genomic locus was not targeted for enrichment, whereas in the other 7 cases, enrichment failed resulting in no coverage of the targeted sequence (Fig. 2D, Additional file 1: Table S1).
For 10 of 19 new diagnoses, the variants were already present in the original ES data and updates in the diagnostic pipelines allowed their detection. A failure to pass quality criteria parameters was the underlying reason in 4, such as poor coverage (n = 1), low quality of sequence reads (n = 1), and a too low percentage of variant allele frequency (n = 2). In a further 2 patients, enhanced CNV calling (n = 1) and annotation of deep(er) splice-site variants (n = 1) allowed for recognition of pathogenic variants that escaped attention in the initial ES analysis ( Fig. 2A). At the level of variant annotation used for prioritization, updates of in silico disease gene panels acknowledged new disease-gene associations (n = 1) and broader phenotypic spectra of existing disease-gene association (n = 2) (Fig. 2C). Furthermore, in the last case, an affected sib in the family allowed for overlap analysis with the exome of the index, identifying a variant that was not prioritized by analyzing the index alone (Fig. 2B).

Relative contribution of changes in diagnostic analysis to increase diagnostic yield
For the 31 new genetic diagnoses, we next retrospectively categorized the reasons for reaching a conclusive diagnosis. Overall, (new) disease-gene associations accounted for 42% (13/31), follow-up of variants by segregation or functional analysis accounted for 10% (3/31), reanalysis of ES data with improved diagnostic pipelines was responsible for 19% (6/31) of the additional diagnostic yield, and resequencing was essential for the last 29% (9/31) (Fig. 2). Step 2 (C) involved the reanalysis of available exome data when data was suitable*, and step 3 (D) included the systematic resequencing and reanalysis for the remaining unsolved cases. Of note, two diagnoses were made by reclassification of variants of unknown significance (VUS) detected in the ad hoc analysis # , and two VUS were rejected based on population frequency^

Discussion
Exome sequencing has been used in routine genetic testing to diagnose children with complex neurological disorders of presumed genetic origin [1,2], with a diagnostic yield of around 30% [1,27]. In this study, on the contribution of reanalysis of ES data, the diagnostic yield in our cohort increased from 31% to 53%. This increase of > 20% exceeds previous research reporting additional diagnostic yields of around 10-13%, with reanalysis intervals ranging between < 6 months and 3 years [10][11][12][13]. In part, this can be explained by the relative long time period of 5 years between the first and last analysis. However, more impactful, this study was not dependent on ad hoc diagnostic requests alone (17 diagnoses) but also included a systematic follow-up of the remainder of the cohort (14 diagnoses; Fig. 1). In addition, our systematic reanalysis included not only the reinterpretation of existing data, responsible for an increase of 15% (22 of 31 new genetic diagnoses), but also the generation of novel data according to the latest standard of sequencing, adding another 6% (9 diagnoses). Together, our data underscore that systematic reanalysis, in addition to ad hoc re-evaluation, can shorten the diagnostic trajectory.

Re-evaluation and follow-up of variants of unknown significance should be standard care
In total, of the 31 novel conclusive genetic diagnoses, 14 were based on variants previously reported as possibly pathogenic, thereby contributing most to the increase of diagnostic yield. These observations are in line with increases observed in novel genotype-phenotype associations in OMIM and the expansion of phenotypes for genes with known genotype-phenotype associations as well as the number of pathogenic variants in the disease variant databases and Decipher [28][29][30]. Examples of reclassification of VUS in our study include a novel and distinctive phenotype for the SRCAP gene [31], previously associated with Floating-Harbor syndrome only, a VUS in PRPS1 recently also reported by others [32], and small genotype-phenotype case series for LMBRD2 [33] and PAX3 [34] (Additional Fig. 2 Relative contribution of changes in diagnostic analysis to increase diagnostic yield. Distribution of different reasons for finding new diagnoses in a pediatric neurology cohort. A Reanalysis after an update of the diagnostic pipeline was responsible for the detection of previous unrecognized copy number and (deep) intronic single nucleotide variants (CNV and SNV) and variants with too low quality criteria parameters. For instance, including interpretation of deeper intronic variants with a possible splice effect identified a variant in FOXP1, which after follow-up analysis was reclassified to likely pathogenic. Both (B) reclassification of variants based on supporting evidence from segregation analysis or metabolic investigation and (C) reanalysis after publication of new or broadened disease-gene associations allowed for the conclusive diagnoses of variants that were previously reported as possibly pathogenic, either in this study or in the initial WES analysis. D Resequencing and subsequent reanalysis identified variants that were either not targeted or not covered in the initial analysis. For instance, resequencing identified a likely pathogenic variant in NUS1 for which the position was poorly covered in the original WES data because there was no target in the original exome capture file 2: Fig. S1). Moreover, a future increase in diagnostic yield is likely to be expected as matchmaking exchange programs [35] have made it easier to establish (inter) national collaborations to generate and strengthen the disease-gene association for genes in which VUS are reported. This also underscores the need for periodic re-evaluation of VUS, as virtually at any time novel genotype-phenotype associations can be published. Also, in our study, we experienced this situation; for some patients, multiple reanalyses of the same VUS were needed before the VUS was "upgraded" to (likely) pathogenic variant. For instance, a variant in KAT8 was found to be of unknown significance twice before its likely pathogenicity was reported [36]. Detailed analysis of all novel diagnoses based on a previous VUS in our study indicates a re-evaluation period of about 1 year for a 10-20% increase of diagnostic yield, as has been suggested by others [11,12], and could lead to un "upgrade" up to 30% of the VUS after 2 years [28]. Importantly, re-evaluation and follow-up lead not only to "upgrading" of VUS but also to "downgrading" in some instances. In our dataset, two X-linked VUS were rejected as a possible cause after re-evaluation of the variants based on the frequency of the variant in the population.

Diagnostic reanalysis, including resequencing, is successful and should be supported
Reanalysis focusses on existing data. However, resequencing was responsible for 29% (9/31) of the 31 novel conclusive genetic diagnoses in our study. For a few, this was because the original ES data were no longer compatible with the current bioinformatic pipelines. For the others, however, technological advances in both enrichment strategies and sequencing chemistry led to higher quality data, mostly from better and more uniform, coverage. For instance, resequencing identified a likely pathogenic variant in NUS1 for one case. This gene was poorly covered in the original ES data and thus failed initial reanalysis variant calling algorithms (Fig. 2D). Hence, assessing improvements in technology since the original investigations can guide the decision whether one should resequence or reanalyze existing data.

Reanalysis benefits from updates on phenotypic information
Another important factor to consider is to update clinical information before revisiting genetic data. As young patients may not (yet) display the full characteristic phenotype of a certain syndrome, reassessment of the patients' phenotype might reveal new features implicative Fig. 3 Considerations for resequencing and/or reanalysis in clinical exome sequencing. This figure depicts the considerations for each form of reanalysis, as for each individual case, it must be decided which is most suitable. Reanalysis can be initiated ad hoc or systematically based on selected time intervals or bioinformatic enhancements. Reassessment of variants of unknown significance (VUS) as well as follow-up should be performed first, using up-to-date phenotypic information and literature or additional tests for reinterpretation. When there is no conclusive diagnosis, existing data needs to be suitable for the current analysis pipeline, if not, or if state-of-the-art approaches are available, resequencing should be offered of specific syndromes. Such evaluations may be crucial for genetic reanalysis. Of note, such clinical reassessment may also uncover that the phenotype is not genetic in origin, but acquired, as observed for one of our patients. Secondly, assessment of the parental phenotypes is also important, as, for example, by assuming full penetrance of variants and apparently unaffected parents, variants in the index may be disregarded during interpretation [37]. It has similarly been found that incomplete penetrance or variable expressivity complicate the discovery of novel genes underlying developmental disorders [9]. This apparent nonpenetrance in clinically unaffected fathers may partly reflect under ascertainment of paternal phenotypes [38]. We observed an example of this by the identification of a duplication of 11q in an assumed to be unaffected father, whom later was known to have macrocephaly as the only feature, passing on the variant in dominant manner.

Towards a future sustainable clinical reanalysis strategy
We show that all reanalysis strategies contribute to obtaining novel diagnoses: ad hoc reanalysis upon patient/physicians request, but also systematic strategies. Thus, a combination of approaches is needed to uncover all genetic diagnoses: follow-up of VUS and reassessment of data, with or without resequencing of the samples (Fig. 3). Re-evaluation of previously reported VUS should always be performed first. Second, as diagnostic data should comply with the FAIR Guiding Principles [39] (for Findable, Accessible, Interoperable, and Reusable), existing data could be reanalyzed. Initiation of such reanalysis can either be by a (time-driven) periodic system, for instance every 1 or 2 years, or by bioinformatic enhancements, such as the implementation of analysis tools [40]. Third, we have learned that there were additional benefits from using state-of-the-art technology. It is within reason to expect that also other (future) diagnostic applications, such as (short-and long read) genome sequencing [41][42][43], methylation profiling [44] or optical mapping [45], will increase diagnostic yield. The feasibility of implementing such (automated) systems for reanalysis may, however, depend on available local infrastructure, bioinformatic support, and budget. It should however be noted that reanalysis can only take place with proper patient consent [46,47]. For the ad hoc analysis, the initiative lies with the patient (who thus consents). With not all patients returning to clinic for ad hoc analysis, we propose to request consent from patients to allow for systematic reanalysis, also including the use of new technologies, after the initial negative diagnostic analysis to maximize benefits.

Conclusions
We provide considerations for reanalysis of clinical exome data based on a five-year follow-up of a pediatric neurology cohort of 150 patients. The diagnostic yield in this cohort increased from 31 to 53% through a combination of ad hoc clinical and genetic diagnostic work-up and subsequent systematic reanalysis. Each reanalysis strategy, consisting of follow-up of VUS, reinterpretation of existing data, clinical reassessment of patients and parental phenotypes as well as resequencing, contributed to the additional diagnostic yield. Based on these experiences, we provide practical considerations to increase novel conclusive genetic diagnoses through reanalysis.