Study design
Participants were enrolled in two studies from 2011 to 2018; a rare disease study—Idiopathic Diseases of huMan (IDIOM), and a post-mortem genetic testing study in early sudden death—Molecular Autopsy (MA). The inclusion criteria, prospective recruitment strategy, phenotyping, and initial analysis approach for these studies are described in detail elsewhere [16, 17]. In brief, the IDIOM study aims to discover novel gene–disease relationships and provide molecular genetic diagnosis and treatment guidance for individuals with novel diseases using genome sequencing integrated with clinical assessment and multidisciplinary case review, whereas the MA study seeks to incorporate prospective genetic testing into the postmortem examination of cases of sudden unexplained death in the young (< 45 years old). Under these protocols, we recruited 101 analyzable proband participants altogether: 51 proband participants (including 4 singletons) were enrolled in the IDIOM study from 2011 to 2018, while 50 deceased individuals and their living relatives were enrolled in the MA study from 2014 to 2018. The IDIOM study (IRB-11–5723) and the Scripps Molecular Autopsy study (IRB-14-6386) were both approved by the Scripps Institutional Review Board.
Whole-exome sequencing
Detailed procedures for WES have been described previously [16, 17, 19, 20]. In brief, whole blood samples were preserved using Paxgene DNA tubes (PreAnalytiX, Hombrechtikon, CH), and genomic DNA was extracted using the QIAamp system (Qiagen, Valencia, CA). Enriched exome libraries were captured using a variety of Agilent SureSelect systems according to the manufacturer’s instructions (Agilent, Santa Clara, CA). Final libraries were generated using Illumina TruSeq sample preparation kits and underwent 100 bp paired-end sequencing on a HiSeq 2500 (Illumina, San Diego, CA). Samples were sequenced to a median coverage of 98X in combined studies.
Variant calling and annotation
The original downstream analysis procedure has been described in detail previously [16]. In brief, alignment and variant calling were performed using BWA-GATK best practices (which changed significantly especially over the duration of the IDIOM protocol) [21]. Annotation and variant prioritization were performed using the SG-ADVISER system.
For our re-analysis, each WES sample was processed using the Genoox platform, which employs Burrows–Wheeler Aligner (version 0.7.16) [22] for the mapping of short-read sequences using hg19 as reference, Genome Analysis Toolkit (GATK; version 4.0.7.0) [23, 24], and FreeBayes (version 1.1.0) [25] for variant calling of low-frequency SNVs, multiple nucleotide variants (MNVs), and INDELS.
Variant filtration and prioritization
After annotation, an automated variant filtration pipeline was applied to narrow down the number of candidate diagnostic SNVs and INDELS using the following rules: (1) variants that follow disease segregation in the family—including multiple probands; (2) functional impact-based filtration retaining only variants that are non-synonymous, frameshift, and nonsense, or affect canonical splice-site donor/acceptor sites; and (3) variants with a minor-allele frequency (MAF) < 1% in population-level allele frequency data derived from the Exome Aggregation Consortium (ExAC), 1000 Genomes Project (1000G), Exome Variant Server (ESP), 10,000 UK Genome (UK10K), The Genome Aggregation Database (gnomAD), and internal data from our studies.
Automated variant classification engine
Further variant prioritization was then performed by combining annotation information into a summary interpretation of variant pathogenicity. For our initial studies, variant interpretation was carried out as described previously and in accordance with the criteria set by the American College of Medical Genetics and Genomics (ACMG)/Association for Molecular Pathology (AMP) guidelines as previously described [26, 27]. In addition, we incorporated the recommendations from the ClinGen Sequence Variant Interpretation (SVI) working group for using ACMG-AMP criteria, regarding the exclusion of the two reputable source criteria pertaining to variant classification PP5 and BP6 due to their questionable validity [28]. For our re-analysis, Genoox (https://www.genoox.com), an artificial intelligence-based variant classification and interpretation engine, was used, which builds disease association and deleteriousness prediction models at the gene and variant level by integrating information from various gene and variant classification sources (e.g., ClinVar, ClinGen, Uniprot, gnomAD, ExAC, Orphanet) [29]. To mitigate limitations in computationally extracting the exact evidence for which the submission is based on (e.g., ClinVar, UniProt, and the literature) since these are currently unstructured, the classification engine applies PP5/BP6 to help prioritize and alert about previously reported variants, or suggest for being clinically relevant. Similarly, based on different features (e.g., number of submitters, dates, type of submitters, number of publications), the strength of the evidence can be estimated. The reported evidence under the PP5/BP6 rules is then manually applied with the relevant rules instead of PP5/BP6, to comply with the new recommendations. Although the actual classification is not affected, it is rather how their evidence is presented. Variants were classified into one of five categories: benign (B), likely benign (LB), variant of uncertain significance (VUS), likely pathogenic (LP), and pathogenic (P). VUS were then further classified by using a combination of in silico prediction tools including (1) missense deleteriousness prediction tools (including REVEL, MetaLR, MT, MA, FATHMM, SIFT, CADD, and POLYPHEN2) [30], (2) splicing defect prediction tools (dbscSNV Ada, Splice AI), (3) conserved region annotation (GERP), and (4) whole-genome functional annotation (GenoCanyon, fitCons, ncER [31]). VUS subclassifications were (1) VUS-PB, if additional evidence was found to support the variant as being Possibly Benign (e.g., non-coding variant not predicted to influence splicing); (2) VUS-U, if there was some evidence for pathogenicity based on variant class but limited additional evidence of deleteriousness (e.g., non-synonymous variant with tolerated and damaging effect according to respective prediction tools); and (3) VUS-PP (possibly pathogenic), if there was strong evidence for pathogenicity based on computational evidence supporting a deleterious effect on the gene or gene product, but not sufficient evidence to meet the likely pathogenic classification according to ACMG-AMP guidelines [27].
Gene-level evidence
Genes with candidate variants were considered for return if the gene had at least a strong level of evidence as outlined in the ACMG/AMP guidelines for association with a monogenic disease. Variants in genes with moderate evidence were also chosen for return if agreed upon after discussion with the broader research team and physician review panel.
For sudden death cases, to be considered diagnostic, the gene must be present in our curated list of confirmed or probable genes associated with sudden unexplained death (SUD), sudden cardiac death (SCD), and sudden death in epilepsy (SUDEP). Our gene panel was drawn from multiple sources, including Human Gene Mutation Database (HGMD), Online Mendelian Inheritance in Man (OMIM), ClinVar, Uniprot, and a combination of several gene panels associated with sudden cardiac death, sudden death in epilepsy, channelopathies, and genetic connective tissue disorders. The content of our list evolved throughout the study as sources were updated. This list contains a total of 1608 genes, and all have been previously cataloged in The Genetic Testing Registry (GTR) and The Genomics England PanelApp (https://panelapp.genomicsengland.co.uk/panels/) as associated with the following conditions: GTR: arrhythmogenic right ventricular cardiomyopathy, comprehensive cardiology, arrhythmia, cardiac arrhythmia, long QT/Brugada syndrome, inherited cardiovascular diseases and sudden death, cardiomyopathies, comprehensive cardiomyopathy, comprehensive arrhythmia, catecholaminergic polymorphic ventricular tachycardia, cardiac arrhythmia, sudden death syndrome, comprehensive cardiovascular, cardiovascular diseases, familial aneurysm, connective tissue disorders, epilepsy, and seizure. PanelApp: dilated cardiomyopathy—adult and teen, dilated cardiomyopathy and conduction defects, idiopathic ventricular fibrillation, long QT syndrome, sudden death in young people, molecular autopsy, brugada syndrome, mitochondrial disorders, familial hypercholesterolemia, thoracic aortic aneurysm or dissection, epilepsy—early onset or syndromic, and genetic epilepsy syndromes.
Combined evidence for reporting
The final assessment of pathogenicity was determined by integrating patient assessment, variant evaluation, inheritance, and clinical fit. The following final classifications were used for reporting:
Category 1. Diagnostic variants (DV): Known pathogenic or likely pathogenic variant(s) either (1) in a known disease gene associated with the reported phenotype provided for the IDIOM proband or (2) in a known gene associated with sudden death for deceased MA individuals. Findings in this category are reported as positive.
Category 2. Possible diagnostic variants (PDV): Pathogenic variant(s) in known disease genes possibly associated with the reported IDIOM phenotype, or possibly pathogenic variants in genes known to be associated with sudden death in MA. This category also includes single pathogenic or likely pathogenic variants identified in a gene associated with an autosomal recessive disorder consistent or overlapping with the provided IDIOM. Findings in this category are reported as plausible but negative.
Category 3: Variants of uncertain diagnostic significance (VUDS): Variant(s) predicted to be deleterious in a novel candidate gene not previously implicated in human disease, or with an uncertain pathogenic role, in the presence of additional supporting data. Such data may include animal models, copy number variant data, tolerance of the gene to sequence variation, tissue or developmental timing of expression, or knowledge of the gene function and pathway analysis. Further research is required to evaluate and confirm any of the suggested candidate genes. Findings in this category are reported as negative.
Category 4 (negative result; negative): No variants in genes associated with the reported phenotype were identified.
Read-level data was visually inspected for variants considered for reporting and validated via Sanger sequencing if determined to be necessary. Amended reports were returned to the referring physician when new diagnostic variants were identified. This new report includes full interpretation of any newly identified variants and updated classifications of previously identified variants where applicable.