Calculation of etiological fraction for significantly enriched variant classes
The etiological fraction (EF) estimates the proportion of risk that can be attributed to a specific exposure, in a population with disease who have been exposed to a risk factor [3]. In the context of Mendelian disease, exposure refers to a rare protein-altering variant in a particular gene, and the EF estimates the proportion of cases with a rare variant in whom that variant is disease-causing. The EF is derived from the attributable risk percent (ARP) among exposed, i.e. expressing the risk as a proportion rather than a percentage, and derived from the odds ratio (OR) as described below, where the OR provides an accurate estimate of the relative risk (RR)—the ratio of risk among exposed to risk among unexposed [16]. The odds ratio (OR) is calculated by Altman [17]:
$$ \mathrm{OR}=\left(a/b\right)/\left(c/d\right) $$
where a = disease cases with a variant, b = controls/reference population with a variant, c = disease cases without a variant, and d = controls/reference population without a variant. The 95% confidence intervals (CI) for OR values are calculated by:
$$ 95\%\mathrm{CI}=\exp \left(\mathit{\ln}\left(\mathrm{OR}\right)-1.96\ast \mathrm{SE}\left\{\mathit{\ln}\left(\mathrm{OR}\right)\right\}\right)\ \mathrm{to}\ \exp \left(\mathit{\ln}\left(\mathrm{OR}\right)+1.96\ast \mathrm{SE}\left\{\mathit{\ln}\left(\mathrm{OR}\right)\right\}\right) $$
where the standard error of the log OR was given by:
$$ \mathrm{SE}\left\{\mathit{\ln}(OR)\right\}=\sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}} $$
The EF is derived from the OR:
$$ EF=\left( OR-1\right)/ OR $$
95% CIs for EF values are calculated as described by Hildebrandt et al. [18].
EF and OR values were calculated for both truncating (frameshift, nonsense, splice donor site, splice acceptor site) and non-truncating (missense, small in-frame insertions/deletions) variants in HCM genes where a significant excess of rare variants in cases over the ExAC reference population was observed [19]. For the eight core sarcomeric genes (MYBPC3, MYH7, TNNT2, TNNI3, TPM1, MYL2, MYL3, ACTC1), the case cohorts were derived from published data from the Oxford Molecular Genetics Laboratory (OMGL) and the Laboratory of Molecular Medicine (LMM), Partners Healthcare, comprising between 4185 and 6179 unrelated HCM probands [3, 13]. The OMGL cohort comprises apparently unrelated index cases referred from Clinical Genetics centres across the UK, with initial clinical diagnosis of HCM made by a consultant cardiologist. Data on patient ethnicity is not available for this cohort but is expected to be broadly representative of the UK population. The LMM HCM cohort comprised unrelated probands referred for HCM clinical genetic testing. Any individuals with an unclear clinical diagnosis of HCM, or with left ventricular hypertrophy due to an identified syndrome such as Fabry or Danon disease, or unaffected individuals with a family history of HCM were excluded. The LMM cohort was 62% Caucasian (see Alfares et al. for full details on ethnicity [13]), but data on ethnicity for individual patients was not available. For the minor HCM genes (CSRP3, FHL1, PLN, TNNC1), combined cohorts from OMGL and LMM, a prospective research cohort from our laboratory and published cohorts were used as previously described [19], comprising between 2061 and 5440 unrelated HCM probands. For FLNC and FHOD3, recently published cohorts of 448 [20] and 3189 [21] HCM patients respectively were used. All rare variants were included for these calculations, regardless of the clinical classification of the variants.
For all genes, ExAC was used as the reference population database for background variation as previously described [3]. To account for variable coverage of the exome sequencing in ExAC, the sample total for each gene was adjusted by calculating the mean number of called genotypes for each variant. Rare variants were defined as those with a filtering allele frequency in ExAC below the maximum credible allele frequency for HCM [22], defined as 4 × 10−5 (prevalence = 1 in 500, allelic heterogeneity = 0.02, penetrance = 0.5, monoallelic inheritance, as calculated at http://cardiodb.org/allelefrequencyapp/). To confirm that this frequency was the most appropriate threshold to use, a sensitivity analysis was performed with other thresholds (0.0001, 0.0005, 0.001), showing that OR and EF values decreased with higher frequency thresholds (Additional file 1: Table S1).
EFs as a means of quantifying performance of variant classifiers
The EF is dependent on the relative frequencies of variants in cases and population controls. While applying strict thresholds for rarity will focus on variants more likely to be disease-causing, thereby increasing the EF, this is usually not sufficient to adequately distinguish between benign and pathogenic variation for non-truncating variants. Therefore, additional methods are required to discriminate between causative and background variants. A perfect discriminator of pathogenic and benign variants will identify the proportion of causative variants that is equal to the case excess and yield an EF of 1.0, with the proportion of benign variants equal to the population reference frequency of ExAC (and an EF of 0)—see hypothetical example in Fig. 1. In practice, it is unlikely that full discrimination will be achieved but this EF-based approach allows us to evaluate methods that aim to differentiate between pathogenic and benign variants. In this study, we compare the widely used and generic missense functional prediction scores with gene and disease-specific variant clustering. This EF-based approach also offers the advantage of not requiring predefined lists of irrefutable pathogenic and benign variants, which can be limited when performing analyses on specific genes.
Assessing performance of missense functional prediction scores in HCM genes
Functional prediction scores from the dbNSFP database [23] (version 3.2) were downloaded for all missense variants in the 13 HCM genes. Eight scores that provide binary predictions, i.e. damaging vs benign/neutral, were assessed—fathmm-MKL coding, FATHMM, LRT, Mutation assessor, MutationTaster, Polyphen2-HDIV, PROVEAN and SIFT, as well as the CADD algorithm (damaging variants were defined with a CADD phred score ≥ 15). A consensus prediction between the 9 scores was defined as being damaging if greater than 50% of the scores that predicted a damaging effect. Additionally, two consensus algorithms, MetaLR and MetaSVM [24], were also evaluated. The proportion of available predictions for each score for all potential missense variants in each gene was calculated to identify algorithms that do not provide comprehensive predictions for specific genes.
To test the effectiveness of these prediction scores for individual HCM genes, missense variants of known consequence (pathogenic and benign missense) were identified. Pathogenic variants were defined by rarity in ExAC as described above and:
-
1)
Classified as pathogenic (P) or likely pathogenic (LP) in HCM patients by two or more clinical laboratories (OMGL, LMM and ClinVar submitters)
-
2)
Classified as P/LP by one clinical laboratory with no conflicting classifications (VUS or benign) by other laboratories
-
3)
Significantly enriched in the OMGL/LMM cohorts compared to ExAC (Fisher’s exact test)
Benign variants were defined as:
-
1)
Presence in more than one individual in ExAC and not associated with any disease in ClinVar (P/LP/VUS) or HGMD
-
2)
Associated with disease in ClinVar (though not P or LP) or HGMD but at a frequency > 0.001 in ExAC.
The sensitivity (true positive rate) and specificity (true negative rate) was calculated for the 9 functional prediction scores and 3 consensus scores for each of the 8 core sarcomeric genes (there were insufficient known pathogenic variants for the minor genes). As an alternative method for assessing these predictors, EFs were calculated for deleterious variants using the case and ExAC cohorts described above.
Clustering algorithm to detect regional enrichment of variants
Protein regions enriched for rare variants were identified using a bespoke unsupervised clustering algorithm developed within this project. The algorithm is based on a sliding window scanning the protein sequences from their N-terminal to C-terminal residues, with a binomial test used to detect whether there is significant variation enrichment within the tested window compared with the rest of the protein.
The results of this first step are influenced by the size of the sliding window, with a spectrum ranging from small windows enabling detection of smaller, highly enriched variation hotspots but prone towards overfitting (in the most extreme case each residue with multiple variant alleles is considered a cluster), to large windows enabling detection of more extended enriched regions such as large protein domains but at the risk of too low a resolution (in the most extreme case, a unique cluster starting at the first variant residue and ending at the last). In terms of model performance, the former situation is characterised by specificity = 1 (no variant-free residues are within clusters) and sensitivity close to 0 (the vast majority of variant residues are excluded from clusters), whereas the latter results in the opposite situation (many variant-free residues are included in the unique cluster [specificity close to 0] but also all variant amino-acids are [sensitivity = 1]). For this reason, the algorithm automatically selects the optimal window size for each protein by searching for one minimising the difference between sensitivity and specificity (in this case the mean difference between cases and controls for each gene). Of note, the sparseness of the data (resulting in a strong imbalance between positive data points [variant residues] and negative data points [variant-free residues]) make all classic model performance measures (e.g. accuracy, AUC, PPV) biased towards results obtained with smaller window sizes.
To look for the optimal window size, the algorithm starts by testing 19 different sizes ranging from 5% of the protein to 95%. Subsequently, the algorithm picks the best one (if any) and tests 18 sizes around it at a 10-fold finer resolution (e.g. if the initial best window size is 10%, the next iteration will be on windows between 5.5% and 14.5%). This iterative process is repeated until a performance plateau is reached (i.e. none of the 18 new window sizes decreases the difference between sensitivity and specificity by more than 0.001 compared with the previous iteration). Once the optimal window size is detected, multiple testing correction is applied to each definitive window significantly enriched for variation, on the basis of the average number of times each protein residue has been tested (which depends on the number of iterations made, and on the size of the tested windows). Whenever a significant enrichment is detected within a window, its coordinates (start/end) are stored until the whole protein is scanned and, subsequently, merged with any other significantly enriched window to obtain a first “raw” set of variation-rich clusters.
After this first step, the algorithm performs a “boundary trimming” procedure at both ends of each cluster. This step controls for potential inclusion of variant-free (or non-enriched) distal cluster tails that may have been included within a significantly enriched window due to variants occurring more proximally. The algorithm performs the same procedure at both the N- and the C-terminal cluster boundaries, starting with a single-residue window including only the most external amino-acid, and iteratively extending it as far as the cluster median residue. Before each extension, the binomial test is used to check if there is a significant depletion of variants compared to the rest of the cluster. The algorithm stores each test’s p value and tested region coordinates and eventually trims the cluster by removing the most (if any) significantly variation-depleted tail, to obtain a final, refined set of clusters. One last binomial test is performed on the refined clusters to measure the significance of their rare variant enrichment.
Distinguishing pathogenic from benign variants using clustering in case and control cohorts
EFs were calculated based on these clusters and compared to those produced by a consensus of missense functional prediction scores from the dbNSFP database [23] (MetaLR, MetaSVM and a consensus of 9 individual predictors as described above). These consensus scores were also evaluated in genes where no clustering of case variants was observed.
Using EFs to increase the yield of putatively pathogenic variants in HCM cohorts
Sarcomeric gene rare variants in the OMGL/LMM clinical cohort [3] were re-assessed based on the analysis described above. The proportion of patients with variants that would be upgraded to likely pathogenic based on the revised ACMG/AMP guidelines was calculated, i.e. those previously classified as VUS but in a variant class with an EF ≥ 0.95 for missense variants or EF ≥ 0.90 for inframe indels (as inframe indels will also activate the PM4 rule regarding variants that change protein length and therefore only the moderate PM1 rule would be required for a likely pathogenic classification).
Analysis of prospective HCM cohort
The effect of the new EF-based ACMG/AMP rules on the yield of actionable variants was assessed on a prospective cohort of 684 HCM patients recruited at the Royal Brompton & Harefield Hospitals NHS Foundation Trust, London, UK [19]. The ACMG/AMP rules described below were used to classify variants from the valid HCM genes defined in this study, with rule implementation as described in the CardioClassifier resource [6]. The following rules could be activated by automated script:
-
PM2—filtering allele frequency in ExAC < 4 × 10−5. This rule must be activated to denote a causative variant for this analysis.
-
PVS1—truncating variants in MYBPC3, TNNT2, TNNI3, CSRP3, FHL1, PLN (genes statistically enriched in HCM cohorts versus ExAC).
-
PS4—individual variant statistically enriched in cases over controls, based on LMM/OMGL cohort versus ExAC with the rule activated if the case count was > 2 and the Fisher’s exact test p value < 1.79 × 10−6 (Bonferroni correction).
-
PM4—protein length changing variant, i.e. an inframe indel or stop lost variant.
-
PP3—missense variant with multiple lines of computational evidence suggesting a deleterious effect, i.e. of the 8 predictors assessed (SIFT, PolyPhen2 var., LRT, Mutation Taster, Mutation Assessor, FATHMM, CADD and Grantham scores), only 1 predicts benign and < 3 have unknown classifications, or if ≥ 3 have unknown classifications, all others predict damaging.
-
PM5/PS1—novel missense change at an amino acid where a different missense variant is pathogenic (PM5) or novel missense variant with same amino acid change as an established pathogenic variant (PS1). Pathogenicity here is defined as a pathogenic classification in ClinVar by multiple submitters with no conflicting evidence.
Rare variants (i.e. with rule PM2 activated) were then manually assessed for human genetic evidence in ClinVar entries and published reports using the following rules:
-
PP1—co-segregation with disease. This rule was defined as supporting for ≥ 3 observed meioses, moderate for ≥ 5 meioses and strong for ≥ 7 meioses.
-
PS2/PM6—de novo inheritance (with/without confirmed paternity and maternity).
-
The PS3 rule relating to effects in functional studies was not applied due to the lack of standardisation and validation in functional assays for HCM variants.
The number of patients with variants that still remained as VUS, i.e. unactionable according to current guidelines, but that would be upgraded to at least likely pathogenic based on the revised ACMG/AMP guidelines was calculated as described for the clinical HCM cohort, i.e. those in a variant class with an EF ≥ 0.95 for missense variants (activating PM1_strong) or EF ≥ 0.90 for inframe indels (activating PM1_moderate).
Genotype-phenotype analyses to validate variant pathogenicity
The clinical characteristics of two HCM cohorts were used to support the pathogenicity of variants upgraded on the basis of an EF ≥ 0.95. For the prospective HCM cohort, left ventricular (LV) mass values indexed to body surface area were derived from cardiac magnetic resonance imaging and compared between cases with pathogenic or likely pathogenic variants (current ACMG/AMP guidelines), VUS upgraded to likely pathogenic with EF rules, other VUS and genotype-negative cases (only variants in thick filament genes MYH7 and MYBPC3 were analysed due to the distinctive patterns of LV hypertrophy observed in cases with variants in thin filament genes [25]).
Outcome data was assessed using the Sarcomeric Human Cardiomyopathy Registry (SHaRe), a multi-centre international repository that aggregates clinical and genetic data from patients with cardiomyopathies including HCM. A total of 2694 HCM patients with both right-censored outcome data and known sarcomeric genotype were analysed—1254 patients with at least one pathogenic or likely pathogenic variant in any of the 8 sarcomeric genes; 1199 patients with no sarcomeric variants; and 241 patients with VUS in any of the sarcomeric genes. Of the 241 patients with VUS, 69 were reclassified as pathogenic as they had variants with an EF ≥ 0.95. Survival curves were calculated by Kaplan-Meier analysis with log-rank test for the proportion of patients free of the overall composite outcome for each of the four genotype groups [7].
Detection of enriched variant clusters in RYR2
Non-truncating variants in RYR2 are causative in up to 50% of patients with catecholaminergic polymorphic ventricular tachycardia (CPVT), a rare inherited arrhythmia affecting approximately 1 in 10,000 people. Variants in cases have been shown to cluster in specific regions of RYR2, which codes for a ryanodine receptor 4867 amino acids in length, with a subset of its 105 exons thought to account for the majority of disease-causing variation [26]. However, the relatively high level of background benign variation in RYR2 increases the uncertainty in interpreting novel variants detected in CPVT cases [27]. For RYR2 variants in CPVT, rare variants were defined as those with a filtering allele frequency in ExAC below the maximum credible allele frequency for CPVT [22], defined as 1 × 10−5 (prevalence = 1 in 10,000, allelic heterogeneity = 0.1, penetrance = 0.5, monoallelic inheritance, as calculated at http://cardiodb.org/allelefrequencyapp/). For variants without a filtering allele frequency, i.e. detected in a maximum of one individual for any major ExAC sub-population, those with an overall ExAC allele count less than three were deemed to be rare. All missense and single amino acid inframe insertions/deletions were included as non-truncating variants.
Case clusters were defined as described above using a recently published cohort of 1200 referral cases for CPVT and 155 well-phenotyped cases (78 classified as strong CPVT and 77 classified as possible CPVT) [27]. For calculating EFs, only the 1200 referral cases were used for comparison with ExAC. Although this will yield more conservative EFs than using definitively diagnosed cases or a mix of diagnosed and referral (the yield of RYR2 variants in the referral series was 18.2% compared to 59% in the well-phenotypes cases [27]), the EFs generated will be more relevant and applicable to real world referral genetic testing for CPVT. We compared results from our detected clusters with previously defined hotspot regions (exons 3–15, 44–50, 83–90 and 93–105) [26] and a recently refined set of exons based on variant enrichment in cases (3, 8, 14, 43, 47–49, 81, 83, 88–90, 93, 95, 97–101, 103, 105), using the same case cohort as here but calculated on an exon-by-exon basis [27].