Beyond gene-disease validity: capturing structured data on inheritance, allelic requirement, disease-relevant variant classes, and disease mechanism for inherited cardiac conditions
Genome Medicine volume 15, Article number: 86 (2023)
As the availability of genomic testing grows, variant interpretation will increasingly be performed by genomic generalists, rather than domain-specific experts. Demand is rising for laboratories to accurately classify variants in inherited cardiac condition (ICC) genes, including secondary findings.
We analyse evidence for inheritance patterns, allelic requirement, disease mechanism and disease-relevant variant classes for 65 ClinGen-curated ICC gene-disease pairs. We present this information for the first time in a structured dataset, CardiacG2P, and assess application in genomic variant filtering.
For 36/65 gene-disease pairs, loss of function is not an established disease mechanism, and protein truncating variants are not known to be pathogenic. Using the CardiacG2P dataset as an initial variant filter allows for efficient variant prioritisation whilst maintaining a high sensitivity for retaining pathogenic variants compared with two other variant filtering approaches.
Access to evidence-based structured data representing disease mechanism and allelic requirement aids variant filtering and analysis and is a pre-requisite for scalable genomic testing.
Inherited cardiac conditions (ICCs) are a group of disorders that share the potential for devastating outcomes, including heart failure and sudden cardiac death at a young age.
Early diagnosis is vital and allows prompt treatment, risk stratification, and primary prevention for sudden cardiac arrest in high-risk individuals. Genetic testing is a routine part of evaluation and can aid diagnosis and alter clinical management [1,2,3].
The scope of genetic testing for ICC-associated genes is growing. In addition to patients undergoing evaluation for confirmed or suspected disease, opportunistic screening for secondary findings is increasing as more patients undergo exome (ES) or genome sequencing (GS) in diverse clinical settings or via consumer-initiated testing. A recent statement by the American Heart Association (AHA) highlights the challenges in interpreting incidental and secondary findings . There are 47 of 90 medically actionable gene-disease pairs on the American College of Medical Genetics and Genomics Secondary Findings list (ACMG SF V3.1)  related to cardiovascular (CV) disease. The ACMG recommends that these genes are analysed whenever clinical ES or GS is performed and that pathogenic or likely pathogenic (P/LP) variants are reported back to patients. Therefore, many laboratories, regardless of their expertise, will soon need the capability to rapidly interpret variants in CV genes. This creates the potential for variant misclassification and/or poor communication of the interpretation of secondary findings to clinicians which could have significant downstream effects on patients and their families .
As access to sequencing and sharing of genomic data has improved, the number of genes and variants reported to be associated with any given disease has grown. Bioinformatic filtering pipelines often prioritise protein truncating variants that are indeed enriched for disease-causing variants in aggregate, but may not be pathogenic if loss of function (LoF) is not a mechanism for the relevant disease. At best, this results in time-consuming false positives and, at worst, can lead to misinterpretation of genomic test results. For ICCs, incomplete penetrance, genetic heterogeneity, oligogenic and modifying variants, overlapping phenotypes, and different disease mechanisms make variant interpretation particularly challenging.
There are international efforts underway to re-evaluate the validity of previously published gene-disease relationships. The Gene Curation Coalition (GenCC)  is a consortium of parties engaged in gene curation, and theGenCC.org (https://search.thegencc.org/)  is a harmonised repository of curated gene-disease relationships from many groups. Having established a robust gene-disease relationship, clinical interpretation of variation within a disease gene is critically dependent on an understanding of the allelic requirement for the disease, and of the mechanism of pathogenicity and disease-relevant variant classes. This data has not previously been consistently available in a structured format for variant prioritisation.
Here, we have analysed the inheritance, allelic requirement, disease mechanism, and disease-relevant variant classes for robust ICC-associated gene-disease pairs using a standardised terminology recently developed by the GenCC . The results of this analysis have been approved by international multidisciplinary expert review panels comprised of scientists and clinicians with expertise in ICCs. Structured data sets with this type of information do not exist currently and are shared here and as a publicly available resource, CardiacG2P, to aid in filtering and analysis of ICC genetic variants.
CardiacG2P is an evidence-based dataset hosted on G2P (https://www.ebi.ac.uk/gene2phenotype), an online system set up to establish, curate and distribute datasets for diagnostic variant filtering . Each dataset entry annotates a disease with an allelic requirement, information pertaining to the disease mechanism (represented as a disease-associated variant consequence), and known disease-relevant variant classes at a defined locus. This dataset is compatible with the existing G2P Ensembl Variant Effect Predictor (VEP)  plugin to support automated filtering of genomic variants accounting for inheritance pattern and mutational consequence. Other G2P datasets for developmental disorders and ophthalmic conditions have shown this approach can help to discriminate between variants, improving the precision of diagnostic variant filtering [10, 12]. G2P data are also available through the GenCC hub . Here we assess CardiacG2P and show its impact on the efficiency of variant prioritisation.
Analysis of inheritance and disease-associated variant consequences in genes implicated in inherited cardiac conditions
We analysed evidence to determine the inheritance pattern, allelic requirement, disease mechanism and disease-relevant variant classes for 65 gene-disease pairs for major ICCs (Fig. 1). We analysed genes classified with “Definitive” or “Strong” evidence by The Clinical Genome Resource (ClinGen) Gene Curation Expert Panels (GCEPs) for seven CV diseases under a Mendelian (monogenic) model (accessed November 2020) [13, 14]: hypertrophic cardiomyopathy (HCM), dilated cardiomyopathy (DCM), arrhythmogenic right ventricular cardiomyopathy (ARVC), long QT syndrome (LQTS), Brugada syndrome (BrS), catecholaminergic polymorphic ventricular tachycardia (CPVT), and short QT syndrome (SQTS) [15,16,17,18,19,20]. Information on these ClinGen expert panels, membership, and curation activity can be found at www.clinicalgenome.org. For HCM, we included both genes causing typical HCM and also genes associated with syndromic disorders where apparently isolated left ventricular hypertrophy (LVH) may be the presenting feature (genocopies) .
Seven channelopathy gene-disease pairs classified by ClinGen as having “Moderate” strength of evidence for monogenic disease are included (CALM1-CPVT, CALM2-CPVT, CALM3-CPVT, CASQ2-CPVT, KCNE1-JLN, SLC4A3-SQTS, KCNJ2-SQTS), following discussion with the channelopathy expert review panel for this project, and where there was sufficient data to adjudicate the required fields. SLC22A5 was also evaluated as a phenotypic mimic of SQTS: although it is classified as “Disputed” by ClinGen Short QT GCEP in relation to true SQTS, it is definitively associated with systemic primary carnitine deficiency disease, which can present similarly to SQTS and might reasonably be included in gene panels for diagnostic assessment of patients presenting with this phenotype. See Tables 1 and 2 and Additional file 3: Table S1 for a complete list of the gene-disease pairs evaluated.
Inheritance, allelic requirement, and disease-associated variant consequences (as a proxy for disease mechanism) are described using previously agreed standardised terms developed by the GenCC . These terms are formalised in the sequence ontology (SO)  and human phenotype ontology (HPO) . Briefly, since the precise disease mechanism is not always known, six high-level variant-consequence terms are used to describe disease-associated variant consequences. These are assigned depending on which variant classes are associated with disease (see Tables 2 and 3 in Roberts et al. ). As examples, “decreased gene product level” [SO:0002316] is used when disease is caused by variants that decrease the level or amount of gene product produced (e.g. variants leading to premature termination codons (PTCs) that trigger nonsense mediated decay (NMD), and gene deletions) and “altered gene product sequence” [SO:0002318] is used for non-truncating variants that instead alter the sequence of the gene product such as the amino acid sequence of a protein (e.g. missense variants, inframe insertions or deletions (indels), PTCs predicted to escape NMD, and stop loss). Variants producing PTCs are often referred to as “loss of function (LoF)” variants, but a PTC could lead to LoF, gain of function (GoF) through loss of a terminal regulatory region, or dominant negative effect. Similarly missense variants can cause GoF, LoF, or dominant negative effects. Using known pathogenic variant classes to describe which consequences, at a sequence level, have been associated with disease allows prediction of which other variant classes may be pathogenic whilst recognising that the downstream mechanisms following a particular sequence consequence can be diverse . More than one disease-associated variant consequence term can be used for each gene-disease pair.
Evidence was collected primarily from published, peer-reviewed literature, but also publicly accessible resources such as ClinGen  and variant databases (e.g. ClinVar ). Building on the previous work by ClinGen GCEPs to determine gene-disease validity, each gene-disease pair was analysed by an individual curator following a standard operating procedure for determining inheritance and disease-associated variant consequences (see Additional file 1). Curation results were then reviewed by panels of international experts (clinicians and scientists) drawn from the relevant disease area.
Development of CardiacG2P
A structured representation of the resulting data is available in Additional files 2, and 3 and also through G2P (https://www.ebi.ac.uk/gene2phenotype/downloads), which is also searchable through the GenCC portal .
For each curation entry, a gene or locus is linked to a disease via a disease-associated variant consequence (as a proxy for disease mechanism) and allelic requirement. Additional information including a confidence category of gene-disease validity (as previously assigned by ClinGen), a narrative summary describing key messages from the expert review, and relevant publication identifiers is also stored.
Unless specifically mentioned, genes previously curated for validity by ClinGen, but not classified as “Definitive” or “Strong” for cardiac disease are included on the panel for completeness. The panel reports the gene-disease validity classification (e.g. “Limited” evidence), but does not speculate on inheritance and mechanism terms where the gene-disease relationship is not established (for information, see the current version of the ClinGen gene-disease validity SOP ).
We evaluated the utility of CardiacG2P by comparing a variant prioritisation pipeline incorporating data from this structured resource against two alternative generic approaches available to an analyst without disease-specific expertise (see Fig. 2). All three pipelines interrogate the same gene list which includes the 21 HCM and 12 DCM genes evaluated here.
Pipeline 1: Generic bioinformatics analysis pipeline with 3-step filtering approach: filtering on gene symbol (for 33 gene-disease relationships classified by ClinGen as “Strong” or “Definitive” for HCM and/or DCM), retaining only rare variants (gnomAD  global allele frequency <0.0001), retaining only protein-altering variants (PAVs).
Pipeline 2: Generic bioinformatics analysis pipeline with 4-step filtering approach: on gene symbol, retaining only rare variants (gnomAD global allele frequency <0.0001), retaining variants that are either high impact (i.e. protein truncating variants (e.g. stop gained, frameshift) AND predicted to result in loss of function with high confidence by LOFTEE , a VEP plugin), OR that are previously classified in ClinVar  as P/LP (as annotated by VEP  version 104).
Pipeline 3 (Cardiac G2P): Using CardiacG2P dataset, variants were filtered: on gene symbol, retaining only rare variants (gnomAD global allele frequency <0.0001), and with allelic requirement, variant consequence, and gene-specific annotations of a restricted repertoire of pathogenic alleles all appropriate for the disease under interrogation—e.g. restricted variant classes, specific variants, or restricted regions of the protein. Specific examples include removing all TTN missense variants apart from three with segregation evidence. In addition for MYBPC3, all intronic variants were retained given recent work identifying more deeply intronic variants associated with disease. This information is available in either the restricted repertoire of pathogenic variants or narrative summaries.
Set 1: To assess sensitivity
Set 1 contains 285 unique gold-standard true positive variants classified as P/LP for HCM and DCM in the last 3 years by the Clinical Genetics & Genomics Laboratory of the NHS Genomic Medicine Service South-East Genomics Laboratory Hub at the Royal Brompton Hospital, London, which is one of 4 NHS England specialist cardiovascular genetics labs. These variants were identified using a custom gene panel using Agilent SureSelect QXT library preparation sequenced on Illumina MiSeq or NextSeq platforms. All variants were evaluated following guidelines produced by the ACMG/AMP  and the Association for Clinical Genomic Science (ACGS)  using an in-house validated pipeline.
For this study, a variant call format (VCF) file was created using these variants, then annotated using VEP  version 104, and filtered according to the 3 pipelines. We compared the number of P/LP variants retained by each of the 3 methods.
Set 2: To assess the positive rate—the number of variants retained for further analysis
Set 2a contains data from 200 patients with cardiomyopathy (either HCM or DCM) from the Royal Brompton & Harefield Hospitals Cardiovascular Research Biobank. Set 2b contains data from 200 healthy volunteers recruited for the digital heart project . Participants provided written informed consent, and the research had ethics committee approval. No individual patient data is reported. The GRCh37 reference genome assembly (Ensembl/GENCODE version 19) was used for sequencing and analysis. Details of the sequencing panels and platforms and the bioinformatics pipelines used for variant calling are previously reported . Briefly, samples were sequenced using the Illumina TruSight Cardio Sequencing Kit, which includes 174 genes reported as associated with ICCs, on the Illumina MiSeq and NextSeq platforms. Targeted DNA libraries were prepared according to manufacturers’ protocols before performing paired-end sequencing. For this study, merged VCF files containing single nucleotide variants (SNVs), and insertion or deletion variants were annotated using VEP version 104 and filtered according to the 3 pipelines described above.
Since it is not possible to define a gold-standard classification for these variants that does not incorporate the same expert knowledge captured in CardiacG2P (except potentially for a very small number of variants with orthogonal segregation data), we report the total number of variants retained by each of the three methods (the positive rate), rather than positive predictive value. This is indicative of the analytical burden for a diagnostic laboratory manually interpreting variants of interest retained by a filtering pipeline. We have included a healthy cohort to represent the potential analytical burden of secondary findings.
Inheritance and disease-associated variant consequences in established ICC genes
Forty cardiomyopathy gene-disease pairs (22 for HCM, 12 for DCM, and 6 for ARVC; overall 33 unique genes) were analysed for inheritance pattern, allelic requirement, disease-associated variant consequences, and variant classes reported with evidence of pathogenicity. These are presented in Table 1 (typical HCM, DCM, and ARVC) and Additional file 3: Table S1 (syndromic disorders that include HCM where LVH may be a presenting feature). Twenty-five channelopathy gene-disease pairs (11 for LQTS, 1 for BrS, 8 for CPVT, and 5 for SQTS; overall 15 unique genes) are presented in Table 2. Narrative summaries accompany each gene-disease pair, with content including relevant transcripts, specific pathogenic variants, mutational hotspots, phenotype notes, and other important information raised during the expert panel reviews and discussion (see Additional file 2 or Additional file 3: Tables S6–S7).
Cardiomyopathy genes are predominately characterised by autosomal dominant inheritance with incomplete penetrance. However, 3/6 ARVC genes demonstrate both autosomal dominant and recessive inheritance; JUP-related Naxos disease (a syndrome characterised by ARVC, woolly hair, and palmoplantar keratoderma) is exclusively inherited in an autosomal recessive manner, and 3/14 syndromic HCM genes (FHL1, GLA and LAMP2) are X-linked.
Importantly, only one of the eight core sarcomere-encoding HCM-associated genes (MYBPC3) causes disease through haploinsufficiency. LoF is not an established mechanism for the other 7 core HCM genes (as listed in Table 1) and NMD-competent PTCs are not known to be pathogenic. Instead, missense variants and variants predicted to escape NMD leading to an altered gene product sequence rather than decreased gene product level should be prioritised. This is also the case for 8/14 syndromic HCM (CACNA1C, FLNC, PRKAG2, PTPN11 (Noonan), PTPN11 (Noonan syndrome with multiple lentigines), RAF1, RIT1, TTR), 3/12 DCM (DES, TNNC1 and TNNT2), and 2/6 ARVC (JUP, TMEM43) gene-disease pairs.
Additional useful information for variant filtering is captured in individual narrative summaries. For example, for TTN-related DCM, only PTCs that are in exons constitutively expressed in both major adult cardiac isoforms (PSI > 0.9) should be prioritised [28, 30, 31]. Very few pathogenic missense variants in TTN-related DCM have been identified: to our knowledge, there are only three reported with segregation evidence [32,33,34]. Individually rare missense variants in TTN are collectively extremely common in the population (>50%, depending on allele frequency cut-off), and there are seldom established approaches to prioritise these in the absence of an informative pedigree. There are instances where evidence for disease comes primarily from one variant class such as missense variants only in MYL2, MYL3, and TPM1-related HCM, or from a single well-characterised variant, such as TMEM43-related ARVC and the founder missense variant NM_024334.3(TMEM43) c.1073C>T (p.S358L) . Pathogenicity of other variant classes, or indeed other missense variants, for TMEM43 is not established and this should guide the interpretation of variants in these gene-disease relationships.
For some gene-disease relationships, there are gene regions where there is a high confidence for pathogenicity, for example exon 9 in RBM20-related DCM (RS motif, amino acids 634-638). Other examples of mutational hotspots are referenced in individual curations.
The channelopathy genes are predominately characterised by autosomal dominant inheritance, though 7/25 gene-disease pairs demonstrate autosomal recessive inheritance.
For 7/11 LQTS, 4/7 CPVT and 5/5 SQTS, disease is due to altered gene product sequence and not a decrease in gene product level. For these gene-disease relationships, it is missense variants and other non-truncating variants that should be prioritised and assessed for pathogenicity.
Many of the channelopathy genes are implicated in more than one phenotype, or overlapping phenotypes; 25 gene-disease relationships are evaluated here but only 15 unique genes. Importantly, for several genes, distinct variant classes drive different phenotypes through distinct mechanisms. As an example, both PTCs and missense variants leading to LoF of KCNQ1 are associated with LQTS and Jervell Lange-Nielsen syndrome. In contrast, almost all evidence for KCNQ1 as a cause of SQTS is derived from a single missense variant, NM_000218.3(KCNQ1):c.421G>A (p.Val141Met), and functional studies in cell models have confirmed GoF as the mechanism [36, 37]. Similarly, both PTCs and non-truncating variants leading to LoF of SCN5A are associated with BrS, whereas SCN5A-related LQTS is caused by pathogenic missense variants and inframe indels leading to GoF.
For certain gene-disease pairs, there are gene regions where there is a higher confidence for pathogenicity such as, for non-truncating variants, the transmembrane regions and C-terminus domains for KCNQ1-related LQTS [38, 39], and the ion channel transmembrane regions and specific N-terminus and C-terminus domains for KCNH2-related LQTS . There are other examples of mutational hotspots referenced in individual curations (see Additional file 2 or Additional file 3: Tables S6–S7).
CardiacG2P reduces the number of variants prioritised, without compromising sensitivity to detect true positives
We assessed variant filtering using the CardiacG2P dataset for the identification of known P/LP variants previously classified by the cardiovascular laboratory of the NHS Genomic Medicine Service South-East Genomics Laboratory Hub at the Royal Brompton Hospital, London. A total of 285 P/LP variants in 16 HCM/DCM genes were used to assess the performance of the CardiacG2P dataset compared to two other generic pipelines (see Fig. 3A). CardiacG2P correctly identified 281/285 variants, a sensitivity of 98.6%. This was superior to both alternative approaches (pipeline 1, 272/285, sensitivity 95.4%, PFisher=0.046; pipeline 2, 198/285, 69.5%, PFisher ≤ 0.0001). Four variants were not retained by using the CardiacG2P dataset. These comprised 1 TTN missense variant and 2 intronic and 1 synonymous variant in LMNA. All four of these variants were classified as P/LP by the clinical laboratory due to impacts on splicing, so the limited sensitivity is due to an incomplete upstream annotation of the variant consequence, rather than an “error” in downstream filtering.
Assessing variant prioritisation—the number of variants retained for further analysis
We compared the number of variants retained by the 3 pipeline filters to assess the positive rate of each approach (see Fig. 3B). A pipeline with a high positive rate requires more downstream human effort for final variant adjudication.
First, we compared sequencing data (5681 unique variants) from 200 individuals with a confirmed diagnosis of HCM or DCM. CardiacG2P prioritised 67 variants, pipeline 1 prioritised 111 variants, and pipeline 2 prioritised 17.
Since the cardiomyopathy cohort would be very substantially enriched for true positives, we also assessed the positive rate in a healthy cohort, indicative of variants that may require follow-up during opportunistic screening for secondary findings. 6060 unique variants found in 200 healthy volunteers were analysed by each pipeline, with CardiacG2P prioritising 37 variants, pipeline 1 prioritising 73 variants, and pipeline 2 prioritising 3 variants.
Pipeline 2 prioritises the fewest variants in both contexts (17/5681 and 3/6060 respectively). This is to be expected as it filters on only high-impact LoF variants or variants classified as P/LP by ClinVar. However, this method also demonstrated the lowest sensitivity for P/LP variants (69.5%), because LoF is not a known mechanism for many of the ICC genes and any pathogenic missense or other non-truncating variants will be wrongly discarded by this method. In the disease cohort, compared to pipeline 1 which retains all PAVs, CardiacG2P demonstrated more efficient variant prioritisation retaining significantly fewer variants (PFisher = 0.001). In the healthy cohort, where we would expect a higher number of false-positive variants to be prioritised, CardiacG2P retained half the number of variants compared to pipeline 1 (37 vs. 73 variants, PFisher ≤ 0.001). CardiacG2P also maintained the highest sensitivity of all 3 pipelines at 98.6%.
Accurate variant classification in ICC genes requires robust strength of a gene-disease relationship and knowledge of inheritance pattern, disease mechanism, and pathogenic variant classes . The literature is constantly expanding with newly reported variants and re-evaluations of historical variant classifications. In ClinVar alone, there are over 1 million variants submitted. Over 49,000 have conflicting interpretations and others are submitted under multiple phenotypes making the relevant disease for the variant classification unclear. Variant classification is expanding beyond laboratories with long-standing interest and expertise in cardiovascular genetics. The ACMG secondary findings list means that others will need to rapidly acquire proficiency in reporting variants in CV genes. The AHA has recently published guidance and a framework to aid the interpretation and clinical application of variants in monogenic cardiovascular disease genes . To assist this process, we have curated the mode of inheritance, allelic requirement, and disease-associated variant consequences, for 65 ClinGen-curated ICC gene-disease pairs (48 unique genes), and following review by multidisciplinary expert panels, present this information as a publicly available structured dataset both here and via CardiacG2P (https://www.ebi.ac.uk/gene2phenotype/downloads), to aid variant analysis. This dataset is compatible with the existing G2P plugin for the widely used Ensembl Variant Effect Predictor.
Overall, for 36/65 gene-disease relationships, the disease is due to altered gene product sequence, not a decrease in gene product level. Therefore, for over 50% of the ICC genes evaluated here, current data cautions against a default prioritisation of predicted protein-truncating variants as pathogenic, with LoF as a presumed mechanism. The majority of the ICC genes are characterised by autosomal dominant inheritance with incomplete penetrance; however, there are notable examples of autosomal recessive and X-linked inheritance and more fully penetrant variants.
As well as the structured data, we have included narrative summaries to capture key notes that arose during evidence collection and expert discussion that may also aid variant filtering and interpretation. Throughout these discussions, several themes that relate to all the ICC genes emerged. It is widely accepted that ICC genes often display incomplete penetrance; however, given that most penetrance estimates have been made using cases , expert opinion and emerging evidence agree that overall penetrance may be lower than previously reported. This is particularly relevant and should be considered when assessing patients who have a pathogenic variant identified as a secondary finding outside of families with known disease [41, 42].
There are many examples of autosomal dominant ICC gene-disease relationships where compound heterozygous and homozygous variants, or variants in more than 1 known disease gene, are also reported. Approximately 10% of genotype-positive LQTS patients have >1 pathogenic variant in ≥1 LQTS-related gene [43, 44]. There was debate amongst the expert panel on how this should be recorded. In those instances where phenotypic features of people with biallelic variants are truly different to those with monoallelic variants (e.g. Jervell Lange-Nielsen Syndrome), this may represent true autosomal recessive or digenic inheritance and should be recorded as such. However, it was recognised that for many of the ICC genes, disease severity and penetrance are often the main distinguishing features between monoallelic and biallelic disease. In this circumstance, autosomal dominant inheritance is recorded with further information in the narrative summary acknowledging that if a second P/LP variant is identified, the disease often appears to be more penetrant and more severe [45,46,47,48] and can even lead to neonatal lethality.
It is important to interpret variants in the context of a gene-disease relationship rather than in the gene alone . There are several ICC genes implicated in more than one phenotype. For some, distinct mechanisms drive different diseases, e.g. MYH7-related HCM and MYH7-related DCM. Although both are caused primarily by missense variants in MYH7 altering the gene product sequence, distinct alleles have opposing effects on sarcomere force generation and drive different phenotypes [50, 51]. In contrast, although DSP is also associated with multiple phenotypes (including DCM, DCM with cutaneous features, ARVC, and Carvajal syndrome), these are overlapping and it does not appear that distinct mechanisms drive different presentations. Similarly, although the phenotype most frequently shown by patients with CALM pathogenic variants is LQTS, others display CPVT and sudden unexplained death and some CALM variants have been associated with both LQTS and CPVT, without evidence of distinct mechanisms underlying different phenotypic manifestations [49, 52].
Here we have evaluated CardiacG2P as a first-tier variant filter. This variant consequence and allelic requirement-aware approach increase the efficiency of variant prioritisation, without compromising on sensitivity, in comparison to two generic bioinformatic filtering pipelines (see Fig. 3). CardiacG2P retains significantly fewer variants than a pipeline where all PAVs are prioritised. The difference between CardiacG2P and the generic pipelines is even more marked in a healthy cohort, highlighting benefits in reducing the analytical burden of assessing secondary findings. Further refinement is also possible using additional variant information stored in the narrative summaries. CardiacG2P correctly identified 281/285 previously classified P/LP variants. The four variants that were not retained comprised 1 TTN missense variant and 2 intronic and 1 synonymous variant in LMNA. All 4 variants were predicted to have a significant impact on splicing by SpliceAI . Functional data is available to support the splicing effect of 2 of the LMNA variants. The TTN missense variant has been detected in 4 in-house DCM patients before. CardiacG2P filters are based on the consequence assigned by VEP, and upstream annotation by VEP had not recorded these 4 variants as impacting splicing. Improvements in the prediction of variant consequence, especially for variants impacting splicing, will allow these to be retained. While our framework recognises that some intronic or coding variants can impact splicing, it is not an expected consequence for the vast majority of such variants and therefore these will not be routinely retained. Rarely there will be instances where pathogenic variants are filtered by G2P if the upstream consequence annotation is incomplete or incorrect, so we must caution against simply discarding all non-prioritised variants and must continue to improve tools for variant consequence annotation. In the meantime, utilising tools such as SpliceAI and filtering on known P/LP variants in ClinVar will improve the identification of variants impacting splicing and the sensitivity of variant filtering pipelines.
We recognise the limitations of using relatively small numbers of variants and patients from a single site for our comparison of CardiacG2P to other methods. We also acknowledge we have compared CardiacG2P to two generic pipelines here and not a clinical diagnostic pipeline. However, we maintain that many clinical laboratories not specialising in cardiovascular disease will not have the expert knowledge collated here easily accessible.
As our knowledge of genes and specific variants contributing to ICCs expands, it is possible to update the CardiacG2P dataset dynamically and subsequently include new information in the VEP G2Pplugin.
As variant reporting moves away from labs with expertise in certain disease areas, it is vital that accurate variant classifications are maintained. Here, we present evidenced-based inheritance and variant consequence curations for robustly associated ICC genes with the benefit of expert review and opinion. We present this data for the first time in a structured format using new standardised terminology. This dataset is a publicly available resource, CardiacG2P, and we have demonstrated here its utility in the filtering of genomic variants in ICC genes.
Availability of data and materials
All data generated during this study are included in this published article. For convenience, a structured representation of the results is also available online through (i) G2P (https://www.ebi.ac.uk/gene2phenotype/downloads), which is also searchable through the GenCC portal (https://thegencc.org/), (ii) a publicly accessible repository in GitHub: https://doi.org/10.5281/zenodo.8434146, and (iii) (https://www.cardiodb.org/cardiac_g2p/Cardiac_G2P_Curations.html).
Association for Clinical Genomic Science
- ACMG SF V3.1:
American College of Medical Genetics and Genomics Secondary Findings list
American Heart Association
Arrhythmogenic right ventricular cardiomyopathy
The Clinical Genome Resource
Catecholaminergic polymorphic ventricular tachycardia
Gene Curation Expert Panels
Gene Curation Coalition
Gain of function
Human phenotype ontology
Inherited cardiac conditions
Insertions or deletions
Jervell and Lange-Nielsen
Loss of function
Long QT syndrome
Left ventricular hypertrophy
Protein altering variants
Primary systemic carnitine deficiency
Percent spliced in
Premature termination codons
Single nucleotide variants
Short QT syndrome
Variant call format
Variant Effect Predictor
Musunuru K, Hershberger RE, Day SM, Klinedinst NJ, Landstrom AP, Parikh VN, et al. Genetic testing for inherited cardiovascular diseases: a scientific statement from the american heart association. Circulation. 2020;13:373–85. https://doi.org/10.1161/HCG.0000000000000067.
Hershberger RE, Givertz MM, Ho CY, Judge DP, Kantor PF, McBride KL, et al. Genetic evaluation of cardiomyopathy: a clinical practice resource of the American College of Medical Genetics and Genomics (ACMG). Genet Med. 2018;20(9):899–909. https://doi.org/10.1038/s41436-018-0039-z.
Wilde AAM, Semsarian C, Márquez MF, Sepehri Shamloo A, Ackerman MJ, Ashley EA, et al. European Heart Rhythm Association (EHRA)/Heart Rhythm Society (HRS)/Asia Pacific Heart Rhythm Society (APHRS)/Latin American Heart Rhythm Society (LAHRS) Expert Consensus Statement on the State of Genetic Testing for Cardiac Diseases. Heart Rhythm. 2022;19(7):e1–60. https://doi.org/10.1016/J.HRTHM.2022.03.1225.
Landstrom AP, Chahal AA, Ackerman MJ, Cresci S, Milewicz DM, Morris AA, et al. Interpreting incidentally identified variants in genes associated with heritable cardiovascular disease: a scientific statement from the American Heart Association. Circulation. 2023;16(2):E000092. https://doi.org/10.1161/HCG.0000000000000092.
Miller DT, Lee K, Abul-Husn NS, Amendola LM, Brothers K, Chung WK, et al. ACMG SF v3.1 list for reporting of secondary findings in clinical exome and genome sequencing: a policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet Med. 2022;24(7):1407–14. https://doi.org/10.1016/j.gim.2022.04.006.
Green RC, Berg JS, Grody WW, Kalia SS, Korf BR, Martin CL, et al. ACMG Recommendations for Reporting of Incidental Findings in Clinical Exome and Genome Sequencing. Genet Med. 2013;15(7):565. https://doi.org/10.1038/GIM.2013.73.
DiStefano MT, Goehringer S, Babb L, Alkuraya FS, Amberger J, Amin M, et al. The Gene Curation Coalition: a global effort to harmonize gene-disease evidence resources. Genet Med. 2022;24(8):1732. https://doi.org/10.1016/J.GIM.2022.04.017.
DiStefano MT, Goehringer S, Babb L, Alkuraya FS, Amberger J, Amin M, et al. The GenCC database. https://search.thegencc.org/ . Accessed 3rd April 2022.
Roberts AM, DiStefano MT, Rooney Riggs E, Josephs KS, Alkuraya FS, Amberger J, et al. Towards robust clinical genome interpretation: developing a consistent terminology to characterize disease-gene relationships - allelic requirement, inheritance modes and disease mechanisms. MedRxiv. 2023. https://doi.org/10.1101/2023.03.30.23287948.
Thormann A, Halachev M, McLaren W, Moore DJ, Svinti V, Campbell A, et al. Flexible and scalable diagnostic filtering of genomic variants using G2P with Ensembl VEP. Nat Commun. 2019;10(1):2373–2373. https://doi.org/10.1038/S41467-019-10016-3.
McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17(1):1–14. https://doi.org/10.1186/S13059-016-0974-4/TABLES/8.
Lenassi E, Carvalho A, Thormann A, Abrahams L, Arno G, Fletcher T, et al. EyeG2P: an automated variant filtering approach improves efficiency of diagnostic genomic testing for inherited ophthalmic disorders Diagnostics. J Med Genet. 2023;1:0–9. https://doi.org/10.1136/jmedgenet-2022-108618.
Rehm HL, Berg JS, Brooks LD, Bustamante CD, Evans JP, Landrum MJ, et al. ClinGen — The Clinical Genome Resource. N Engl J Med. 2015;372(23):2235–42. https://doi.org/10.1056/NEJMsr1406261.
Clinical Genome Resource. Clinical Domain Working Groups. https://clinicalgenome.org/working-groups/clinical-domain/. Accessed 1 Nov 2020.
Adler, Novelli V, Amin AS, Abiusi E, Care M, Nannenberg EA, et al. An international, multicentered, evidence-based reappraisal of genes reported to cause congenital long QT syndrome. Circulation. 2020;141(6):418–28. https://doi.org/10.1161.119.043132.
Hosseini SM, Kim R, Udupa S, Costain G, Jobling R, Liston E, et al. Reappraisal of reported genes for sudden arrhythmic death: evidence-based evaluation of gene validity for Brugada syndrome. Circulation. 2018;138(12):1195. https://doi.org/10.1161/CIRCULATIONAHA.118.035070.
Walsh R, Adler A, Amin AS, Abiusi E, Care M, Bikker H, et al. Evaluation of gene validity for CPVT and short QT syndrome in sudden arrhythmic death. Eur Heart J. 2021. https://doi.org/10.1093/EURHEARTJ/EHAB687.
James CA, Jongbloed JDH, Hershberger RE, Morales A, Judge DP, Syrris P, et al. International evidence based reappraisal of genes associated with arrhythmogenic right ventricular cardiomyopathy using the clinical genome resource framework. Circulation. 2021;14:273–84. https://doi.org/10.1161/CIRCGEN.120.003273.
Ingles J, Goldstein J, Thaxton C, Caleshu C, Corty EW, Crowley SB, et al. Evaluating the clinical validity of hypertrophic cardiomyopathy genes. Circulation. 2019;12(2):57–64. https://doi.org/10.1161/CIRCGEN.119.002460.
Jordan E, Peterson L, Ai T, Asatryan B, Bronicki L, Brown E, et al. Evidence-based assessment of genes in dilated cardiomyopathy. Circulation. 2021;144(1):7–19. https://doi.org/10.1161/CIRCULATIONAHA.120.053033.
Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 2005;6(5):1–12. https://doi.org/10.1186/GB-2005-6-5-R44/FIGURES/4.
Gargano M, Matentzoglu N, Carmody LC, Lewis-Smith D, Vasilevsky NA, Danis D, et al. The Human Phenotype Ontology in 2021. Nucleic Acids Res. 2020;49(2):1207–17. https://doi.org/10.1093/nar/gkaa1043.
Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018;46. https://doi.org/10.1093/nar/gkx1153.
Clinical Genome Resource. Gene-Disease Validity Training Materials - ClinGen | Clinical Genome Resource. https://clinicalgenome.org/curation-activities/gene-disease-validity/training-materials. Accessed 3 April 2022.
Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans Genome Aggregation Database Consortium. Nature. 2020;581:19. https://doi.org/10.1038/s41586-020-2308-7.
Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17(5):405. https://doi.org/10.1038/GIM.2015.30.
Ellard S, Baple EL, Callaway A, Berry I, Forrester N, Turnbull C, et al. ACGS Best Practice Guidelines for Variant Classification in Rare Disease 2020 Recommendations ratified by ACGS Quality Subcommittee on 4 th. 2020; https://doi.org/10.1101/531210.
Roberts AM, Ware JS, Herman DS, Schafer S, Baksi J, Bick AG, et al. Integrated allelic, transcriptional, and phenomic dissection of the cardiac effects of titin truncations in health and disease. Sci Transl Med. 2015;7(270):270ra6. https://doi.org/10.1126/SCITRANSLMED.3010134.
Walsh R, Buchan R, Wilk A, John S, Felkin LE, Thomson KL, et al. Defining the genetic architecture of hypertrophic cardiomyopathy: re-evaluating the role of non-sarcomeric genes. Eur Heart J. 2017; ehw603. https://doi.org/10.1093/eurheartj/ehw603.
Schafer S, de Marvao A, Adami E, Fiedler LR, Ng B, Khin E, et al. Titin truncating variants affect heart function in disease cohorts and the general population. Nat Genet. 2017;49(1):46. https://doi.org/10.1038/NG.3719.
Morales A, Kinnamon DD, Jordan E, Platt J, Vatta M, Dorschner MO, et al. Variant interpretation for dilated cardiomyopathy (DCM): refinement of the ACMG/ClinGen Guidelines for the DCM Precision Medicine Study Circulation. Genom Precis Med. 2020;13(2):e002480. https://doi.org/10.1161/CIRCGEN.119.002480.
Gerull B, Gramlich M, Atherton J, Mcnabb M, Trombitás K, Sasse-Klaassen S, et al. Mutations of TTN, encoding the giant muscle filament titin, cause familial dilated cardiomyopathy. Nat Genet. 2002;30. https://doi.org/10.1038/ng815.
Herrero Galán E. Conserved cysteines in titin sustain the mechanical function of cardiomyocytes. https://doi.org/10.1101/2020.09.05.282913.
Hastings R, de Villiers CP, Hooper C, Ormondroyd L, Pagnamenta A, Lise S, et al. Combination of whole genome sequencing, linkage, and functional studies implicates a missense mutation in titin as a cause of autosomal dominant cardiomyopathy with features of left ventricular noncompaction. Circulation. 2016;9(5):426–35. https://doi.org/10.1161/CIRCGENETICS.116.001431/-/DC1.
Merner ND, Hodgkinson KA, Haywood AFM, Connors S, French VM, Drenckhahn JD, et al. Arrhythmogenic right ventricular cardiomyopathy type 5 is a fully penetrant, lethal arrhythmic disorder caused by a missense mutation in the TMEM43 gene. Am J Hum Genet. 2008;82(4):809. https://doi.org/10.1016/J.AJHG.2008.01.010.
Lee HC, Rudy Y, Liang H, Chen CC, Luo CH, Sheu SH, et al. Pro-arrhythmogenic effects of the V141M KCNQ1 mutation in short QT syndrome and its potential therapeutic targets: insights from modeling. J Med Biol Eng. 2017;37(5):780. https://doi.org/10.1007/S40846-017-0257-X.
Hong K, Piper D, Diazvaldecantos A, Brugada J, Oliva A, Burashnikov E, et al. De novo KCNQ1 mutation responsible for atrial fibrillation and short QT syndrome in utero. Cardiovasc Res. 2005;68(3):433–40. https://doi.org/10.1016/j.cardiores.2005.06.023.
Kapa S, Tester DJ, Salisbury BA, Harris-Kerr C, Pungliya MS, Alders M, et al. Genetic testing for long QT syndrome - distinguishing pathogenic mutations from benign variants. Circulation. 2009;120(18):1752. https://doi.org/10.1161/CIRCULATIONAHA.109.863076.
Walsh R, Lahrouchi N, Tadros R, Kyndt F, Glinge C, Postema PG, et al. Enhancing rare variant interpretation in inherited arrhythmias through quantitative analysis of consortium disease cohorts and population controls. Genet Med. 2021;23(1):47. https://doi.org/10.1038/S41436-020-00946-5.
Arbustini E, Behr ER, Carrier L, van Duijn C, Evans P, Favalli V, et al. Interpretation and actionability of genetic variants in cardiomyopathies: a position statement from the European Society of Cardiology Council on cardiovascular genomics. Eur Heart J. 2022;43(20):1901–16. https://doi.org/10.1093/EURHEARTJ/EHAB895.
Lorenzini M, Norrish G, Field E, Ochoa JP, Cicerchia M, Akhtar MM, et al. Penetrance of hypertrophic cardiomyopathy in sarcomere protein mutation carriers. J Am Coll Cardiol. 2020;76(5):550. https://doi.org/10.1016/J.JACC.2020.06.011.
de Marvao A, McGurk KA, Zheng SL, Thanaj M, Bai W, Duan J, et al. Phenotypic expression and outcomes in individuals with rare genetic variants of hypertrophic cardiomyopathy. J Am Coll Cardiol. 2021;78(11):1097–110. https://doi.org/10.1016/J.JACC.2021.07.017/SUPPL_FILE/MMC1.DOCX.
Tester DJ, Will ML, Haglund CM, Ackerman MJ. Compendium of cardiac channel mutations in 541 consecutive unrelated patients referred for long QT syndrome genetic testing. 2005. https://doi.org/10.1016/j.hrthm.2005.01.020.
Kapplinger JD, Tester DJ, Salisbury BA, Carr JL, Harris-Kerr C, Pollevick GD, et al. Spectrum and prevalence of mutations from the first 2,500 consecutive unrelated patients referred for the FAMILION® long QT syndrome genetic test. Heart Rhythm. 2009;6(9):1297. https://doi.org/10.1016/J.HRTHM.2009.05.021.
Bhonsale A, Groeneweg JA, James CA, Dooijes D, Tichnell C, Jongbloed JD H, et al. Impact of genotype on clinical course in arrhythmogenic right ventricular dysplasia/cardiomyopathy-associated mutation carriers. Euro Heart J. 2015;36:847–55. https://doi.org/10.1093/eurheartj/ehu509.
Kolokotronis K, Kühnisch J, Klopocki E, Dartsch J, Rost Simone, Huculak C, et al. Biallelic mutation in MYH7 and MYBPC3 leads to severe cardiomyopathy with left ventricular noncompaction phenotype. Hum Mutat. 2019;40:1101–14. https://doi.org/10.1002/humu.23757.
Alders M, Bikker H, Christiaans I. Long QT syndrome. 2003. https://www.ncbi.nlm.nih.gov/books/ .
Girolami F, Ho CY, Semsarian C, Baldi M, Will ML, Baldini K, et al. Clinical features and outcome of hypertrophic cardiomyopathy associated with triple sarcomere protein gene mutations. J Am Coll Cardiol. 2010;55(14):1444–53. https://doi.org/10.1016/J.JACC.2009.11.062.
Thaxton C, Goldstein J, DiStefano M, Wallace K, Witmer PD, Haendel MA, et al. Lumping versus splitting: how to approach defining a disease to enable accurate genomic curation. Cell Genom. 2022;2(5): 100131. https://doi.org/10.1016/J.XGEN.2022.100131.
Ujfalusi Z, Vera CD, Mijailovich SM, Svicevic M, Yu EC, Kawana M, et al. Dilated cardiomyopathy myosin mutants have reduced force-generating capacity. J Biol Chem. 2018;293(23):9017. https://doi.org/10.1074/JBC.RA118.001938.
Sommese RF, Sung J, Nag S, Sutton S, Deacon JC, Choe E, et al. Molecular consequences of the R453C hypertrophic cardiomyopathy mutation on human β-cardiac myosin motor function. Proc Natl Acad Sci USA. 2013;110(31):12607–12. https://doi.org/10.1073/PNAS.1309493110/-/DCSUPPLEMENTAL.
Crotti L, Spazzolini C, Tester DJ, Ghidoni A, Baruteau AE, Beckmann BM, et al. Calmodulin mutations and life-threatening cardiac arrhythmias: insights from the International Calmodulinopathy Registry. Eur Heart J. 2019;40(35):2964. https://doi.org/10.1093/EURHEARTJ/EHZ311.
Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, et al. Predicting splicing from primary sequence with deep learning. Cell. 2019;176(3):535-548.e24. https://doi.org/10.1016/J.CELL.2018.12.015.
The following authors have taken part in the ClinGen Cardiovascular Clinical Domain Working Group https://clinicalgenome.org/working-groups/clinical-domain/cardiovascular/ and/or are members of a ClinGen Gene Curation Expert Panel (GCEP) affiliated to this working group: Roddy Walsh, Matthew Edwards, Courtney Thaxton, Melanie Care, Wojciech Zareba, Arnon Adler, Amy C. Sturm, Valeria Novelli, Emma Owens, Lucas Bronicki, Olga Jarinova, Bert Callewaert, Stacey Peters, Tom Lumbers, Elizabeth Jordan, Babken Asatryan, Neesha Krishnan, Ray E. Hershberger, C. Anwar A. Chahal, Andrew P. Landstrom, Cynthia James, Elizabeth M. McNally, Daniel P. Judge, Peter van Tintelen, Arthur Wilde, Michael Gollob, Jodie Ingles, and James S. Ware.
JSW was supported by the Sir Jules Thorn Trust [21JTA], Wellcome Trust [107469/Z/15/Z; 200990/A/16/Z], Medical Research Council (UK), British Heart Foundation [RE/18/4/34215], NHLI Foundation Royston Centre for Cardiomyopathy Research, and the NIHR Imperial College Biomedical Research Centre. KSJ was supported by the Wellcome Trust [222883/Z/21/Z]. AMR was supported by the British Heart Foundation Fellowship [FS/CRLF/21/23011]. PT was supported by the Wellcome Trust [200990/A/16/Z]. This publication was supported in part by the National Human Genome Research Institute of the National Institutes of Health through the following grants: U24HG009650. AW and PvT are supported by CVON/Dutch Heart Foundation PREDICT2 (2018-30); RT is supported by the Canada Research Chairs program; TL receives support from BHF Research Accelerator; BC is a Senior Clinical Investigator of the Research Foundation – Flanders; EMM is supported by NIH HL128075, American Heart Association.
For the purpose of open access, the authors have applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.
The views expressed in this work are those of the authors and not necessarily those of the funders.
Ethics approval and consent to participate
No individual patient data is reported.
Royal Brompton and Harefield Hospitals Cardiovascular Research Biobank participants provided written informed consent, HRA research ethics approval: South Central Hampshire B Research Ethics Committee 19/SC/0257. Healthy volunteers in the digital heart project provided written informed consent, HRA research ethics committee approval: London – West London and GTAC Research Ethics Committee 09/H0707/69. The research conformed to the principles of the Helsinki Declaration.
Consent for publication
EMM is a Consultant for Amgen, AstraZeneca, Avidity Biosciences, Cytokinetics, PepGen, Pfizer, Stealth Biotherapeutics, and Tenaya Therapeutics and founder of Ikaika Therapeutics. CJ is a Consultant for Pfizer Inc (paid), StrideBio Inc (unpaid), and Tenaya Inc (unpaid). TL has research grant support from Pfizer. DPJ is a Consultant for Alexion, Alleviant, Cytokinetics, Novo Nordisk, Pfizer, and Tenaya Therapeutics. JI has research grant support from Bristol Myers Squibb. JSW has received research support or consultancy fees from Myokardia, Bristol-Myers Squibb, Pfizer, and Foresite Labs. The other authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Standard operating procedure for gene-disease curations. This document provides a template and standard operating procedure for the curation of inheritance, allelic requirement and disease mechanism for gene-disease pairs already curated by ClinGen using standardised terminology.
Inheritance and mechanism curation summaries for all gene-disease pairs. Data from individual gene-disease pair curations presented in individual tables with a narrative summary describing key messages from the expert review with relevant publication identifiers.
Table S1. A table showing the curation of syndromic forms of (hypertrophic) cardiomyopathy that can have isolated left ventricular hypertrophy as the presenting feature: structured representation of inheritance, allelic requirement, disease-associated variant consequence, and variant classes reported with evidence of pathogenicity for each gene-disease pair. Tables S2–S5. Details of the filtering process of each pipeline for the 3 datasets (Table S2 - Set 1, Table S3 - Set2a and Table S4 -Set2b). Details of the demographics of the cohorts used in Set2a and Set2b are available in Table S5. Tables S6–S8. The same information that is presented in Additional File 2 is included here in xls format. Table S6. (CardiacG2P) includes a structured representation of inheritance and mechanism data for all curated gene-disease pairs. In addition this also includes information for 7 genes related to a syndrome where LVH is seen only with overt syndromic features. Table S7. (Narr_sum) has narrative summaries for each gene-disease pair as plain free text. Table S8. (Other_limited) is a list of gene-disease pairs where there is no established relationship (gene disease validity assertion from ClinGen); these are included for completeness.
About this article
Cite this article
Josephs, K.S., Roberts, A.M., Theotokis, P. et al. Beyond gene-disease validity: capturing structured data on inheritance, allelic requirement, disease-relevant variant classes, and disease mechanism for inherited cardiac conditions. Genome Med 15, 86 (2023). https://doi.org/10.1186/s13073-023-01246-8