ClinGen Pathogenicity Calculator: a configurable system for assessing pathogenicity of genetic variants
- Ronak Y. Patel†1Email author,
- Neethu Shah†1,
- Andrew R. Jackson1,
- Rajarshi Ghosh1,
- Piotr Pawliczek1,
- Sameer Paithankar1,
- Aaron Baker1,
- Kevin Riehle1,
- Hailin Chen1,
- Sofia Milosavljevic1,
- Chris Bizon2,
- Shawn Rynearson3,
- Tristan Nelson4,
- Gail P. Jarvik9,
- Heidi L. Rehm5, 6,
- Steven M. Harrison5,
- Danielle Azzariti5,
- Bradford Powell7,
- Larry Babb8,
- Sharon E. Plon1,
- Aleksandar Milosavljevic1Email author and
- on behalf of the ClinGen Resource
© The Author(s). 2017
Received: 17 August 2016
Accepted: 7 December 2016
Published: 12 January 2017
The success of the clinical use of sequencing based tests (from single gene to genomes) depends on the accuracy and consistency of variant interpretation. Aiming to improve the interpretation process through practice guidelines, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) have published standards and guidelines for the interpretation of sequence variants. However, manual application of the guidelines is tedious and prone to human error. Web-based tools and software systems may not only address this problem but also document reasoning and supporting evidence, thus enabling transparency of evidence-based reasoning and resolution of discordant interpretations.
In this report, we describe the design, implementation, and initial testing of the Clinical Genome Resource (ClinGen) Pathogenicity Calculator, a configurable system and web service for the assessment of pathogenicity of Mendelian germline sequence variants. The system allows users to enter the applicable ACMG/AMP-style evidence tags for a specific allele with links to supporting data for each tag and generate guideline-based pathogenicity assessment for the allele. Through automation and comprehensive documentation of evidence codes, the system facilitates more accurate application of the ACMG/AMP guidelines, improves standardization in variant classification, and facilitates collaborative resolution of discordances. The rules of reasoning are configurable with gene-specific or disease-specific guideline variations (e.g. cardiomyopathy-specific frequency thresholds and functional assays). The software is modular, equipped with robust application program interfaces (APIs), and available under a free open source license and as a cloud-hosted web service, thus facilitating both stand-alone use and integration with existing variant curation and interpretation systems. The Pathogenicity Calculator is accessible at http://calculator.clinicalgenome.org.
By enabling evidence-based reasoning about the pathogenicity of genetic variants and by documenting supporting evidence, the Calculator contributes toward the creation of a knowledge commons and more accurate interpretation of sequence variants in research and clinical care.
KeywordsGenome sequencing Exome sequencing Clinical genome sequencing Clinical exome sequencing ACMG guidelines ClinVar ClinGen Clinical Genome Resource Knowledge commons Data commons Data sharing Linked Data Big Data
Successful clinical diagnostic application of exome and genome sequencing hinges on the ability of clinical laboratories to consistently and accurately assess pathogenicity of genetic variants. Several studies have identified inconsistencies in variant classification across laboratories, reference databases, and even within individual databases [1–4]. Interpretation requires the use of professional judgment and non-standardized evidence types, thus it is unrealistic to expect interpretation concordance of all variants across all laboratories and data sources. However, it is anticipated that knowledge sharing, standardization of best practices, and transparent documentation of evidence-based reasoning will have significant positive impact on both consistency and accuracy.
To improve the standardization of variant interpretation by clinical testing laboratories, the American College of Medical Genetics (ACMG) and the Association of Molecular Pathologists (AMP) have recently updated guidelines for the interpretation of sequence variants . The new guidelines provide significantly more information about the types and weight of evidence used to classify variants compared to the prior guidelines . Overall, they define 28 different evidence codes capturing different types of evidence for or against pathogenicity of a variant, only a subset of codes typically being applicable for a specific variant. There are 20 rules for combining the evidence codes for a given variant in order to reach one of five conclusions about pathogenicity (Pathogenic, Likely Pathogenic, Uncertain Significance, Likely Benign, or Benign).
The formal structuring of evidence-based arguments for or against pathogenicity and their implementation in software provide several opportunities for enhancing the emerging knowledge of genetic variation. One such opportunity is efficient evidence tracking that facilitates critical evaluation of variant classifications. Recent studies [1, 4] suggests that a framework for organizing and communicating evidential support and the reasoning behind specific assessments of variant pathogenicity facilitates resolution of inter-laboratory discordances. However, currently available software systems and web services provide only rudimentary support for guideline-based reasoning  and do not facilitate full evidence tracking or collaborative resolution of discordances. They also do not enable guideline configuration to meet ACMG guideline specification and customization for specific genes and diseases, nor integration with existing variant curation systems. We here address these gaps by describing the design, development, and application of the ClinGen Pathogenicity Calculator, thus fostering the improvement of clinical classification of variants, a key goal of the NIH supported Clinical Genome Resource (ClinGen) .
The main interface
Allele registration and data linking
As illustrated in Additional file 2: Figure S2, the Calculator utilizes the services of the ClinGen Allele Registry to identify a unique variant and collates information about it from external sources to aid in classification. The registry accepts wide variety of Human Genome Variation Society (HGVS) recommended expressions for substitution, insertion, deletion, and indels . It also supports query using HUGO Gene Nomenclature Committee (HGNC) gene identifiers. If an allele is not present in the Registry, the Calculator may register the new allele within seconds upon request by an authenticated user. More information about the ClinGen Allele Registry is available at http://reg.clinicalgenome.org (manuscript in preparation).
The registry enables linking and viewing of pathogenicity assessments that are independently generated by different users (displayed under different tabs in the second panel in Fig. 1) as well as linking of relevant external supporting information about the variant. The current set of externally linked information includes ClinVar , dbSNP , dbNSFP , COSMIC , ExAC , and myvariant.info .
Modeling and storage of semi-quantitative guidelines and assessments
For the purpose of modeling and storing configurable semi-quantitative guidelines and assessments for individual variants, the Calculator utilizes GenboreeKB, a system for document-oriented modeling and storage. The information is stored within four document collections within GenboreeKB. The first two collections store semi-quantitative guidelines such as the ACMG/AMP guidelines using one document collection for guideline rules and metarules and another for evidence codes. The third document collection defines the cell (row and column) for each evidence tag within the evidence grid (fourth panel in Fig. 1 or alternatively Fig. 1d). The fourth document collection stores the complete pathogenicity assessment information for individual variants.
The reasoning engine
One core module of the Pathogenicity Calculator is a generic reasoner written in the R programming language. The reasoner takes the guidelines (defined using rules, metarules, and evidence tags) and aggregated evidence for the variant being pathogenic or benign (encoded as a set of evidence tags) and produces a conclusion (Fig. 1). The reasoner provides documentation of reasoning for each conclusion and identifies conclusions that are a few evidence tags away from being satisfied. The output of the reasoner is a JSON document that is used by the Pathogenicity Calculator interface to display the conclusion, the evidence codes, and any other conclusions that are close to being reached.
Support for collaboration
Data modeling, storage, and application program interfaces
GenboreeKB utilizes MongoDB for data storage and exposes in an access-controlled way. The complete data contents of a Calculator is exposed as JSON via Genboree REST HTTP application program interfaces (APIs). GenboreeKB stores both data models and the data itself as document collections and exposes them via the APIs.
Integration with existing variant curation and evaluation tools and systems
Numerous tools and systems for variant curation and evaluation have been developed and used by various laboratories. Recognizing that some of those tools may benefit from the functionalities of the Calculator, all the data and functionality of the Calculator are programmatically accessible via REST APIs. The Calculator may be used as a web service (with authentication and collaborative features such as group-based access control) or by installing it on local hardware.
The original ACMG/AMP guidelines can be translated into an algorithm with only modest adaptations
Two adaptations of the original guidelines had to be introduced to facilitate formal algorithmic reasoning. First, several metarules were introduced to capture implicit “common sense” rules practitioners were expected to apply without explicit instruction. For example, if there are two strong and three moderate-strength evidence tags in favor of pathogenicity, then the variant can be pathogenic (≥2 strong evidence tag) or likely pathogenic (≥3 moderate-strength evidence tags). In such a case the metarules specify that the variant is pathogenic. Conversely, evidence tags that can be combined to yield both Likely Benign and Benign are scored as Benign. The Calculator distinguishes between uncertain significance that arises due to insufficient evidence from uncertain significance due to conflicting evidence.
Second, the guidelines allow the practitioners to change the strength level originally assigned to certain tags based on perceived strength of supporting data for the tag (“To provide critical flexibility to variant classification, some criteria listed as one weight can be moved to another weight using professional judgment, depending on the evidence collected,” from ). To accommodate this flexibility within the formal algorithmic reasoning system of the Calculator, multiple new tags were defined for each original tag where the new tags are identical except that their evidence strength was formally redefined. For example, the original “PVS1” tag constitutes “Very Strong” evidence for pathogenicity while the newly defined “PVS1-Strong” tag is defined as the same type of evidence except that it constitutes only “Strong” evidence for pathogenicity.
The calculator reduces human error when applying variant assessment guidelines
The ACMG/AMP guidelines and the Calculator were tested in the context of a multi-site study  involving nine molecular diagnostic laboratories involved in the Clinical Sequencing Exploratory Research (CSER) consortium and 99 variants that were evaluated by multiple laboratories who applied the ACMG/AMP rules. The Calculator helped identify human error when these groups applied the guidelines. The Calculator identified that for 16 of 353 (5%) of the variant assessments, the ACMG/AMP evidence codes listed by the participating laboratory did not support the conclusion provided by the laboratory; of these, nine (2.5%) of the 353 were errors. The remaining seven were cases when the labs used professional judgment to override the ACMG/AMP rules.
The user can then add evidence tags using one of two available interfaces. A click on an individual cell in Panel 4 activates a new panel where one or more tags that belong to that cell may be applied (Fig. 2b). A user may enter an optional free-text summary and may also associate evidence tags with multiple links to supporting data (Fig. 2c). Figure 2d shows a snapshot of a panel to manage tags using an alternative interface. This panel lists all possible tags for a selected type of variant interpretation data (e.g. population, functional, etc.). This interface is useful when the user does not know the exact location of a tag within the evidence grid (fourth panel in Fig. 1). Immediately upon addition or deletion of a tag (using any of the two interfaces), conclusions in the third panel (Fig. 1) are updated. The conclusions reached based on current tag selection appear highlighted at the top of this panel. A click on the icon to the right of this conclusion highlights the columns in the evidence grid (Additional file 1: Figure S1) that contain supporting evidence for the conclusion. In addition, potential conclusions that are close to being reached are listed, together with the total number of tags that need to be turned on for a conclusion to be reached. A click on the icon to the right of such conclusions highlights the columns in the evidence grid where the missing evidence tags reside (Additional file 1: Figure S1).
Advanced users (application developers) can access the functionalities of the Calculator via Genboree REST-APIs. Using the services, users can upload evidence tags for several variants at a time. The uploaded data become accessible through the interface as well as through REST-APIs. In the future, the bulk upload functionality will be made accessible even for regular users without any background in computer programming.
The calculator facilitates critical evaluation of pathogenicity assertions
The calculator helps detect and resolve discordant conclusions
Recent studies indicate that the initial application of the ACMG/AMP guidelines did not increase concordance in laboratory assessment ; however, it did help facilitate resolution of the inter-laboratory discordances by providing a framework for organizing and communicating evidential support behind independent assessments of the same variant. To facilitate this process, the ability of the Calculator to track the reasoning and evidence discussed above is necessary but not sufficient. Two more features are essential. First, independent assessments of the same allele must be standardized, as different labs/groups may be using different transcripts or genome builds. This feature is accomplished through the services of ClinGen Allele registry (top panel, Fig. 1). Second, public or group-based sharing of assessments is critical, which is accomplished by using group access management features of the Genboree system. As a result, for each assessment for a given allele that is either owned by a user or available to him/her for public or group-based sharing, the user sees separate tabs in the second panel (Fig. 1) of the Calculator. By clicking on the tabs, the user may access complete information about other assessments (in panels 3 and 4, Fig. 1, Additional file 1: Figure S1 and Additional file 3: Figure S3). This group-based comparison of variant interpretation can also be used for proficiency testing challenges where different sites can be asked to interpret the same variant. The interpretation at the end of a challenge can then be compared to evaluate the degree of overall concordance and identify specific discordances.
The Calculator does not share information about pathogenicity of variant publicly without explicit user request. The information entered is protected using a login and password. It provides optional free-text fields summarizing tag assignments and links to supporting evidence. The Calculator does not require or have fields for providing identifying information about patients. Public sharing per user request will be enabled in future releases.
The calculator guidelines are configurable without programming to accommodate guideline evolution and customizations for specific conditions
Although ACMG/AMP guidelines provide a general framework for Mendelian sequence variant interpretation, it was always recognized that this framework would evolve and be customized to meet requirements for specific genes and disease conditions [2, 16–19]. To address these anticipated needs, the Calculator is designed to support guideline evolution and customization without programming. By using a lower-level GenboreeKB interface (not shown), an “admin” user may fully configure the Calculator by editing the following three structured documents: (1) guideline rules and metarules; (2) evidence codes and their assignments to the columns and rows of the evidence grid (fourth panel in Fig. 1); and (3) the rows and columns of the evidence grid itself. These three documents completely define a formal “guideline” used by the Calculator. Considering this configurability, it is possible to customize the Calculator for specific purposes (e.g. to implement guidelines for somatic variant interpretation and/or multifactorial inheritance).
While each assessment refers to a specific guideline, different guidelines may exist within the same system. For example, for the purpose of assessing variants within genes causing inborn errors of metabolism, metabolite levels could be used as an evidence type in pathogenicity assessments. Because the original ACMG/AMP guidelines do not specify this type of evidence or potentially others that will be added in the future, we have designed flexibility into the Calculator to allow for addition of new evidence types (new rows in the evidence grid) and corresponding evidence tags and of new rules without programming.
By automating the formal guideline-based assessment of genetic variants, the ClinGen Pathogenicity Calculator can reduce human error. The Calculator extends functionality of other currently available calculators  along several dimensions. First, it enables critical evaluation of pathogenicity assessments by documenting the complete reasoning from conclusions to relevant rules, evidence tags and supporting data needed to reach a more definitive conclusion (e.g. from likely pathogenic to pathogenic; Additional file 1: Figure S1). Second, the Calculator facilitates collaborative resolution of discordances by linking multiple independent assessments of the same variant and by enabling their group-based sharing. Third, by being configurable, the Calculator accommodates evolution of ACMG/AMP guidelines and their customizations for specific genes and diseases [1, 16–18]. Fourth, the modular reasoning engine facilitates integration of more quantitative guidelines such as those based on Bayesian inference that are used by the community for the assessment of some cancer predisposition genes . Finally, the Calculator supports programmatic integration of rule-based reasoning into existing variant interpretation and curation systems by providing robust APIs implemented using free open source software. Efforts are under way to integrate one or more functionalities of the Calculator within the variant curation interface that is under development by the ClinGen Resource.
By providing an intuitive user interface and by addressing other key computational requirements, the Calculator facilitates formal and collaborative evidence-based reasoning about the pathogenicity of genetic variants, thus paving the way toward more effective and accurate clinical genomic interpretation.
The authors are thankful to Selina Dwight, Matthew Wright, Tam Sneddon, and Erin Riggs for their discussions and contributions.
This work was funded by National Human Genome Research Institute (NHGRI) under award numbers U01 HG007436-01 and U41HG006834. The ClinGen Consortium is funded by the NHGRI, in conjunction with additional funding from the National Institute of Child Health and Development (NICHD) and National Cancer Institute (NCI), through the following grants and contracts: 1U41HG006834-01A1, 1U01HG007437-01 1U01HG007436-01 and HHSN261200800001E. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Data provided by the Clinical Sequencing Exploratory Research bake-off were supported by NIH NHGRI grants U01HG006500, U01HG006507, and U01HG007307.
Availability of data and materials
The Calculator software is modular and available under a free open source license as part of Genboree system distribution and as a service hosted on the commercial cloud (currently hosted at Rackspace, Inc.), thus facilitating both stand-alone use and integration with existing variant curation and interpretation systems.
The web services provided by ClinGen Pathogenicity Calculator are available at http://calculator.clinicalgenome.org. The Calculator is available under a free open source license as part of the Genboree server distribution http://www.genboree.org/site/software.
RYP, SEP, and, AM wrote the manuscript. AM, RYP, and ARJ conceptualized and designed the functioning of the Calculator. RYP, NS, SP, ARJ, AB, and KR coded various aspects of the Calculator. PP helped in deployment. SH, RG, and SEP helped in the interpretation of ACMG/AMP guidelines for germline sequence variants and also provided suggestions for Calculator improvements. HR and GJ provided data from CSER bake-off projects and provided a significant contribution to improving the draft of the manuscript. HC implemented a preliminary version of the reasoning engine. HR, SH, LB, CB, TN, RG, BP, SM, PP, and DA contributed to the testing, refinement, and/or conceptual design of the Pathogenicity Calculator and/or ClinGen allele registry. All authors read and approved the final manuscript.
Baylor College of Medicine (BCM) and Miraca Holdings, Inc. have formed a joint venture with shared ownership and governance of the Baylor Genetics, which performs exome sequencing. Dr. Plon is an employee of BCM and serves on the Scientific Advisory board of Baylor Genetics. Dr. Milosavljevic is an employee of BCM and performs integration consulting services for BCM-developed software including Genboree through IP Genesis, Inc. Mr. Babb is an employee of GeneInsight a Sunquest Information Systems Company, which offers a full line of laboratory software solutions including but not limited to GeneInsight. The remaining authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Amendola LM, Jarvik GP, Leo MC, McLaughlin HM, Akkari Y, Amaral MD, et al. Performance of ACMG-AMP variant-interpretation guidelines among nine laboratories in the Clinical Sequencing Exploratory Research Consortium. Am J Hum Genet. 2016;98(6):1067–76.View ArticlePubMedPubMed CentralGoogle Scholar
- Pepin MG, Murray ML, Bailey S, Leistritz-Kessler D, Schwarze U, Byers PH. The challenge of comprehensive and consistent sequence variant interpretation between clinical laboratories. Genet Med. 2016;18(1):20–4.View ArticlePubMedGoogle Scholar
- Vail PJ, Morris B, van Kan A, Burdett BC, Moyes K, Theisen A, et al. Comparison of locus-specific databases for BRCA1 and BRCA2 variants reveals disparity in variant classification within and among databases. J Community Genet. 2015;6(4):351–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Harrison S, Dolinsky J, Vincent L, Chao E, Azzariti D, Das S, et al. Clinical laboratories implement the ACMG/AMP guidelines to resolve differences in variant interpretations submitted to ClinVar. In: ACMG Annual Clinical Genetics Meeting. Tampa, Florida; 2016. http://epostersonline.com/acmg2016/node/624.Google Scholar
- Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17(5):405–24.View ArticlePubMedPubMed CentralGoogle Scholar
- Richards CS, Bale S, Bellissimo DB, Das S, Grody WW, Hegde MR, et al. ACMG recommendations for standards for interpretation and reporting of sequence variations: Revisions 2007. Genet Med. 2008;10(4):294–300.View ArticlePubMedGoogle Scholar
- Kleinberger J, Maloney KA, Pollin TI, Jeng LJ. An openly available online tool for implementing the ACMG/AMP standards and guidelines for the interpretation of sequence variants. Genet Med. 2016;18(11):1165.View ArticlePubMedGoogle Scholar
- Rehm HL, Berg JS, Brooks LD, Bustamante CD, Evans JP, Landrum MJ, et al. ClinGen--the clinical genome resource. N Engl J Med. 2015;372(23):2235–42.View ArticlePubMedPubMed CentralGoogle Scholar
- den Dunnen JT, Dalgleish R, Maglott DR, Hart RK, Greenblatt MS, McGowan-Jordan J, et al. HGVS recommendations for the description of sequence variants: 2016 update. Hum Mutat. 2016;37(6):564–9.View ArticleGoogle Scholar
- Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 2016;44(D1):D862–8.View ArticlePubMedGoogle Scholar
- Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29(1):308–11.View ArticlePubMedPubMed CentralGoogle Scholar
- Liu X, Jian X, Boerwinkle E. dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum Mutat. 2013;34(9):E2393–402.View ArticlePubMedPubMed CentralGoogle Scholar
- Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, et al. COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. 2015;43(Database issue):D805–11.View ArticlePubMedGoogle Scholar
- Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536(7616):285–91.View ArticlePubMedGoogle Scholar
- Xin J, Mark A, Afrasiabi C, Tsueng G, Juchler M, Gopal N, et al. High-performance web services for querying gene and variant annotation. Genome Biol. 2016;17(1):91.View ArticlePubMedPubMed CentralGoogle Scholar
- Vincent L. Optimization and utilization of ACMG variant classification criteria for the RASopathies: A ClinGen initiative. In: American Society of Human Genetics. 2015Google Scholar
- Yang Y. Adaptation of the ACMG/AMP standards and guidelines for variant interpretation: experience within a clinical laboratory. In: American Society of Human Genetics. 2015.Google Scholar
- Jarvik GP, Browning BL. Consideration of cosegregation in the pathogenicity classification of genomic variants. Am J Hum Genet. 2016;98(6):1077–81.View ArticlePubMedPubMed CentralGoogle Scholar
- Maxwell KN, Hart SN, Vijai J, Schrader KA, Slavin TP, Thomas T, et al. Evaluation of ACMG-guideline-based variant classification of cancer susceptibility and non-cancer-associated genes in families affected by breast cancer. Am J Hum Genet. 2016;98(5):801–17.View ArticlePubMedPubMed CentralGoogle Scholar
- Iversen Jr ES, Couch FJ, Goldgar DE, Tavtigian SV, Monteiro AN. A computational method to classify variants of uncertain significance using functional assay data with application to BRCA1. Cancer Epidemiol Biomark Prev. 2011;20(6):1078–88.View ArticleGoogle Scholar