- Open Access
Systems genetics analysis identifies calcium-signaling defects as novel cause of congenital heart disease
Genome Medicine volume 12, Article number: 76 (2020)
Congenital heart disease (CHD) occurs in almost 1% of newborn children and is considered a multifactorial disorder. CHD may segregate in families due to significant contribution of genetic factors in the disease etiology. The aim of the study was to identify pathophysiological mechanisms in families segregating CHD.
We used whole exome sequencing to identify rare genetic variants in ninety consenting participants from 32 Danish families with recurrent CHD. We applied a systems biology approach to identify developmental mechanisms influenced by accumulation of rare variants. We used an independent cohort of 714 CHD cases and 4922 controls for replication and performed functional investigations using zebrafish as in vivo model.
We identified 1785 genes, in which rare alleles were shared between affected individuals within a family. These genes were enriched for known cardiac developmental genes, and 218 of these genes were mutated in more than one family. Our analysis revealed a functional cluster, enriched for proteins with a known participation in calcium signaling. Replication in an independent cohort confirmed increased mutation burden of calcium-signaling genes in CHD patients. Functional investigation of zebrafish orthologues of ITPR1, PLCB2, and ADCY2 verified a role in cardiac development and suggests a combinatorial effect of inactivation of these genes.
The study identifies abnormal calcium signaling as a novel pathophysiological mechanism in human CHD and confirms the complex genetic architecture underlying CHD.
Congenital heart disease (CHD) represents malformations of the heart or intra-thoracic vessels, which affect cardiac function and occur in almost 1% of live births . Most patients survive until adulthood, and the number of adults with CHD has increased to above 3 million in Europe and the USA alone . These patients are challenged by serious cardiovascular complications, which require specialized care, and their children have significantly increased CHD risk .
Families presenting with recurrent cases suggest that mutations with large effects segregate with CHD in such families. Recent targeted next generation sequencing (NGS) efforts have shown that causative mutations may be identified in one third of CHD families . In the majority of familial cases, the unidentified pathogenic variants may be explained by a “burden of genetic variation” model, which hypothesizes that CHD may occur if the developing embryo is exposed to a certain burden of rare and common genetic variants, possibly in combination with epigenetic, environmental, or stochastic effects .
Genomic information obtained by NGS analyses may further clarify genetic and molecular mechanisms causing CHD and thus contribute to improved health care and counseling of the patients and families. However, because of the complex genetic architecture of CHD, more sophisticated methods for interpretation of genetic variation need to be developed before such information may be translated into clinical use.
Here, we performed whole exome sequencing (WES) in a cohort of 32 Danish families in which several family members presented with CHD. Utilizing a systems biology approach, we discovered recurrent mutation of genes involved in calcium signaling. Loss-of-function analyses of three of the genes confirmed their crucial role in embryonic heart development.
CHD families were identified through the Danish National Patient Registry (DNPR) and contacted by letter. Clinical diagnoses were validated by manual review of the patient files, and detailed pedigrees were constructed based on interview with patients (Additional file 1: Fig. S1). There was no known consanguinity in the families. All family members presenting with CHD were non-syndromic, except III.1 in family 545, who had a diagnosis of Turner syndrome. Cardiac malformations are listed below affected family members in Additional file 1: Fig. S1. The age at diagnosis of affected family members ranged from 1 day to 65 years (median age, 1 year; Q1, 29 days; Q3, 8 years). Clinical evaluation for CHD was not performed on asymptomatic family members. Termination of pregnancy was performed in fewer than 5 cases (data not shown).
DNA samples were obtained from a cohort consisting of 90 individuals in 32 families. Affected family members were not screened for 22q11 deletions, but index patients were screened for mutations in the NKX2-5 gene prior to WES. A frameshift mutation (c.112delG) was found to segregate with ASD in a single family . This family was excluded from the cohort prior to exome sequencing.
Relatedness of family members was determined using the algorithm implemented in VCFtools. Pairwise relatedness of all 90 individuals is shown in Additional file 1: Fig. S2A. Relatedness between individuals not belonging to the same family peaks around zero, confirming that the families are unrelated. Individuals within families have positive relatedness score, confirming that they are related (Additional file 1: Fig. S2B).
The self-reported ethnicity of the individuals in our cohort was Danish. To verify the ethnic homogeneity of the 90 samples, we used the algorithm for detection of outliers implemented in PLINK. For each individual in our cohort, we calculated the population distance to the 20 nearest neighbors using an identity-by-state (IBS) matrix based on 208,163 variants. These distances were compared to the mean of the population in terms of standard deviations (Z-scores). This test showed that no samples had Z-scores below − 2.5 confirming the ethnic homogeneity of the samples (Additional file 1: Fig. S3). Principal component analysis (PCA) was performed using FlashPCA (https://github.com/gabraham/flashpca). PCA was based on a selected set of common (MAF > 5%) high-quality genetic variants that overlap with the 1000 Genomes data . The Danish samples which were analyzed together with reference samples of different Super Populations from 1000 Genomes appear similar to European populations (Additional file 1: Fig. S4). We identified one outlier sample, but this particular individual had two siblings with similar type of CHD and was not excluded from the study.
Whole exome sequencing
Ninety exomes corresponding to 79 patients and 11 obligate carriers were sequenced. Capture followed the protocol corresponding to an Agilent Sure Select exome v4 kit, and 100-bp paired-end sequencing reads were generated on an Illumina Hiseq 2000 machine. Both exome capture and sequencing were performed at BGI Europe’s facility in Copenhagen. The average sequencing coverage was 91.8×. Eighty-six percent of the samples had a sequence coverage of 30× or higher in 80% of the exome. Cumulative sequencing coverage is shown in Additional file 1: Fig. S5A. Mean number of sequencing reads per sample was 64,396,134 (range 47,821,039–85,327,021). The distribution of sequencing reads is shown in Additional file 1: Fig. S5B.
Variants were stored in a mySQL database to facilitate filtering and comparison between both individuals and families. Population-wide allele frequencies were used to remove variants with a minor allele frequency higher than 1%. Finally, variants with unclear impact on gene function were removed (see definitions below).
Bioinformatics pipeline for variant calling
Bioinformatics analysis of the sequencing data followed standard practices in the field and included assessment of data quality with FastQC; mapping to the human reference genome (hg19/Grhc37) with BWA mem ; removal of PCR duplicates, local indel realignment, base quality score recalibration, and variant calling with HaplotypeCaller were performed with GATK . Only variants with Phred scores ≥ 30 were considered. The functional consequences of variants were assessed with Ensembl’s Variant Effect Predictor (VEP) tool . This step included the prediction of the pathogenicity of identified variants with both SIFT  and Polyphen-2 . Variants were stored in a mySQL database to facilitate filtering and comparison between both individuals and families.
Filtering of variants
Following genotype calling, variants were filtered to meet a number of sequencing quality requirements prior to consideration. The variants should correspond to single nucleotide events both in the reference and alternative alleles, be supported by read depth of at least 30 at the genomic position, and have a Phred call quality greater than 30. Similarly, variants were required to be present in all affected members of a family, either in heterozygosity or in homozygosity.
Population-wide allele frequencies were used to further filter the candidate variants under the prerogative that the observed incidence of CHD is not coherent with highly frequent variants being causative factors. Variants were filtered from our analysis if they were observed with a minor allele frequency (MAF) higher than 1% in either any of the main populations (AFR, AMR, ASN, CEU) comprised by the 1000 Genomes (1000G) project , the catalog of Exome Annotation Consortium  (ExAC), or its European (non-Finnish) subpopulation. Similarly, we exploited the Danish ancestry of the patients for further reduction of the number of candidate variants. First, we discarded from further analyzing those variants with MAFs higher than 1% in 2000 Danish exomes . Second, variants present in 5% of the alleles of the 100 parents included in the Genome Denmark cohort [15, 16] were excluded in further analyses. This step implied a lift-over  of variants between the coordinates of reference genomes hg38 and hg19; in cases where multiple variants remapped to the same genomic position, the highest allele frequency for each possible allele was considered.
A final filtering exploited the functional consequences of variants according to affected genomic elements. Variants with unclear impact on gene function were disregarded. These included all noncoding variants (except intronic variants in splice sites), stop codon variants where the stop codon is retained, and nonsense-mediated decay (NMD) transcript variants.
We did not filter variants according to in silico prediction of functional consequence (e.g., Polyphen or SIFT scores).
Enrichment of known CHD genes
We curated two lists of known CHD genes. A list of 829 genes which cause CHD when mutated in mice was derived from data in the Mouse Genome Informatics database (Additional file 1: Table S1). A list of 144 human CHD disease genes was curated from the literature (Additional file 1: Table S2). Statistical significance of overlaps was calculated using one-tailed Fisher’s exact test, with a significance level of p < 0.05.
Definition of enriched protein-protein interaction modules
Protein-protein interactions (PPIs) were obtained from InWeb (InWeb5.5rc3), a scored high-confidence PPI database which contains 87% of human proteins and more than 500,000 interactions . PPIs were represented as a network; nodes represent proteins and edges represent interactions between these proteins. InWeb contains protein interactions for 1310 of the 1785 candidate genes identified in our families. We used data from InWeb to generate a PPI network of these 1310 genes and their first-degree interactors. The PPI network was pruned to include only high-confidence relationships; we disregarded high-throughput yeast-2-hybrid experiments (“Matrix” interactions according to InWeb’s terminology) and interactions with a confidence below 0.1. After confidence filtering, the PPI network included a total of 8186 genes and 29,463 high-quality interactions.
The PPI network was partitioned into overlapping modules (clusters) with ClusterOne  using the confidence score derived from InWeb to weight the clustering. Clusters were considered significant when below a corrected (Benjamini-Hochberg FDR method) one-sided p value of 0.05 and a minimum of five proteins. Highly connected clusters of proteins identified in this fashion constitute a proxy for functionally related protein complexes.
Enrichment of candidate genes was determined for each cluster by permutation analysis (k = 10e4) using random clusters of comparable size. Correction for multiple testing was performed using the procedure of Bonferroni-Holm. A one-sided p value of 0.05 was used as significance level to identify clusters significantly enriched for candidate genes.
Probability of being loss-of-function intolerant (pLI) values of genes encoding the 27 proteins in the calcium-signaling module, 829 and 144 known CHD genes from mouse models and patients, respectively, was obtained from the ExAC database. The distributions were compared to all 18,225 genes listed in the ExAC database using a Kruskal-Wallis one-way analysis of variance on ranks. Significance level was 0.05.
WES data from a previously published, independent cohort was used for replication, please see Sifrim et al. for details ; WES data from a total of 714 non-syndromic CHD cases and 4922 controls of European ancestry was analyzed.
Control samples were participants in the INTERVAL randomized controlled trial which were recruited with the active collaboration of NHS Blood and Transplant England, which has supported fieldwork and other elements of the trial. DNA extraction and genotyping were funded by the National Institute of Health Research (NIHR), the NIHR BioResource, and the NIHR Cambridge Biomedical Research Centre. The academic coordinating center for INTERVAL was supported by core funding from the NIHR Blood and Transplant Research Unit in Donor Health and Genomics, UK Medical Research Council (G0800270), British Heart Foundation (SP/09/002), and NIHR Research Cambridge Biomedical Research Centre.
Replication was performed by comparing the number of cases and controls harboring very rare (MAF < 0.001) variants in any of the genes ADCY2, ADCY5, CACNA1D, CACNA1H, CACNA1I, CACNA1S, GRIA4, ITPR1, NFAT5, and PLCB2. Data from CACNA1F was missing; thus, this gene was excluded from the analysis. The MAF cutoff (0.001) was chosen by performing a burden test for synonymous variants in 10,000 random sets of genes of the same size as the set above (two-sided Fisher’s exact test). For two different scenarios (MAF < 0.01 and MAF < 0.001), we observed a better match between expected vs. observed p values using MAF < 0.001 (Additional file 1: Fig. S12).
Synonymous variants, protein altering variants (PAV) (inframe_insertion, start_lost, stop_retained, stop_lost, missense, inframe_deletion, protein_altering, start_retained), and protein truncating variants (PTV) (stop_gained, splice_donor, splice_acceptor, frameshift) were identified using the Variant Effect Predictor tool (VEP API v90). Quality control and filtering were performed using Hail 0.2. In brief, both samples and variants were included for further analysis if they met the following filtering criteria: call rate ≥ 0.95, genotype quality average ≥ 25, and depth average ≥ 15. In addition, only genotypes (heterozygous) with allelic balance (AB) within the range 0.25–0.75 were retained in the analysis. Deleterious variants were identified by assigning a Combined Annotation Dependent Depletion (CADD) score  and a score based on regional missense constraint (MPC score) . MPC score > 2 was used to identify pathogenic variants. Previous analysis of de novo variants identified in 5620 cases with neurodevelopmental disorders shoved that variants with MPC > 2 have a rate 5.79 times higher in cases than controls. Statistical significance was calculated using two-tailed Fisher’s exact test, with a significance level of p < 0.05.
To test the specificity of the observed burden of rare variants, we created 10,000 random sets of 10 genes with the same size distribution as the gene set of 10 calcium-signaling genes, which were analyzed in the replication study. For each of the 10,000 random gene sets, we counted the number of cases with pathogenic variants (MPC score > 2).
Zebrafish maintenance and microinjection
AB/TL zebrafish strain (obtained from the Zebrafish International Resource Center) was used in the experiments. Embryos were maintained and staged as previously described [23, 24]. 1.5, 3, and 6 ng of adcy2a-SP-MO (5′-GGATGAGGGTAACTCACCTGACATT-3′), itpr1b-SP-MO (5′-GTGCATAAACGCGGCCTTACCTCGA-3′), plcb2-SP-MO (5′-CTGTAGTTTCTGTTCACCTCATCAG-3′), and standard-MO (5′-CCTCTTACCTCAGTTACAATTTATA-3′) and 0.5, 1, and 2 ng of adcy2a-SP-MO2 (5′-CCCCCAGTCTCCAAACACTCACCAG-3′), itpr1b-SP-MO2 (5′-CCAGACTGTAGACAAGAGAGACATG-3′), and plcb2-SP-MO2 (5′-TGTGGTAAAGGATACTCCACCCAGT-3′) (Gene Tools, LLC) were injected into one-cell stage embryos and harvested at 48 hpf. Embryos were imaged under Zeiss AxioZoom V16 (Carl Zeiss, Brock Michelsen A/S, Denmark).
In order to verify SP-MO efficiencies, embryos injected with SP-MOs were collected at 48 hpf and RNA was isolated by QIAzol reagent (Qiagen). cDNA synthesis was performed using iScript Reverse Transcription Supermix (Bio-Rad). SP-MO knockdown was assessed by PCR using gene-specific primers (Additional file 1: Table S3) spanning the SP-MO-targeted exon.
Whole-mount in situ hybridization
Digoxigenin (DIG)-labeled anti-sense myl7 and mef2cb  riboprobes were synthesized from linearized pGEM-T easy and pCMV-SPORT6.1 vectors respectively using the DIG RNA labeling mix (Roche) and the T7 RNA polymerase (Roche). Embryos collected at 48 hpf were raised in the presence of 0.2 mM 1-phenyl-2-thiourea (PTU) upon gastrulation for optical clarity . For analysis of myl7 and mef2cb expression, embryos at 48 hpf and 10 somite stages, respectively, were dechorionated and fixed in freshly prepared 4% paraformaldehyde in phosphate-buffered saline (PBS, pH 7.4) overnight. Whole-mount in situ hybridization was performed as previously described with minor modifications . Embryos were imaged under Zeiss AxioZoom V16 (Carl Zeiss, Brock Michelsen A/S, Denmark). The staining intensity of mef2cb riboprobe was measured as integrated density (IntDen, ImageJ software-NIH, USA), and mean values from three embryos of each group were plotted relative to wild-type value. Data were shown as mean ± std dev.
Real-time quantitative RT-PCR
Total RNA from pools of 50 zebrafish embryos was extracted using TRIzol (Ambion) and a RNeasy mini kit (Qiagen) and used to synthesize random-primed cDNA (SuperScript II Reverse Transcriptase, Invitrogen). A Brilliant III Ultra-Fast SYBR® Green QPCR Master Mix (Agilent Technologies) was used for cDNA amplification. Samples were analyzed using a 7500 fast real-time PCR system (Applied Biosystems). Data were normalized to the average expression of housekeeping genes actb1 and rpl13a. RT-PCR primers are listed in Additional file 1: Table S3.
Exome sequencing data analysis
We performed whole exome sequencing of 90 individuals belonging to 32 multiplex CHD families (Additional file 1: Figs. S1-S4). To identify potentially disease-causing genes, we analyzed rare variants shared by affected family members. The analysis was performed after removing the variants that are likely sequencing artifacts, that occupy genomic elements with a mild functional consequence, or that can be found widespread in the general population (see the “Methods” section, workflow shown in Additional file 1: Fig. S6). We identified 3698 rare inherited variants in 1785 genes, denoted candidate disease genes (CDGs) hereafter. The number of CDGs per family ranged from 56 to 507 (Additional file 1: Fig. S7).
Recurrent candidate disease genes
To test if particular candidate genes were overrepresented, we calculated the fraction of CDGs shared by all possible pairs of families in the cohort (Fig. 1a, Additional file 1: Fig. S8). Pairs of families only share a small fraction of their CDGs (median = 0.05, 72.3% of pairwise values < 0.2).
To analyze this in more detail, we calculated the number of families with rare inherited variants in each of the 1785 CDGs. None of the 1785 CDGs was mutated in more than seven families (Fig. 1b, Additional file 1: Fig. S9) after disregarding genes that rely on the accumulation of variants to exert their biological function, like hypermutated genes in the MHC machinery or genes encoding olfactory receptors.
We identified 218 CDGs shared between two or more families. For variants shared between two and three families, less than 20% were identical across families (Fig. 1c, d). We propose that these variants might partially explain the origin of the disease. For example, three different rare alleles of DNAH5 were identified in affected members of families 489, 732, and 1121, where patients presented with ASD, VSD, and outflow tract defects. Mutation of DNAH5 is associated with primary ciliary dyskinesia (PCD, OMIM #608644) . A subset of PCD patients present with CHD . Another example of a CDG where rare variants were found in more than one family is KMT2D, which encodes a histone methyl transferase involved in heart development . Mutation of KMT2D is associated with Kabuki syndrome (OMIM #147920), a rare developmental disorder which includes CHD as part of a wide phenotypical spectrum .
CDGs are enriched for known disease genes
We expect that a subgroup of variants in our 1785 CDGs could be causative mutations leading to CHD. In such a scenario, a diverse but limited number of genomic variants, acting by the accumulation of additive effects, could explain the occurrence of CHD in individual families. This etiological diversity at the genomic level would hinder detection with traditional association analysis, but we hypothesized that the CDGs would be enriched for known CHD disease genes. To test this, we calculated the overlap between the 1785 CDGs in our families and curated lists of genes known to cause CHD in mouse models and patients, when mutated (Additional file 1: Table S1 and S2). We observed significant enrichment of known CHD genes among CDGs affecting five or fewer families (Fig. 2a, b). When only variants scored pathogenic by SIFT/Polyphen-2 are considered, this enrichment increases Additional file 1: Fig. S10). In order to validate the CDG list and explore the possibility of a selection bias in our sets of known CHD genes, we produced 10,000 random gene sets (one for each of the two sets) and ascertained the overlap with the CDGs. This analysis corroborated the statistical significance of the observations (p value< 0.0001 for both sets).
Distribution of known CHD genes across families is shown in Additional file 1: Fig. S11 and Additional file 1: Table S4. Individual families often present with rare mutations in more than one known CHD gene, suggesting a substantial fraction of the observed heart defects might be explained by a combination of rare mutations inherited within the families. Under the hypothesis that CHD runs within individual families as the result of such a combined effect from mutations in several developmental genes, we would expect that families present significant enrichment of mutations harbored by known CHD genes. Enrichment of CHD genes from mouse models per family was determined using a permutation test (n = 10,000). Affected individuals from 23 families share rare mutations in known CHD genes. In 19 of these 23 families (78.3%), enrichment of known CHD genes among the CDGs is statistically significant at p < 0.05 (Fig. 2c).
For comparison, we filtered our variants following the strategy, recently applied by Bayrak et al. . In this approach, we only included variants, shared between at least two affected individuals per family and with population MAF < 0.001 (across all GenomAD populations and in the GenomAD European, not Finnish population). In addition, genes with a Gene Damage Index (http://lab.rockefeller.edu/casanova/GDI) prediction of “high” were removed from the list of CDGs. With this approach, we identified a total of 719 variants in 577 genes. When we compared with our curated lists of CHD genes in mouse models and patients (Additional file 1: Table S1 and S2), we identified 36 CDGs overlapping with mouse CHD genes and five CDGs overlapping with human CHD genes. However, the overlaps were not statistically significant (p = 0.1406 and 0.8178, respectively, Fisher’s exact test), suggesting that very stringent filtering removes a significant number of causative variants.
Rare variants affect functional modules in a protein-protein interaction network
We further investigated whether CHD was caused by the disruption of cooperative protein functionality at the systems level rather than at the individual gene level. A protein-protein interaction network was generated using 1310 seed genes (the subset of the 1785 CDGs for which InWeb has recorded protein interactions). Their first-degree neighbors and their interactions from the current version of InWeb (InWeb5.5rc3)  were added as well, summing up to a total of 8186 proteins and 29,463 interactions. Using the graph clustering algorithm ClusterOne , we identified 230 significant PPI clusters considering a threshold p value of 0.05 and a minimum number of five genes. Of these, 25 clusters presented two or more genes mutated in at least one of the families (Additional file 1: Table S5). By permutation analysis (k = 10,000), we identified two clusters accommodating more CDGs than expected by chance (p values of 0.039 and 0.0033, respectively).
One of these corresponds to a small module, encoded by of five genes (INSL1, RXFP2, RLN1, RLN2, RXFP1) where the three first are CDGs. Due to the small size and low connectivity of this cluster, we decided to focus on the second cluster.
The second significant cluster corresponds to a highly interconnected group of 27 proteins, encoded by eleven CDGs and their first-degree interaction partners (Fig. 3a). These CDGs present mutations in ten different families. Affected individuals from three families shared rare mutations in more than one of the eleven genes, and two genes (ITPR1 and CACNA1S) were each mutated in two different families (Additional file 1: Table S6).
The 27 proteins in the cluster constitute calcium channels or signal transduction enzymes such as adenylate cyclase that interacts with calcium-dependent protein kinases. A Gene Ontology term enrichment analysis using AmiGO  for the 27 proteins in the cluster showed enrichment in biological processes involved in calcium signaling (Additional file 1: Table S7).
To investigate the functional importance of the genes encoding proteins in the cluster, we compared the probability of being loss-of-function intolerant (pLI) of these 27 genes with known CHD genes and all 18,225 genes listed in the Exome Aggregation Consortium database. The distributions of pLI scores of known CHD genes have median pLI values of 0.63 and 0.86, respectively, and the distributions suggest that more than half of known CHD genes are intolerant against loss-of-function mutations (Fig. 3b). The distribution of pLI scores of the 27 genes encoding proteins in the significant cluster is skewed towards the higher values of the distribution (median = 0.98), suggesting intolerance to loss-of-function mutations and supporting the hypothesis that these genes might potentially play an active role in the etiology of CHD.
Using WES data from an independent cohort of 714 CHD cases and 4922 controls , we tested the mutation burden of the same calcium-signaling gene set, as we found mutated in our families (genes listed in Additional file 1: Table S6). In this gene set, we analyzed the distribution of CADD and MPC variant scores between CHD cases and controls, and observed significant different score distributions of rare (MAF < 0.001) variants, predicted to alter the gene products (i.e., PAV and PTV) (Fig. 4). Variants with MPC score above 2 (MPC2 variants) have been shown to be significantly associated with pathogenesis . Thus, to test for burden of pathogenic mutations, we calculated the number of cases and controls harboring MPC2 variants in the calcium-signaling gene set. We observed more than 2-fold enrichment (OR 2.68, p value 3.7e−04) of such variants in CHD cases (Additional file 1: Table S8). To test if this enrichment was specific for the calcium-signaling gene set, we created 10,000 random gene set with the same size distribution as the calcium genes. For each random set, we counted the number of CHD cases harboring MPC2 variants. Only four of the random gene sets were found with more MPC2 variants in cases than the calcium gene set (Additional file 1: Fig. S13). These results confirm an increased burden of pathogenic mutations in genes involved in calcium signaling among CHD patients.
Knockdown of adcy2a, itpr1b, and plcb2 causes cardiac malformations in zebrafish
We used zebrafish as a model to address the functional relevance in heart development, of genes within the identified cluster. In the cluster, 11 out of 27 genes were mutated in the families we assessed. From these 11 genes, we assessed the consequence of loss-of-function for three zebrafish genes, adcy2a, itpr1b, and plcb2. These genes are orthologues to the human genes ADCY2, ITPR1, and PLCB2, encoding calcium-signaling proteins, of which roles in cardiac development have been elusive.
We found that knockdown of either of the three genes by morpholino oligonucleotides gave rise to abnormal morphology or laterality of the heart (Fig. 5a). The abnormal morphology included aberrant atrioventricular canal (AVC) formation where the ring structure of AVC was hindered, mis-looping of the heart, and narrowing in the atrium or ventricle. Laterality defects reflected straight or reversed hearts. Injection of adcy2a-MO, itpr1b-MO, and plcb2-MO resulted in cardiac defects in 53%, 73%, and 66% of embryos, respectively (Fig. 5b).
In addition to the cardiac defects, knockdown of itpr1b caused edema in the brain, whereas curved tail was seen by knockdown of plcb2 (Fig. 5a).
Knockdown of more than one gene by injections of both efficient and sub-efficient doses of MO resulted in an increase of the number of embryos with heart defects, suggesting a combinatorial effect (Fig. 5b, Additional file 1: Fig. S14).
We showed that the morpholinos work efficiently causing splicing defects (Additional file 1: Fig. S15), and we validated the specificity of the morpholinos, by using a second set of morpholinos and co-injection of in vitro expressed mRNAs encoding wild-type proteins (Fig. 5b, Additional file 1: Fig. S15). Unfortunately, we were not able to clone the cDNA of itpr1b, presumably due to the large size of the transcript (8.4 kbp coding region), but co-injection of wild-type adcy2a and plcb2 mRNA resulted in partial rescue of the cardiac phenotype, confirming the specificity of the morpholinos.
To investigate effects of gene knockdown at the molecular level, we analyzed the expression of mef2c, a transcription factor which is involved in cardiac morphogenesis and myogenesis and known to be a specific target for Ca2+-dependent control of gene expression in cardiomyocytes [33, 34]. In zebrafish, two copies of mef2c exist. We analyzed the expression of mef2ca and mef2cb in wild-type, control, and morphant embryos at 10 somite stages, where both genes are expressed in the bilateral heart fields at the anterior lateral plate mesoderm. We observed significantly reduced expression of mef2cb in adcy2a, itpr1b, and plcb2 morphants (Fig. 5c, d).
Interpretation of variants detected in exomes from CHD patients is challenging due to the extreme heterogeneity which characterize CHD . Analysis of complex networks has previously been applied successfully to interpret genetic variants [31, 35,36,37,38]. Such analyses often use stringent variant filtering criteria to identify candidate genes, followed by network analysis to interpret biological function. Here, we present an alternative approach, based on lenient filtering of variants, followed by stringent network analysis using a high-confidence human PPI network .
We analyzed the exomes of 90 individuals in 32 multiplex CHD families for rare variants, which were shared among affected individuals within a family. Considering that CHD rarely has a monogenic cause, but may be caused by a burden of rare and common genetic variants, we selected an arbitrary filtering threshold of MAF above 0.01 and we did not filter the variants for in silico predictions of consequence. We expect that our filtering has removed a significant amount of neutral variants, while keeping possible variants with medium to strong effects. However, it is likely that we also have removed a number of causative variants by applying this criterion, as well as our focus on the exome limits the number of possible causative variants identified in our study.
We identified 1785 candidate disease genes. The large number of candidate genes is partly a consequence of our non-stringent filtering strategy and underlines the challenge in interpreting variants in CHD patients, even in multiplex families. However, our analysis showed that the 1785 CDGs were enriched for genes known to cause heart defects and that application of very stringent filtering conditions removes a significant part of causative variants.
Approximately 12% of the CDGs were found in more than one family, but none of them was represented in more than seven families, and on average, less than 10% of the CDGs found in a family were shared with another family. Thus, our data support that CHD is an extremely heterogeneous disorder and consistent with a model, where several rare genetic variants contribute to the pathogenesis in the individual patient. However, we do not expect that all of our CDGs are true CHD disease genes; thus, a significant proportion of the rare variants, which are shared by affected individuals, may be of limited functional consequence.
To identify pathophysiological mechanisms associated with CHD, we integrated PPI data in our analysis of CDGs and discovered that eleven of the CDGs converge in a functional cluster of genes, which encode proteins involved in calcium signaling. Analysis of an independent case-control cohort confirmed increased mutation burden of this calcium-signaling gene set in CHD patients, thus supporting that defects in calcium signaling are associated with CHD.
Intracellular calcium plays essential roles in physiology and pathophysiology of the adult heart. In the healthy heart, intracellular calcium fluxes control cardiomyocyte contraction and mutations in calcium-handling genes may cause arrhythmia . In addition, calcium also modulates transcription in cardiomyocytes through a complex signaling network, and abnormal calcium handling and signaling are part of the pathophysiology in congestive heart failure [33, 40].
Defects in calcium signaling have not previously been associated with CHD in humans, but animal studies implicate an important role of calcium signaling in heart development. Targeted deletion of Nfatc1is embryonic lethal in mice and causes malformation of the cardiac valves, outflow tract, and ventricular septum [41, 42]. Overexpression of activated calcineurin rescues cardiac developmental defects in calreticulin-deficient mice . Pharmacological blockade of L-type calcium channels during embryonic development causes heart defects in mice . However, epidemiological studies show that therapeutic doses of calcium channel blockers have very low teratogenicity [45, 46].
Two of the 27 proteins within the cluster we identified were previously implicated in cardiac development and CHD. Knockout of Nfat5 is embryonic lethal in mice and results in reduced compaction of cardiomyocytes in the ventricular wall and trabeculae . Double knockout of Itpr1 and Itpr3 results in aberrant development of the outflow tract and right ventricle, while mice defective of only one of the two genes develop normally . Likewise, double knockout of Itpr1 and Itpr2 is embryonic lethal and results in cardiac malformation , together suggesting a redundant role of IP3 receptors in mammalian heart development.
Analysis of embryonic mice has shown that NFAT5 and IP3 receptors are expressed in the heart during embryonic development, but also in several other tissues [47,48,49]. Similarly, many well-characterized CHD genes are also expressed in extra-cardiac embryonic tissues, and some of these genes have been associated with both isolated and syndromic forms of CHD [3, 50]. Thus, future detailed genotype-phenotype analyses of CHD disease genes may be useful for further dissection of embryonic developmental mechanisms.
Functional validation of ADCY2, ITPR1, and PLCB2 in a zebrafish model confirmed a critical role during embryonic heart development. We observed defects in cardiac morphology and laterality in a significant proportion of embryos injected with morpholinos targeting adcy2a, itpr1b, and plcb2. Importantly, we observed more embryos with cardiac defects when combinations of two and all three genes were knocked down, supporting that the three genes interact functionally in heart development. Specificity of the morpholinos used in the experiments was confirmed by rescue experiments and further corroborated by significant reduction in the expression of mef2cb, a zebrafish orthologue of Mef2c, which is a well-established target of calcium signaling in cardiomyocytes [48, 51]. Thus, the results of the zebrafish experiments support the hypothesis that expression and interaction of the three genes play important roles in heart development. However, it is important to note that depletion of the gene products precipitated by morpholino injections in zebrafish does not represent a true model of the molecular pathology of CHD in affected family members. Firstly, the majority of mutations, which we identified in calcium-signaling genes, were missense mutations and carriers were all heterozygous. Thus, we do not know if the effects of the mutations were haploinsufficiency or gain-of-function. And second, it is possible that factors other than calcium-signaling genes may play a role in the molecular pathology of individual patients.
The cardiac malformations, present in our family cohort and replication cohort, were unselected with respect to severity or anatomical similarity. Thus, our data indicate that calcium-signaling defects are associated with both familial and sporadic CHD and unrelated to specific groups of malformations.
We have recently shown that specific malformations co-occur in families, suggesting that specific developmental programs may be responsible for certain groups of heart malformations . We suggest that WES or whole genome sequencing of cohorts of families selected for specific malformations, combined with systems-based analysis, as presented here, may be a useful strategy for identification of pathophysiological mechanisms in CHD.
Our data support a model where CHD is caused by a combinatorial effect of rare and common genetic variants. Our systems-level analysis of rare genetic variants, shared by affected individuals in families, and functional analysis in zebrafish identified defects in calcium signaling as a novel pathophysiological mechanism in CHD.
Availability of data and materials
A list of rare variants, shared between affected family members, which our results and conclusions are based on, is available upon request. Individual exome sequencing data cannot be shared due to concerns over patient privacy. Other data generated or analyzed during this study are included in the main paper or its additional files.
van der Linde D, Konings EE, Slager MA, Witsenburg M, Helbing WA, Takkenberg JJ, et al. Birth prevalence of congenital heart disease worldwide: a systematic review and meta-analysis. J Am Coll Cardiol. 2011;58(21):2241–7.
Warnes CA. Adult congenital heart disease: the challenges of a lifetime. Eur Heart J. 2017;38(26):2041–7.
Pierpont ME, Brueckner M, Chung WK, Garg V, Lacro RV, McGuire AL, et al. Genetic basis for congenital heart disease: revisited: a scientific statement from the American Heart Association. Circulation. 2018;138(21):e653–711.
Blue GM, Kirk EP, Giannoulatou E, Sholler GF, Dunwoodie SL, Harvey RP, et al. Advances in the genetics of congenital heart disease: a clinician’s guide. J Am Coll Cardiol. 2017;69(7):859–70.
Blue GM, Kirk EP, Giannoulatou E, Dunwoodie SL, Ho JW, Hilton DC, et al. Targeted next-generation sequencing identifies pathogenic variants in familial congenital heart disease. J Am Coll Cardiol. 2014;64(23):2498–506.
Ellesoe SG, Johansen MM, Bjerre JV, Hjortdal VE, Brunak S, Larsen LA. Familial atrial septal defect and sudden cardiac death: identification of a novel NKX2-5 mutation and a review of the literature. Congenit Heart Dis. 2016;11(3):283–90.
Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68–74.
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60.
DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43(5):491–8.
McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics. 2010;26(16):2069–70.
Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31(13):3812–4.
Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7(4):248–9.
Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536(7616):285–91.
Lohmueller KE, Sparso T, Li Q, Andersson E, Korneliussen T, Albrechtsen A, et al. Whole-exome sequencing of 2,000 Danish individuals and the role of rare coding variants in type 2 diabetes. Am J Hum Genet. 2013;93(6):1072–86.
Besenbacher S, Liu S, Izarzugaza JM, Grove J, Belling K, Bork-Jensen J, et al. Novel variation and de novo mutation rates in population-wide de novo assembled Danish trios. Nat Commun. 2015;19(6):5969.
Maretty L, Jensen JM, Petersen B, Sibbesen JA, Liu S, Villesen P, et al. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference. Nature. 2017;548(7665):87–91.
Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 2006;34(Database issue):D590–8.
Li T, Wernersson R, Hansen RB, Horn H, Mercer J, Slodkowicz G, et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat Methods. 2017;14(1):61–4.
Nepusz T, Yu H, Paccanaro A. Detecting overlapping protein complexes in protein-protein interaction networks. Nat Methods. 2012;9(5):471–2.
Sifrim A, Hitz MP, Wilsdon A, Breckpot J, Turki SH, Thienpont B, et al. Distinct genetic architectures for syndromic and nonsyndromic congenital heart defects identified by exome sequencing. Nat Genet. 2016;48(9):1060–5.
Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46(3):310–5.
Samocha KE, Kosmicki JA, Karczewski KJ, O'Donnell-Luria AH, Pierce-Hoffman E, MacArthur DG, Neale BM, Daly MJ. Regional missense constraint improves variant deleteriousness prediction. BioRxiv. https://www.biorxiv.org/content/10.1101/148353v1.
Kimmel CB, Ballard WW, Kimmel SR, Ullmann B, Schilling TF. Stages of embryonic development of the zebrafish. Dev Dyn. 1995;203(3):253–310.
Hinits Y, Pan L, Walker C, Dowd J, Moens CB, Hughes SM. Zebrafish Mef2ca and Mef2cb are essential for both first and second heart field cardiomyocyte differentiation. Dev Biol. 2012;369(2):199–210.
Thisse C, Thisse B. High-resolution in situ hybridization to whole-mount zebrafish embryos. Nat Protoc. 2008;3(1):59–69.
Monnich M, Borgeskov L, Breslin L, Jakobsen L, Rogowski M, Doganli C, et al. CEP128 localizes to the subdistal appendages of the mother centriole and regulates TGF-beta/BMP signaling at the primary cilium. Cell Rep. 2018;22(10):2584–92.
Zariwala MA, Knowles MR, Omran H. Genetic defects in ciliary structure and function. Annu Rev Physiol. 2007;69:423–50.
Kennedy MP, Omran H, Leigh MW, Dell S, Morgan L, Molina PL, et al. Congenital heart disease and other heterotaxic defects in a large cohort of patients with primary ciliary dyskinesia. Circulation. 2007;115(22):2814–21.
Ang SY, Uebersohn A, Spencer CI, Huang Y, Lee JE, Ge K, et al. KMT2D regulates specific programs in heart development via histone H3 lysine 4 di-methylation. Development. 2016;143(5):810–21.
Bogershausen N, Wollnik B. Unmasking Kabuki syndrome. Clin Genet. 2013;83(3):201–11.
Sevim BC, Zhang P, Tristani-Firouzi M, Gelb BD, Itan Y. De novo variants in exomes of congenital heart disease patients identify risk genes and pathways. Genome Med. 2020;12(1):9–0709.
Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S. AmiGO: online access to ontology and annotation data. Bioinformatics. 2009;25(2):288–9.
Dewenter M, von der Lieth A, Katus HA, Backs J. Calcium signaling and transcriptional regulation in cardiomyocytes. Circ Res. 2017;121(8):1000–20.
Lin Q, Schwarz J, Bucana C, Olson EN. Control of mouse cardiac morphogenesis and myogenesis by transcription factor MEF2C. Science. 1997;276(5317):1404–7.
Gustafsson M, Nestor CE, Zhang H, Barabási AL, Baranzini S, Brunak S, et al. Modules, networks and systems medicine for understanding disease and aiding diagnosis. Genome Med. 2014;6(10):82.
Glessner JT, Bick AG, Ito K, Homsy J, Rodriguez-Murillo L, Fromer M, et al. Increased frequency of de novo copy number variants in congenital heart disease by integrative analysis of single nucleotide polymorphism array and exome sequence data. Circ Res. 2014;115(10):884–96.
Liu X, Yagi H, Saeed S, Bais AS, Gabriel GC, Chen Z, et al. The complex genetics of hypoplastic left heart syndrome. Nat Genet. 2017;49(7):1152–9.
Lage K, Greenway SC, Rosenfeld JA, Wakimoto H, Gorham JM, Segre AV, et al. Genetic and environmental risk factors in congenital heart disease functionally converge in protein networks driving heart development. Proc Natl Acad Sci U S A. 2012;109(35):14035–40.
Landstrom AP, Dobrev D, Wehrens XHT. Calcium signaling and cardiac arrhythmias. Circ Res. 2017;120(12):1969–93.
Houser SR, Piacentino V III, Weisser J. Abnormalities of calcium cycling in the hypertrophied and failing heart. J Mol Cell Cardiol. 2000;32(9):1595–607.
Ranger AM, Grusby MJ, Hodge MR, Gravallese EM, de la Brousse FC, Hoey T, et al. The transcription factor NF-ATc is essential for cardiac valve formation. Nature. 1998;392(6672):186–90.
de la Pompa JL, Timmerman LA, Takimoto H, Yoshida H, Elia AJ, Samper E, et al. Role of the NF-ATc transcription factor in morphogenesis of cardiac valves and septum. Nature. 1998;392(6672):182–6.
Guo L, Nakamura K, Lynch J, Opas M, Olson EN, Agellon LB, et al. Cardiac-specific expression of calcineurin reverses embryonic lethality in calreticulin-deficient mouse. J Biol Chem. 2002;277(52):50776–9.
Porter GA Jr, Makuck RF, Rivkees SA. Intracellular calcium plays an essential role in cardiac development. Dev Dyn. 2003;227(2):280–90.
Davis RL, Eastman D, McPhillips H, Raebel MA, Andrade SE, Smith D, et al. Risks of congenital malformations and perinatal events among infants exposed to calcium channel and beta-blockers during pregnancy. Pharmacoepidemiol Drug Saf. 2011;20(2):138–45.
Weber-Schoendorfer C, Hannemann D, Meister R, Eléfant E, Cuppers-Maarschalkerweerd B, Arnon J, et al. The safety of calcium channel blockers during pregnancy: a prospective, multicenter, observational study. Reprod Toxicol. 2008;26(1):24–30.
Mak MC, Lam KM, Chan PK, Lau YB, Tang WH, Yeung PK, et al. Embryonic lethality in mice lacking the nuclear factor of activated T cells 5 protein due to impaired cardiac development and function. PLoS One. 2011;6(7):e19186.
Nakazawa M, Uchida K, Aramaki M, Kodo K, Yamagishi C, Takahashi T, et al. Inositol 1,4,5-trisphosphate receptors are essential for the development of the second heart field. J Mol Cell Cardiol. 2011;51(1):58–66.
Uchida K, Aramaki M, Nakazawa M, Yamagishi C, Makino S, Fukuda K, et al. Gene knock-outs of inositol 1,4,5-trisphosphate receptors types 1 and 2 result in perturbation of cardiogenesis. PLoS One. 2010;5(9):10.
Andersen TA, Troelsen KL, Larsen LA. Of mice and men: molecular genetics of congenital heart disease. Cell Mol Life Sci. 2014;71(8):1327–52.
Faustino RS, Behfar A, Groenendyk J, Wyles SP, Niederlander N, Reyes S, et al. Calreticulin secures calcium-dependent nuclear pore competency required for cardiogenesis. J Mol Cell Cardiol. 2016;92:63–74.
Ellesoe SG, Workman CT, Bouvagnet P, Loffredo CA, McBride KL, Hinton RB, et al. Familial co-occurrence of congenital heart defects follows distinct patterns. Eur Heart J. 2018;39(12):1015–22.
The mef2cb probe plasmid was a kind gift of Dr. Yaniv Hinits, King’s College London. We thank Matthew E. Hurles, Wellcome Trust Sanger Institute, for providing access to published WES control data. Control samples were participants in the INTERVAL randomized controlled trial.
This work is supported by The Danish National Advanced Technology Foundation (The Genome Denmark platform, grant 019-2011-2), The Novo Nordisk Foundation (grants NNF14CC0001 and NNF12OC0001790), Aase og Ejnar Danielsens Fond, Børnehjertefonden, The Danish Heart Association, Dagmar Marshalls fond, Arvid Nilssons Fond, Oda og Hans Svenningsens Fond, Eva & Henry Frænkels Mindefond, Kong Christian Den Tiendes Fond, The A.P. Møller Foundation for the Advancement of Medical Sciences, The Lundbeck Foundation (R209-2015-2604), and Villum Fonden. JB is supported by the Van de Werf fund for cardiovascular research and a clinical research fund of UZ Leuven. AS is supported by the FWO (Postdoctoral Fellow number 12W7318N). The Clinical Academic Group in Precision Diagnostics in Cardiology under Greater Copenhagen Health Science Partners is also acknowledged.
Ethics approval and consent to participate
This project was approved by the Danish Data Protection Agency (2009-41-3570) and the Danish National Board of Health (H-D-2009-070). The study conformed to the principles of the Helsinki Declaration. Health records were accessed only after explicit consent by the patients. All zebrafish research was approved by and conducted under license from the Danish Animal Experiments Inspectorate.
Consent for publication
Consent for publication was obtained from the participants.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Pedigrees of 32 Danish multiplex CHD families (Fig. S1). Relatedness of the 90 individuals included in the study (Fig. S2). Homogeneity of the cohort (Fig. S3). Principle component analysis (Fig. S4). Sequencing coverage and number of reads (Fig. S5). Overview of the sequencing and data analysis processes (Fig. S6). Number of CDGs per family when all rare variants (left) or only high severity variants (right) were considered (Fig. S7). Overlap between CDGs in pairs of families (Fig. S8). Families with rare inherited variants in CDGs (Fig. S9). Overlap between the 1,785 CDGs in our families and a curated list of 829 genes known to cause CHD in mice (Fig. S10). Distribution of CHD genes across families (Fig. S11). Quantile-quantile plots (Fig. S12). Distribution of pathogenic mutations in random gene-sets (Fig. S13). Injection of sub-efficient doses of MOs and quantification of heart phenotypes in WT, controls and morphants injected with second set of MOs (Fig. S14). Efficiency of the splice blocking morpholinos (MOs) used against adcy2a, itpr1b and plcb2 (Fig. S15). Human orthologues to 829 genes known to be associated with CHD in mouse models (Table S1). A list of 144 Human CHD disease genes (Table S2). Primers used in the study (Table S3). Variants identified in known human CHD genes (Table S4). Significant protein-protein interaction clusters with at least two CDGs (Table S5). Rare calcium signaling gene variants shared among affected individuals in multiplex CHD families (Table S6). Gene ontology term enrichment of 27 genes within the cluster shown in Fig. 3a (Table S7). Replication using WES data from 714 CHD cases and 4922 controls (Table S8).
About this article
Cite this article
Izarzugaza, J.M.G., Ellesøe, S.G., Doganli, C. et al. Systems genetics analysis identifies calcium-signaling defects as novel cause of congenital heart disease. Genome Med 12, 76 (2020). https://doi.org/10.1186/s13073-020-00772-z
- Congenital heart disease
- Whole exome sequencing
- Developmental biology
- Systems biology
- Calcium signaling