Investigation of an LPA KIV-2 nonsense mutation in 11,000 individuals: the importance of linkage disequilibrium structure in LPA genetics

Objective Elevated Lp(a) plasma concentrations are determined mainly genetically by the LPA gene locus, but up to 70% of the coding sequence is located in the so-called “kringle IV type 2” (KIV-2) copy number variation. This region is not resolved by common genotyping technologies and large epidemiological studies on this region are therefore missing. The Arg21Ter variant (R21X, variant frequency ≈2%) is a functional variant in this region, but it has never been analyzed in large cohorts and is it unknown whether it is captured by genome-wide association studies. Approach and Results We developed a highly sensitive allele-specific qPCR assay and genotyped R21X in 10,910 individuals from three populations (GCKD, KORA F3, KORA F4). R21X carriers showed significantly lower mean Lp(a) concentrations (−11.7 mg/dL [−15.5;−7.82], p=3.39e-32). Of particular note, virtually all R21X carriers also carried the splice mutation rs41272114 (D’=0.957, R2=0.275), as confirmed by pulsed-field gel electrophoresis and long-range haplotyping. This proposes that the R21X mutation arose on the background of the rs41272114 splice variant. Conclusions We performed the largest epidemiological study on an LPA KIV-2 variant so far. Interestingly, R21X is located on the same haplotype as the splice mutation rs41272114, creating “double-null” LPA alleles that are inactivated by two independent mutations. The effect of the R21X nonsense mutation can thus not be discerned from the effect of rs41272114 splice site mutation. This emphasizes the importance of assessing the complex LD structure within LPA even for functional variants.


Introduction
The lipoprotein(a) [Lp(a)] plasma concentration is one of the most prominent genetically determined factors for cardiovascular diseases [1][2][3][4][5][6][7] . Elevated Lp(a) concentrations concern 14-25% of the general population 2 and increase the cardiovascular disease (CVD) risk up to more than three-fold for very high concentrations above the 99 th percentile 8 . More than 90% of Lp(a) variance is genetically determined by the LPA gene locus 9 , which encodes apolipoprotein(a) [apo(a)], the distinctive structural protein of the Lp(a) particle. The apo(a) protein consists of 10 types of so-called kringle-IV domains (KIV-1 to -10), one kringle V domain and a protease domain. The KIV-2 domain is encoded by a hypervariable copy number variation (CNV), which is present in up to ≈40 copies per gene allele, resulting in a pronounced size polymorphism of the apo(a) protein 10 11,12 . In contrast, same-sized alleles within families (i.e. alleles that are identical-by-descent) are associated with a much smaller variation in Lp(a) (typically <2.5-fold 11 ). This indicates that genetic variants exist that regulate the Lp(a) concentrations in addition to isoform size 12,13 . Genome-wide association studies (GWAS) have identified dozens of SNPs distributed over a two megabases region [14][15][16] , but few bona-fide functional variants have been identified 13,[17][18][19] .
The KIV-2 region encompasses up to 70% of the LPA coding sequence 13 and is therefore an obvious candidate region to search for functional variants affecting Lp(a) concentrations. Very little is known about the impact of sequence variation in the KIV-2 CNV on Lp(a) concentrations because commonly used sequencing and genotyping technologies are not able to resolve variation within this region. Accordingly, it is also unclear whether KIV-2 variants are captured by known GWAS hits via linkage disequilibrium (LD) or whether they are indeed mostly independent. Some studies suggested a strong LD structure spanning the whole kringle region 20-23 , but few details have been reported.
Recently, an ultra-deep next generation sequencing (NGS) approach with a customized bioinformatic analysis pipeline has allowed cataloging variation within the KIV-2 region 24 . This revealed hundreds of variants and provided several new putative regulators of Lp(a) levels. For example, the G4925A variant 13 found in 20% of the population is associated with an Lp(a) reduction of up to ≈30 mg/dL and explains a considerable fraction of the individuals presenting 4 low Lp(a) concentrations despite carrying a LMW isoform. This aspect of the relationship between Lp(a) concentrations and apo(a) isoform size is poorly understood so far.
The nonsense mutation KIV-2 R21X (g. 61 C>T in 19 , 640C>T in 24 ) is another likely causal single nucleotide polymorphism (SNP) in the KIV-2 region. It leads to a truncated protein that is rapidly degraded 19 . Parson et al. 19 identified it by a laborious cloning approach and reported a minor allele frequency [MAF] of 1.67% in 405 individuals 19 . However, because standard genotyping technologies like TaqMan genotyping assays and SNP microarrays are not specific and sensitive enough to detect a variant that is present in only one (or a few) of up to 80 nearly identical repeats (i.e. 1.2% mutation level), the R21X variant was not further explored in large epidemiological studies. Accordingly, R21X has never been put into the context of the findings from GWAS on Lp(a) [14][15][16] and it is unknown whether any of the LPA SNPs detected in GWAS studies is in LD with R21X.
We developed an allele-specific TaqMan PCR assay (ast-PCR) targeting the R21X variant, as well as the previously described KIV-2 variant G4925A 13 , and assessed the effect of R21X on Lp(a) concentrations in nearly 11,000 individuals. We then used genome-wide SNP data from the German Chronic Kidney Disease (GCKD) study to assess the LD of R21X with SNPs outside the KIV-2 region to link it to available GWAS datasets. Finally, given the fact that the effect of a functional LPA SNP depends strongly on which gene allele it is located, we determined the allelic location of R21X by pulsed-field electrophoresis (PFGE) and show the existence of moderately frequent "double-null" LPA alleles that are inactivated by two independent loss-of-function mutations.

Populations
Our study involved 10,910 individuals from three studies, namely GCKD 25

Ast-PCR for R21X typing
We designed an allele-specific triplex TaqMan PCR assay (ast-PCR) amplifying the mutant bases of R21X 19 Tables I and II. The assay was run on a   6 384-well ThermoFisher QuantStudio 6 qPCR system. The R21X assay was validated both against ultra-deep NGS data from Coassin and Schönherr et al, 2019 24 and against a commercial cast-PCR assay (ThermoFisher; as used in 13 ) with a sensitivity of 0.2% mutant fraction (determined according to manufacturer instructions on a NGS-validated sample). For validation, our assay was run on 376 samples from KORA F4, identifying 14 R21X-carriers, which were all confirmed also by the commercial castPCR assay. The reproducibility was tested on 477 samples run in duplicates.
Additionally, each 384 well qPCR plate (n=34) contained a positive control sample. A slightly modified ast-PCR protocol was used to genotype the gene alleles separated by PFGE (Supplemental Methods). . This creates two C T distributions whose widths are defined by stochastic fluctuation in the amplification of the target (e.g. due to slightly varying input amount) and, for the mutant, the fraction of KIV-2 repeats affected ( Figure 1A). DNA input was 20 ng in all samples. Previous data 13,24 indicates that no more than one to three repeats are affected by R21X, which translates to maximum ≈1.6 cycles differences due to the mutation level.

Ast-PCR data analysis
To avoid human bias and have a systematic approach for sample assignment beyond pure visual clustering of the amplification curves, the optimal discrimination threshold between the C T distributions of carriers and non-carriers was estimated using a bagged clustering algorithm 30 implemented in the R function 'classIntervals' (package 'classInt') and two normal distributions were fitted to the two C T distributions using the R package 'VGAM' 31 ( Figure 1B). Details are provided in the Supplemental Methods. Samples that could not be assigned unambiguously to one of the C T distributions (i.e. which cannot be unambiguously identified as carriers or non-carriers) were excluded. The exclusion rate was 0.7% in GCKD (35/4974), 1.6% in KORA F3 (52/3157) and 0.5% in KORA F4 (15/3063). The custom R function used for analysis is available at https://github.com/scogi/r21x_analysis.

Lp(a) phenotyping
The Lp(a) concentrations and apo(a) isoforms were determined by ELISA and Western blotting, respectively, as described earlier 32,33 . All analyses were performed in the same laboratory at the Institute of Genetic Epidemiology, Medical University of Innsbruck, Austria and evaluated by the same experienced researcher.

Identification of proxy SNPs
We used the genome-wide SNP data available for the GCKD study to search for a proxy SNP for R21X that would allow linking R21X to existing results from GWAS. Following the rationale that a SNP in LD with R21X will present a similar effect on Lp(a) and that the observed effect of R21X on Lp(a) should have been easily detected by our recent GWAS on Lp(a) (n=13,781) 14 , we created a contingency table of each of the 66 top hits of the isoform-adjusted model of our recent GWAS metaanalysis on the Lp(a) concentrations with the R21X 14 and analyzed it using the Fisher´s exact test. We selected the two SNPs with the most significant p-values (rs2489940, rs41272114) and calculated the LD using CubeX 34 .

Pulsed-field gel electrophoresis
We performed LPA PFGE-based genotyping 13,[35][36][37] to assess on which gene allele R21X is located and to confirm the co-localization of R21X and rs41272114 experimentally. Two different restriction enzymes were used for LPA PFGE. KpnI excises a region from KIV-1 to KIV-5 [35][36][37] and allows precise allele sizing. Conversely, Kpn2I digestion excises a much larger fragment that spans from LPA to MAP3K4 38 (Supplemental Figure III) and allows long-range haplotyping of variants by performing the genotyping directly on the previously separated gene alleles.
The LPA gene alleles of eight R21X carriers and three R21X-negative samples were separated by PFGE and detected by Southern blotting using a probe against KIV-2 (detailed in 36 and 38 ). In brief, DNA agarose plugs have been prepared as previously described 38 and digested for 4 hours at 37°C (KpnI) or 55°C (Kpn2I). Half plug for each sample was applied on the agarose gel twice, separated on a Bio-Rad CHEF Mapper system (for technical details see Supplemental Table III) and the separated alleles were isolated from the gel as described previously 13,36 . DNA was extracted from the gel slices using the peqGOLD Gel Extraction Kit (VWR). Genotyping was done using a modified ast-PCR protocol for R21X (Supplemental methods) and Sanger sequencing for rs41272114 (Supplemental Table IV).

Statistical methods
Differences in medians were assessed by Wilcoxon tests. The association between the LPA KIV-2 variant R21X and the Lp(a) levels was assessed by linear regression analysis in each population, adjusted for age and sex. GCKD analysis was also repeated adjusting additionally for the estimated glomerular filtration rate (eGFR; estimated according to the CKD-EPI equation 39 ) and urine albuminto-creatinine ratio. Since R21X has been previously shown to cause a null LPA allele 19 and therefore completely abolishes the respective isoform in plasma, all remaining Lp(a) is produced by the nonmutant allele. Therefore, the regression analysis was not adjusted for isoform, as this would imply to adjust for a major part of the Lp(a) concentration itself. β-estimates were obtained on the original 8 scale of Lp(a), while p-value and coefficients of determination were derived after inverse-normal transformation of the Lp(a) concentrations due to the skewed distribution. All analyses were done in R software version 3.5.0 (www.r-project.org). R package metafor 40 (www.metafor-project.org) was used for fixed effect meta-analysis.

Assay performance
We established a cost-effective ast-PCR for the detection of carriers of the R21X variant 19 and G4925A 13 in large epidemiological sample collections. Our ast-PCR is a high throughput-capable assay with three multiplexed targets: two KIV-2 variants (R21X 19 and G4925A 13 ) and an amplification control fragment in PNPLA3. In this manuscript we report the results for the R21X variant. The results of G4925A have already been reported earlier using a different assay approach 13 .
Our assay showed excellent sensitivity down to 0.5% mutant fraction and no amplification at 0% (Supplemental Figure II). The R21X assay also correctly classified six samples from Coassin and  Table V). Moreover, each 384 well qPCR plate (n=34) included the same positive control sample, with consistent results over all plates. Sample call rates of the single studies ranged from 97.8% to 99.0% (Supplemental Table V).

R21X is associated with reduced lipoprotein(a) concentrations
We determined the carrier status for the R21X in 10,910 samples. Carrier frequency was 1.6% in GCKD, 1.8% in KORA F3 and 2.1% in KORA F4 resulting in 193 carriers in the combined data set. The R21X variant was associated with reduced Lp(a) levels in all three populations ( Figure 2, Table 2) with consistent effect estimates ( Table 2). A fixed-effect meta-analysis resulted in an overall effect estimate of -11.7 mg/dL (95% confidence interval (CI): -15.5 to -7.82; p=1.08e-32). Adjustment in 9 GCKD for eGFR (and urine albumin-to-creatinine ratio) altered the estimates only marginally ( Table 2, footnote). Positive R21X mutation carrier status explained 1.1% to 1.5% of the inverse-normal Lp(a) variance (Table 2).

PFGE shows location of R21X on moderately large alleles
We assessed the allelic location of R21X by PFGE in eight individuals. The LPA alleles separated by PFGE were isolated from the gel and genotyped using our ast-PCR. In all analyzed individuals R21X was located on HMW alleles in the range 27-32 KIV. This is in line with the observed effect magnitude. The PFGE genotypes and the gene allele carrying the variant are reported in Supplemental Table VI.  Figure III) and performing SNP genotyping on the separated alleles allows direct long-range haplotyping 38 . In all tested individuals, the LPA allele carrying the R21X mutation carried also the rs41272114 splice site mutation (i.e. the two variants formed one haplotype). Accordingly, the association between R21X and Lp(a) vanished, if the linear regression model for R21X on Lp(a) in GCKD was adjusted for rs41272114 (β=-0.67 (95% CI: -9.14; 7.81), p=0.504, age, sex and eGFRadjusted). Vice versa, rs41272114 was still associated with Lp(a) also when the linear regression was adjusted for R21X (β=-12.26 (95% CI: -17.00; -7.55), p=3.18e-29), respectively when linear regression was performed only in R21X-negative samples (β=-12.26 (95% CI: -17.06; -7.46), p=3.90e-28). No difference was found between median Lp(a) of heterozygous individuals with both variants and such with rs41272114 but not R21X (p=0.47, Figure 3).

Discussion
The KIV-2 repeat polymorphism in the LPA gene, respectively the apo(a) isoform explains 30-70% 1  The LPA KIV-2 R21X variant is a nonsense mutation located in the KIV-2 region and results in a truncated protein, which is degraded quickly 19 . Until recently 13,24 , the R21X variant was the only functional KIV-2 variant that had been investigated in a relatively large sample set (n=405 19 ).
However, no study up to now has investigated the contribution of R21X to the Lp(a) levels in general or high risk populations nor it is known whether this variant is captured by any of the many GWAS hits that have been reported for LPA [14][15][16] .
Using a newly developed high-throughput capable PCR assay, we determined the carrier status for R21X in 10,910 individuals from three independent studies and found that R21X is associated with a reduction of mean Lp(a) concentrations by 9.9 to 13.0 mg/dL (Table 2). This effect is of moderate magnitude for a nonsense mutation. Since location on LMW LPA alleles would likely be associated with a much stronger Lp(a) decrease (it is e.g. ≈30 mg/dL for G4925A, which is located in the isoform range 19-25 13 ), the observed effect magnitude suggests that R21X is located on rather large LPA alleles. To investigate this assumption we separated the LPA gene alleles of eight individuals by PFGE and typed R21X on the separated gene alleles. As postulated, the R21X mutation was located on HMW LPA alleles in all investigated samples. Surprisingly, the best proxy SNP for R21X among all top hits of a genome-wide association metaanalysis on the Lp(a) concentrations 14 was rs41272114 (MAF=2.6%). This SNP is a well-known 17,41,42 splice site mutation in the KIV-8 domain of LPA and results in a null apo(a) allele, too 17 . The combination of a low determination coefficient (R 2 =0.27) but a high Lewontin's D' (D´=0.957) indicates that R21X-carrying alleles constitute a subset of the more frequent rs41272114-carrying alleles, where virtually all R21X carriers carry also rs41272114 (indicated by the high D'), but, vice versa, not all rs41272114 carry R21X (indicated by the low R 2 ). This suggests that R21X is a more recent mutation than rs41272114 and indeed arose on the background of an rs41272114-carrying haplotype. Accordingly, R21X (termed 640C>T the supplementary materials of Coassin and Schönherr et al, 2019 24 ) is found mostly in Europeans and South-Asians but is absent in Africans 24 , while rs41272114 is rare but present also in Africans (MAF = 0.7%) 45 .
By separating the LPA gene alleles and typing them independently, we have been able to confirm this statistical inference also experimentally. This demonstrated that R21X-carrying alleles indeed represent "double-null" LPA alleles that are inactivated by two independent loss-of-function variants.
Since both variants are located on the same haplotype, no significant difference in Lp(a) concentrations is found between rs41272114-only carriers and double-null allele carriers ( Figure 3) and single causality cannot be assigned. Therefore, despite R21X is a nonsense mutation and would be clearly functional in an isolated manner (e.g. in-vitro), within its proper genomic context its effect is masked by rs41272114. The effect of R21X on Lp(a) and cardiovascular outcomes therefore merges with rs41272114, which has been repeatedly shown to be protective against coronary artery disease

Strengths and limitations of the study
Our high throughput ast-PCR capable of typing two variants within the KIV-2 CNV (R21X and G4925A) in a single multiplex reaction, can be seen as a major technical strength of this work. Some commercial high sensitivity assays like castPCR (ThermoFisher Scientific), Agena MALDI-TOF Ultraseek 47 and droplet digital PCR 48 , are able to type mutations in the KIV-2, too, but their exceedingly high costs (several Euro per sample) precludes their application to large epidemiological studies. In the study at hand, we typed nearly 11,000 individuals, making this study the largest assessment of a variant located in the LPA KIV-2 region performed so far.
The allelic location of functional LPA mutations is rarely assessed in Lp(a) epidemiology.
However, to fully understand the effect size of an LPA mutation, it plays a major role whether a mutation is located on a low or a high molecular weight LPA allele. We have experimentally demonstrated the allelic localization of R21X on moderately large alleles and also experimentally confirmed the co-localization of two loss-of-function variants on the same gene allele.
Conversely, the relatively low number of samples assessed by PFGE is a limitation of our study.
PFGE requires preparation of agarose-plug embedded DNA. This requires buffy coat, which is not commonly available in population studies. The low MAF of R21X further complicates the retrieval of a large number of individuals for PFGE. Therefore, only a limited number of suitable samples were available in our laboratory and the results of our PFGE experiments might not be fully generalizable.
However, the localization of R21X on medium to large sized alleles is in line with the effect magnitude observed in the whole dataset. Furthermore, the co-localization of rs41272114 and R21X on the same haplotype is supported by three independent lines of evidence: (1) the experimental PFGE data from five individuals, (2) the regression analysis in the whole dataset, where the effect of R21X, but not of rs41272114, vanishes after reciprocal adjustment, and (3) the R2/D' values in GCKD.

Conclusion
We developed a high-throughput capable assay for the KIV-2 variant R21X and found that this variant is located on high molecular weight apo(a) alleles, lowers Lp(a) by 11.7 mg/dL, and most surprisingly, that it is in nearly perfect LD with another null mutation (rs41272114). These two variants create LPA alleles that are inactivated by two independent loss-of-function mutations and their effects cannot be genetically separated. While previous studies have shown the impact of LD between SNPs and apo(a) isoforms 13,43 , our study is the first example of a strong LD between two clearly functional LPA variants. This emphasizes the complexity of LPA genetics and exemplifies the importance of assessing LD patterns even for seemingly obvious functional variants.