An imprinted rheumatoid arthritis methylome signature reflects pathogenic phenotype

Background A DNA methylation signature has been characterized that distinguishes rheumatoid arthritis (RA) fibroblast like synoviocytes (FLS) from osteoarthritis (OA) FLS. The presence of epigenetic changes in long-term cultured cells suggest that rheumatoid FLS imprinting might contribute to pathogenic behavior. To understand how differentially methylated genes (DMGs) might participate in the pathogenesis of RA, we evaluated the stability of the RA signature and whether DMGs are enriched in specific pathways and ontology categories. Methods To assess the RA methylation signatures the Illumina HumanMethylation450 chip was used to compare methylation levels in RA, OA, and normal (NL) FLS at passage 3, 5, and 7. Then methylation frequencies at CpGs within the signature were compared between passages. To assess the enrichment of DMGs in specific pathways, DMGs were identified as genes that possess significantly differential methylated loci within their promoter regions. These sets of DMGs were then compared to pathway and ontology databases to establish enrichment in specific categories. Results Initial studies compared passage 3, 5, and 7 FLS from RA, OA, and NL. The patterns of differential methylation of each individual FLS line were very similar regardless of passage number. Using the most robust analysis, 20 out of 272 KEGG pathways and 43 out of 34,400 GO pathways were significantly altered for RA compared with OA and NL FLS. Most interestingly, we found that the KEGG 'Rheumatoid Arthritis' pathway was consistently the most significantly enriched with differentially methylated loci. Additional pathways involved with innate immunity (Complement and Coagulation, Toll-like Receptors, NOD-like Receptors, and Cytosolic DNA-sensing), cell adhesion (Focal Adhesion, Cell Adhesion Molecule), and cytokines (Cytokine-cytokine Receptor). Taken together, KEGG and GO pathway analysis demonstrates non-random epigenetic imprinting of RA FLS. Conclusions The DNA methylation patterns include anomalies in key genes implicated in the pathogenesis of RA and are stable for multiple cell passages. Persistent epigenetic alterations could contribute to the aggressive phenotype of RA synoviocytes and identify potential therapeutic targets that could modulate the pathogenic behavior.


Background
RA is a chronic inflammatory disease marked by synovial hyperplasia and invasion into cartilage and bone. This process is mediated, in part, by cytokines like IL-1, IL-6, and TNF that activate a broad array of cell signaling mechanisms and leads to the release of destructive enzymes [1]. Fibroblast-like synoviocytes (FLS), which form the inner lining of the synovium, play an active role in joint destruction by invading intra-articular cartilage and other support structures of the joint [2]. These mesenchymal cells normally produce hyaluronic acid and other lubricants that facilitate joint movement and a low friction environment.
FLS display an aggressive phenotype in RA that persists in long-term culture [2,3]. These imprinted cells can migrate between joints and exhibit characteristics of locally invasive transformed cells [4]. The mechanism that contribute to functional alterations in RA FLS are only partially understood and include somatic mutations, alterations in cell survival and apoptosis genes, and persistent activation of signaling pathways [3]. Epigenetic changes, including aberrant miRNA expression [5], can also contribute to the aggressive RA FLS phenotype.
More recently, a characteristic DNA methylation signature that could affect cell function and distinguishes RA from osteoarthritis (OA) FLS was discovered [6]. Differentially methylated loci (DML) involve many key genes implicated in inflammation, immune responses, cell-cell interactions, and matrix regulation. The original study defining the RA methylation pattern was performed on a relatively limited number of cell lines and did not evaluate the stability of the signature over multiple passages. For the present analysis, we increased the number of OA and RA cell lines and included normal (NL) synoviocytes. The greater number of cells permitted a focused evaluation of promoter sequences and a more detailed pathway analysis. The results demonstrate a pattern of differentially methylated pathways in RA FLS that define pathogenic processes that could permit identification of novel therapeutic targets.

FLS and patient phenotype
FLS were isolated from synovial tissues obtained from 11 RA and 11 OA patients at the time of joint replacement as described previously [7]. The diagnosis of RA conformed to the American College of Rheumatology 1987 revised criteria [8]. The protocol was approved by the UCSD Human Subjects Research Protection Program. Synoviocytes were used from passage 3 through 7, when FLS were a homogeneous population with < 1% CD11b, < 1% phagocytic, and < 1% FcR II positive cells. Normal human synoviocytes were provided by the San Diego Tissue Bank from autopsy specimens. Preparation of the genomic DNA from early, middle, and late passage cells (passages 3, 5, and 7, respectively) for the Infinium HumanMethylation450 BeadChip (Illumina; San Diego, CA) and calculation of methylation frequencies (Beta values) was performed as previously reported [6]. The BeadChip data from the original 11 samples in reference 6 are available through the Gene Expression Omnibus (GEO) under the accession [GSE46364].

BeadChip processing and validation
BeadChip data were processed in R 2.15 using the methylumi and minfi packages. Data were normalized via the minfi package (preprocessIllumina function). CpGs with detection P values > 0.001 were filtered out. For chip validation, biologic replicates within chips were performed and demonstrated a mean correlation coefficient r 2 = 0.9338. Replicates between chips demonstrated a mean correlation coefficient r 2 = 0.9858. Regarding potential performance differences between Infinium I and II probes on the BeadChip, we reasoned that since CpGs were tested independently, probe type-specific bias would be equally present in both phenotype populations. Therefore, the risk of false-positive detection due to probe-type differences is low.

Methylation heat maps and histograms
Methylation frequencies of previously identified RA-OA differentially methylated CpGs [6] across FLS samples were multiplied by 100 to obtain values interpreted as methylation percentages. The Euclidian distances between FLS samples and CpGs were calculated and hierarchically clustered using complete linkage. The results were visualized in a heat map using the heatmap.2 function in the gplots R package as previously described [6]. Missing values were represented with a white color. For each FLS cell line, beta value differences of the previously identified 1,859 CpG signature between passages were calculated and plotted as histograms with bins for every 0.01 interval. The bar areas were normalized so that their total sum equaled unity.

Correlations between FLS passage number
The FLS lines were experimentally processed the same day across three BeadChips. This approach minimized intra-dataset (that is, KEGG and passage FLS datasets) batch effects like BeadChip lot, reagent lot, lab processing, or temporal variability). The tight hierarchical clustering of P5 FLS samples across the KEGG and passage datasets demonstrates that inter-dataset batch effects are minor relative to the RA methylation signature (see Results). Furthermore, validation studies of duplicate samples on the same BeadChip, on different BeadChips, or performed on different runs showed very strong correlations (data not shown). For each cell line, the Spearman Rank correlation of beta values of the previously identified 1,859 CpG signatures was calculated across passages. For each pairwise correlation calculation, CpGs with missing data were omitted. For the correlation between replicates of passage 5 comparisons were made within the same cell line. To estimate passage correlation, the Spearman correlation coefficients between passage 5 and the other passages were calculated per cell line. An average was calculated based on these correlation coefficients.

Identifying differentially methylated genes (DMGs)
A Welch's t-test was used to calculate the significance of differential methylation. We chose to apply Welch's t-test as the samples may have unequal variance. All loci that are present on the array were tested and the resulting P values were converted into q values [9]. Additionally, the average difference in mean values at each position was calculated. Loci with q values < 0.05 and average mean differences > 0.1, or < -0.1, were labelled as differential methylated loci (DML). Missing values and values with detection P values > 0.01 were not included in this analysis. At a specific locus, if values were absent but at least three values present, for each of the two groups being considered, then comparison was made between only the existing values. If there were less than three values present in a single group, at a specific locus, then this locus was ignored from the analysis. The DMLs within the promoters of genes were identified and the DMLs were linked to the corresponding genes. The locations of refseq genes were obtained from the University of California, Santa Cruz. If a promoter contained one or more DMLs within a 3 kb window around a gene's transcription start site (TSS) (-2500 bps to +500 bps from the TSS) then the gene was labelled as a DMG.
The significance of DMG in 271 KEGG human pathways was assessed [10]. Enrichment factors (EF) were calculated as: EF = total DMGs in pathway total genes in pathway total DMGs total genes in KEGG The 'total genes in the pathway' and 'total genes in KEGG' only considered genes that are present in a KEGG pathway and are covered by the BeadChip.
The hypergeometric distribution was used to calculate enrichment P values that were converted into q values.
Resulting q values represented the fraction of randomly selected background gene sets that were at least as enriched in genes found in the tested pathway as the DMG set. A q value threshold of 0.05 determined significance. The significance of 34,449 GO categories was tested in the same way. The background for the GO analysis included all the genes whose promoters are covered by the methylation array and that are present in the GO database.

Stability of the methylome signature in RA FLS
We previously identified a methylome signature in RA comprised of 1,859 DML and predicts the phenotype of passage 5 FLS (RA vs. OA). One critical question to answer before performing additional pathway analysis with more cell lines is whether this signature is stable. Therefore, we assayed the methylomes of nine FLS cell lines (three RA, three OA, and three NL) at the 3rd, 5th, and 7th passage (P3, P5, and P7, respectively). Passage 3 cells were the earliest samples evaluated because passage 1 and 2 lines can include other cell types. By the 3rd passage, FLS are a homogeneous population of cells (< 1% CD11b positive, < 1% phagocytic, and < 1% FcR II and FcR III receptor positive) [2].
We first performed hierarchical clustering of RA, OA, and NL samples from the three passages. In addition, we included P5 samples from our previous study as an internal biological sample control. The RA, OA, and NL groups clustered separately at all passages, confirming that the RA methylome signature is stable over many passages (see Figure 1A for representative examples). Of note, the replicates of P5 clustered very closely and were also only slightly different from the other passages.
To further explore the stability of the RA methylome signature over time, we considered the difference in methylation percentages between different passages and the replicates of P5. Figure 1B shows that variation between replicates of P5 is comparable to the variation between P5-P3 and P5-P7. The correlation across the nine cell lines between P5 replicates is 0.950 while the correlation between P3-P5 is 0.943. These data show that P3 and P5 are as highly correlated as replicates of the same passage and that there is very little change in the methylation frequencies between P3 and P5. There is a slight decrease in correlation between P3 to P7 (0.883) as the cells approach senescence.

Determination of differentially methylated genes in RA FLS
Having confirmed that the RA methylome signature is stable we then investigated its relationship with clinical phenotype on a systems level. Previously, we had carried out KEGG and GO analysis on a limited set of samples (six RA and five OA FLS lines). We have advanced on this previous analysis by increasing the sample size to 11 RA and 11 OA and adding six NL lines. The increased number of samples from 11 to 28 allowed us to focus on the subset of DMLs within the gene promoters (-2500 to +500 bps from the TSS). DMLs were identified by calculating P values using Welch's t test. Then the resulting P values were corrected for multiple testing to produce q values. Genes that contain DMLs within their promoters were labeled DMGs. An average mean difference of 0.1 or greater was required for a DML to be considered significant.
Identification of DMGs was carried out by comparing the RA samples to five combinations of OA and NL: 'RA vs. OA', 'RA vs. NL', 'RA vs. OA+NL' (which combines the OA and NL databases), 'RA vs. OA or NL' (which includes DMGs that are significant in either RA vs. OA or RA vs. NL) and 'RA vs. (OA or NL) or (OA +NL)' (denoted as 'combined'). We focused on comparisons that take account of both OA and NL as these are of greatest relevance to differentiating RA phenotype from non-RA. Furthermore, smaller NL sample size identified a reduced set of DMGs making this set of DMGs unsuitable for systems level analysis.
A summary of the number of genes identified in each of the comparisons is shown in Table 1 (for a full list of genes, see Supplemental Table 1). The majority (89%) of the 2,375 total genes identified (the combined) as DMGs were identified solely by comparison of RA and OA.

Determination of differentially methylated KEGG pathways in RA FLS
Having established sets of DMGs, we identified the biological pathways and gene ontologies that were enriched within the sets. To identify enriched pathways, the DMGs where mapped to the KEGG pathways and GO databases. Enrichment P values were then calculated using the hypergeometric distribution. Then these P values were corrected for multiple testing to produce q values. A q values cut-off of < 0.05 was considered significant.
A summary of the total number of significant KEGG pathways and GO categories that were found significant is given in

Replicates of P5
Methylation Frequency Difference  Table 3 using the various groupings described above. The top ranked pathway is the KEGG 'rheumatoid arthritis' pathway (see Figure 2) with 2.47-fold enrichment (P = 1.729e-05 and q = 0.0027) and 24 out of 89 genes identified as DMGs in the union set. This confirms that the observed alterations in DNA methylation are highly relevant to RA. Furthermore, eight additional immunological pathways relevant to RA were also identified as significantly enriched with DMGs in RA FLS. For example, KEGG 'Complement and coagulation cascades' pathway (see Figure 3) is 1.99-fold enriched (P = 0.0064 and q = 0.0494) with 15 out of 69 genes labeled as DMGs in the union set. The KEGG 'Focal Adhesion' pathway (see Figure 4) is 1.61-fold enriched (P = 0.0027 and q = 0.0420) with 35 out of 199 genes labeled as DMGs in the union set. The KEGG 'Toll-like receptor signaling'   pathway (see Figure 5) is 1.89-fold enriched (P = 0.0042 and q = 0.0377) with 19 out of 92 genes labeled as DMGs in the union set. Certain KEGG pathways and GO categories are significantly enriched in 'OA+NL' or 'OA or NL' but are not significantly enriched in 'combined'. This can occur because the increase in the total number of significant genes can affect the enrichment of a subset of genes within a particular pathway. For example, the KEGG pathway 'cytosolic DNA sensing' is significantly enriched in 'OA+NL' but not in combined despite the 'OA+NL' containing five DMGs in the pathway whereas 'combined' contains eight. This occurs because the total number of DMGs increases from 472 in 'OA+NL' to 2,375 in 'combined'.

Determination of differentially methylated GO pathways in RA FLS
The enriched GO categories out of the full 34,449 category dataset are summarized in Table 4 where 10 categories especially relevant to RA are listed at the top of the table. Of particular interest is the overlap with KEGG because of the preponderance of matrix, adhesion, and signaling GO categories.

Discussion
Fibroblast-like synoviocytes (FLS), which form the synovial intimal lining, play an integral role in the pathogenesis of RA by producing cytokines, small molecule mediators, and proteases [1]. The FLS are responsible for cartilage damage by virtue of their ability to adhere to and invade the cartilage extracellular matrix [11]. Rheumatoid FLS exhibit a unique aggressive phenotype that contributes to the cytokine milieu and joint destruction [12]. Functional studies suggest that RA cells are imprinted in situ and maintain these features after many passages in tissue culture. For example, RA FLS, unlike OA or NL synoviocytes, invade cartilage explants in SCID mice [13]. RA FLS can grow under Table 3 Gene ontology terms that are enriched in two or more sets of genes (Continued)  anchorage-independent conditions, are less susceptible to contact inhibition, and resistant to apoptosis [14,15]. DNA methylation could play a critical role in joint damage by epigenetic imprinting FLS in RA. Evidence of abnormal methylation and a distinct methylation signature was recently described in RA synoviocytes [6]. Using a limited number of cell lines, hyper-and hypomethylated CpG sites were demonstrated in nearly 2,000 loci located in 1,200 genes. Methylation status determined by array analysis was confirmed by bisulfite sequencing and correlated with gene expression in that study. Additional analysis identified over 200 genes with multiple differentially methylated loci. Preliminary KEGG and GO analysis suggested that pathways involved with inflammation, matrix regulation, and immune responses are differentially methylated. The present study greatly expands upon the initial dataset by doubling the number of RA and OA FLS and adding normal FLS. The increased sample size also allowed us to focus our analysis on differentially methylated promoter sites. These data refined the KEGG and GO analysis and led to identification of additional key pathways implicated in disease.
Before extending our computational studies, we determined if the RA methylation signature is stable. Comparison of the methylation patterns between different passages of RA, OA, and NL FLS showed that the signature changes very little over time. These results correlate with previous studies demonstrating that the FLS transcriptome is also stable in culture [2,16]. By the 7th passage, a slight increase in methylation variability was detected, but the correlation was only slightly lower than between replicates of the same passage. Furthermore, the most significant methylation changes that may be associated with pathogenic processes appear to be very stable over time in FLS lines. These data suggest that the majority of the variation is a result of noise in the bead array assay.
Methylome stability has been observed with other longterm cultured cell lines. For instance, multiple passages of the cell lines IMR90 (human fetal lung fibroblast) and H1 (human embryonic stem cells) showed high concordance [17]. The methylome is not immutable, however, and can be modulated by the environment, different developmental stages [18], or prolonged tissue culture [19]. For example, dynamic changes in methylome occur in human embryonic stem cells, a fibroblastic differentiated derivative of the human embryonic stem cells, and neonatal fibroblasts. Immortalized fibroblasts that evolve separately over 300 generations show stochastic methylation changes. Despite the random nature of changes, they resulted in a deterministic remodeling of the methylome that correlates with histone modification and CTCF binding.
The stability of the RA methylome signature from the earliest possible passage (P3) to cells approaching senescence (P7) suggests that it is imprinted in FLS rather than a transient phenomenon. However, it does not tell us the origin of the RA methylome signature. In particular, we are still unsure whether the imprinting predates disease, is brought about by interaction with the pre-RA synovium environment, or is modified by RA in a way that influences the behavior of FLS. Our recent study show that DNMT expression and function are suppressed in FLS when exposed to IL-1 [20]. However, this effect is transient and is reversed 2 weeks after the cytokine is removed from the cultures. Therefore, cytokines can potentially contribute to altered DNA methylation in FLS but probably do not account for the long-lasting effects. Evaluation of FLS from patients at high risk for RA but without clinical disease would be needed to resolve this issue with greater certainty. Also, specificity for RA, as opposed to other forms of inflammatory arthritis, will require evaluation of FLS from other forms of synovitis.
Systems level analysis of the RA methylome signature in our expanded dataset reveals that DMLs are highly enriched in the promoters of genes that belong to characteristic functional categories. Because we had three distinct cell populations, we used several different ways to evaluate how RA differed from OA and/or NL FLS. There were some differences in the genes and pathways that were differentially methylated depending on how the phenotypes were grouped. Relatively broad agreement among the various types of analyses was observed. The most gratifying was the RA pathway in the KEGG analysis, which was found in the three most robust comparisons and confirmed the relevance of our analysis to this disease.
We also found the enrichment of many pathways that are related to the RA phenotype and immune responses. The KEGG and GO analysis of differentially methylated pathways is, therefore, non-random and likely essential for establishment or maintenance of the RA FLS phenotype. Even many of the seemingly less relevant categories that were identified as enriched result from overlap in sets of genes that are shared with pathways that are more clearly relevant to RA. For example, the pathway 'Staphylococcus aureus infection' contains 16 DMGs; however, six of these are also present in the RA pathway.
The nature of the RA-associated pathways enriched in our KEGG and GO analysis provides clues regarding the pathogenesis of the disease. For example, the role of complement is well documented in animal studies [21]. This innate immune mechanism is also strongly implicated in the initiation and acute inflammatory reaction of RA [22]. Several complement components are produced by FLS in the intimal lining as well as cultured synoviocytes [23]. The pathway analysis indicates that regulation of complement is abnormal in addition to the fact that it is robustly consumed in the joints.
Altered methylation and, presumably, gene regulation in FLS is also observed for many other components of innate immunity. Notably, TLR, cytosolic DNA-sensing, and NOD-like receptor pathways are significantly enriched in the KEGG analysis for differentially methylated genes. Each pathway has been implicated as fundamental mechanisms that regulate the inflammatory response in RA [24][25][26]. Genes regulating innate immunity are emerging as potential therapeutic targets for RA, and the methylation data supports their participation in rheumatoid synovitis. Similarly, multiple cytokines are differentially methylated in the RA pathway and in the cytokine-cytokine receptor pathway, including TNF and other critical mediators of RA. Additional types of pathways involving inflammation, host defense, and immune responses are enriched in the GO analysis and are consistent with the insights gleaned from the KEGG pathways.
The data suggest that pathway analysis can provide other clues for disease pathogenesis and novel therapeutic interventions. For example, cell-cell and cell-matrix interactions are differentially methylated in the focal adhesion pathway and the cell adhesion molecule pathways. Several potential therapeutic targets are within these pathways and could be explored for diseases like RA. The specific genes that are differentially methylated are not necessarily the ones that should be the focus of drug development. Instead, more attractive proteins in the pathway that are more amenable to inhibition or modulation could be selected and achieve the same result.