Functional genomics and rheumatoid arthritis: where have we been and where should we go?

Studies in model organisms and humans have begun to reveal the complexity of the transcriptome. In addition to serving as passive templates from which genes are translated, RNA molecules are active, functional elements of the cell whose products can detect, interact with, and modify other transcripts. Gene expression profiling is the method most commonly used thus far to enrich our understanding of the molecular basis of rheumatoid arthritis in adults and juvenile idiopathic arthritis in children. The feasibility of this approach for patient classification (for example, active versus inactive disease, disease subsets) and improving prognosis (for example, response to therapy) has been demonstrated over the past 7 years. Mechanistic understanding of disease-related differences in gene expression must be interpreted in the context of interactions with transcriptional regulatory molecules and epigenetic alterations of the genome. Ongoing work regarding such functional complexities in the human genome will likely bring both insight and surprise to our understanding of rheumatoid arthritis.

Olsen and colleagues [9] demonstrated that peripheral blood mononuclear cells (PBMCs) from patients with early and late RA showed distinctly different gene expression profiles. This group [10] also demonstrated two features of RA expression profiles that have been corroborated in several, but not all [11], subsequent studies: (1) differentially expressed genes in RA do not reflect an orderly, patterned immune response (for example, as one sees after immunization of healthy controls), and (2) many of the differentially regulated genes show no apparent immune function at all. Nevertheless, the success of microarray technologies in classifying patients has held out the promise that this approach might be used as the basis for diagnostic assays [12], and the field seems to be approaching that point now. A recent report by van Baarsen and colleagues [13] provides an example of the potential for such clinical applications. The authors demonstrated that gene expression profiling of autoantibody-positive patients (IgM-rheumatoid factor (IgM-RF) and/or anti-citrullinated protein antibodies) with arthralgia could distinguish those patients fated to develop frank arthritis over a 7-month period.
Gene expression profiling is also beginning to show potential clinical utility for RA in the area of predicting responses to therapy, specifically to tumor necrosis factor (TNF)-α blockers. This is a critical issue, given the expense and intrusiveness of these therapies, and the fact that as many as 30% of patients do not respond to their first TNF inhibitor [14]. In 2006, Lequerrré and colleagues [15] demonstrated that responses to the anti-TNF monoclonal antibody infliximab can be predicted on the basis of gene expression profiling. More recently, Tanino and colleagues [16] replicated this finding in a cohort of Japanese patients, and validated their candidate biomarkers (that is, the genes whose expression levels best predicted response to therapy) in a prospective cohort, while Koczan et al. [17] in Germany reported similar results with etanercept. However, it is important to note that the predictive genes showed no overlap between the Japanese and German cohorts. Whether this was due to the differences in array platforms, underlying clinical or genetic differences in the two populations studied, or differences in how TNF inhibitors are used in the clinical setting in the two countries is unclear. At the present time, we can only conclude that, while these preliminary studies suggest that it may be feasible to develop arraybased prognostic biomarkers, a common, internationally applicable set of gene expression biomarkers has yet to emerge. Of special interest is that some of the most informative biomarkers in each cohort emerged by observing the dynamics of gene expression after the initiation of therapy. Our group has found similarly informative gene dynamics in the polyarticular form of juvenile idiopathic arthritis (JIA) [18]. Thus, future studies will need to incorporate gene dynamics as well as static studies; it is likely that these dynamic studies will also provide unprecedented insight into the biology of response to therapy.

Insights into pathogenesis
While patient stratifications for clinical and therapeutic prognoses are useful in themselves, they represent only two potential uses of functional genomics as applied to RA. There remains considerable interest in using gene expression profiling to better understand disease pathogenesis and the complex interactions between genes and environment that are believed to be the basis of this disease [19]. There have already been some surprises, and these surprises in themselves demonstrate the value of 'discovery science' uninformed by a specific hypothesis.
An interesting observation that has emerged from several microarray studies of RA has been the prominence of genes associated with innate immunity. It has long been assumed that RA is an autoimmune disease, although the initiating or perpetuating autoantigen(s) are poorly understood. Gene expression signatures demonstrating critical involvement of the innate immune system suggest a complex interplay between innate and adaptive immunity rather than an antigen-driven event [20]. Our own work in the polyarticular form of JIA (which phenotypically carries a strong resemblance to adult RA) suggests that a focused look at innate immunity may be fruitful [21,22].
Another interesting observation, revealed first in the work by Olsen et al. [9], is the finding that many of the differentially expressed genes identified in patients with RA (compared with healthy age-and sex-matched controls) are not genes directly associated with immune function as we currently understand it. Differential expres sion of cell cycle regulators, genes encoding signal transduction molecules, transcription factors, and DNA repair enzymes has been seen in multiple microarray experiments [10]. Clearly there is a need for further experimental work and interdisciplinary cooperation to decipher the clues hidden by these findings.
The currently published literature on the use of gene expression profiling in RA has largely used relatively straight forward computational biology approaches to analyze the data. Published studies have used hierarchical cluster analysis to classify patients (for example, van Baarsen et al. [13], and van der Pouw Karan et al. [11]) and various methods for assigning function (known or putative) to groups of differentially expressed genes, but only recently have there been attempts to understand disease pathogenesis by linking differentially expressed genes into interactive regulatory networks [23,24]. This approach can be quite powerful in understanding disease pathology. Until recently, it was assumed that biological systems adhered to classical network theory as articulated by Erdös and Rényi [25]. This theory assumes that constituents in a network ('nodes') are connected randomly to other constituents. Furthermore, the number of links between nodes is similar and follows a Poisson distribution related to the number of constituents in the system. Over the past 10 years, it has become clear that biological systems exhibit features of scale-free networks [26,27]. Computer modeling derived from genome sequen cing, metabolic studies, and known biochemical functions of specific proteins suggests that there are both 'hubs' with high connectivity and peripheral nodes with significantly less connectivity within networks. An interesting feature of such scale-free networks is that they are highly resistant to errors or perturbation [28] making them highly relevant to the study of disease. In homo geneous systems, disruption of a single node can have significant effects on the whole system, since each node has approximately the same number of (linear) connec tions. In contrast, scale-free systems are relatively resistant to perturbations because most nodes show only limited connectivity. Modulation of hubs, however, has significant effects on the system, because of the high levels of connectivity of hubs to other parts of the system. This can be seen intuitively in a thought experiment with the inter national air traffic system, which also shows a hub-andnode structure: disruption of traffic into or out of London Heathrow airport or John F Kennedy airport can have serious ramifications for international travelers all over the world, while disruption in Rapid City, South Dakota, or Burlington, Vermont, has a significantly smaller impact.
We have found that the complex relationships between products of differentially expressed genes derived from childhood rheumatic diseases also demonstrate the 'huband-node' structure of physiologic systems [29]. Interestingly, most differentially expressed genes occur as nodes, while genes represented in hubs frequently encode transcription factors and signaling molecules whose functions may be modified by post-translational processing rather than by differences in levels of RNA or protein.
If gene expression profiling is to be used to identify new targets for therapy, it may be critical to look at network structures in order to identify those places where disruption is likely to be most effective. While there are serious limits to 'off-the-shelf ' network modeling programs whose databases are derived primarily from the existing literature, they provide an easy-to-use starting point from which one might build more sophisticated computational biology approaches.

Interpreting gene expression profiles: studying mechanisms that regulate gene expression
While considerable progress has been made, and new computational resources continue to enrich the utility of existing and future gene expression databases, it will also be critical to use insight gained from studies of trans criptional regulation of model organisms to understand the meaning of expression profiles in complex diseases such as RA. In this regard, investigators have traditionally studied mechanisms that regulate the expression of a limited number of genes, as if the expression of each gene were an independent event. However, studies from model organisms have shown that, rather than occurring independently, transcription of large groups of genes is tightly coordinated across the genome [30]. Each step in gene transcription, including chromatin remodeling, activa tion and interactions between transcription factors, and transcriptional processing, appears to be elegantly orches trated with complementary processes in other genes.
Related to this issue are mechanisms currently being elucidated in the area of epigenetics. Although there are redundant mechanisms through which the emergence of cell 'identity' and regulation of gene expression occur, biochemical alterations of DNA [31] and associated histones [32] in response to environmental changes appear to be critical. However, at this early stage, use of such information to treat RA has been limited, and the outcomes are controversial [33].
Furthermore, we are learning that differential gene expression patterns in diseases such as RA are also coordinated by elements within the non-protein-coding parts of the genome, formerly referred to as 'junk DNA' . While there is still a great deal to be learned about functional non-coding elements within the genome, there is reason to be optimistic that the systematic efforts of the National Institutes of Health ENCODE project, organized to identify all the functional elements in the human genome [34], will provide a platform for the develop ment of novel insights into complex human diseases. Even with only a small percentage of the func tional elements characterized, some startling insights have emerged in the preliminary report encompassing the pilot phase of the project [35]. Rather than transcripts merely serving as passive templates from which genes are translated, RNA molecules of eukaryotic organisms are active, functional elements of the cell whose products detect, interact with, and modify other transcripts. The abundance of long intergenic non-coding RNAs has added to our understanding of the complexity of trans criptional control [36], and it can be anticipated that study of these new regulators in the context of complex human diseases will be highly informative. Similarly, study ing small non-coding RNAs (small interfering RNA, microRNA) is very likely to provide important insights into the mechanisms behind the RA gene expression profiles already generated [37,38]. Collectively, these mole cules are likely to transform our understanding of the dysregulation of gene expression in RA and other rheumatic diseases.
If we are to fully exploit the information and methods that are emerging from the ENCODE project to understand the pathology of RA at the molecular level, then we have very likely reached the limits of what we can achieve while studying mixed populations of cells (except for the development of biomarkers and prognostic assays). A problem in interpreting many of the published studies of gene expression profiling in RA patients is the fact that the profiles have typically been generated from PBMCs, a mixed population of cells that includes monocytes, T cells, B cells, and natural killer cells. Relatively pure subpopulations of cells of the innate or adaptive immune systems from patients with RA have been used in only limited cases [39,40]. Epigenetic markers (DNA methylation, histone modifications, non-coding RNA expression, and so on) are also cell specific. In order to derive a mechanistic understanding of how gene transcription is regulated over the course of RA -for example, in response to therapeutic agents -it will be critical to observe these changes over time in specific cell types, preferably in conjunction with a simultaneously obtained gene expression profile. Genome-wide mapping of disease-specific transcription factor binding sites by chromatin immunoprecipitation (ChIP)-chip or ChIPsequencing, particularly for those transcription factors found to be hubs using systems biology approaches, is likely to provide crucial insight into RA gene expression profiles. As these new results unfold, we may begin to regard RA less as an autoimmune disease that is triggered by inappropriate recognition of a self antigen by a T cell, but, rather, as a disease characterized by loss of transcriptional regulation in cells of both innate and adaptive immunity.

Conclusions
The past 7 years have shown us the promise of using functional genomics to gain insight into the prognosis and pathogenesis of RA. The future will likely take investigators in two very different directions. Pros pective validation of prognostic biomarkers of therapeutic response will build on the promising work of several groups and facilitate the development of relatively simple, clinically useful assays [41]. Meanwhile, rheumatology investigators, computational biologists, and cell biologists focused on transcriptional regulation will take on the challenge of interpreting the complex biology reflected in existing RA gene expression data bases and those to be generated in single-cell populations in the near future.
As the American College of Rheumatology indicates, finding a cure for RA may be 'within our reach' . We think, however, that the state of the art is better summarized by the 1980s rock duo Timbuk3, 'The future's so bright, I gotta wear shades' [42].