- Open Access
Whole genome sequencing in support of wellness and health maintenance
Genome Medicine volume 5, Article number: 58 (2013)
Whole genome sequencing is poised to revolutionize personalized medicine, providing the capacity to classify individuals into risk categories for a wide range of diseases. Here we begin to explore how whole genome sequencing (WGS) might be incorporated alongside traditional clinical evaluation as a part of preventive medicine. The present study illustrates novel approaches for integrating genotypic and clinical information for assessment of generalized health risks and to assist individuals in the promotion of wellness and maintenance of good health.
Whole genome sequences and longitudinal clinical profiles are described for eight middle-aged Caucasian participants (four men and four women) from the Center for Health Discovery and Well Being (CHDWB) at Emory University in Atlanta. We report multivariate genotypic risk assessments derived from common variants reported by genome-wide association studies (GWAS), as well as clinical measures in the domains of immune, metabolic, cardiovascular, musculoskeletal, respiratory, and mental health.
Polygenic risk is assessed for each participant for over 100 diseases and reported relative to baseline population prevalence. Two approaches for combining clinical and genetic profiles for the purposes of health assessment are then presented. First we propose conditioning individual disease risk assessments on observed clinical status for type 2 diabetes, coronary artery disease, hypertriglyceridemia and hypertension, and obesity. An approximate 2:1 ratio of concordance between genetic prediction and observed sub-clinical disease is observed. Subsequently, we show how more holistic combination of genetic, clinical and family history data can be achieved by visualizing risk in eight sub-classes of disease. Having identified where their profiles are broadly concordant or discordant, an individual can focus on individual clinical results or genotypes as they develop personalized health action plans in consultation with a health partner or coach.
The CHDWB will facilitate longitudinal evaluation of wellness-focused medical care based on comprehensive self-knowledge of medical risks.
Whole genome sequencing (WGS) and exome sequencing are rapidly being incorporated as routine components of diagnosis and explanation of rare disorders, and the trend is moving toward utilization of these for risk assessment for common diseases as well [1, 2]. Each month, novel mutations (either de novo or transmitted) that are causal for conditions such as autism, primary immunodeficiency, and craniofacial abnormalities are reported [3–8]. In parallel, widespread adoption of genome-wide association studies (GWAS) have identified thousands of loci that contribute to multifactorial diseases as diverse as diabetes, asthma, and depression [9, 10]. Because environmental factors make a substantial contribution to these latter conditions, 'prediction' is too strong a claim for genomic medicine [11, 12], but risk stratification is certainly feasible . Here we explore how WGS might be incorporated alongside traditional clinical evaluation as part of preventive medical care and the maintenance of good health. The term 'whole genome sequencing' does not imply that the entire genome of each individual is sequenced, but rather should be taken to mean that total genomic DNA was sequenced to high depth, in contrast to targeted or whole exome sequencing.
We have previously shown in multiple settings how GWAS results can be used in combination with personal WGS to evaluate individual risk. The probability of developing each of approximately 100 diseases was estimated for individuals by integrating allelic risk effects for multiple well-validated common variants with population-specific pre-test baseline lifetime risk of disease estimates. We first showed that the pseudo-individual represented by the Human Reference Genome (hg19) would have an increased risk for type 1 diabetes (T1D) . We then reported on three Caucasian individuals. The first had known familial risk of heart disease, which was borne out by his genomic profile of both common variant and rare deleterious variant-associated risk for coronary artery disease (CAD) . The second was found to have an unexpected predisposition to type 2 diabetes (T2D), which revealed itself in an extended period of hyperglycemia following a respiratory viral infection , and parallel longitudinal transcriptomic, proteomic, and immune profiling supported the inference of a change in health status. For the third case, the immunological and pharmacological risk assessment was advanced by incorporating new methodologies for analysis of family quartets . Most recently, we also evaluated the WGS of a South Asian woman from Kerala , and showed how genetic risk distribution for a number of diseases varies in different populations .
Although the focus of most genomic medicine is on disease, routine incorporation into primary medical care also calls for its inclusion in assessment and promotion of wellness and health. Currently, health promotion programs utilize primary preventative measures such as exercise, diet, weight loss, and stress management . Additionally, clinical indicators and risk factors such as blood pressure, glucose, and lipids are being incorporated into screening, and the potential value of combining these with large-scale genomic and molecular measurements have been discussed  but not yet assessed in the context of health promotion. Further, use of these measures is on the 10-year agenda for the United States Department of Health and Human Services Healthy People 2020 health promotion program .
The Center for Health Discovery and Well Being (CHDWB) is a joint initiative of Emory University and the Georgia Institute of Technology, which has the objective of assessing whether comprehensive annual health evaluation combined with regular discussions with a 'health partner' (an individual trained to interpret clinical profiles and coach on health-related behavior) can help people make more informed and better personal health decisions that maintain wellness, and potentially reduce morbidity and medical treatment for chronic disease [23, 24]. In this program, we obtaincomprehensive clinical data pertaining to metabolic, cardiovascular, skeletal, and mental health, we carry out a survey assessment of nutrition, behavior, and family history of disease every 6 to 12 months, and we are performing WGS and other deep genomic profiling for a subset of participants.
The objective of this report is to show how these two types of analysis, namely clinical and genomic, can be considered as complementary views of participant health. Clear instances of agreement and of discordance are described, and strategies for conditioning genomic risk assessment on clinical data are considered. We conclude with a discussion of how the complex and voluminous quantity of data might in the near future be distilled to support actionable medical inference and personal lifestyle choices.
Eight Caucasian individuals (four men and four women) were selected from a longitudinal cohort of healthy adult volunteers at the CHDWB at Emory University Midtown Hospital (Atlanta). The CHDWB participants were broadly representative of Emory employees and were free of any known acute illness at the time of recruitment. The eight selected individuals are drawn from a panel of 500 CHDWB participants who had completed at least three visits during the first 2 years of the Center's existence, and were chosen pseudo-randomly to represent a range of diversity for metabolic and cardiovascular phenotypes (for summary, see Additional file 1). The eight individuals were a non-random sample in the sense that they were selected from the upper or lower deciles for body mass index, percentage body fat, high-density lipoprotein cholesterol (HDL-C), and triglyceride levels, and the Beck Depression Index and Augmentation Index values were used to capture slightly different clinical profiles. They thus represent classically 'fit' and 'unfit' phenotypes, whichwere nevertheless different with respect to blood fat and sugar. The individuals were a random sample in the sense that another 30 individuals with similar profiles could easily have been chosen. Their Framingham Risk Scores (FRS) for diabetes and cardiovascular risk were distributed across the observed range in the entire cohort, as are their genotypic risks for both diseases (see Additional file 2).
The study was performed in accordance with the Declaration of Helsinki. It was approved by the institutional review boards (IRBs) of Emory University (IRB00007243) and the Georgia Institute of Technology (H09364) for collection of clinical and genomic data following written consent, and although we discussed openly with participants regarding their clinical data, approval for provision of genetic results to individual participants has not yet been sought or provided. Because of this, in the interests of participant privacy, data in some figures and tables are presented as z-scores so as to preclude personal identification. We carried out analyses (see Additional file 3) to confirm that participants would not be able to identify themselves unambiguously from the data presented here, as most have clinical profiles that are very similar to those of other individuals in the study. In addition, rare variants were for the most part excluded from discussion, as the IRB considers these to be more potentially disturbing in the absence of professional consultation were an individual to suspect that they are represented.
Details about the recruitment of participants and collection of biomedical and health status data at CHDWB have been described previously . Participants underwent extensive clinical measurements to assess their health status at study initiation, and 6 and 12 months later, and most are continuing to participate with annual evaluations. Data gathered include anthropomorphic measurements, laboratory tests including complete blood counts, metabolic and lipid profiles, urinary and serum biomarkers for oxidative stress, inflammation, and immune function, pulse wave velocity assessment of cardiovascular function (SphygmoCor; AtCor Medical, Sydney, Australia), whole body densitometry, and assessment of mental and behavioral health (NexAde; NexSig Neurological Examination Technologies Ltd, Herzliya, Israel) as described previously . Self-reported family and personal medical histories were also recorded, along with extensive online surveys that were filled in at the participants' convenience at or around the time of each visit. Blood samples were collected at each visit for all the participants, and DNA was extracted from buffy coats isolated at first visit.
Risk predictions for 8 year risk of diabetes and 10 year risk of cardiovascular diseases were calculated using the equations provided by the Framingham Heart Study  online . We derived z-scores for continuous clinical variables using the entire CHDWB dataset of over 500 individuals by subtracting the mean and dividing by the standard deviation. The mean of the first three visits was considered in all assessments reported here.
Whole genome sequencing
WGS was performed by the Illumina Genome Sequencing Network at the University of Washington on HiSeq2000 (Illumina Inc., San Diego, CA, USA) automated sequencers. Briefly, 100 μl of genomic DNA (> 60 ng/μl concentration) was sheared to give a mean fragment size of 500 bp, and sequencing libraries were generated. Imaging and analysis of 100 bp paired-end read data was performed using standard Illumina software. Approximately 125 billion bases that passed the Illumina analysis filter were obtained for each genome. Mean non-N reference (that is, after excluding gaps) coverage was approximately 36X, with 95.5% (mean) of the positions having coverage of at least 10X. The genome sequences were aligned against the Human Reference Genome assembly (hg19 sequence) using CASAVA (Consensus Assessment of Sequence And Variation) software (Illumina). On average, 87% of each individual's quality filtered reads were aligned. High-confidence variants with a quality score above 20 were retained. The accuracy of the generated genome sequences was confirmed by comparison with previously determined genotypes from Illumina OmniQuad arrays, which showed over 99% concordance for all individuals.
Genetic risk assessment based on common variants
Genetic risk predictions for various diseases were generated using our VARIMED (Variants Informing Medicine) database of complex disease associations , and our previously reported pipeline for combining odds ratios (ORs) of robustly associated single-nucleotide polymorphisms (SNPs) with diseases and traits . An individual's genetic risk for a disease was calculated as their 'post-test probability'. We first computed likelihood ratios (LRs) for each SNP as the ratio of the probability of the genotype in an affected person to that of an unaffected person. LRs for each locus were computed from each case-control study be dividing the genotype frequency in cases by the frequency in controls, weighted by the sample size of the study. Thus, for an individual with genotype g in SNP x found in i = 2 ...s studies, each of size S(i), the LR is computed as:
Only loci that have been found in GWAS in individuals of European ancestry in at least one study with P < 1 × 10-6 were used in estimating LR, using only the most significant site in a haplotype block (r2 ≥ 0.8). Next, all LRs were combined with pre-test probabilities, namely the baseline lifetime risk for disease, to estimate the post-test probability . Sex-appropriate pre-test probabilities estimated from published reports were used to estimate the post-test probabilities by converting them to pre-test odds using the formula pre-test probability/1-pre-test probability, multiplying by the genotypic LR to give the post-test odds, and then converting these to post-test probability by dividing the post-test odds by 1 + post-test odds.
For example, for n number of SNPs, the post-test odds are given by:
Clinical risk assessment
Each person was classified according to five levels of risk (very low, low, intermediate, high, or very high) for a disease/trait according to whether they were in the upper or lower one or two standard deviation unit bins of measured clinical markers for that disease/trait in the entire CHDWB cohort (Figure 1). Clinical risk was assessed in eight major disease categories: immunological, metabolic, cardiovascular, musculoskeletal, respiratory, cognitive, psychiatric and oncological. A list of diseases and the measured clinical attributes considered in each category is provided (see Additional file 4), but note that this list will vary depending on the range of clinical attributes that are available in any study or clinic. The strategy is simply to average multiple clinical measures so as to place individuals in five bins for each category. We are currently evaluating computational strategies for combining the scores in a weighted manner that also accounts for co-variance, but in this study we used the simpler strategy of simply averaging each of the contributing z-scores. Additional categories might include organ failure and reproductive health, but we did not have data pertaining to these at this time.
Integration of genetic and clinical data
For joint clinical and genetic risk assessment, we describe two exploratory approaches.
The first approach directly matches GWAS results with individual diseases. Two limitations to this approach are either that there are no appropriate clinical biomarkers for some diseases in our cohort (such as for asthma and cancer), or that some biomarkers are precisely the endophenotype of the disease/traits investigated in GWAS (for example, triglycerides for hypertriglyceridemia or body mass index (BMI) for obesity), so in a sense the two are redundant. Nevertheless, we proceeded to use the strategy shown in Figure 2 for five inter-related and prevalent conditions: CAD T2D, hypertension, obesity, and hypertriglyceridemia. There are two analytical issues, namely assessing each person's relative risk, and adjusting the post-test probability based on that risk. For CAD, T2D, and hypertension we computed FRS for each person at each visit across the entire CHDWB database. These scores were averaged over the first three visits to generate an individual's average FRS, which was divided by the sample mean to generate the LR. We used the relationship that post-test probability = pre-test probability × LR/(1 + (pre-test probability × (LR-1))) to generate an adjusted baseline, which can then be modified by genotypic risks. For obesity and hypertriglyceridemia, the intention is to show an individual whether their risk is due to a combination of both clinical and genetic factors. For instance, individuals with incident obesity or hypertriglyceridemia have the disease, but it is nevertheless possible to report risk factors of less than 100%. We proceeded by noting that the relative environmental and genetic contributions are reflected in the heritability, which can thus be used to scale the clinical contribution as a proxy for the environment. Estimates vary in the literature, but here we assumed heritability of 50% for obesity and 30% for hypertriglyceridemia. Each individual's z-score was computed and the LR for individuals with the same clinical z-score was identified. The pre-test probability was multiplied by 2 × h2 × LR (namely the LR for obesity or 60% of the LR for hypertriglyceridemia), providing a newly scaled pre-test probability that then seeded the genotypic adjustment.
The second approach combined multiple clinical and genetic measures in order to generate an overall portrait of risk in the eight major disease categories mentioned above (Figure 3). The z-scores for clinical parameters were adjusted with respect to risk predisposition: for traits that are known to confer risk at a lower level (such as HDL-C, hyperemia) we plotted the additive inverse. Similarly, each participant's genetic risk score was ranked according to percentiles into five categories. The 'gridiron plot' (Figure 3C) then showed the relationship between estimates, and allowed an individual to immediately see for which classes of disease they have an increased, reduced, or discordant risk. They could then consult individual clinical and genetic measures (Figure 3A, B) to discover exactly which attributes they may consider in developing a health action plan.
Clinical and genomic measures of eight participants in the CHDWB
In general, clinical parameters show much higher inter-individual than intra-individual variance, shown in Figure 1 by the progression of values over time (left to right) for four traits in each of the eight individuals in the study. Trends were in a hopeful direction for several of the participants . For example, BMI dropped consistently for the most overweight individual (CHD-3, Figure 1A). Occasionally, one or more individuals showed hypervariable measurements over time (triglyceride levels for CHD-8; Figure 1B), but for the purposes of clinical risk evaluation, average values reflect the rank order of individuals, and are sufficient to place individuals in broad categories, from very low (light blue shading) to very high (pink).
SNP, indel and copy number variant (CNV) or structural polymorphism called by the Illumina CASAVA algorithm, as well as the number of homozygous coding variants and predicted splice site variants. are summarized in Table 1. Each individual had an average of almost 3.7 million SNPs, 623,000 indels, and 4,100 CNVs, consistent with published estimates of polymorphism from WGS [28, 29] also summarized in the Table.
Genotypic risk prediction
Genotypic risk assessments were generated for each of the participants, and are presented as 'risk-o-gram' plots (see Additional file 5). Baseline pre-test odds were simply taken from published epidemiological data on gender, age, and ethnicity-adjusted disease prevalence . The number of SNPs per disease or trait ranged from 1 to 66 (for Crohn's disease), with an average of 9 and median of 4, where 25 conditions were evaluated from at least 10 SNPs. Representative short risk-o-grams are shown for only the common diseases for two individuals on the linear scale (Figure 2).
Because not all the individuals within a population of the same sex, age, and with the same environment have the same overall risk, we propose that pre-test odds can be conditioned on an individual's clinical profiles. These partially capture the lifetime of individual-specific genetic and environmental factors that have shaped that individual's health status. An intuitive way to perform this conditioning is to modify the baseline risk using a multiplier that is a function of the heritability, along with each individual's z-scores for relevant clinical parameters that are related to the specific disease.
A simple implementation of the approach resulted in the further modified risk-o-grams (Figure 2C,D). In this case, there was no change in relative genotypic risk for a given disease, but the absolute risk predictionswere in some cases modified substantially, and as a result of this, the rank order of diseases changed. For example, the hypertension and hypertriglyceridemia profiles were reversed for individual CHD-5 on the left, because the clinical data implies low blood pressure but high serum triglycerides, despite the currently understood genetic risk. Triglyceride levels were even higher than expected for CHD-8 on the right, which contributed to a higher T2D FRS, but other factors have moderated this individual's CAD risk. Our representations (Figure 2) are provided as illustrations of the principle of how genotypic risks can be adjusted according to clinical status, and should be interpreted in light of the numerous methodological challenges that need to be addressed. Further research is required to refine the implementation, and establish whether or not this will or should affect an individual's health behavior choices and/or future clinical outcomes.
Comparison of clinical and genotypic risk assessments
An alternative to computing overall disease probabilities by combining both genotypic and clinical LRs is to report both evaluations in a simple graphical manner (Figure 3). Commercial providers of personal genome services currently present results in browsable disease-by-disease or gene-by-gene formats, which do not lead themselves to data integration, and arguably either overwhelm individuals or focus attention on a few key results. By contrast, our proposal is to present participants with summaries of their risk factors and biological indicators across different aspects of wellness. We developed this notion in the form of a gridiron plot of risk (Figure 3C) assessed in the aforementioned eight domains of health along the x-axis, from both clinical (y-axis) and genotypic (z-axis) data. These eight proposed domains reflect concerns that most relatively healthy middle-aged people have about joint and back pain, body weight, infections, and irritability or sleep deficits, few of which are directly measured by GWAS studies, yet can conceivably be related to GWAS disease results. These domains correspond to established categories used to classify disease, such as those in the International Classification of Diseases (ICD), but a more refined classification might be based on the human disease network built around shared genetic etiology . Additional domains can be considered, such as reproductive health, eyesight and hearing, and renal health. These domains are most likely to be of concern for subscribers to wellness and/or preventative care programs, such as the CHDWB, in which health maintenance and disease prevention take medical priority.
One individual, CHD-5, (Figure 3C) is a very fit Caucasian woman in her 50s. There was no indication of great concern for the musculoskeletal, respiratory, cognitive, psychiatric and oncological domains, but interesting findings were suggested by the first three domains to the left of the gridiron. Her metabolic profiles showed general concordance, as her low BMI and low percentage of body fat, were matched by her genotypic risk scores for metabolic disease and T2D in the low to moderate range. There was also clear concordance between her clinical and genotypic profiles in the cardiovascular disease (CVD) domain, as she was shown to have moderately high risks of CAD, myocardial infarction, and stroke (each approximately two-fold ORs combined from a total of 80 SNPs), and her standard measures of cardiovascular health (augmentation index of arterial stiffness, hyperemia, measure of peripheral oxygenation) both placed her in the high-risk category, contrary to the Framingham risk assessmentthat generated her revised estimate in Figure 2 (black dot). In the immunological domain, she had few indicators of immune impairment other than self-reported intestinal complaint, so it is interesting to note her genotypic risk for ulcerative colitis was relatively high, whereas that for Crohn's disease was low. The utility of the gridiron plot is that it is designed to help the individual pay attention to specific aspects of their health. A drawback of the necessarily superficial summarization is that conflicting genotypic (or clinical) assessments within a domain can cancel each other out. More detailed representation of how the data types can be combined and presented in each domain, across multiple individuals, is shown (Figure 3A, B).
Perusal of Table 2 indicates many examples where common variant risk evaluation was concordant with clinical data (that is, the two types of risk are in the same direction), with an approximate 2:1 ratio of concordance to discordance. For example, for the four individuals with consistently low weight and BMI measures (CHD-1, CHD-4, CHD-5, and CHD-7) were all found to have a genotypic risk for obesity that was below the population average (see also Figure 3A), although the overweight individual CHD-3 had a slightly increased genetic risk for obesity. For triglycerides, the evaluation was split, with two of the three individuals with very high triglyceride levels (CHD-6 and CHD-8) also showing a greatly increased genetic risk, whereas CHD-3 showed a reduced genetic risk. There were no diabetic subjects in the sample, but the two individuals with very high T2D risk had normal fasting glucose and insulin levels at the time of assessment. For CAD (Figure 3B), the three individuals with the highest genotypic risk had variable clinical profiles: their FRS for CAD were not particularly high, and two of them were at the opposite ends of the augmentation index and hyperemia score ranges (CHD-5 being at high risk, and CHD-7 at low risk, by both criteria), while the third had intermediate clinical CAD-related scores. The results for hypertension were less concordant, possibly owing to the small number of variants considered, but it is noteworthy that the individual with very high blood pressure (CHD-3) had no obvious genetic risk. This may be a case where this individual's lifestyle is a major component of her risk, and notably her systolic blood pressure has dropped consistently over the first 2 years that she has been in the CHDWB program (Figure 1C). Note that these analyses only include highly significant genotypes from GWAS that have been independently replicated, thus they capture a minor proportion of the suspected genetic variance, and consequently there is not a strong expectation at this stage of genomic medicine that the relatively small genotypic samples should be predictive [31, 32].
Individual CHD-1 is a woman in her 50s who might be described as 'super-fit' with a body fat percentage under 20% and one of the highest maximal oxygen consumption (VO2max) estimates in the entire cohort. Her only current health concern at the time of assessment was recurrent bladder or kidney infections for which she takes medications, but nothing in her genetic profile pointed to this condition. She had increased common variant predictionsfor T2D and age-related macular degeneration, and a slight increase in CAD-related traits. She is homozygous for the ApoE2 genotype (present in <1% of Caucasians), which is protective against late-onset Alzheimer's disease, but leads to type III hyperlipoproteinemia in 2% of cases, thereby increasing risk of atherosclerosis .
Individual CHD-2 is a man in his mid-40s, also with a very high fitness level, although his low-density lipoprotein cholesterol (LDL-C) levels are toward the upper end of the range typically observed in healthy people. He had some digestive concerns, and an increased T2D prediction. While depression was suggested by his genotype (although very little of the variance in the population for depression is explained by common variants at this time) and his siblingsare affected by depression, his mental health score shows no sign of depression or anxiety.
Individual CHD-3 is a woman in her 60s who has high blood pressure and cholesterol, is classified as obese, and has had cancer. Her risk-o-gram was concordant with obesity, CAD, and stroke, and she is also apparently at increased risk for asthma, Parkinson's disease, Alzheimer's disease, and Paget's bone disease, among others. She appears to be someone for whom genomic medicine alongside longitudinal clinical profiling could have important implications for health maintenance.
Individual CHD-4 is another very fit woman in her 40s, whose primary health concern is allergies, which run in three generations of her family. Like many in the study, she takes supplements for heart and bone health, which may offer some protection against her increased risk of CAD and Paget's disease from common variants. She has a family history of cancer, and increased breast cancer risk was indicated genetically, so careful surveillance may be advisable.
Individual CHD-5 is a woman in her late 50s, whose profiles are highlighted in Figure 5. We observed strong concordance of cardiovascular genetic and clinical risk, as well as a history of intestinal issues that would be consistent with a genotypic liability to ulcerative colitis. However, her genetically increased T2D risk not indicated by her excellent fitness, low BMI, and normal clinical indicators of diabetes. Two rare variants (not shown) suggest visual impairment and color vision deficiency, but there is no indication that these are issues for this woman.
Individual CHD-6 is a relatively younger man in excellent health except for triglyceride levels at the high extreme for the entire cohort, which is consistent with the very strong genetic prediction of hypertriglyceridemia from his common variants.
Individual CHD-7 is a man in his early 60s whose most distinguishing clinical feature was that he had the lowest triglyceride levels in the cohort, and he also had remarkably low signs of inflammation, in that his interleukin (IL)-6, IL-8, and tumor necrosis factor-α levels were all in the bottom few percent of the CHDWB sample, and his neutrophil-to-lymphocyte ratio was also low. Deep analysis of his genome may be revealing with respect to the mechanisms responsible for his low level of inflammation.
Individual CHD-8 is a man in his 40s who was discordant for multiple indicators of heart disease, including high diastolic blood pressure, arterial stiffness, and serum lipids, combined with a high FRS for CAD. These were only mildly indicated by common variant evaluation, but rare homozygous variants have been linked to cardiomyopathy and to carotid stenosis or thromboembolism. Alcoholism was reported in his family, so this person certainly is a good candidate for careful clinical and possibly genetic consideration in development of his health behaviors.
It is inevitable that genome sequence information will be incorporated into individualized medical care over the next couple of decades, but just how it will be utilized remains to be seen. A spectrum of applications ranging from explanation of rare conditions at or before birth to enhancement of medical interventions is likely, and to some extent, genome-wide data may be used to predict and potentially help prevent early onset of chronic disease. Clinicians already utilize family history and clinical information for disease prognosis and diagnosis in a similar manner, while recognizing that these are also not individually definitive indicators of the likelihood of disease progression. Family history and polygenic genotype scores from SNPs identified by GWAS have similar predictive ability for common diseases, but genotypes already outperform family history for many rare conditions .
The holistic 'total evidence' approach to integration of clinical and genetic factors in medical evaluation will surely see dramatic improvements in the near future, and will be advanced by developments in several aspects of genomic risk assessment. First, there is ample room for improvement of baseline risk assessment. In Figure 2, we proposed that adjustment of the population prevalence by clinical status has the potential to directly integrate genetic and clinical risk prediction. We emphasize that more research needs to be carried out before this strategy can be considered to be robust, and that medical utility remains to be demonstrated. Note that the OR approach to computation of genetic risk is just one of several methods that could be used. It improves on simple allelic sum measures through the incorporation of allele frequency and effect sizes in the computation, but does not account for epistatic or genotype × environment interactions, and it is not yet clear how well it captures the actual distribution of genetic liability for common disease. In addition, there are important theoretical issues surrounding the computation of genetic risk [35, 36], particularly in populations of mixed ancestry. Most importantly, GWAS have as yet discovered only a minority of the variants that contribute to any given disease, in most cases explaining no more than 15% of the risk. This amount of explained variance does not translate into significant risk prediction, for example by receiver operating characteristicanalysis , although there are reasons to believe that it does classify individuals who are toward the tails of the distribution. Even in the presence of complete knowledge of the genetic contributions, risk prediction is limited to the square root of the heritability, but we emphasize that none of the scores available to date approach this limit. As the sample sizes of GWAS continue to increase to hundreds of thousands of cases for more common diseases, expanding discovery from dozens to hundreds of loci, genotypic risk assessment will certainly improve .
This study was conceived as a pilot investigation of how WGS may be utilized in the context of health maintenance. Participants in the CHDWB interact regularly with a health partner who helps them to interpret their clinical profiles in the context of their own medical issues, and to develop a health action plan . These typically focus on diet, exercise, and stress reduction, but can include specific attention to issues such as low bone density, high blood pressure, or loneliness, and/or lead to physician referral for indications of previously unrecognized heart or metabolic disease. Across the full cohort of almost 700 participants, there are encouraging trends toward improved wellness [23, 24], and this is clear for some of the individuals reported here, in terms of significant reduction in BMI and inflammation. It will, however, take prospective and longitudinal studies to evaluate whether wellness genomic profiling is beneficial either to individuals (in terms of maintained wellness) or as a matter of public policy (reduced healthcare costs, improved employee performance).
We do not currently have IRB approval to share the genomic profiles with the participants, so cannot yet evaluate how self-knowledge of gene sequences might also affect health behavior. Instead, we propose a strategy for presenting the diverse data types in a manner that we suspect will help individuals see connections between their genetics, their clinical profiles, and their own health perception. In the short term, the utility of the approach is more likely to be measurable in terms of modification of health behaviors than in economic or life-long health benefits. We show a modified version of Figure 3 (see Additional file 6) that captures the type of data that may be most influential for a hypothetical individual, where an overall view of the health domains is associated with an individual's genotypic and clinical scores relative to the population, along with a list of rare variants of interest. A recent study  suggests that clinical geneticists are reluctant to report incidental findings on genetic mutations to patients unless the mutation is known to be pathogenic. However, because the expectations are different in the context of wellness, where subjects actively seek data, we envision that a physician, genetic counselor, or health partner would discuss the summary and appropriate specific details of the evidence with the individual, who would thus be empowered to consider whether they should act upon the genetic eviden.
Clearly, our ability to integrate genomics into health maintenance will improve with experience and the incorporation of more data, including environmental exposures and behaviors. Family history of disease and presence of rare deleterious variants are two obvious types of information that will be highly relevant: risk assessment relative to other family members who have been genotyped will allow a person to evaluate where they stand, given the known sources of variance in the family [16, 35], and de novo mutations may sometimes explain specific discontinuity between clinical and common variant assessments. We are also gathering data on the metabolome, transcriptome, and epigenome for these eight individuals, and expect these functional genomic data types to provide complementary information that we will evaluate later.
This pilot study of eight individuals from the CHDWB proposes two approaches for combining and conditioning clinical and genetic profiles, which could facilitate longitudinal evaluation of wellness-focused medical care based on comprehensive self-knowledge of medical risks. The study shows an excess of concordance between genetic prediction and observed sub-clinical disease. Further, we illustrate how more holistic combination of genetic and clinical data can be achieved by visualizing risk in sub-classes of disease. The visualization of concordance and discordance in the genetic and clinical profiles might help develop personalized health action plans in consultation with a health partner. We acknowledge that the data presented here falls short of the gold standards of evidence of inference that are typically required in genetic analysis of causation, but argue that the objective of 'personalized genomics' is not necessarily to predict disease with any certainty, but rather to provide another line of evidence that physicians and other medical practitioners can consider in their interactions with patients. Ultimately, the utility of the approach described here will require prospective evaluation in a cohort of healthy adults followed longitudinally for decades. As the volume of personalized information increases, the issue of who will be responsible for interpreting and explaining the assessments to individuals becomes more acute, and suggests the need for training of a new class of genomic healthcare professional and development of novel ways to present the information.
Body mass index
Coronary artery disease
Consensus Assessment of Sequence And Variation
Center for Health Discovery and Well Being
copy number variant
Framingham Risk Scores
Genome-wide association studies
Institutional review board
Type 1 diabetes
type 2 diabetes
Variants Informing Medicine
whole genome sequencing.
Thompson R, Drew CJ, Thomas RH: Next generation sequencing in the clinical domain: clinical advantages, practical, and ethical challenges. Adv Protein Chem Struct Biol. 2012, 89: 27-63.
Bick D, Dimmock D: Whole exome and whole genome sequencing. Curr Opin Pediatr. 2011, 23: 594-600. 10.1097/MOP.0b013e32834b20ec.
O'Roak BJ, Vives L, Girirajan S, Karakoc E, Krumm N, Coe BP, Levy R, Ko A, Lee C, Smith JD, Turner EH, Stanaway IB, Vernot B, Malig M, Baker C, Reilly B, Akey JM, Borenstein E, Rieder MJ, Nickerson DA, Bernier R, Shendure J, Eichler EE: Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature. 2012, 485: 246-250. 10.1038/nature10989.
Neale BM, Kou Y, Liu L, Ma'ayan A, Samocha KE, Sabo A, Lin CF, Stevens C, Wang LS, Makarov V, Polak P, Yoon S, Maguire J, Crawford EL, Campbell NG, Geller ET, Valladares O, Schafer C, Liu H, Zhao T, Cai G, Lihm J, Dannenfelser R, Jabado O, Peralta Z, Nagaswamy U, Muzny D, Reid JG, Newsham I, Wu Y, Lewis L, Han Y, Voight BF, Lim E, Rossin E, Kirby A, Flannick J, Fromer M, Shakir K, Fennell T, Garimella K, Banks E, Poplin R, Gabriel S, DePristo M, Wimbish JR, Boone BE, Levy SE, Betancur C, Sunyaev S, Boerwinkle E, Buxbaum JD, Cook EH, Devlin B, Gibbs RA, Roeder K, Schellenberg GD, Sutcliffe JS, Daly MJ: Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature. 2012, 485: 242-245. 10.1038/nature11011.
Sanders SJ, Murtha MT, Gupta AR, Murdoch JD, Raubeson MJ, Willsey AJ, Ercan-Sencicek AG, DiLullo NM, Parikshak NN, Stein JL, Walker MF, Ober GT, Teran NA, Song Y, El-Fishawy P, Murtha RC, Choi M, Overton JD, Bjornson RD, Carriero NJ, Meyer KA, Bilguvar K, Mane SM, Sestan N, Lifton RP, Günel M, Roeder K, Geschwind DH, Devlin B, State MW: De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature. 2012, 485: 237-241. 10.1038/nature10945.
Chou J, Ohsumi TK, Geha RS: Use of whole exome and genome sequencing in the identification of genetic causes of primary immunodeficiencies. Curr Opin Allergy Clin Immunol. 2012, 12: 623-628. 10.1097/ACI.0b013e3283588ca6.
Ghosh S, Krux F, Binder V, Gombert M, Niehues T, Feyen O, Laws HJ, Borkhardt A: PID-NET: German Network on Primary Immunodeficiency Diseases: Array-based sequence capture and next-generation sequencing for the identification of primary immunodeficiencies. Scand J Immunol. 2012, 75: 350-354. 10.1111/j.1365-3083.2011.02658.x.
Need AC, Shashi V, Hitomi Y, Schoch K, Shianna KV, McDonald MT, Meisler MH, Goldstein DB: Clinical application of exome sequencing in undiagnosed genetic conditions. J Med Genet. 2012, 49: 353-361. 10.1136/jmedgenet-2012-100819.
Visscher PM, Brown MA, McCarthy MI, Yang J: Five years of GWAS discovery. Am J Hum Genet. 2012, 90: 7-24. 10.1016/j.ajhg.2011.11.029.
Ku CS, Loy EY, Pawitan Y, Chia KS: The pursuit of genome-wide association studies: where are we now?. J Hum Genet. 2010, 55: 195-206. 10.1038/jhg.2010.19.
Meigs JB, Shrader P, Sullivan LM, McAteer JB, Fox CS, Dupuis J, Manning AK, Florez JC, Wilson PW, D'Agostino RB, Cupples LA: Genotype score in addition to common risk factors for prediction of type 2 diabetes. N Engl J Med. 2008, 359: 2208-2219. 10.1056/NEJMoa0804742.
Lyssenko V, Jonsson A, Almgren P, Pulizzi N, Isomaa B, Tuomi T, Berglund G, Altshuler D, Nilsson P, Groop L: Clinical risk factors, DNA variants, and the development of type 2 diabetes. N Engl J Med. 2008, 359: 2220-2232. 10.1056/NEJMoa0801869.
Feero WG, Guttmacher AE, Collins FS: Genomic medicine - an updated primer. N Engl J Med. 2010, 362: 2001-2011. 10.1056/NEJMra0907175.
Chen R, Butte AJ: The reference human genome demonstrates high risk of type 1 diabetes and other disorders. Pac Symp Biocomput. 2011, 231-242.
Ashley EA, Butte AJ, Wheeler MT, Chen R, Klein TE, Dewey FE, Dudley JT, Ormond KE, Pavlovic A, Morgan AA, Pushkarev D, Neff NF, Hudgins L, Gong L, Hodges LM, Berlin DS, Thorn CF, Sangkuhl K, Hebert JM, Woon M, Sagreiya H, Whaley R, Knowles JW, Chou MF, Thakuria JV, Rosenbaum AM, Zaranek AW, Church GM, Greely HT, Quake SR, Altman RB: Clinical assessment incorporating a personal genome. Lancet. 2010, 375: 1525-1535. 10.1016/S0140-6736(10)60452-7.
Dewey FE, Chen R, Cordero SP, Ormond KE, Caleshu C, Karczewski KJ, Whirl-Carrillo M, Wheeler MT, Dudley JT, Byrnes JK, Cornejo OE, Knowles JW, Woon M, Sangkuhl K, Gong L, Thorn CF, Hebert JM, Capriotti E, David SP, Pavlovic A, West A, Thakuria JV, Ball MP, Zaranek AW, Rehm HL, Church GM, West JS, Bustamante CD, Snyder M, Altman RB, Klein TE, Butte AJ, Ashley EA: Phased whole-genome genetic risk in a family quartet using a major allele reference sequence. PLoS Genet. 2011, 7: e1002280-10.1371/journal.pgen.1002280.
Chen R, Mias GI, Li-Pook-Than J, Jiang L, Lam HY, Chen R, Miriami E, Karczewski KJ, Hariharan M, Dewey FE, Cheng Y, Clark MJ, Im H, Habegger L, Balasubramanian S, O'Huallachain M, Dudley JT, Hillenmeyer S, Haraksingh R, Sharon D, Euskirchen G, Lacroute P, Bettinger K, Boyle AP, Kasowski M, Grubert F, Seki S, Garcia M, Whirl-Carrillo M, Gallardo M, Blasco MA, Greenberg PL, Snyder P, Klein TE, Altman RB, Butte AJ, Ashley EA, Gerstein M, Nadeau KC, Tang H, Snyder M: Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell. 2012, 148: 1293-1307. 10.1016/j.cell.2012.02.009.
Gupta R, Ratan A, Rajesh C, Chen R, Kim HL, Burhans R, Miller W, Santhosh S, Davuluri RV, Butte A, Schuster SC, Seshagiri S, Thomas G: Sequencing and analysis of a South Asian-Indian personal genome. BMC Genomics. 2012, 13: 440-10.1186/1471-2164-13-440.
Chen R, Corona E, Sikora M, Dudley JT, Morgan AA, Moreno-Estrada A, Nilsen GB, Ruau D, Lincoln SE, Bustamante CD, Butte AJ: Type 2 diabetes risk alleles demonstrate extreme directional differentiation among human populations, compared to other diseases. PLoS Genet. 2012, 8: e1002621-10.1371/journal.pgen.1002621.
Goetzel RZ, Ozminkowski RJ: The health and cost benefits of work site health-promotion programs. Annu Rev Public Health. 2008, 29: 303-323. 10.1146/annurev.publhealth.29.020907.090930.
Omenn GS: Overview of the symposium on public health significance of genomics and eco-genetics. Annu Rev Public Health. 2010, 31: 1-8. 10.1146/annurev.publhealth.012809.103639.
Healthy People 2020 Framework: The vision, mission, and goals of Health People 2020. 2012, Accessed October 30, [http://www.healthypeple.gov]
Brigham KL: Predictive health: the imminent revolution in health care. J Am Geriatr Soc. 2010, 58 (Suppl 2): S298-302.
Rask KJ, Brigham KL, Johns MME: Integrating comparative effectiveness research programs into predictive health: A unique role for academic health centers. Acad Med. 2011, 86: 718-723. 10.1097/ACM.0b013e318217ea6c.
D'Agostino RB, Vasan RS, Pencina MJ, Wolf PA, Cobain M, Massaro JM, Kannel WB: General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation. 2008, 117: 743-753. 10.1161/CIRCULATIONAHA.107.699579.
Framingham Heart Study. Risk score profiles. [http://www.framinghamheartstudy.org/risk/index.html]
Morgan AA, Chen R, Butte AJ: Likelihood ratios for genome medicine. Genome Med. 2010, 2: 30-10.1186/gm151.
Pelak K, Shianna KV, Ge D, Maia JM, Zhu M, Smith JP, Cirulli ET, Fellay J, Dickson SP, Gumbs CE, Heinzen EL, Need AC, Ruzzo EK, Singh A, Campbell CR, Hong LK, Lornsen KA, McKenzie AM, Sobreira NL, Hoover-Fong JE, Milner JD, Ottman R, Haynes BF, Goedert JJ, Goldstein DB: The characterization of twenty sequenced human genomes. PLoS Genet. 2010, 6: e1001111-10.1371/journal.pgen.1001111.
Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, Gravel S, McGee S, Do R, Liu X, Jun G, Kang HM, Jordan D, Leal SM, Gabriel S, Rieder MJ, Abecasis G, Altshuler D, Nickerson DA, Boerwinkle E, Sunyaev S, Bustamante CD, Bamshad MJ, Akey JM, Broad GO, Seattle GO, NHLBI Exome Sequencing Project: Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012, 337: 64-69. 10.1126/science.1219240.
Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL: The human disease network. Proc Natl Acad Sci USA. 2007, 104: 8685-8690. 10.1073/pnas.0701361104.
Wray NR, Goddard ME, Visscher PM: Prediction of individual genetic risk of complex disease. Curr Opin Genet Dev. 2008, 18: 257-263. 10.1016/j.gde.2008.07.006.
Zhao J, Arafat D, Gibson G: Genetic risk prediction in a small cohort of healthy adults in Atlanta. Genet Res (Camb). 2013, 95: 30-37. 10.1017/S0016672313000025.
Breslow JL, Zannis VI, SanGiacomo TR, Third JL, Tracy T, Glueck CJ: Studies of familial type III hyperlipoproteinemia using as a genetic marker the apoE phenotype E2/2. J. Lipid Res. 1982, 23: 1224-1235.
Do CB, Hinds DA, Francke U, Eriksson N: Comparison of family history and SNPs for predicting risk of complex disease. PLoS Genet. 2012, 8: e1002973-10.1371/journal.pgen.1002973.
Evans DM, Visscher PM, Wray NR: Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk. Hum Mol Genet. 2009, 18: 3525-3531. 10.1093/hmg/ddp295.
Kruppa J, Ziegler A, König IR: Risk estimation and risk prediction using machine-learning methods. Hum Genet. 2012, 131: 1639-1654. 10.1007/s00439-012-1194-y.
Wray NR, Yang J, Goddard ME, Visscher PM: The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genetics. 2010, 6: e1000864-10.1371/journal.pgen.1000864.
Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, Willer CJ, Jackson AU, Vedantam S, Raychaudhuri S, Ferreira T, Wood AR, Weyant RJ, Segrè AV, Speliotes EK, Wheeler E, Soranzo N, Park JH, Yang J, Gudbjartsson D, Heard-Costa NL, Randall JC, Qi L, Vernon Smith A, Mägi R, Pastinen T, Liang L, Heid IM, Luan J, Thorleifsson G, Winkler TW, et al: Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010, 467: 832-838. 10.1038/nature09410.
Green RC, Berg JS, Berry GT, Biesecker LG, Dimmock DP, Evans JP, Grody WW, Hedge MR, Kalia S, Korf BR, Krantz I, McGuire AL, Miller DT, Murray MF, Nussbaum RL, Plon SE, Rehm HL, Jacob HJ: Exploring concordance and discordance for return of incidental findings from clinical sequencing. Genet Med. 2012, 14: 405-410. 10.1038/gim.2012.21.
Shen H, Li J, Zhang J, Xu C, Jiang Y, Wu Z, Zhao F, Liao L, Chen J, Lin Y, Tian Q, Papasian CJ, Deng HW: Comprehensive characterization of human genome variation by high coverage whole-genome sequencing of forty four Caucasians. PLoS One. 2013, 8: e59494-10.1371/journal.pone.0059494.
We particularly thank the staff and participants of the Center for Health Discovery and Well Being, which is supported by Emory University and the Georgia Institute of Technology. This work was funded by start-up funds from the Georgia Tech Research Foundation to GG, whose laboratory generated the genomic sequence data. CP is supported by a post-doctoral fellowship from the Stanford Prevention Research Center (NHLBI T32 T32 HL007034). AJB is supported by the Lucile Packard Foundation for Children's Health. RC is an employee of Personalis, Inc., a genetic testing company. AM is a consultant to Personalis Inc. AB is a founder and consultant to Personalis, Inc. Stanford University holds the intellectual property on any genotypic risk assessment technologies described in the paper that may be licensed to various companies.
AB is a founder and consultant to Personalis, Inc., a genetic testing company. RC is an employee of, and AM is a consultant to, Personalis, Inc. The remaining authors declare that they have no competing interests. Stanford University holds the intellectual property on genotypic risk assessment technologies described in the manuscript that may be licensed to various companies, but the data visualization strategies presented in Figure 3 and Additional File 6 are not intellectual property of Personalis, Inc. The CHDWB also offers fee-for-service clinical assessment and health partner evaluation for a small number of individuals not included in this study.
CJP, AS, and RT performed all of the genome analysis and risk-assessment computations, and wrote the paper with GG. TP analyzed the potential function of amino acid mutations. JZ provided support for genome sequence feature extraction. DA prepared the DNA samples for sequencing. RC and AM provided software, databases, and advice for risk assessment. GM and KB direct the CHDWB, and performed clinical assessments. AJB, CJP, and GG conceived the study approach and supervised all analyses. All authors read and approved the final manuscript.
Chirag J Patel, Ambily Sivadas, Atul J Butte and Greg Gibson contributed equally to this work.