Personalized prediction of circadian time from gene expression in human blood

Measuring the body’s response to treatments for circadian dysfunction, as well as optimizing the daily timing of treatments for other health conditions, requires a method for accurately tracking the state of a person’s circadian clock. We used a recently developed method called ZeitZeiger to predict circadian time (CT, time of day according to the circadian clock) from genome­wide gene expression in human blood. In 10­fold cross­validation on 498 samples from 60 individuals across three publicly available datasets, ZeitZeiger predicted CT with a median absolute error of 2.1 h. The predictor trained on all 498 samples used 15 genes, only two of which are part of the core circadian clock. We then extended ZeitZeiger to make predictions for groups of samples and to personalize predictions using samples from only the respective individual. Each of these strategies improved accuracy by ~20%. Our results are an important step towards precision circadian medicine.


Introduction
Much of human physiology, from sleep to immune function, has a daily rhythm (1) .Driving many of these rhythms is a cellautonomous molecular oscillator called the circadian clock, which is present in nearly every tissue in the body (2) .In animal models, disrupting the circadian system genetically or environmentally can have a wide range of phenotypic consequences (3)(4)(5) .In humans, circadian dysfunction is linked to a number of health conditions, including cancer (6) , major depressive disorder (7) , and obesity (8) .At least some of the circadian dysfunction in humans seems to be a result of multiple features of the modern environment, e.g., shift work and reduced exposure to sunlight (9,10) .Consequently, improving circadian function by photic, behavioral, or other means, which has been called chronomedicine, could have a large positive effect on human health (11) .
At the same time, increasing evidence suggests that circadian rhythms influence the efficacy and toxicity of therapeutics (12,13) as well as the metabolic consequences of food intake (14) .For example, over half of the 100 bestselling drugs in the U.S. target a protein whose mRNA in mice shows a circadian rhythm in at least one organ (15) .Using knowledge of the body's 1/28 circadian rhythms to optimize the timing of interventions has been called chronotherapy (16,17) .
Unfortunately, largescale implementation of chronotherapy may be more difficult than first thought, as multiple lines of evidence suggest that at any given time of day, different individuals' circadian rhythms are at different points in the cycle.For instance, circadian phase of entrainment (as measured by the Munich Chronotype Questionnaire) varies highly between individuals (18) , as well as with age and between dayworkers and shiftworkers (19) .Furthermore, recent work found that diurnal preference correlated with the circadian phase of clock gene expression in hair follicle cells (20) .These observations imply that the optimal timing of a given intervention may vary from one person to another.Thus, one critical component of both chronotherapy and chronomedicine, which together might be called precision circadian medicine, is a method to determine the state of a person's circadian clock(s), ideally in realtime.For chronotherapy, such a method would be an input; for chronomedicine, an output.One method for detecting the state of the clock in humans is to measure the sleepwake rhythm, either by questionnaires, sleep logs, or actigraphy (21) .Although measuring the sleepwake rhythm is noninvasive and has led to valuable insights (22,23) , sleepwake is also influenced by noncircadian processes and seems to have a complex relationship with the various clocks throughout the body (24) .
The other common method for assessing the state of the clock is to measure melatonin in either plasma or saliva.In particular, the dim light melatonin onset (DLMO) is the gold standard for circadian phase (25)(26)(27) .Determining DLMO, however, requires collecting many samples under controlled conditions over at least several hours, making it impractical for widespread use or for monitoring the circadian clock in real time.Furthermore, DLMO only reflects the phase of the central clock in the suprachiasmatic nucleus (which controls secretion of melatonin by the pineal gland), making it unable to report on clocks in other tissues.In an effort to address these limitations, Ueda et al. previously developed the moleculartimetable method and used it to estimate CT from metabolite levels in human blood (28)(29)(30) .
More recently, we developed a machine learning method called ZeitZeiger and used it to achieve stateoftheart prediction accuracy of CT in multiple mouse organs (31) .ZeitZeiger's prediction accuracy is enabled by three important features: fitting periodic behavior using splines instead of sinusoids, focusing on the variation in the data associated with the periodic variable of interest (e.g., time of day), and using regularization to prevent overfitting.
Here we applied ZeitZeiger to three publicly available datasets of genomewide circadian gene expression in human blood.We found that ZeitZeiger learned to use a small set of genes to accurately predict the CT of a single sample.This allowed ZeitZeiger to detect how circadian gene expression is affected by various perturbations to the lightdark and sleepwake cycles.We then investigated two ways to improve prediction accuracy: first, by using groups of samples, and second, by combining the initial prediction with that from a personal predictor 2/28 trained only on samples from the respective individual.Our results are an important step towards precision circadian medicine.

Predicting circadian time of single samples in three datasets
We assembled three publicly available datasets of genomewide gene expression in human blood (Table 1) (32)(33)(34) .Each dataset consisted of ~8 control samples taken throughout the day for each individual.We merged and batchcorrected the gene expression measurements (35,36) and standardized the time of day values (Materials and Methods).
Using the combined data, we then performed 10fold crossvalidation, in which ZeitZeiger learned to predict a sample's CT based on its gene expression.We ran crossvalidation with a range of values for ZeitZeiger's two main parameters, sumabsv (which controls the amount of regularization) and nSPC (which controls how many sparse principal components, SPCs, are used for prediction).Samples from the same individual were always in the same fold.
We evaluated the results of crossvalidation in terms of absolute error (absolute difference between predicted and observed CT; Fig. 1A).The optimal parameter values achieved a median absolute error of 2.1 h (interquartile range 2.8 h).The expected absolute error of a random predictor is 6 h.Similar to our experience predicting CT using gene expression in mice (31) , prediction accuracy plateaued at sumabsv=2 and nSPC=2.Prediction accuracy was similar across the three datasets, although somewhat worse in samples from GSE56931 (Fig. 1B).On average, the predictors from crossvalidation trained with sumabsv=2 and nSPC=2 were based on the expression of 15 genes (Fig. 1C).These results suggest that ZeitZeiger can use the expression of a small number of genes in human blood to accurately predict circadian time from a single sample.
Using the parameter values sumabsv=2 and nSPC=2, we trained a predictor on all control samples from the three datasets.The SPCs calculated by ZeitZeiger, each of which is a linear combination of genes, are designed to explain variation in gene expression associated with circadian time and are discouraged from being correlated with each other.Indeed, the predictor's two SPCs showed times of peak expression that were shifted from each other by ~6 h (Fig. 2A), similar to the multiorgan predictor of CT that we trained from gene expression in mice (31) .Interestingly, however, the expression of SPC 1 as a function of CT was markedly nonsinusoidal.Moreover, of the 15 genes that formed the two SPCs (Fig. 2B), only two, NR1D2 ( REVERBβ ) and PER1 , are thought to be part of the core circadian clock.Consistent with this observation, the signaltonoise ratio of circadian rhythmicity was generally lower for clock genes than for the 15 genes in the predictor (fig.S1).When we allowed ZeitZeiger to predict CT using only core clock genes, absolute error on crossvalidation increased by a median of 27% 3/28 (P = 7⋅ 10 6 by paired Wilcoxon ranksum test; fig.S2), demonstrating ZeitZeiger's ability to select the most informative genes.

Analyzing the effects of sleep perturbations
In addition to including samples from a control condition, each dataset also included samples from a condition in which sleep (and the lightdark cycle) was somehow perturbed.To investigate how the sleep perturbations would affect predictions of CT, we followed a leaveonestudyout strategy, in which we trained a predictor (sumabsv=2 and nSPC=2) on control samples from two datasets, then tested on all samples from the third dataset.
We observed several effects of the sleep perturbations (fig.S3).First, six days of restricted sleep opportunity (6 h per night) worsened prediction accuracy by 16%, consistent with weaker circadian oscillations in gene expression (32) .Second, a single night of complete sleep deprivation (38 h of continuous wakefulness with lights on (34) ) appeared to induce a phase delay of 2.1 h.Finally, four 28h days (forceddesynchrony protocol in GSE48113, which decouples the sleepwake rhythm from the melatonin rhythm (33) ) induced a phase delay of 2 h and increased the variability in prediction error by 42% (based on circular standard deviation).
We also used all the samples from GSE48113 ("in phase", i.e., control, and "out of phase") to train a predictor of time relative to DLMO, obtaining prediction accuracy similar to that obtained by predicting circadian time of control samples (fig.S4AC).The predictor trained on all samples from GSE48113 (sumabsv=2 and nSPC=2) included many of the same genes that were in the predictor trained on control samples from all three datasets (fig.S4DE).We therefore focused on the control samples for the remainder of our analysis.

Predicting circadian time using multiple samples
Although ZeitZeiger was originally designed to predict the circadian time of a single sample, we wondered if prediction accuracy could be improved by using multiple samples from the same individual.We therefore extended ZeitZeiger to make predictions for groups of samples, where the time difference between each sample in the group is known (Materials and Methods).For each individual in the three datasets, we then constructed groups of two samples (taken either 89 h or 12 h apart) or three samples (taken over 12 h).We predicted the CT of each group in 10fold crossvalidation using sumabsv=2 and nSPC=2 (group information is only used during testing, not during training).
Compared to predictions based on a single sample, predictions based on two samples taken either 89 h apart were ~21% more accurate (0.43 h reduction in median absolute error; Fig. 3).Predictions based on two samples taken 12 h apart or on three samples showed a slight additional increase in accuracy (P > 0.3 by Wilcoxon ranksum test).These results suggest that ZeitZeiger can use multiple relatively noisy samples to make better predictions.

Personalizing predictions of circadian time
We previously found that ZeitZeiger can make accurate predictions even given small training sets with low time resolution (31) .Because the three datasets included multiple samples per individual (median 8.5), we wondered if ZeitZeiger could learn to accurately predict CT using only the samples from a single individual.To test this, we performed leaveonesampleout crossvalidation for each individual (sumabsv=2 and nSPC=2).Unfortunately, these personalized predictions were only slightly better than random and much worse than those from the original 10fold crossvalidation (fig.S5).
Given that the merged dataset used throughout this paper included the expression of 17,477 genes, we suspected that ~8 samples per individual might not be enough to prevent ZeitZeiger from overfitting.We therefore devised a procedure for training personal predictors with "universal guidance", which removes all features (i.e., genes) from the personal training set except for those selected by the "universal" predictor (i.e., the predictor trained on samples from multiple other individuals; Materials and Methods and Fig. 4A).
Using universal guidance, the personal predictors achieved similar accuracy on leaveonesampleout crossvalidation to the universal predictors on 10fold crossvalidation (Fig. 4BC).We then combined the universal and personal predictors into an ensemble using the circular mean (Materials and Methods and Fig. 4A).Strikingly, the ensemble predictor was ~20% more accurate than either single predictor, both on a persample and perindividual basis (Fig. 4BC and fig.S6).The improvement in accuracy was robust for predictions based on at least seven personal training samples (fig.S7).Applying this strategy of ensemble learning to groups of samples did not improve accuracy further, likely due to the small number of samples in the personal training sets (fig.S8).Taken together, these results suggest that predictions based on a large training set from multiple individuals can be finetuned by predictions based on a carefully constructed training set from the individual of interest.
To further investigate the differences in circadian gene expression between individuals, we trained one predictor for each individual, using as universal guidance the 15 genes selected in the predictor trained on all samples.We found that the subset of genes selected by ZeitZeiger varied from one personal predictor to another (fig.S9A).Furthermore, although the difference between peak times of SPC 1 and SPC 2 was largely consistent across individuals (fig.S9B), the actual value of peak times was not (fig.S9C).These results suggest that even the 15 "consensus" genes show meaningful interindividual variation in circadian expression.

Discussion
Developing treatments that improve the function of or that account for the circadian system has the potential to improve multiple areas of human health.Realizing this potential, however, requires a robust method for tracking an individual's circadian rhythm.Here we developed a 5/28 .predictor of circadian time in human blood by applying machine learning to genomewide gene expression.We demonstrated accurate prediction for single samples using a small set of genes, then developed two strategies that each improved accuracy by ~20%.
Both our strategies for improving prediction accuracy are based on the idea of an ensemble.Importantly, both rely on having multiple observations per individual.The first, using multiple samples with a known time difference between them, is effectively an ensemble of observations.The second, combining universal and personal predictors, fits the traditional definition of an ensemble in machine learning.In this case, the ensemble combines the relatively highbiaslowvariance universal predictor with the lowbiashighvariance (but not too high, thanks to universal guidance) personal predictor.This strategy for personalizing predictions is similar to an approach called customized training, which involves finding training observations that look similar to a given test observation (37) .In the future, it may be possible to personalize predictions using samples from multiple individuals (e.g., individuals with similar chronotype or phase of entrainment).
Comparing our current results in human blood to our previous results in multiple mouse organs, two main differences emerge.First, our predictions here are less accurate, a consequence of circadian gene expression in humans being noisier (although here we analyzed only blood, we have observed similar levels of noise in human brain (38) ).This increased noise is likely due to both genetic and environmental variation.The caveat is that some of the data from mice is based on tissue pooled from multiple animals, so a fair interspecies comparison of the variation in circadian gene expression in a given tissue has yet to be made.
Second, in contrast to the predictor we developed in mice, most of the genes in the human bloodbased predictor are not thought to be part of the core circadian clock.This difference is likely due to the fact that the mouse predictor was trained on data from 12 organs, which discouraged ZeitZeiger from selecting genes whose circadian expression was tissuespecific and resulted in a strong enrichment for core clock genes.In this study, the dominant gene for SPC 1 was DDIT4 (REDD1), which encodes a protein that inhibits mTOR signaling as part of the response to cellular stress (39) .The dominant gene for SPC 2 was FOSL2 (FRA2), which encodes a subunit of the AP1 transcription factor and is therefore involved in numerous aspects of cell proliferation (40) .
Because only two of the 15 genes in the predictor are part of the core clock and because circadian rhythms are normally aligned with the sleepwake (and feeding) rhythm, it seems reasonable to wonder exactly which rhythm ZeitZeiger learned to detect.For three reasons, we believe the 15gene ZeitZeiger predictor is primarily detecting of the progression of the clock.First, five of the ten core clock genes we analyzed showed a signaltonoise ratio in the top 1.5% of all genes.Second, the control condition in two of the three datasets attempted to separate the clock from sleep, either by keeping subjects in constant conditions (GSE39445) or by allowing the clock to freerun (GSE48113).Third, we obtained similar results when training ZeitZeiger on control samples from all three datasets compared to training ZeitZeiger on all 6/28 samples from the forceddesynchrony protocol of GSE48113.Further work is necessary to understand how the clock regulates the expression of the 13 noncore clock genes.Our results imply, however, that the forceddesynchrony protocol affects not just the central clock driving the melatonin rhythm, but also the peripheral clock in blood cells.
One limitation of this study is that, because not all of the datasets included individuallevel information about DLMO, we had no direct measurement for the internal time of each individual's circadian clock.Consequently, we had to train ZeitZeiger to predict the externally measured time of day.Some of the inaccuracy of the predictions could therefore be due to interindividual variation in the alignment of external and internal time of day, i.e., the phase of entrainment.Such variation could explain why the personal predictors in the ensemble improved accuracy: they helped adjust for each individual's circadian phase.Before our approach can be integrated into clinical trials for chronotherapy or chronomedicine, it will need to be validated in prospective studies.Such studies will likely involve testing ZeitZeiger and/or the 15gene set alongside melatonin and actigraphy outside the laboratory setting.
Although here we have focused on genomewide gene expression in blood, our methodology can be applied to any type of data from any tissue.We are therefore hopeful that in addition to its utility in chronotherapy and chronomedicine, ZeitZeiger will support efforts to (1) study how circadian function in various tissues is altered in pathophysiological conditions and (2) develop biomarkers for sleep and circadianrelated disorders (41) .Furthermore, as the number of observations for each individual increases, e.g., in electronic medical records, our framework for personalizing predictions may prove useful in many areas of precision medicine.

Processing time of day and other metadata
The nomenclature for time of day in chronobiology is complicated (42) .Complicating our analysis even further, each of the three datasets used a different experimental design (in particular, a different lightdark regimen) and only one of the datasets included individuallevel information for DLMO.
For both GSE48113 and GSE56931, the first samples were collected after subjects had been in the lab no more than a day.For GSE39445, the first samples were collected after subjects had been in the lab for 9 days, but for each subject, the midpoint of sleep opportunities in the lab coincided with the midpoint of sleep in that subject's habitual sleepwake schedule.In all three datasets then, the phase of each subject's circadian clock should be based largely on the natural lightdark cycle.Therefore, we calculated the time of day for each sample in each dataset (e.g., 8:00 am) relative to sunrise time, using either the dates and geographic location provided by the authors (GSE56931) or the average sunrise time in the respective geographic location (GSE39445 and GSE48113).We refer to this adjusted time of day as "circadian time".

7/28
. This approach to standardizing time of day is not perfect, but it should be reasonably accurate, and we believe combining the three datasets makes our results more robust.For GSE48113, we calculated time relative to DLMO using the average DLMO for each condition (21:59 for "in phase", 23:03 for "out of phase"), as provided in the original publication (33) .
Unless otherwise specified, we used only the samples from the control condition in each dataset.This corresponded to "sleep extension" in GSE39445, "in phase with respect to melatonin" in GSE48113, and "baseline" in GSE56931.

Processing the gene expression data
Gene expression from the three microarray datasets was processed using MetaPredict (36) ( https://github.com/jakejh/metapredict), which maps probes to Entrez Gene IDs, performs intrastudy normalization and logtransformation, and uses ComBat (35) to perform crossstudy normalization.The merged data of control samples from all three datasets consisted of 17,477 genes measured in 498 samples.

Using ZeitZeiger to predict circadian time
ZeitZeiger is a supervised learning method for periodic variables, i.e. variables that are continuous and bounded and for which the maximum value is equivalent to the minimum value.Conceptually, ZeitZeiger uses the training observations to learn a sparse representation of the variation associated with the periodic variable, then makes a prediction for a test observation using maximum likelihood (31) ( https://github.com/jakejh/zeitzeiger).
Training a ZeitZeiger predictor involves the following steps: (1) fitting a periodic smoothing spline to the intensity of each feature (e.g., the expression of each gene) as a function of the periodic variable (43) , (2) discretizing and scaling the spline fits, (3) using the discretized and scaled fits to calculate sparse principal components (SPCs; linear combinations of a small set of features) (44) , and (4) fitting a periodic smoothing spline to the intensity of each SPC as a function of the periodic variable.Making predictions requires two steps: (1) projecting the test observation from featurespace to SPCspace and (2) using the spline fits of the SPCs from the training data to perform maximum likelihood estimation.The two main parameters of ZeitZeiger are sumabsv and nSPC.The former corresponds to the amount of regularization used to L 1 calculate the SPCs, while the latter corresponds to the number of SPCs used for prediction.Other parameters of ZeitZeiger include the number of knots for spline fitting and the number of timepoints for discretization.In this study, we always used three knots (which constrains the spline's flexibility and makes it more resistant to noise) and 12 timepoints.
Tenfold crossvalidation was performed such that all samples from a given individual were in the same fold.The folds were identical when predicting CT for groups of samples and when 8/28 training universal predictors to provide universal guidance to the personal predictors in leaveonesampleout crossvalidation.Because only three datasets were available (two of which were from the same research group), we elected not to perform leaveonestudyout crossvalidation or to have a separate group of validation samples from one or more of the datasets.Instead, we only performed 10fold crossvalidation across all controls samples from all three datasets.This means we may be underestimating generalization error (perhaps cancelling out the imperfect standardization of time of day), but also makes it simpler to use all the control samples when testing strategies for improving accuracy.
We did use a leaveonestudyout strategy when analyzing the effects of sleepwake perturbations.For each dataset in turn, we trained a ZeitZeiger predictor (sumabsv=2 and nSPC=2) on only the control samples from two datasets, then tested on all samples (control and "treatment") from the third.Thus, prediction accuracies from this analysis are not directly comparable to those from 10fold crossvalidation, but within this analysis, one can still compare results for control and treatment samples within each dataset.
The signaltonoise ratio of circadian rhythmicity for gene was calculated as j where is the expression of gene as a function of time , and is the root mean squared (t) f j j t s j error of the periodic spline fit.

Providing universal guidance when training personal predictors
For each fold of 10fold crossvalidation (performed across individuals) and each sample of leaveonesampleout crossvalidation (performed on samples from a single individual), our procedure for universal guidance worked as follows (Fig. 4A).First, samples from the other nine folds (the universal training set) were used to train a "universal" predictor (the same predictor used for 10fold crossvalidation in Fig. 1).Next, the training samples from the current individual (i.e., the personal training set) were filtered to include only those genes used in the universal predictor, resulting in the shrunken personal training set.Thus, universal guidance here exploits the fact that ZeitZeiger performs feature selection.Finally, the shrunken personal training set was used to train the personal predictor.

Extending ZeitZeiger for multiple samples and multiple predictors
When making a prediction for a single sample , ZeitZeiger calculates the loglikelihood , x L (t|x) where is the scaled periodic variable (e.g., time of day).The predicted time is then 0 Now suppose we have a group of samples, and for each sample, we have the measurements n and a time difference , which is the time of the th sample relative to the time of a particular x i τ i i sample in the group.In the simplest case, and .For two samples taken antiphase to n = 1 τ = 0 each other, we could have and .Now we can combine the loglikelihood for each τ 1 = 0 .5 τ 2 = 0 sample and make one prediction for the entire group as follows: where the operator means that times less than 0 or greater than 1 "wrap around" to be od m between 0 and 1, and is the estimated time at which .t ˆτ = 0 Combining predictors can be done in two ways, the first of which works similarly to combining samples.Suppose we have a group of predictors, where for a given sample , is the m x L j (t|x) loglikelihood for predictor .For the situation of one universal and one personal predictor, j .The ensemble prediction is then m = 2 The second way to combine predictors is to use the circular mean.In this case, the ensemble prediction is This second way is simpler and (on our data) provides a slightly larger improvement in accuracy, so all ensemble predictions in this study are based on the circular mean.
Our current implementations implicitly weight each sample or each predictor equally, but one could imagine incorporating explicit weights into any of these calculations, then learning the weights through an additional round of crossvalidation.Tenfold crossvalidation to predict CT using only the core clock genes (otherwise identical to Fig. 1).

Fig. S3
Applying ZeitZeiger to gene expression from various sleep perturbations.For each of the three datasets, a predictor was trained on control samples from the other two datasets, then tested on all samples from the dataset of interest.Boxplots of (A) error, (B) absolute error, and (C) loglikelihood of predicted circadian time for each condition in each dataset.Numbers above the boxplots indicate pvalues less than 0.1 (circular ANOVA for error and Wilcoxon ranksum test for absolute error).For ease of visualization, plots of absolute error do not show three outliers in GSE39445, one in GSE48113, and three in GSE56931.The leftmost condition in each dataset is the control.In GSE39445, "restriction" refers to seven consecutive days of 6h sleep opportunity per night.In GSE48113, "out of phase" refers to a forceddesynchrony protocol that decouples the sleepwake rhythm from the central circadian clock.In GSE56931, "deprivation" refers to one full day of sleep deprivation, while "recovery" refers to a normal sleep opportunity the next night.(A) Boxplots of absolute error for universal (standard 10fold crossvalidation, identical to Fig. 3), personal (leavegroupout crossvalidation for each individual), and ensemble (circular mean of universal and personal) predictors.

22/28
(B) Improvement in absolute error between universal predictor and ensemble predictor as a function of the number of personal training samples for that group (equal to the number of samples for that individual minus two).

Fig. S9
Genes and SPCs of the personal predictors trained with universal guidance.
(A) Heatmap of genes present in personal predictors trained with universal guidance (using 15 genes present in predictor shown in Fig. 2).Rows correspond to genes and columns correspond to individuals.Black indicates the gene was present in the predictor for that individual.Rows and columns were sorted by hierarchical clustering.
(B) Histogram of difference between peak times of SPC 1 and SPC 2.
(C) Circadian times of peak expression for SPC 1 and SPC 2 in the personal predictors.Each point corresponds to one individual.For ease of visualization, some peak times for SPC 2 were shifted by 24 hours.28/28

Figures
Figures

Using
ZeitZeiger to predict circadian time across three datasets in 10fold crossvalidation.(A) Boxplots of absolute error for various values of sumabsv (regularization parameter) and nSPC (number of SPCs).(B) Boxplots of absolute error for each dataset at sumabsv=2 and nSPC=2.(C) Mean number of genes in the predictors from crossvalidation for various values of sumabsv and nSPC.15/28.