KLRD1-expressing natural killer cells predict influenza susceptibility

Background Influenza infects tens of millions of people every year in the USA. Other than notable risk groups, such as children and the elderly, it is difficult to predict what subpopulations are at higher risk of infection. Viral challenge studies, where healthy human volunteers are inoculated with live influenza virus, provide a unique opportunity to study infection susceptibility. Biomarkers predicting influenza susceptibility would be useful for identifying risk groups and designing vaccines. Methods We applied cell mixture deconvolution to estimate immune cell proportions from whole blood transcriptome data in four independent influenza challenge studies. We compared immune cell proportions in the blood between symptomatic shedders and asymptomatic nonshedders across three discovery cohorts prior to influenza inoculation and tested results in a held-out validation challenge cohort. Results Natural killer (NK) cells were significantly lower in symptomatic shedders at baseline in both discovery and validation cohorts. Hematopoietic stem and progenitor cells (HSPCs) were higher in symptomatic shedders at baseline in discovery cohorts. Although the HSPCs were higher in symptomatic shedders in the validation cohort, the increase was statistically nonsignificant. We observed that a gene associated with NK cells, KLRD1, which encodes CD94, was expressed at lower levels in symptomatic shedders at baseline in discovery and validation cohorts. KLRD1 expression in the blood at baseline negatively correlated with influenza infection symptom severity. KLRD1 expression 8 h post-infection in the nasal epithelium from a rhinovirus challenge study also negatively correlated with symptom severity. Conclusions We identified KLRD1-expressing NK cells as a potential biomarker for influenza susceptibility. Expression of KLRD1 was inversely correlated with symptom severity. Our results support a model where an early response by KLRD1-expressing NK cells may control influenza infection. Electronic supplementary material The online version of this article (10.1186/s13073-018-0554-1) contains supplementary material, which is available to authorized users.


Data Collection and Preprocessing
We identified 4 influenza challenge studies consisting of 52 whole blood samples the NCBI database Gene Expression Omnibus (GEO) ( Table 1) [1]. We supplemented the influenza challenge cohorts with 7 acute viral infection studies consisting of 16 cohorts of 771 whole blood, PBMC, and nasal epithelium samples also from GEO ( Table 2) [1]. We excluded challenge studies with less than 5 asymptomatic-nonshedders or 5 symptomatic-shedders. We used phenotypic labels as reported by the original authors. All datasets used were publicly available as described below.
All datasets were downloaded using the MetaIntegrator R Package [2]. Unless otherwise specified, we used gene expression data that had been preprocessed by the original authors, after verifying normalization and log2-transformation.

Liu et al. 2016 Cohorts (GSE73072)
The Liu et al. 2016 viral challenge cohorts were obtained from GEO (GSE73072) [3]. Three cohorts within GSE73072 fit our inclusion criteria as influenza challenge studies with at least 5 asymptomatic-nonshedders and 5 symptomatic-shedders. These cohorts (which we renamed for clarity) were DEE2 (Challenge B), DEE3 (Challenge A), and DEE5 (Challenge C). To maintain the heterogeneity of the cohorts, the .CEL files were downloaded from GEO, and each cohort was separately RMA-normalized and log-2 transformed. Total symptom scores were obtained for DEE2 (Challenge B) and DEE3 (Challenge A) from an earlier published version of those cohorts (GSE52428).

Davenport et al. 2015 cohort (GSE61754)
The Davenport et al. 2015 cohort (GSE61754) contained gene expression microarray data from whole blood of healthy individuals challenged with influenza H3N2 [4]. We utilized the dataset as preprocessed by the original authors, after verifying normalization and log2-transformation. The symptoms scores and demographic information were obtained through correspondence with the authors.

Proud et al. 2008 cohort (GSE11348)
The Proud et al. 2008 contained gene expression microarray data from the nasal scrapings of individuals experimentally infected with Human Rhinovirus (HRV-16) [5]. This cohort also contained sham-infected individuals, which we removed from this study. We downloaded the cohort from GEO (GSE11348) and utilized the data as preprocessed by the original authors, after verifying normalization and log2transformation.

Do et al. 2018 cohort (GSE97742 and GSE97741)
The Do et al. 2018 cohort contained gene expression microarray data from nasopharyngeal swabs (GSE97742) or whole blood (GSE97741) of children < 2 years old hospitalized with lower respiratory tract infection at admission to hospital (acute infection) and discharge [6]. We downloaded the cohort from GEO (GSE97742, GSE97741) and utilized the data as preprocessed by the original authors, after verifying normalization and log2-transformation. We separated GSE97742 and GSE97741 each into three cohorts by virus: Human Rhinovirus (HRV), Respiratory Syncytial Virus (RSV), and RSV coinfected with other pathogens (RSVco).

Hoang et al. 2014 cohort (GSE61821)
GSE61821 (Hoang et al. 2014) contained gene expression microarray data from whole blood of individuals with natural influenza infection between the ages of 5-73 years old [7]. We downloaded the cohort from GEO (GSE61821) and utilized the data as preprocessed by the original authors, after verifying normalization and log2-transformation. We separated GSE61821 into five cohorts according to virus and infection severity: Mild H1N1, Mild H3N2, Severe H1N1, Severe H3N2, and Pandemic H1N1 (Pand. H1N1). We defined severe infection as needing hospitalization, and thus both patients labeled as "moderate" and patients labeled as "severe" in GSE61821 are categorized as "severe" in our work. Due to small sample size, the Pandemic H1N1 cohort was not separated into "mild" and "severe".

Zhai et al. 2015 cohorts (GSE68310)
GSE68310 (Zhai et al. 2015) contained gene expression microarray data from whole blood of adults with naturally acquired respiratory infections [8]. Individuals in GSE68310 were first profiled at a baseline healthy time point prior to viral infection and then returned for additional transcriptional profiling within 48 hours of symptom onset (study Day 0) of a naturally acquired respiratory viral infection. We downloaded the cohort from GEO (GSE61821) and utilized the data as preprocessed by the original authors, after verifying normalization and log2-transformation. We separated GSE68310 into two cohorts based on virus: Influenza A and Human Rhinovirus (HRV). Individuals infected with other viruses were excluded due to low sample size.

Sun et al. 2013 cohort (GSE43777)
The Sun et al. 2013 cohort (GSE43777) contained gene expression microarray data from peripheral blood mononuclear cells (PBMCs) of humans during the acute, late acute, and convalescent phases of dengue infection [9]. We downloaded the normalized and log2-transformed data from GEO (GSE43777). Individuals with dengue hemorrhagic fever were excluded. Samples of late acute infection were excluded. GSE43777 was profiled on two microarray platforms, but only samples profiled on Affymetrix Human HG-Focus Target Array (GPL201) were used due to its larger sample size.

Kwissa et al. 2014 cohort (GSE51808)
The Kwissa et al. 2014 cohort (GSE51808) contained gene expression microarray data from whole blood of individuals infected with dengue and healthy controls [10]. We downloaded the normalized and log-2 transformed data from GEO (GSE51808). Individuals with dengue hemorrhagic fever were excluded. Only samples from acute dengue infection or healthy controls were included.

Heinonen et al. 2016 cohort (GSE67059)
The Heinonen et al. 2016 cohort (GSE67059) contained gene expression microarray data from whole blood of children < 2 years old symptomatically infected with HRV, asymptomatically infected with HRV, and healthy controls [11]. We log2-transformed the normalized data obtained from GEO (GSE67059). Asymptomatically infected individuals were excluded.