Genetic regulation of gene expression

Understanding how and to what extent inter­individual genetic variation determines gene function in normal and pathological conditions can provide important insights into disease etiology. To this end, the rapid accumulation of large transcriptomic datasets across diff erent tissues has prompted several population­based studies of gene expression variation [1]. In many of these studies, typical transcriptional analyses are carried out within or between whole tissue(s), with the aim of pinpointing gene expression signatures and/or (tissue­ specifi c) genetic regulation of gene expression. Even at this level, context­dependent genetic regulation of gene expression has been shown to be important, and the underlying regulatory variants have more complex eff ects than previously anticipated [2]. For instance, character­ izing diff erent cis­regulatory mechanisms between tissues (such as opposite allelic eff ects) is important to under­ stand the tissue­specifi c function exerted by disease­ associated genetic variants. Th e genetic variants that are associated with gene expression variation are commonly called expression quantitative trait loci (eQTLs). Th ese can be mapped to the genome by modeling quantitative variation in gene expression and genetic variation (for example, single nucleotide polymorphisms (SNPs)) that have been assessed in the same population, family or segregating population. Essentially, mRNA levels can be treated as a quantitative phenotype and as such can be mapped to discrete genomic regions (genetic loci) that harbor DNA sequence variation aff ecting gene expression. In many cases, eQTL studies have provided direct insights into the complex regulatory mechanisms of gene expression ­ for instance, by allowing researchers to diff erentiate cis (or local) from trans (or distant) control of gene ex­ pression in a given tissue, experimental condition or developmental stage. Furthermore, eQTL analyses can be integrated with clinical genome­wide association studies (GWAS) to identify disease­associated variants [3,4]. Despite this recent, exciting progress in ‘genetical genomics’ (that is, eQTL studies), the growing number of single­cell transcriptomic analyses now prompts re­ evaluation of our understanding of how heritable variations aff ect gene function in the cell.

Understanding how and to what extent interindividual genetic variation determines gene function in normal and pathological conditions can provide important insights into disease etiology. To this end, the rapid accumulation of large transcriptomic datasets across diff erent tissues has prompted several populationbased studies of gene expression variation [1]. In many of these studies, typical transcriptional analyses are carried out within or between whole tissue(s), with the aim of pinpointing gene expression signatures and/or (tissue specifi c) genetic regulation of gene expression. Even at this level, contextdependent genetic regulation of gene expression has been shown to be important, and the underlying regulatory variants have more complex eff ects than previously anticipated [2]. For instance, character izing diff erent cisregulatory mechanisms between tissues (such as opposite allelic eff ects) is important to under stand the tissuespecifi c function exerted by disease associated genetic variants.
Th e genetic variants that are associated with gene expression variation are commonly called expression quantitative trait loci (eQTLs). Th ese can be mapped to the genome by modeling quantitative variation in gene expression and genetic variation (for example, single nucleotide polymorphisms (SNPs)) that have been assessed in the same population, family or segregating population. Essentially, mRNA levels can be treated as a quantitative phenotype and as such can be mapped to discrete genomic regions (genetic loci) that harbor DNA sequence variation aff ecting gene expression. In many cases, eQTL studies have provided direct insights into the complex regulatory mechanisms of gene expression for instance, by allowing researchers to diff erentiate cis (or local) from trans (or distant) control of gene ex pression in a given tissue, experimental condition or developmental stage. Furthermore, eQTL analyses can be integrated with clinical genomewide association studies (GWAS) to identify diseaseassociated variants [3,4]. Despite this recent, exciting progress in 'genetical genomics' (that is, eQTL studies), the growing number of singlecell transcriptomic analyses now prompts re evaluation of our understanding of how heritable variations aff ect gene function in the cell.

Neglected single-cell diff erences and other hidden factors
Establishing a robust link between SNPs and gene expression variation is a nontrivial exercise when multiple cell types are jointly modeled. To aid this process, ad hoc methodological approaches that borrow information among tissues have been recently developed [5,6]. None theless, emerging concepts such as singlecell transcrip tomics have started changing our understanding of the genetic regulation of gene expression in individual cells, which can be hidden in ensembleaveraged experiments. In a recent study published in Nature Biotechnology, Holmes and colleagues [7] carried out singlecell quanti fi cation of gene expression for 92 genes in approximately 1,500 individual cells to disentangle the eff ect of gene variants on celltocell variability, temporal dynamics or cellcycle dependence in gene expression.
Th e authors looked at selected genes in fresh, naive B lymphocytes from three individuals and clearly showed how gene expression had much greater variability

Abstract
The recently developed ability to quantify mRNA abundance and noise in single cells has allowed the eff ect of heritable variations on gene function to be re-evaluated. A recent study has shown that major sources of variation are masked when gene expression is averaged over many cells. Heritable variations that determine single-cell expression phenotypes may exert a regulatory function in specifi c cellular processes underlying disease. Masked eff ects on gene expression should therefore be modeled, not ignored.

© 2010 BioMed Central Ltd
Single cell expression quantitative trait loci and complex traits Enrico Petretto* between cells within an individual than between indi viduals. This observation set the scene for a compre hensive investigation of the distributions of singlecell gene expression and the properties of gene expression noise in a larger population of cells. These analyses were focused on 92 genes affected by Wnt signaling (that can be chemically perturbed by a Wnt pathway agonist), of which 46 genes were also listed in the Catalog of GenomeWide Association Studies, and resulted in four important outcomes.

R E S E A R C H H I G H L I G H T
First, perturbing the system with a Wnt pathway agonist exposed significant changes not only in whole tissue gene expression but also in gene expression noise. Given the intrinsic stochastic nature of gene expression, it was expected that the number of mRNA copy numbers would vary from cell to cell, as previously shown in isogenic bacterial cell populations [8]. The singlecell transcriptomic analyses reported by Holmes and colleagues [7] highlight the large effect of fluctuations of mRNA copy numbers in HapMap lymphoblastoid cell lines, which has been mostly neglected and might influ ence eQTL detection in this system to a large extent.
Second, singlecell transcriptomic analysis allowed Holmes and colleagues to quantify both the noise from the regulation of transcription and the noise of RNA turnover, which therefore can be modeled independently. In keeping with previous observations [9], genes differed from each other primarily in terms of burst size (that is, the amount of RNA produced when the gene is switched on), resulting in an increased expression variance between cells that was greater than the expression mean. The expression 'Fano factor' (the gene expression vari ance divided by the mean) quantifies this phenomenon, and it represents another commonly neglected compo nent that might be important in eQTL studies.
Third, when gene expression distributions were described in terms of heterogeneous cell subpopulations with respect to different stages of the cell cycle, Holmes and colleagues showed that the majority of genes ana lyzed had altered expression between G1 and early S phases. These apparent differences in cell cycle sub population proportions between samples represent another determinant of gene expression variation, which is expected to contribute significantly to gene regulation.
Finally, singlecell transcriptomics enabled the reliable quantification of the gene expression noise in the system. The latter can be considered as another source of varia bility, which can then be used to infer an expression network for each sample. Traditional gene coexpression networks assess genegene associations by correlating gene expression profiles across multiple samples. By contrast, in the Nature Biotechnology article, expression networks were built by correlating gene expression across multiple cells, which were profiled in the same lymphoblastoid cell line. For instance, one expression network built with approximately 200 cells from one of the lymphoblastoid cell lines revealed changes in cellto cell gene correlations in response to chemical pertur bation of the Wnt signaling, which were not detectable at the level of wholetissue expression. This approach allowed the authors to assess the extent to which the network connectivity of each gene varies in the system in response to other perturbations (for example, chemical, genetic), unmasking an additional factor that is potentially relevant for eQTL analysis.

Single-cell quantitative trait loci
After demonstrating (and quantifying) the important effect on gene function of a number of factors that reflect singlecell differences, Holmes and colleagues tested how each of these factors (alone or in combination) contri buted to the detection of ciseQTLs (that is, regulatory SNPs within 50 kb of the gene) [7]. This is an important question because integrated eQTL and clinical GWAS analyses are commonly employed to identify genes and pathways underlying disease, and eventually generate new hypotheses concerning diagnostic and prognostic biomarkers or potential therapeutic targets [10]. First, the eQTL associations detected at log 10 P = 3 for whole tissue gene expression (at both baseline and after chemi cal perturbation of the Wnt signaling) represented only a small fraction of the total number of eQTLs in the system ( Figure 1). Overall, many more eQTL signals were detected for the other singlecell expression phenotypes tested. This highlights the extent to which different masked sources of variation (detailed above) can signifi cantly affect the detection of ciseQTLs in the system. Furthermore, it turns out that the complex spatio temporal expression variability quantified by singlecell analysis ('singlecell expression') is more heritable than, or at least comparable to, gene expression levels averaged over many cells ('wholetissue expression'), such that the authors of the study named this new class of associated genetic variants 'singlecell quantitative trait loci' (scQTLs) [7].
Notably, GWAS eQTL genes in particular demon strated greater cellcycle (G1 and early S phase) inter individual variability compared with other genes and greater interindividual variability of their network con nec tivities [7]. The implications of these results are two fold: first, these studies urge caution in the interpretation of eQTL data published to date where only wholetissue expression was considered; and second, they prompt a deeper evaluation (and accurate modeling) of these 'masked' sources of variation resulting from singlecell differences. It will be intriguing to extend these analyses to the study of more distant genetic control of gene expression at the singlecell level (that is, singlecell transeQTLs) and to investigate the functional relevance of scQTLs on wholebody phenotypes in human and animal models. With the growing accessibility of single cell technologies for transcriptomic studies, the time is right for a deep rethinking of the key factors determining the observed complexity of gene expression and its regulation.

Competing interests
The author declares that he has no competing interests.  Supplementary  Table 1 from [7] is represented as a doughnut chart. Several different phenotypes derived from single-cell transcriptomic analysis were modeled as described in [7], and tested for association with single nucleotide polymorphisms within 50 kb of the gene. Beyond signals coming from cells with undetected expression (grey), a substantial number of single-cell quantitative trait loci associated with single-cell transcriptional variation due to cell cycle, gene burst, gene-gene correlation, network connectivity and expression noise were detected. The highlighted sector (black) denotes the relatively small contribution of whole-tissue expression quantitative trait loci, which were obtained using gene expression levels averaged over many cells.