Pharmacogene regulatory elements: from discovery to applications

Regulatory elements play an important role in the variability of individual responses to drug treatment. This has been established through studies on three classes of elements that regulate RNA and protein abundance: promoters, enhancers and microRNAs. Each of these elements, and genetic variants within them, are being characterized at an exponential pace by next-generation sequencing (NGS) technologies. In this review, we outline examples of how each class of element affects drug response via regulation of drug targets, transporters and enzymes. We also discuss the impact of NGS technologies such as chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA sequencing (RNA-Seq), and the ramifications of new techniques such as high-throughput chromosome capture (Hi-C), chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) and massively parallel reporter assays (MPRA). NGS approaches are generating data faster than they can be analyzed, and new methods will be required to prioritize laboratory results before they are ready for the clinic. However, there is no doubt that these approaches will bring about a systems-level understanding of the interplay between genetic variants and drug response. An understanding of the importance of regulatory variants in pharmacogenomics will facilitate the identification of responders versus non-responders, the prevention of adverse effects and the optimization of therapies for individual patients.

Most pharmacogenomics studies to date have focused on coding variants of pharmacologically important proteins. However, well-supported examples of variants in regu latory elements of genes involved in drug response, such as drug metabolizing enzymes and transporters (see review by Georgitsi et al. [1]), show that variants in noncoding regulatory sequences are also likely to be important (Table 1). Th ree classes of regulatory elements have been studied in this context: promoters, enhancers and microRNAs (miRNAs). Each of these has a direct impact on the abundance of messenger RNA (mRNA) (in the case of promoters and enhancers) and protein (in the case of miRNAs). Genetic variation within each of these elements has been linked to human disease as well as interindividual diff erences in drug response. For example, a single nucleotide polymorphism (SNP) in the promoter of VKORC1, the gene encoding vitamin K epoxide reductase complex subunit 1, radically aff ects an indi vidual's response to the anticoagulant warfarin [2]. Likewise, a SNP within an enhancer in the vicinity of several solute carrier family (SLC) drug transporters is associated with increased clearance of methotrexate (MTX) [3], and a SNP within a 3'-untranslated region (UTR) miRNA binding site prevents resistance to the chemo therapeutic cisplatin [4].
While a rough estimate of the number of coding genes exists [5], it is unclear how many regulatory elements there are in the genome. Th e diffi culty in defi ning critical regulatory elements is compounded by the fact that the search space for regulatory elements is vast (98% of our genome is noncoding) and without clear sequence cues such as open reading frames. Next-generation sequencing (NGS) approaches are rapidly changing the status quo, revealing the location and function of regulatory elements on a genomic scale. Robust, high-throughput DNA sequencing platforms emerged as the Human Genome Project came to a close in 2003, as did a desire to establish a reference human epigenome [6]. Key technical advances have brought this goal closer to reality by enabling rapid de novo detection of DNA methylation [7,8], enhancers [9,10] and RNA transcripts [11][12][13] on a genome-wide level ( Table 2).
In this review, we discuss the role of each class of regulatory element in drug response variability, and how our understanding of these mechanisms has been impacted by NGS approaches. We also discuss NGS technolo gies such as deoxyribonuclease I sequencing (DNase-Seq), formaldehyde-assisted isolation of regulatory elements sequencing (FAIRE-Seq), high-throughput chromosome capture (Hi-C) and chromatin interaction analysis by paired-end tag sequencing (ChIA-PET), which have not yet been applied to pharmacogene regulation but will greatly improve our ability to interpret noncoding genetic variation. Finally, we comment on the need for more effi cient functional validation, and discuss other challenges that need to be considered in moving NGS data into the clinic.

Pharmacogene regulatory elements
Th ere are many diff erent classes of gene regulatory elements, including promoters, enhancers, miRNAs, silencers and insulators ( Figure 1; for detailed reviews see Maston et al. [14] and Noonan et al. [15]). In this review, we will focus on the fi rst three classes, each of which has been linked to multiple pharmacogenomic phenotypes (Table 1). Th at is not to say that other classes of regulatory elements are not important for pharmacogene regulation; they most likely are, but have not yet been identifi ed. As functional validation assays improve, a more complete picture of regulatory mechanisms will no doubt emerge.

Promoters
Gene promoters are located at the 5' terminus of their target gene, and often have two separate domains, known as the core and proximal promoter regions. Th e core promoter is where the transcription machinery assembles [16,17] and is usually 35 to 40 base pairs (bp) long. Th e proximal promoter is located in the immediate vicinity (-250 bp to +250 bp) of the gene's transcription start site (TSS). It contains several transcription factor binding sites (TFBS) and is thought to serve as a tethering element for enhancers, enabling them to interact with the core promoter [18]. Additional elements up to 5 kb upstream of the proximal promoter are often considered to be part of the 'promoter region' , and are designated as such in this review. Mammalian promoters often contain CpG islands: sequences at least 200 bp in length with  [4] family of miRNAs, leading to higher levels of its product, transcription factor AP-2α. This variant allows drugs that induce AP-2α, such as cisplatin, to more eff ectively inhibit cancer cell growth >50% G/C content [19]. CpG islands are unmethylated in tissues in which their target gene is expressed [20,21], but can be silenced by methylation in disease states.
Numerous studies have shown that genetic variations in promoters have functional eff ects on drug response. Two well-studied pharmacogene promoter variants are those of genes encoding VKORC1 and UDP glucuro nosyltransferase 1 family polypeptide A1 (UGT1A1), which have been linked to the anti-coagulation response of warfarin [22] and diarrhea and neutropenia toxicity caused by irinotecan, respectively [23,24]. VKORC1 is targeted by warfarin (Figure 2), and a common variant (rs9923231, global minor allele frequency 0.467) in its promoter region (-1639G>A) results in the formation of a novel E-box binding site, leading to lower mRNA expression of VKORC1 [2], and a lower eff ective dose of warfarin. Th is site is thought to recruit transcription factors that suppress gene expression by activating repressive histone modifi cation complexes. Th is variant can explain much of the variability in average dose requirements among Caucasians, and is incorporated in the warfarin-dosing algorithm to improve warfarin treatment outcome [25][26][27]. Th e active form of irinotecan, SN-38, is metabolized through glucuronidation by the UGT1A enzyme [28], which has fi ve to eight copies of 'TA' in its promoter. Th ere is a negative correlation between the number of TA repeats and UGT1A1 expression levels, and the presence of seven repeats

DNA methylation
MethylC-Seq Unmethylated cytosines are converted into uracil by bisulfi te treatment, and the converted DNA is sequenced. [7,8] Methylation is detected with single-base resolution by comparing the sequence of the converted DNA with that of the unconverted DNA, assuming effi cient cytosine conversion

RRBS
Restriction digested DNA is size-selected, treated with bisulfi te for cytosine conversion, and sequenced. [80,81] Methylation is detected as described for MethylC-Seq. The reduced representational approach lessens the complexity of analysis and covers a reproducible fraction of the genomic space, but regions with a lack of restriction sites might be missed

MeDIP
Methylated DNA elements are enriched with an antibody that binds to 5-methylcytosine, and then sequenced [82] MBD-Seq Methylated DNA elements are enriched with recombinant methyl CpG binding domain of MBD2, and then sequenced [83] CAP-Seq The chromatography-based method uses the CXXC domain, which has a high affi nity for unmethylated CpG sites, [84] to enrich for DNA elements with methylated CpG sites

MRE-Seq
A methylation-sensitive restriction enzyme is used, leaving unmethylated CpG sites available for sequencing [85] Identifi cation of regulatory elements ChIP-Seq DNA is crosslinked and isolated via immunoprecipitation typically using an antibody for a transcription factor, [9,10,86-90] regulatory co-factor, or chromatin mark. Subsequent sequencing allows identifi cation of the binding sites of the DNA-associated protein or histone mark Hi-C DNA elements in close spatial proximity are crosslinked, ligated and sequenced. It allows for a genome-wide, [127] high-resolution analysis of interacting DNA elements ChIA-PET DNA elements interacting with a protein of interest are enriched via chromatin immunoprecipitation, crosslinked [128] and ligated so that long-range interactions can be identifi ed by sequencing CAP, CXXC affi nity purifi cation; ChIA-PET, chromatin interaction analysis by paired-end tag sequencing; ChIP, chromatin immunoprecipitation; DNase, deoxyribonuclease I; FAIRE, formaldehyde-assisted isolation of regulatory elements; Hi-C, high-throughput chromosome capture; MBD, methylated DNA binding domain sequencing; MeDIP, methylated DNA immunoprecipitation; MRE, methylation-sensitive restriction enzyme; RRBS, reduced representation bisulfi te sequencing.
(denoted as UGT1A1*28) was shown to have a signifi cant association with higher-grade neutropenia and diarrhea for patients treated with irinotecan [29][30][31].
Several large-scale pharmacogene promoter sequencing studies have been conducted, and illustrate the potential advantages of using NGS technologies in the future. In one such example, promoters of 107 diff erent ATP-binding cassette (ABC) transporters and SLC drugassociated transporters were resequenced in an ethnically diverse cohort of 272 individuals, identifying several variants that aff ect expression levels [32]. Another study systematically identifi ed non-coding expression quanti tative trait loci (eQTLs) aff ecting expression of liver cytochrome P450 superfamily (CYPs) enzymes [33], which play key roles in drug metabolism and toxicity. Studies such as these provide valuable functional anno tations that can be mined in future pharmacogenomic association studies and whole-genome datasets.

Enhancers
Enhancers interact with promoters, instructing the promoters when, where and at what level to express their target gene. Th ey can regulate in cis, meaning that they regulate a nearby gene on the same chromosome, or in trans, regulating genes on a diff erent chromosome [34]. cis enhancers can be located 5' or 3' distal to the regulated gene, in introns or even within a coding exon of their target gene [35,36]. Th e Sonic Hedgehog (SHH) limb enhancer is approximately 1,000,000 bp away from its TSS, highlighting the diffi culty in linking such elements with a target gene [37]. Enhancers are thought to direct tissue-specifi c expression in a modular fashion, and therefore a gene that is active in many tissues is likely to be infl uenced by multiple enhancers [38,39]. While genetic variation within enhancers can have direct conse quences in human disease states [15,[40][41][42], information regarding their role in interindividual drug response is scarce.
A useful example of pharmacogene enhancers from the literature is that of liver CYPs, which metabolize the vast majority of pharmaceutical compounds. Of these, CYP3A4 is the most abundantly expressed in sites of drug disposition in the liver [43] and is also thought to singlehandedly catalyze the metabolism of >50% of prescribed pharmaceutical agents. CYP3A4 activity can vary 5-to 20-fold between individuals (depending on the substrate) [44] and its protein expression level can vary up to 40-fold [45]. Of the 28 common SNPs in the CYP3A4 locus, none has been linked to variability in its expression [46], suggesting regulatory variation. Two regions 7.7 kb and 10.5 kb upstream of the gene encoding CYP3A4 were shown to drive its expression in transgenic mouse studies [47,48]. A trinucleotide insertion within this region is present in about 3.1% of the French population, and leads to reduced induction of CYP3A4 expression in cell culture models [48]. Although this insertion is relatively rare in other populations and its eff ects on adverse drug reactions are unclear, this study provides evidence that enhancer variants can lead to interindividual diff erences in drug response. Distal enhancers have also been discovered for genes encoding other CYP family members, including CYP1A1 [49], CYP1B1 [49], CYP2E1 [50] and CYP2B6 [51], as well as genes encoding other liver enzymes, such as alcohol dehydrogenase 4 (ADH4) [52] and UGT1A1 [53]. Th e phenotypic eff ects of variants within these regions are mostly unknown, mainly due to the diffi culty in carrying out physiologically relevant studies.
Our laboratory recently used comparative genomics to identify evolutionarily conserved regions (ECRs) in proxi mity to nine liver membrane transporters, which were screened in vivo using the hydrodynamic tail vein assay [3]. In this technique, a large volume of plasmidcontaining delivery solution is rapidly injected into the adult mouse tail vein, causing specifi c expression of a reporter such as luciferase in the liver. Five ECRs in the vicinity of the genes encoding ABCB11, SLC10A1, SLCO1B1, SLCO1A2 and SLC47A1 were identifi ed as enhancers using this approach. Common human genetic variants within these regions were further functionally characterized, one of which was associated with reduced mRNA expression of SLCO1A2 in human liver tissue samples. Another variant was associated with increased clearance of MTX, a chemotherapeutic substrate of SLCO1A2 that is used to treat several malignancies, as well as psoriasis and rheumatoid arthritis. As NGS techniques become more widely adopted, we will be able to rapidly identify distal regulatory elements and key variants within them.
MicroRNAs miRNAs are small (18 to 25 nucleotide) noncoding RNAs that regulate gene expression by binding to complementary 3' UTRs of target genes. Th ey are endogenously transcribed as precursors and processed [54][55][56] into mature forms. Mature miRNAs harbor a two to eight nucleotide 'seed' region at the 5' end of the miRNA that is crucial for binding to target mRNA. Upon binding, the miRNA initiates translational repression or cleavage of its target mRNA [57,58]. SNPs within the seed region of the miRNA or the binding site on the target mRNA (miRSNPs) aff ect targeting of the miRNA and can lead to interindividual expression diff erences. Rarer variants can occur in genes involved in miRNA biogenesis and maturation, leading to more severe, syndromic phenotypes

Reference allele response Regulatory variant response
Cisplatin resistance

Coagulation
Requires normal warfarin dose Requires lower warfarin dose X X (a) (b) [59][60][61][62]. Compared with enhancers, miRNAs are relatively easy to identify using computational tools [63,64] and extensive databases of known and predicted miRNAs are publicly available [65]. Despite abundant evidence that miRNAs participate in almost all aspects of cell biology [66][67][68], there are only a handful of examples of their role in interindividual drug response. Overexpression of miR-27b, which binds the 3' UTRs of CYP1B1 [69] and CYP3A4 [70], leads to CYP3A4 downregulation and increased sensitivity to cyclo phospha mide [70]. miR-27a and miR-451 activate expression of P-glycoprotein, an ABCB1 gene product that renders cancer cells resistant to chemotherapeutics. Treatment with miR-27a and miR-451 antagomirs, small synthetic RNAs complementary to their target miRNAs, results in increased accumulation of doxorubicin in drug-resistant cells. [71]. It has been reported recently that miR-200c is downregulated in patients resistant to breast cancer therapy, as well as in human breast cancer cell lines resistant to doxorubicin [72], suggesting additional mechanisms in this pathway.
An example of a mutation aff ecting pharmacogene miRNA targeting is the C829T variant near the miR-24 binding site in the 3' UTR of the gene encoding human dihydrofolate reductase (DHFR). DHFR is a key metabolic enzyme important in DNA synthesis. MTX binds DHFR with high affi nity, inhibiting its activity in malignant cells. Th e C829T variant interferes with miR-24 targeting, resulting in DHFR overexpression and MTX resistance [73]. Another example is rs1045385 A>C in the miR-200b/200c/429 binding site in the 3' UTR of the gene encoding transcription factor AP-2α (TFAP2A) (Figure 2). AP-2α acts as a tumor suppressor by regulating key genes involved in cell proliferation and apoptosis, and can be induced by the chemotherapeutic agent cisplatin. Cancerous cells from certain endometrial and esophageal tumors can become resistant to cisplatin by upregulating miR-200b, miR-200c and miR-429. Th ese molecules bind to the 3' UTR of AP-2α mRNA, repressing protein translation. Th e AP-2α rs1045385 C SNP interferes with miR-200b/200c/429 targeting of AP-2α, thus preventing its downregulation and resulting in eff ective cisplatin treatment [4].
Wang et al. [74] recently performed a pair-wise correlation coeffi cient analysis on expression levels of 366 miRNAs and 14,174 mRNAs in 90 immortalized lympho blastoid cell lines. Th ey identifi ed 7,207 significantly correlated miRNA-mRNA pairs, with a good representation of metabolic enzymes (for example, CYP family) and drug transporters (for example, ABC and SLC family). Datasets such as these provide excellent troves of candi date regulatory elements for functional validation. Th e use of NGS methods such as RNA-Seq will greatly aid in the generation of such datasets and will help shed light on the role of miRNAs in regulating drug responses.

NGS approaches for investigating pharmacogene regulatory elements
Although gene promoters are easily identifi ed by their location, their activity in diff erent cell types and disease states can be altered by a myriad of intrinsic regulatory factors and genetic variants. Epigenetic factors such as DNA methylation can also aff ect promoter activity, resulting in diff erential response to drug treatment [75][76][77][78][79]. For example, methylation of the promoter of the gene encoding O-6-methylguanine-DNA methyltransferase (MGMT) is a good predictor of the effi cacy of temozolomide in the treatment of glioblastoma patients [78,79]. NGS technologies that can analyze DNA methylation on a genomic scale (Table 2), such as MethylC-Seq [7,8], reduced representation bisulfi te sequencing (RRBS) [80,81], methylated DNA immunoprecipitation sequencing (MeDIP-Seq) [82], methylated DNA binding domain sequencing (MBD-Seq) [83], CXXC affi nity purifi cation plus deep sequencing (CAP-Seq) [84] and methylationsensitive restriction enzyme sequencing (MRE-Seq) [85] will facilitate future systematic epigenetic studies of pharmacogene regulation.
Regulatory proteins infl uence gene expression by interacting with specifi c DNA sequences. Determining which proteins bind to which sites in the genome is the fi rst step to understanding regulatory mechanisms. Chromatin immuno precipitation (ChIP) approaches have been widely used for this purpose. More recently, coupling ChIP with NGS (ChIP-Seq; Figure 3) [86] has become the de facto standard as it provides an unbiased, genomewide look at enhancer binding with a high signal to noise ratio [87]. In addition to specifi c regulatory proteins, the availability of specifi c, high-quality antibodies for histone modifi cation marks has been used to characterize chromatin regulatory states [10,88,89]. Th e nucleosome core consists of histone proteins, which can be modifi ed posttranslationally (for example, by methylation, acetylation, phosphorylation, ubiquitination and sumoylation). Th ese modifi cations determine the regulatory state of the genomic region they are in (active, silent, and so on) and can be used to detect various gene regulatory elements such as promoters or enhancers. For example, developmentally active enhancers can be identifi ed by the acetylation of the 27th lysine of histone H3 (H3K27ac) [90].
Large-scale, multi-center eff orts have mapped the binding of dozens of regulatory proteins in a variety of human cell lines [91]. Th ese include the treatment of cells such as the human hepatocyte cell line HepG2 with factors such as forskolin, insulin and pravastatin. Transcription factors such as signal transducer and activator of transcription 1 (STAT1) or 3 (STAT3) [92,93], co-activators such as the CREB binding protein (CREBBP/CBP) and E1A binding protein p300 (EP300/p300) [94,95] that colocalize with enhancers, p160 co-regulators [95] and nuclear receptor proteins such as farnesoid X receptor (FXR/NR1H4) and pregnane X receptor (PXR/NR1I2) [96,97] have been successfully used in ChIP-Seq assays. However, only one study to date has used ChIP-Seq to interrogate a pharmacogenomically relevant drug response in vivo. Cui et al. [98] mapped PXR binding in mice before and after treatment with pregnenolone-16αcarbonitrile (PCN). PCN is analogous to rifampin, an antibiotic with hepatotoxic side eff ects, and is thought to activate many of the same targets. In addition to identifying many novel PXR-bound loci, the authors identifi ed a new DNA motif recognized by the factor. Results such as these are invaluable in elucidating complex drug response mechanisms. Furthermore, regulatory regions that are enriched using ChIP-Seq approaches only after the addition of a drug make attractive candidates for interindividual drug response variant discovery.
Over the past two decades, microarray-based methods have substantially improved our ability to quantify gene transcription through gains in throughput. RNA-Seq ( Figure 3) has the potential to push the boundaries of our knowledge further by off ering an unbiased approach that requires no prior knowledge of transcript variants, and off ers single base pair resolution and high dynamic range [11][12][13]. RNA-Seq is currently the only method that can rapidly detect novel splice isoforms [99][100][101] and mRNA sequence variants (for example, RNA editing) on a genome-wide scale. Current commercially available RNA-Seq sample preparation kits require as little as 10 pg of total RNA [102], allowing the possibility of strandspecifi c sequencing of mRNA species from single cells.
A primary use for RNA-Seq in pharmacogenomic studies is the determination of gene expression profi les that can be correlated with drug response phenotypes. Diff erent proteins are responsible for pharmacokinetic interactions with the drug (how the drug enters the cells and reaches its target) and pharmacodynamic interactions (how the drug exerts its cellular eff ects), and it is therefore not useful to focus on any one particular gene. For example, breast cancer cell lines demonstrate diff erential responses to drugs based on their gene expression profi les [103]. In addition, expression levels of several genes were shown to be statistically associated with response to various common chemotherapy agents such as etoposide [104], cisplatin [105] and carboplatin [106]. Th e Pharmacogenomics Knowledge Base [107,108] project curates pharmacogenomic data from a wide variety of basic and clinical reports, using them to construct drug pathways. Th e complexity of these pathways, which routinely involve a dozen or more genes and multiple deleterious variants, highlights the need for genome-wide profi ling approaches.
For miRNA and miRSNP discovery and profi ling, RNA-Seq off ers unprecedented scale and depth. For example, Lee et al. [109] conducted a comprehensive survey of miRNA sequence variations from human and mouse samples using RNA-Seq. Th is study demonstrated the complexity of the human miRNA spectrum through deep sequencing of isomiRs (miRNA sequence variants generated from the same precursor by diff erent processing mechanisms). So far, only a few studies have used NGS methodologies to identify miRNAs for diagnostic or prognostic applications. Most of this research was carried out on tumor development and progression [110][111][112][113][114], with sporadic reports of miRNA profi ling in noncancer diseases (for example, endometriosis, preeclampsia) [115,116].
Th e systematic identifi cation and characterization of other classes of regulatory elements will improve our knowledge of how regulatory nucleotide variants aff ect drug response. Silencers (also termed repressors), which can be thought of as the opposite of enhancers, turn off gene expression at specifi c time points and locations [14,15,117,118]. Insulators create cis-regulatory boundaries that prevent the transcriptional activity of one gene from aff ecting neighboring genes [14,15,119,120]. Variants in these elements almost certainly infl uence interindividual drug responses and remain to be identifi ed.

Future directions for pharmacogene regulatory element discovery
Several emerging NGS techniques have not yet been directly applied to pharmacogenomics, but promise to greatly improve our ability to interpret regulatory variants. Accessible DNA elements residing in active chromatin often harbor regulatory sequences such as promoters, enhancers, silencers and insulators. Deoxyribo nuclease I (DNase I) hypersensitive sites (HS) cluster around TSSs, but a signifi cant portion also maps to regions distal to known TSSs [121]. Th ese sites can be mapped genomewide by DNase-Seq (Figure 3), requiring no prior knowledge of specifi c transcription factors. A related approach, known as FAIRE [122], can identify open chromatin by phase separation (Figure 3). Th ere is a high level of correlation between FAIRE and DNase HS sites in general, but unique sites are discovered by each because of slight diff erences between the techniques [123]. Both DNase-Seq and FAIRE-Seq will be useful in broadly identifying regulatory elements in pharmacologically relevant tissues and will help prioritize SNP discovery eff orts.
Enhancers are thought to interact with promoters through chromatin looping (Figure 1) [124]. Th ese looping interactions can be identifi ed through techniques such as chromatin conformation capture (3C) and several of its derivatives (4C [125], 5C [126]). With the advent of NGS, whole-genome adaptations of this technique have been introduced, such as Hi-C [127] and ChIA-PET [128] ( Figure 3). A great advantage of these techniques over ChIP-Seq, DNase-Seq and FAIRE is that they can link regulatory elements with their target genes. Th ey could therefore be employed to systematically link variants with individual expression profi les, much like eQTLs but with the power to identify long-distance and trans interactions.
NGS technologies are constantly being improved, allowing higher throughput and the ability to ask biological questions on a genomic scale. 'Th ird-generation' , single molecule sequencing platforms are forthcoming and are reviewed in detail elsewhere [129]. Besides higher throughput and longer read lengths, they off er the advantage of eliminating the amplifi cation step, minimizing non-linear biases and thus alleviating some of the informatics and statistical challenges associated with analyses of RNA-Seq and ChIP-Seq data [130][131][132]. A signifi cant limitation of current ChIP-Seq protocols is the low resolution (about 200 to 300 bp) with which TFBSs within regulatory elements can be identifi ed. ChIP-exo partially eliminates this problem, using lambda exonuclease to facilitate strand-specifi c 5'-3' degradation, removing DNA not directly involved in the protein-DNA interaction [133]. Th is modifi cation to the ChIP-Seq protocol signifi cantly increases the signal-to-noise ratio, enabling much more precise peak-calling.
A major obstacle to overcome is our inability to functionally characterize candidate regulatory elements and variants with high throughput. Techniques such as ChIP-Seq, DNase-Seq, FAIRE, Hi-C and ChIA-PET are descriptive in nature. Th e development of techniques that will allow the functional characterization of thousands of these sequences in a high-throughput manner is critical. One technique that can potentially overcome this hurdle is the use of transcribed barcodes in massively parallel reporter assays, the abundance of which can be measured by RNA-Seq. Using this methodology, thousands of promoter variants were tested in a single experiment [134], and key bases that negatively impacted expression were identifi ed. Th is methodology has been recently followed up with enhancers in vitro using human cell lines [135] and in vivo using the mouse hydrodynamic tail vein assay [136]. Further development of such approaches will permit the high-throughput functional characterization of regulatory elements and nucleotide variants within them.

Translational implications of pharmacogene regulation
As we learn more about how specifi c variants in regulatory sequences contribute to diff erential drug responses, it will become more commonplace to personalize drug dosing. Warfarin has become a poster child for pharmacogenomics due to the frequency with which it is prescribed and the importance of genetic testing on proper dosing. Warfarin has a very narrow therapeutic index: there is small diff erence between clinically benefi cial and toxic doses and a large variation in the maintenance dose. Several reports have confi rmed VKORC1 as the target of warfarin and CYP2C9 as the principal enzyme responsible for its metabolism [137][138][139]. Together with non-genetic information, such as age, weight and drug interactions, variants aff ecting the expression of these genes can explain approximately 50% of the variability of warfarin maintenance dose [26]. A prospective study demonstrated the therapeutic benefi ts of genotyping known CYP2C9 and VKORC1 variant alleles in ensuring an optimal dosage of warfarin [137]. Th e clinical implications of the VKORC1 -1639G>A regulatory variant and a coding variant of CYP2C9 prompted the Food and Drug Administration (FDA) to add this information to warfarin labeling [140].
Another example of a regulatory variant that has been translated to the clinic is the UGT1A1*28 promoter allele, which alters an individual's response to the anticancer drug irinotecan. Th e information about the UGT1A1 variant and summary of the clinically signifi cant adverse reactions, related to severe neutropenia and diarrhea, have been added in the 'Warnings' section of the FDAapproved drug label [140]. If a patient is known to be homozygous for the UGT1A1*28 allele, clinicians are instructed to reduce the prescribed dose of irinotecan by one level. Patients who are taking irinotecan are often monitored for adverse reactions and to allow early relief of side eff ects. Genotyping tests for pharmacogene variants are becoming more widely available, along with guidelines to help clinicians with dosing and dosing adjustment [26,[141][142][143].
Despite the fact that we have discovered many functional pharmacogene variants, the uptake of pharmaco genomic testing in the clinic has been slow. Th ere is mounting evidence that pharmacogenomics data can play an important role in identifying responders and non-responders to drugs, avoiding side eff ects and allowing optimized dosing for patients. However, the link between biologically signifi cant correlations and the therapeutic impact of adopting new clinical practices is unclear. It is vital that we develop a useful framework to sift through and prioritize functional variants for clinical study. At the same time, it will also be necessary to promote training and education among health professionals about the value of pharmacogenomic testing before new policies can be widely adopted.

Concluding remarks
Over the past few years, NGS technologies have greatly accelerated the identifi cation of regulatory elements. However, their use has mainly been limited to genome annotation of physiologically normal cells and tissues. With time, their use in interpreting pharmacogenomic drug-gene interactions will grow rapidly. With each individual genome having millions of nucleotide variants, and reference epigenomic datasets soon to be widely available, there will be a vital need for ways to limit the search space for biologically relevant variants. Th e use of these technologies in a drug-specifi c manner, such as the study carried out by Cui et al. [98] to map PCN-induced PXR binding sites in mice, will provide the opportunity to highlight drug response-associated regions in these whole-genome datasets.
A major challenge will be to bring these experimental results to a clinical setting. Strong collaboration will be needed between scientists, software engineers, clinicians and pharmacists in order to generate tools to visualize and interpret genomic variations. Several ethical issues will also need to be addressed, such as the privacy and confi dentiality of this genomic data, how it will be stored and who can access it, keeping in mind that this information will be extremely important throughout the entire prescription process. In addition, the development of rapid high-throughput assays to functionally characterize variants in pharmacogene-associated regulatory elements is still needed. Techniques that use transcribed barcodes alongside NGS technologies, as mentioned previously [134][135][136], hold great promise. However, these techniques need to allow the rapid functional assessment of uncharacterized nucleotide variants of each individual. Th is is necessary to allow the functional consequence of these variants to be analyzed before a drug is prescribed. Th e existence of such technologies could also be extremely useful in cancer treatment by allowing assessment of how de novo mutations alter the effi cacy of a drug treatment. Th e ultimate goal of these studies would be to provide information to a physician or pharmacist regarding an individual's genomic sequence and any drug-associated gene or regulatory element variants so that the most effi cient and least harmful drug for each patient can be prescribed.

Competing interests
The authors declare that they have no competing interests.