Mobile elements in the human genome: implications for disease

Perhaps as much as two-thirds of the mammalian genome is composed of mobile genetic elements ('jumping genes'), a fraction of which is still active or can be reactivated. By their sheer number and mobility, retrotransposons, DNA transposons and endogenous retroviruses have shaped our genotype and phenotype both on an evolutionary scale and on an individual level. Notably, at least the non-long terminal repeat retrotransposons are still able to cause disease by insertional mutagenesis, recombination, providing enzymatic activities for other mobile DNA, and perhaps by transcriptional overactivation and epigenetic effects. Currently, there are nearly 100 examples of known retroelement insertions that cause disease. In this review, we highlight those genome-scale technologies that have expanded our knowledge of the diseases that these mobile elements can elicit, and we discuss the potential impact of these findings for medicine. It is now likely that at least some types of cancer and neurological disorders arise as a result of retrotransposon mutagenesis.

pseudogenes, which are cellular mRNAs that are reverse transcribed and inserted into the genome [14].
DNA transposons are considered to be immobile in the human genome. Accordingly, no human disease is known to arise as a result of their activity. Also, HERVs are thought to be retrotransposition defective in humans, but some may have retained their ability to move. For example, HERV-K113 has intact ORFs and has insertional polymorphisms in the human population, implying recent evolutionary activity [15]. Although no insertional mutagenesis by HERVs has been described, oncogenic ETV1 (ets variant 1)-HERV-K fusions generated by chromo somal translocation have been observed in prostate cancer [16,17], and HERV expression has been suggested as a potential contributor to autoimmune diseases [18]. Furthermore, syncytin, an endogenous retroviral enve lope protein playing a role in placental trophoblast cell fusion, is involved in breast cancer-endothelial cell and endo metrial carcinoma cell fusions [19,20]. Also, a LTR of a MaLR human endogenous retrovirus has been shown to aberrantly activate a protooncogene, thereby causing lymphoma [21].
The predominant mechanism by which L1s cause disease is insertional mutagenesis into or near genes [22,23]. L1 insertions are often accompanied by 3' trans duction, the comobilization of DNA sequences down stream of an L1 as a consequence of transcriptional read-through resulting from a weak L1 poly(A) signal [24][25][26][27]. Alu sequences predominantly cause disease by homolo gous recombination between two Alu sequences, but insertion into or near exons, and aberrant Alu splicing from introns, also frequently result in pathological conditions [28][29][30]. Furthermore, Alu RNA toxicity, a new disease-causing phenomenon, has been recently proposed to result in macular degeneration by DICER1 deficit [31]. Regarding the role of Alu elements in eye disorders, it is interesting that an Alu insertion poly mor phism in the ACE (angiotensin I converting enzyme) gene has been associated with protection from the dry/atrophic form of age-related macular degeneration [32]. SVA elements also have the ability to interrupt genes through insertional mutagenesis that can be coupled with 3' transduction, genomic deletion or aberrant splicing [11,[33][34][35]. The wide spectrum of disease cases caused by retrotransposons ranges from hemophilia to muscular dystrophy and cancer, and has been thoroughly reviewed [36][37][38]. There have been 96 known retrotransposon insertions in disease cases, of which 25 are caused by L1s, while the other 71 are also L1-mediated. Among the latter, 60 cases are attributable to Alus, 7 to SVAs, and 4 to truncated inserts with only poly(A) sequence remain ing [38]. Overall, retrotransposon insertions account for about 1 in 250 (0.4%) of disease-causing mutations [29]. Processed pseudogenes have not yet been found to cause human disease by de novo insertional mutagenesis, but facioscapulohumeral dystrophy has been demonstrated to arise as a result of the contraction of macrosatellite repeats leading to aberrant expression of an array of DUX4 retrogenes residing within the repeats [39,40]. In addition, mutations in functional processed pseudogenes can cause disease. For instance, mutations in UTP14C have been associated with male infertility [41], mutations in TACSTD2/M1S1 result in gelatinous drop-like corneal dystrophy [42,43], and PTENP1 is selectively lost in human cancers [44,45]. The main characteristics of mobile elements capable of causing human disease are summarized in Figure 1.
In the next two sections, we discuss large-scale genome, transcriptome and methylation profiling studies of mobile elements in human diseases. We also discuss some nongenome-scale studies that support or contradict the implications of these novel findings.

Genome-scale approaches to identify new retrotransposon insertions
High-throughput sequencing has increased our capacity to generate large datasets at an unprecedented resolution. It is now possible to characterize genome sequences of scarce samples or even single cells. A next-generation sequencing technique with a high coverage of germline polymorphic human-specific L1 (L1Hs) retrotransposi tion events has been developed by Ewing and Kazazian, comprising hemispecific PCR coupled to Illumina sequencing [46]. Using this approach, it has been demonstrated that many L1Hs elements are population-specific [46,47], and recapitulate genetic ancestry similar to Alu insertion polymorphisms [48]. Retrotransposons are not only excellent markers for exploring population history, but can also give rise to population-specific diseases. For example, a homozygous Alu insertion in an exon of the MAK (male germ-cellassociated kinase) gene has been identified in 21 patients of Jewish ancestry who were diagnosed with retinitis pigmentosa [49]. Oddly, the discovery of this mutation using Agilent exome capture and subsequent Illumina and ABI sequen cing was paradoxical, as attempts to remove repetitive sequences from the analysis led to the identification of the insertion [49]. Another populationspecific disease caused by retrotransposon mutagenesis is Fukuyama-type congenital muscular dystrophy. It is one of the most common autosomal recessive disorders in Japan and was the first human disease found to result from ancestral insertion of an SVA element [35,50,51]. A similar example of an apparently ethnic-specific retrotransposon allele-mediated disease is an L1-mediated orphan 3' transduc tion into the dystrophin gene leading to Duchenne muscular dystrophy in a Japanese boy [52,53].
An unexpected finding of state-of-the-art, large-scale approaches to study retrotransposon insertions has been that highly active (or 'hot' [54]) L1s are much more abundant in humans than previously appreciated. The out come of a fosmid-based paired-end DNA sequencing strategy, coupled with a cell culture assay for retrotrans position, was that over half of newly identified L1s are hot, expanding the number of known hot L1s from 6 to 43 [55]. These L1s are not only expected to be a major source of inter-individual genetic variation [55], but hot L1s account for most examples of disease-causing insertions [54].
Another unusual finding of genome-scale approaches in retrotransposon biology has been that retrotransposi tion occurs at a very high frequency in somatic cells. Specifically, the brain was announced to be a bona fide territory for retrotransposition. Among three somatic tissues tested, this organ supported the highest level of endogenous L1 copy number, as assessed by quantitative PCR (qPCR) [56]. In another study that awaits further validation, over 7,700 L1s, 13,600 Alus and 1,300 SVA putative somatic insertions were found in the hippo cam pus of three individuals using retrotransposon capture sequencing, which is based on transposon array capture followed by Illumina paired-end sequencing [57]. Sur pris ingly, in this study, L1 and Alu insertions were over-represented in protein-coding genes and targeted genes, such as HDAC1 (histone deacetylase 1) and RAI1 (retinoic acid induced 1), which are known to be mutated in neurological disorders [57]. These findings suggest that if a retrotransposon inserts into a gene that functions in neurological development or psychological functioning early in development, it might affect a large enough area of the brain to lead to disease. One might further speculate that retrotransposition in a single brain cell could have some physiological consequences or impact memory formation through altered extracellular signal ing to neighboring neurons. If such neuronal plasticity exists, it could affect behavioral phenotypes, and could be modulated by environmental factors [58]. Conversely, knock down of an L1 regulating cellular factor has demon strated an effect on L1 retrotransposition in the neuro develop mental disorder Rett syndrome [59]. MeCP2 (methyl CpG binding protein 2) has been shown to repress L1 expression and retrotransposition [60], and increased L1 retrotransposition has been observed in induced pluripotent stem cells of patients with Rett syndrome who carry MECP2 mutations [59]. Ten brain tumors were examined for somatic L1 insertions by 454 pyrosequencing, but interestingly no  Solyom and Kazazian Genome Medicine 2012, 4:12 http://genomemedicine.com/content/4/2/12 retrotransposon insertions were discovered [61]. However, nine somatic L1 insertions were found in 6 out of 20 lung tumors with the same technique [61]. It was not deter mined though whether the normal tissues also contained some number of L1 insertions relative to the tumor tissue, and thus whether insertions in the cancer represented an elevated level of retrotransposon mobilization. Furthermore, it is not known if these insertions are transcribed or affect gene expression, and whether they were drivers or merely passengers of the tumorigenic process. The genomewide methylation status of the lung tumors and adjacent normal tissue was also examined using an Illumina platform. All 6 patient DNA samples exhibiting tumorspecific L1 insertions were clustered together as hypomethylated, compared with 13 out of the remaining 14 samples that lacked somatic insertions. These data imply that a methylation signature distinguishes L1-permissive tumors from non-permissive tumors [61].
Another genome-scale method to genotype common retrotransposon insertion polymorphisms (RIPs) to identify genotype-phenotype associations uses array-based technology. Commonly, single nucleotide polymor phisms (SNPs) and copy-number variants have been used as markers in genome-wide association studies (GWASs) to map loci involved in human disease. RIPs are a valuable resource to investigate the role of these elements in phenotypic variation and disease. Also, generally a RIP is much more likely to be the causal variant than a SNP, because a large insertion is more likely to be disruptive of gene function than a single nucleotide alteration, and retrotransposons have many features that can interfere with gene expression (reviewed by Goodier and Kazazian [62]). On the other hand, strong selection exists against retroelement insertions into coding regions, where they are under-represented compared with SNPs [63]. Currently, one array-based approach has been conducted to detect retrotransposon insertions in human disease. Using transposon insertion profiling by microarray (TIP-chip), several novel L1 insertions on the X chromosome were discovered in male probands with presumptively X-linked intellectual disability [64]. Interestingly, one of the insertions occurred in the NHS gene, which is mutated in Nance-Horan syndrome, a condition associated with intellectual disability. Another promising insertion occurred in the DACH2 (dachshund homolog 2) gene that regulates neuronal differentiation [64]. However, confirmation studies are needed to demonstrate whether these insertions are the underlying cause of intellectual disability in these patients.
Except for the Baillie et al. study [57], which analyzed L1, Alu and SVA somatic insertions, the studies men tioned above concentrated on genome-wide detection of new L1 insertions. A notable study by Witherspoon et al. [65] developed a robust technique, termed mobile element scanning, to find new insertions of young Alu elements using PCR methods coupled with high-throughput sequencing. The group found approximately 500 de novo Alu insertions [65]. Their technique is applicable to all mobile elements, and is amenable to significant multiplexing of a number of DNA samples in one sequencing run.

Genome-wide methylation studies and transcriptome analysis of retrotransposons
It is speculated that one of the main roles of DNA methylation, in addition to epigenetic reprogramming, is to silence transposable elements [66]. Most methylation studies of human transposons have investigated malignancies and showed consistent hypomethylation (for example, [67,68]), the extent of which, however, was variable in different tissues [69]. As the malignant pheno type is inherently associated with global as well as tumor-typespecific methylation changes [70], and transposable elements comprise the majority of the human genome, it is difficult to establish the role of transposon demethy lation per se in tumorigenesis, especially without accom pany ing functional studies. It is possible that pathogenic cellular stress responses could result in local or global transposon deregulation -for example, via demethylation or chromatin modification. Once out of control, such an epigenetic deregulation might result in single or multiple retrotransposition events.
Retrotransposons located 5' of protein coding loci frequently function as alternative promoters. They might also express non-coding RNAs, and retrotransposons in the 3' UTR (untranslated region) of genes show strong evi dence of reducing the expression of the respective gene, as assessed by cap analysis gene expression and pyrosequencing [71]. Thus, an altered retrotransposon methylation state is expected to affect either the trans crip tion of the retrotransposon itself or that of nearby genes. Accordingly, it has been shown that hypomethy lation of L1s can cause altered gene expression. Specifi cally, an L1 is located in the MET (hepatocyte growth factor receptor) oncogene, and hypomethylation of a promoter in this L1 induced an alternative MET trans cript within the urothelium of tumor-bearing bladders. At the same time, in the bladder epithelium of cancer-free donors the methylation level of this L1 promoter was high and expression of the alternative MET transcript was low [72].
There are few studies that correlate human retro transposon methylation with their transcription level on a genome-wide scale. According to one study, expression of L1 5' and 3' UTR sequences in prostate cancer was rather decreased, despite significant hypomethylation of the L1 promoter. Different HERV-K families showed opposite trends in expression levels, and the expression of evolutionarily young Alu families was restricted to individual prostate tumors as assessed by RT-qPCR and pyrosequencing [73]. In agreement with that study, transcriptional activation of L1s was not observed in globally hypomethylated hepatocellular carcinoma compared with matched normal tissue, as assessed by RT-qPCR [74]. Of note, the quantification of some types of expressed retroelements by using classical methods may prove ambiguous. Since Alu sequences are abundant in RNA polymerase II transcripts, quantification of the relatively rare RNA polymerase III transcribed Alu transcripts by RT-qPCR is fraught with potential error [75]. Transcribed L1 sequences embedded in genes may similarly confound the results of such quantitative measurements of L1 expression.
Using a custom GeneChip microarray for transcrip tome analysis of several HERV families, it was shown that numerous HERV-W loci were overexpressed in testicular cancer [76]. Interestingly, one of these was an ERVWE1 transcript whose expression is usually restricted to the placenta. Methylation was severely or completely diminished at HERV-W sequences in the tumor DNA, suggest ing that DNA methylation and HERV-W expression is interrelated in this tumor context [76]. With a genome-wide technique termed selective differential display of RNAs containing interspersed repeats and with its modified version, termed L1 chimera display, it has been also demonstrated that the levels of many HERV-K LTR transcripts differ between normal and testicular germ cell tumor tissues [77], and that the L1 antisense promoter gives rise to novel chimeric transcripts that are unique in tumor samples [78]. Furthermore, the cancer-specific chimeric L1 transcripts could be induced in non-malig nant cells by using the demethylating drug 5-azacytidine [78].
It will be interesting to learn if tumor-specific retrotransposon profiles reveal enhanced retroelement mobility. For example, L1 retrotransposition is associated with genetic instability [79], a hallmark of cancer [80]. Thus, local or global overactivation of L1s could have the potential to contribute to tumorigenesis. In particular, germ cell tumors are good candidates to examine cancer-specific retroelement activity, because the genome of germ cells goes through epigenetic reprogramming through methylation at CpG sites. Thus, deregulation of this process might easily lead to the derepression of trans posable elements, and potentially to germ cell tumors. In support of this hypothesis, the L1 ORF1 protein was overexpressed in all 62 cases of investigated childhood malignant germ cell tumors relative to adjacent normal tissue and was associated with poor differentiation [81]. Testicular germ cell tumors should also be examined for L1-conferred hereditary disease, as no high penetrance susceptibility genes have been identi fied in this condition. With pyrosequencing of bisulfite-treated DNA using L1-specific primers, transgenerational L1 methylation inheritance was implicated to be associated with testicular cancer risk [82].
Thus, L1s are attractive candidates for both somatic drivers and hereditary predisposition factors in germ cell tumors and possibly in other cancer types. However, currently their functional impact in malignancy is poorly understood.

Concluding remarks and future directions
Genome-scale technologies now provide us with the opportunity to investigate retrotransposon biology in unprecedented detail. Ultimately, it will be important to test the functional consequences of these results, such as the effect of RIPs on gene function, and their role in cancer and neurological disorders. This outcome might be accomplished by classical functional studies, or by com bining the results of several genome-scale experiments. For instance, if comprehensive RIP profiles were coupled with next-generation RNA sequencing data, it would allow testing of hypotheses pertaining to retrotrans po sons and their effects upon gene expression. Such plat forms would also be useful to explore whether there is a role for common RIPs in common disease and if these RIPs convey the disease phenotype through expression. In a similar manner, one could incorporate chromatin/methyl-seq/RNA/ChIPseq profiles for DNA-binding or RNA-binding proteins with the respective RIP profiles. It would also be advantageous to carry out studies to explore whether any overlap between a GWAS hit and a known RIP exists, as the RIP might indeed be the causal variant.
As an alternative genome-scale approach to understand the impact of human transposons on disease, functional genetic screening strategies could be developed in cell culture. For instance, haploid cell lines [83,84] and BLM (Bloom syndrome, RecQ helicase-like)-deficient cells that can be converted to generate a genome-wide library of homozygous mutant cells [85][86][87] are available to be mutagenized and screened for any desirable phenotype, such as altered retrotransposition activity, using suitable read-out systems. One such system could be a retro transposition assay, where an L1 reporter construct has been designed so that translation of the reporter (drug-resis tance gene or enhanced green fluorescent protein) occurs only after L1 reverse transcription and insertion of its cDNA copy into the genome [88,89]. Also, genome-wide mutagenesis might be accomplished with mobile elements themselves, such as retroviruses or DNA transposons [85][86][87]. Similarly, large-scale small interfering RNA (siRNA) and cDNA functional genetics screening strate gies could be designed to identify host cell factors modulating L1 activity. One should also investigate whether some host factors elicit a disease phenotype through deregulated retrotransposon activity. For example, the remarkable finding of the role of Alu RNA toxicity due to DICER1 deficiency in macular degeneration [31] needs to be replicated by alternative methods. Those methods should exclude the possibility that what is really being detected is amplification of the closely related 7SL RNA or Alu sequences contained in RNA polymerase II trans cripts, which vastly outnumber the RNA polymerase III-transcribed Alu elements [75]. Also, functional genetic follow-up studies should circumvent -if at all feasible -non-specific toxicity arising as a result of ectopic Alu overexpression or antisense oligo-mediated downregula tion of essential RNA polymerase II transcripts with embedded Alu sequences. DICER1 may also have a role in tumorigenesis through retrotransposon overexpression, as germline mutations in this gene have been found in familial pleuropulmonary blastoma [90] and in familial multinodular goiter with ovarian Sertoli-Leydig cell tumors [91]. Sense and antisense transcripts derived from L1 promoters could be processed to siRNAs that might suppress retrotransposition by RNA interference [92], and DICER1 has been implicated in this process [93]. These data raise the possibility that genomic instability in some malignancies could arise -at least partly -from retrotransposon overdose as a consequence of a mutated small non-coding RNA pathway. This could lead even tually to retrotransposon RNA toxicity [31], genotoxic stress through DNA nicking by ORF2 [94], or elevated insertional mutagenesis [61].
For the future of personalized medicine it will be vital not to exclude the transposon profile of patients, as exemplified by the case of an Alu insertion in a retinitis pigmentosa proband [49]. Another aspect of personalized medicine is gene therapy. In one form of gene therapy, antisense oligonucleotides that block aberrant splicing into an intronic SVA that causes Fukuyama muscular dystrophy has been suggested [35]. Another aspect of gene therapy is the use of DNA transposons that hold the promise of lower immunogenicity, enhanced safety profile and reduced manufacturing costs compared with viral vectors [95]. Two DNA transposons from non-mammalian species have emerged as gene therapy tools based on their efficient transposition in humans: the reconstructed Tc1/mariner element Sleeping Beauty from salmonid fish and piggyBac from the baculovirus genome [95,96]. The first ex vivo gene therapy clinical trial using Sleeping Beauty has been approved [97], and induced pluripotent stem cells are now being generated after targeted gene correction using piggyBac technology [98]. Once the potential side-effects of these therapies -such as secondary mutagenesis resulting from transposon hopping or activation of nearby genes -are overcome, the roles of mobile elements can be redefined from being just 'junk' or 'enemy' to 'life-guards' of our genomes.