Genetic architecture of retinal and macular degenerative diseases: the promise and challenges of next-generation sequencing

Inherited retinal degenerative diseases (RDDs) display wide variation in their mode of inheritance, underlying genetic defects, age of onset, and phenotypic severity. Molecular mechanisms have not been delineated for many retinal diseases, and treatment options are limited. In most instances, genotype-phenotype correlations have not been elucidated because of extensive clinical and genetic heterogeneity. Next-generation sequencing (NGS) methods, including exome, genome, transcriptome and epigenome sequencing, provide novel avenues towards achieving comprehensive understanding of the genetic architecture of RDDs. Whole-exome sequencing (WES) has already revealed several new RDD genes, whereas RNA-Seq and ChIP-Seq analyses are expected to uncover novel aspects of gene regulation and biological networks that are involved in retinal development, aging and disease. In this review, we focus on the genetic characterization of retinal and macular degeneration using NGS technology and discuss the basic framework for further investigations. We also examine the challenges of NGS application in clinical diagnosis and management.

medium wave-length (M); and blue, short wave-length (S)) are associated with daylight vision, color perception and high visual acuity [3,4]. The distribution of rod and cone photoreceptors is not uniform in the retina; for example, in humans, the central part of the retina (fovea) and a 6 mm 2 cone-rich area around the fovea (macula) are responsible for high-resolution central vision (Figure 1b) [5]. Rhodopsin is the visual pigment in the rods, whereas the three kinds of cones in humans contain a distinct visual pigment (L-, M-and S-opsin in L, M, and S-cones, respectively) [4]. Development and homeostasis in the retina must be stringently controlled for normal vision [6].
The dysfunction or death of retinal photoreceptors is the primary cause of vision loss in the majority of retinal degenerative diseases (RDDs) [7,8], which are clinically and genetically heterogeneous. In general, loss of photoreceptors in the macular region is termed macular degeneration and results in central vision defects. By contrast, the loss of peripheral vision -for example, in retinitis pigmentosa (RP) -generally starts with rod dysfunction or death that is followed by cone degeneration. RDDs are associated with a diverse spectrum of phenotypes. Some, such as RP, congenital and early-onset retinal degeneration including Leber congenital amaurosis (LCA), and cone-rod dystrophies, are monogenic and non-syndromic. Some syndromic diseases, including Bardet-Biedl syndrome (BBS) and Usher syndrome, exhibit a highly penetrant retinal degeneration phenotype with multiple affected tissues and are also monogenic. Other RDDs, such as age-related macular degeneration (AMD), glaucoma, and diabetic retinopathy, are complex multifactorial entities.
Monogenic RDDs, by definition, are often caused by mutations in a single gene and have a Mendelian pattern of inheritance (dominant, recessive, or X-linked). Geneticlinkage studies in large pedigrees have led to the identification of almost 200 causative genes [9]. Genetic defects in monogenic RDDs are highly penetrant, with little influence of non-genetic factors; nevertheless, clinical severity has often been difficult to correlate to genetic mutation because of variable penetrance, mutations in more than one gene, or modifier variants. The clinical manifestations in complex multifactorial RDDs result from the interplay among multiple susceptibility genes and epigenetic and/or environmental factors; therefore, conventional linkage and positional cloning methods have not been very effective in identifying the underlying genetic cause(s). Candidate gene and genome-wide association studies (GWAS) have been successful in identifying susceptibility loci for complex RDDs. A recent meta-analysis of GWAS discovered at least 19 genes and associated cellular pathways for AMD pathogenesis; these included the complement system (CFH, C2/CFB, C3 and CFI), the high-density lipoprotein pathway (LIPC and CETP), the extracellular or collagen matrix pathway (TIMP3, COL8A1and COL10A1) and the angiogenesis signaling pathway (VEGFA) [10]. Several loci have also been identified for primary open angle glaucoma [11][12][13] and diabetic retinopathy [14]. The associated alleles do not, however, specify causality, and further genetic and functional dissection of susceptibility loci will be crucial for understanding their roles in disease pathogenesis.
The discovery of new RDD genes helps in the elucidation of the molecular pathways that underlie retinal development and homeostasis, and in delineating the genetic underpinnings of the degenerative process. Several recent reviews have provided excellent synopses of the genetic and biological features of human RDDs [7,8,[15][16][17][18]. We therefore focus on recent advances in next-generation sequencing (NGS) methods [19] (Boxes 1 and 2) that have provided an unprecedented opportunity for the unbiased discovery of genes and causal variants, and for the comprehensive dissection of the genetic architecture of RDDs. We also highlight the merits and limitations associated with different NGS methods, and present a framework for integrative analyses to elucidate a more complete, multidimensional view of genomic function in retinal health and diseases. Finally, we discuss the progress and challenges in the application of NGS approaches for diagnosis and management of RDDs.

NGS approaches for gene identification in Mendelian RDDs
Extensive genetic and clinical heterogeneity is observed in RDDs that have Mendelian inheritance. Although almost 200 genes have been identified, the genetic defects in many patients are still unknown. NGS offers a rapid, high-throughput and cost-effective approach to identify mutations underlying monogenic disorders.

Whole-exome sequencing
A majority of Mendelian diseases are caused by highly penetrant mutations that disrupt protein coding or splice site sequences. Conventional methods of gene identification involve linkage analysis followed by sequencing of candidate genes in the critical linked region. However, small pedigrees and simplex cases (Box 1) with uncertain inheritance patterns make it difficult to identify genetic defects using the positional-candidate strategy. Whole-exome sequencing (WES) involves the capture and sequencing of all coding exons (1 to 2% of the whole genome) (Box 1) and has become an ideal choice for Mendelian disease gene discovery with or without prior linkage information [20][21][22][23][24] (Figure 2). A majority of commercially available exome capture kits target with high-confidence a subset of genes annotated by the Consensus Coding Sequence (CCDS) project [25] and  The vertebrate retina consists of six major types of neurons -rod and cone photoreceptors, horizontal, bipolar, amacrine and ganglion cells. The rod and cone photoreceptors are specialized light-sensing neurons, which capture photons and transduce visual signals to the inner retina. The RPE serves as a barrier between the choroidal capillaries and the neural retina and is crucial for photoreceptor survival. Bipolar cells relay signals to the amacrine and ganglion cells through synapses in the inner plexiform layer. Ganglion cell axons project towards the optic nerve head and carry signals to the brain. (b) Ocular fundus photograph of a healthy retina showing retinal blood vessels, optic disc, macula (5 to 6 mm diameter), and fovea (central pit of the macula). Photograph provided by Dr Emily Chew (National Eye Institute, National Institutes of Health, Bethesda, MD).
the RefSeq collection [26]. Ever since it was first described in 2009 as proof-of-concept for Freeman-Sheldon syndrome [21], several studies have successfully employed WES for identifying dozens of genes in inherited diseases. With regard to retinal phenotypes, it was first employed to study an Ashkenazi Jewish family, in which WES of three affected siblings revealed a mutation in a novel gene, DHDDS, as a cause of RP [27]. Table 1 provides a summary of published reports of novel retinal disease genes discovered by WES.
Next-generation sequencing (NGS): also known as massively parallel sequencing or high-throughput sequencing; these recent technologies allow parallel sequencing of multiple samples and produce millions of sequence reads concurrently, thereby reducing the sequencing costs and time considerably. Mendelian disease: also called monogenic disease; results from single genetic defect(s) that are passed on to the next generation in a Mendelian inheritance pattern (dominant, recessive, or X-linked). Complex disease: a multifactorial disease that does not exhibit a simple Mendelian inheritance pattern and is caused by complex interplay among genetic, epigenetic and environmental factors. Simplex case: also known as sporadic cases; those in which only a single individual has a clinical diagnosis and no relatives are affected. Whole-genome sequencing (WGS): sequencing of the complete DNA of an organism's genome. Whole-exome sequencing (WES): capture and sequencing of the coding exons of all genes, comprising roughly 1 to 2% of the genome. Targeted re-sequencing: sequencing of a defined (targeted) region of the genome; helpful in identifying disease variant(s) when a linkage region has been established. Transcriptome: the complete set of coding and non-coding transcripts (RNAs) expressed in a specific cell type, tissue or organism. RNA-Seq: sequencing of transcriptome using NGS; RNA-Seq includes only the transcribed sequences of the genome. RNA CaptureSeq: use of tiling arrays to capture and sequence selected portions of the transcriptome. Chromatin immunoprecipitation (ChIP): a method for investigating the genomic DNA region(s) that are interacting with a protein in vivo, by capturing the DNA that is bound to a specific antibody. ChIP-Seq: combines ChIP with NGS to generate a genome-wide map of DNA-binding proteins; can be used to identify genomewide epigenetic marks on histones. cis-Regulatory element (CRE): DNA sequence in or near a gene (in cis) containing binding sites for transcription regulatory proteins that control the spatiotemporal expression of the target gene. Single-end sequencing: sequencing of a DNA molecule only from one end; as NGS reads can be small, single-end sequencing might not be efficient in mapping low complexity and repetitive regions of the genome. Paired-end sequencing: sequencing of a DNA molecule from both ends, resulting in a superior alignment across the genome. Mapping sequence assembly: mapping of NGS reads to a reference genome. De novo assembly: a method of generating large assemblies of NGS reads in the absence of a reference genome. Single nucleotide polymorphisms (SNPs): genetic variations that alter a single nucleotide in the genome. Insertion-deletion variation (indel): any sequence change that leads to the addition or removal of one or more nucleotides compared to the reference genome. eQTL analysis: analysis of genomic loci that have statistically significant correlation between genotype and gene expression levels, and thus regulate gene expression levels. Copy number variation (CNV): a segment of DNA (>1 kb) that is present at a variable copy number compared to the reference genome, altering the diploid status at a particular locus. Deleterious variant: a DNA change that lowers the fitness of the gene by altering the expression or function of the encoded protein or RNA and remains under the selection of various strengths. Neutral variant: a variant that does not alter the function of a gene or confer any influence on the fitness of a gene. Conserved variant (or conserved residue): a DNA nucleotide or amino acid residue that shares identity among closely related species. Private variant: a genetic variant that is confined to a single individual or a family. Incidental or secondary findings: refers to the identification of any genetic finding that has health or reproductive relevance for the patient, but was not obtained as a part of the original goals of the study.

Candidate exome capture
In a slight variation of WES, candidate exome sequencing captures the selected coding regions that are relevant to a specific genetic trait. Consequently, the examination of a smaller genomic region allows better utilization of resources and the analysis of a larger group of patients. This approach has been useful for ciliopathies, where almost 2,500 genes that are implicated in ciliary function can serve as potential candidates [58]. Candidate exome capture of about 13,000 exons from 828 candidate genes recently identified mutations in the SDCCAG8 gene in patients with retinal-renal ciliopathies [56]. Mutations in a single gene can cause overlapping syndromic phenotypes, and so the candidate exome capture strategy helps in refining genotype-phenotype correlations.

Targeted re-sequencing
Mendelian traits have often been mapped using multigenerational families, narrowing the genomic search space for causal gene identification. Owing to the limited number of meiotic crossing-overs, however, linkage intervals can often span several megabases and include tens or hundreds of transcribed sequences. Intelligent guesses to select and sequence candidate genes can lead to laborintensive and expensive efforts that often provide limited data on the coding regions of annotated genes. NGS can now be used to capture and sequence large genomic regions of interest, expediting the discovery of causal genes (Table 1).

Whole-genome sequencing
WES, candidate exome capture and targeted re-sequencing can examine only a fraction of the genome and present additional challenges (Table 2). A role for variants beyond coding regions has occasionally been reported in Mendelian traits, including in vision disorders; for example, intronic mutations in RP1 [59] and OFD1 [53] are implicated as causes of RP, and CSPG2 variants may cause congenital vitreoretinopathies [60]. The inability of WES to assess the impact of non-coding, conserved or regulatory regions, and other long-range genomic alterations has encouraged researchers to move towards whole-genome sequencing (WGS), which is becoming more cost-effective. Nevertheless, progress has been slow because of the technological limitations associated with the analysis and handling of large datasets.

RNA-Seq or transcriptome sequencing
The transcriptome represents a collection of all transcribed sequences (RNAs), both protein-coding and noncoding, in a cell type or particular tissue and at a specific stage of development or age. While microarray technology and methods for serial analysis of gene expression (SAGE) have been valuable, transcriptome profiling using NGS technology (RNA-Seq) is becoming popular because of its ability to survey the transcriptome in a high-throughput and quantitative manner and at low cost [61]. In addition, RNA-Seq has proven useful for annotating protein-coding genes, discovering novel alternatively spliced transcripts Next-generation sequencing: also known as massively parallel sequencing or high-throughput sequencing, this process enables the sequencing of a large number of DNA sequences in a single reaction; NGS has revolutionized the genomics field by producing fast, inexpensive and accurate sequencing data. Platforms: several NGS platforms are commercially available, including Roche/454, Illumina/Solexa, ABI SOLiD, Life/APG, Helicos BioSciences and Pacific Biosciences; Illumina is a widely used platform because it provides large throughput at low cost. Template preparation: starting material (genomic DNA, cDNA or immuno-precipitated DNA) is sheered to small-size fragments of 30 to 400 base pairs and ligated to a universal adapter. DNA can then be amplified using emulsion PCR or bridging amplification, followed by sequencing, or in the case of single molecule templates, sequenced directly. Sequencing reaction: NGS chemistry methods (cyclic reversible termination, sequencing by ligation, pyrosequencing and real-time sequencing) vary among different platforms; sequencing by synthesis (SBS) is a commonly used approach that includes the use of DNA polymerase or DNA ligase for stepwise nucleotide addition. Output: high throughput is a key feature of NGS protocols, with a single run producing several gigabases (Gb) of sequence data. Output varies among platforms; for example, Illumina HiSeq 2000 produces 95 to 600 Gb and SoLiD/ABI 5500 produces 90 to 300 Gb in a single run. Assembly and mapping: once sequencing is accomplished, the initial analysis of base calling is performed by proprietary software on the sequencing platform, followed by aligning the data to a known reference genome if available or by de novo assembly; short read lengths can make mapping in the repetitive regions challenging. Analysis: downstream analysis varies depending on the biological question; DNA-Seq data are subjected to variant and CNV detection, whereas RNA-Seq data are used to characterize the transcriptome and ChIP-Seq; is used for large-scale analysis of chromatin features.

Box 2. Next-generation sequencing methods
and non-coding RNAs (ncRNAs), single nucleotide polymorphism (SNP) profiling, and the detection of gene fusions or rearrangements [62]. DNA-Seq approaches generate a large number of variants and often secondary filtering is required to prioritize the candidate disease genes. Tissue-specific expression profiles therefore offer a valuable first-level screen to identify relevant diseasecausing variants (Figure 2).

Profiling of alternative transcripts
In humans, a majority of genes (>90%) undergo alternative splicing to generate tissue-specific and functionally diverse protein isoforms [63]. The role of novel transcripts in diverse pathways and disease causation is slowly being recognized [64,65]. Retina-specific isoforms of RPGR and BBS3 have been implicated in X-linked RP and BBS, respectively [66,67]. In addition, mutations in the spliceosome-component genes (PRPF31, PRPF3, and PRPF8) are associated with RP [68]. Thus, retina-specific RNA or transcriptome profiling provides an excellent opportunity to identify novel functionally relevant transcripts. As novel transcripts may be present at low-copy number, RNA CaptureSeq (Box 1) can provide an alternative approach for enrichment [69].
Non-coding RNA ncRNAs appear to play prominent and diverse roles in normal development, physiology, and disease. Several specific microRNAs (miRNAs) are expressed in the retina (miR-96, miR-182, and miR-183) [70,71] and RPE (miR-204/211) [72]. Long antisense (non-coding) transcripts have been associated with eight transcription factors that are involved in eye development [73], and two ncRNAs, TUG1 [74] and Six3OS [75], have been linked to retinal differentiation. Inactivation of DICER1, an RNase III endonuclease that is essential for the production and function of mature miRNAs, has been implicated in retinal degeneration [76], and Alu RNA toxicity has been suggested to a play a role in AMD [77]. Thus, further exploration of ncRNAs seems essential, and RNA-Seq offers a starting point for the identification of novel ncRNAs in development and disease.

SNP profiling
Genetic variations within the transcribed (coding or noncoding) regions of the genome can alter the expression or function of the encoded sequence. Therefore, RNA-Seq can provide profiles of genetic variants in both the quasicomplete set of transcribed genes (mRNAs) and ncRNAs in a cost-effective manner and without the need for target-probe hybridization, a necessary but inefficient step in capture-based methods. However, RNA-Seq should be applied with caution, as it is possible to miss variants that result in the loss of a gene product.

ChIP-Seq-based approaches
The genomic location and function of regulatory elements contributes significantly to the development of human  Figure 2 Strategies for the identification of disease-causing variants in Mendelian diseases. Linkage or homozygosity mapping analysis can serve as the starting point in mutation identification by NGS. If linkage is conclusive (LOD score ≥3), linkage can be analyzed using a targeted re-sequencing approach. In cases of multiple suggestive linkage peaks (LOD score <3), whole exome/genome or candidate exome capture will be more suitable. Filtration and prioritization of variants can be customized depending on the availability of genetic information and NGS data. 1000G, 1,000 Genomes; dbSNP, Single Nucleotide Polymorphism database; EVS/EPS, Exome Variant Server (EVS) for the NHLBI Exome Sequencing Project (ESP); SIFT, sorting intolerant from tolerant. diseases. Chromatin immunoprecipitation followed by NGS (ChIP-Seq) can be used to profile cis-regulatory elements (CREs) (Box 1), which include transcription factor binding sites clustered within promoters, enhancers and silencers [78]. ChIP-Seq has been employed to generate genomewide maps of CREs for two key photoreceptor-specific transcription factors, CRX [79] and NRL [80]. In addition, these data have permitted the prioritization of diseaseassociated genes or variants for further study. Not surprisingly, several CRX and NRL target genes are associated with RDDs [79,80], and recently, CRX ChIP-Seq data were used to filter the candidates to identify mutations in MAK (encoding a regulator of ciliary length) as a cause of autosomal recessive RP [57].
ChIP-Seq can also be employed for genome-wide profiling of epigenetic modifications, such as DNA methylation and covalent modifications of histones [81]. Epigenetic variations can influence gene expression and developmental programming [82,83]. Evidence suggests the involvement of epigenetic mechanism in RDDs; but such modifications are more likely to be associated with complex disorders, such as AMD [84], glaucoma and diabetic retinopathy. All of the diseases listed here, except AMD, are monogenic. AMD is a multifactorial and complex disease.
Epigenetic profiling of retina, RPE and other eye tissues or cells, in combination with gene profiling, should provide valuable insights into disease mechanisms. The Encyclopedia of DNA Elements (ENCODE) project has systematically integrated gene expression data with information on regulatory elements, transcription factor binding, and epigenetic modifications for as much as 80% of the genome [85]. Expression profiles and other data relevant to retinal and macular diseases have not, however, been incorporated as yet.

NGS approaches for complex traits
The identification of the genetic susceptibility variants underlying complex multifactorial disorders requires extensive efforts including large patient cohorts and cumbersome analytical tools. GWAS have been successful in uncovering associated loci for numerous diseases, but such studies only examine common (tagging) variants in populations and additional investigations are necessary to identify causality. In this section, we provide an overview of the possible applications of NGS, with an emphasis on the study design for complex diseases.

GWAS and meta-analysis
GWAS have begun to unravel the genetic architecture of complex traits. Hundreds of susceptibility loci associated with multifactorial diseases have now been discovered [86]. AMD provided the first example of GWAS success with the identification of CFH susceptibility loci [87]. Multiple GWAS and large-scale meta-analysis studies have to date revealed as many as 19 AMD susceptibility loci [10,88]. However, the associated variants are not causal and do not explain a substantial fraction of genetic heritability. Rare and structural variants at these associated susceptibility loci might help to explain the causality and missing heritability [89]. NGS approaches have made the identification of rare alleles feasible and have ushered in a new era for a second-generation of association studies in complex diseases (Figure 3).

Rare variant identification
The hypothesis that rare variants influencing a complex trait should co-localize with associated common alleles has accelerated targeted re-sequencing of the GWAS loci [90]. Such studies have led to the identification of rare coding variants, R1210C in CFH [49] and G119R in CFI [12], that are associated with AMD. WES studies are also being performed to test the association of rare coding variants with a complex phenotype [91,92]. However, such studies require a large sample size to achieve the statistical power necessary to detect a significant association that can then justify multiple testing. In the absence of such a dataset, extreme phenotype study design, in which samples from both ends of the phenotype distribution are analyzed, can serve as a suitable alternative (Figure 3) [93]. In addition, inherited macular dystrophies share common clinical characteristics with AMD, and occasionally genes identified in heritable forms, such as TIMP3 [94] and ABCA4 [95], have been associated with AMD [88,96]. Thus, WES and WGS in families with macular dystrophies (and occasionally in available AMD families) can uncover rare variants that might contribute to disease pathophysiology. Does not identify novel variants; limited to coding region; limited representation of intronic and regulatory variants Genome-wide association analysis with rare variants

RNA-Seq
Array-independent profiling of the transcriptome High coverage required for the identification of low-copy transcripts; not applicable for the identification of variants that cause loss of protein; limited by tissue-or cell-type availability Genome-wide expression profiling; alternative transcript identification; non-coding RNA detection; SNP profiling; eQTL analysis

ChIP-Seq
Genome-wide profiling of epigenetic marks (DNA methylation and histone modifications) and cis-regulatory elements Dependent on the quality of antibody; requires high input; analysis methods still evolving; high coverage needed for accurate profiling DNA methylation; histone modifications; tissue-specific enhancer profiling

Copy number variations
Large DNA stretches (>1 kb) that exhibit variable copy number when compared to the reference genome contribute significantly to population dynamics and evolution. Comparative genomic hybridization (CGH) arrays and SNP arrays have been commonly used to detect copy number variations (CNVs) that are implicated in neurodevelopmental (autism, schizophrenia, and intellectual disability) and immune-related diseases (Crohn's disease, psoriasis, HIV/ AIDS, rheumatoid arthritis, and type I diabetes) [97]. The role of CNVs is under investigation in AMD [98][99][100]. An 84-kb deletion spanning CFHR1 and CFHR3 has been associated with protection against the development of AMD [101][102][103]. A recent GWAS has identified CNVs at NPHP1 and EFEMP1 as potential candidates for AMD association [100]. CNVs in additional genes, such as CCR3, CFH, CX3CR1, ERCC6, HTRA1, VEGF, GSTM1, and GSTT1, have also been associated with AMD [104]. Nevertheless, a complete spectrum of CNVs in complex diseases (including AMD) has not been realized yet because of the limited resolution of current methods. NGS-based CNV detection utilizes high-coverage WGS data for unbiased detection of CNVs at much higher resolution than has been available to date, providing information about CNV breakpoints and the location of copy number gains [105,106]. Nevertheless, methodologies for CNV detection using NGS lack welldefined workflows, protocols, and quality-control measures, imposing substantial computational and bioinformatic challenges [107]. In addition, databases for human structural variation are limited and contain inadequate information on most breakpoints. A validated pipeline for structural variation analysis using NGS data is highly desirable for utilizing the full potential of CNVs for complex trait analysis.

Exome-chip
A common-variant-based GWAS approach has not explained the complete genetic variance observed in complex traits. Missing heritability might be explained by rare to low frequency variants (minor allele frequency (MAF) of 1 to 5%) [90,108,109]. NGS studies, such as the 1,000 Genomes Project [110] and the NHLBI Exome Sequencing Project [111], have identified a large number of such variants, leading to a second-generation genotyping array for testing the association of rare variants in complex traits [112]. Ideally, all common GWAS variants that are associated with distinct phenotypes can be included in such exome-chips. Although any regulatory or novel rare variant is not identified by this approach in RDDs, exome chips offer an economical and rapid platform that can be used to test the hypothesis that certain rare variants are causal alleles in common diseases. Such studies are currently in progress as part of a large AMD consortium and should identify novel variants and genes in the near future.

Expression quantitative trait loci analysis
A vast majority of disease-associated SNPs identified in GWAS are reportedly located in non-coding regions of the genome, and their functional role in causing the disease is generally not understood. Arguably, such variants might regulate gene expression and act as expression quantitative trait loci (eQTL) [113]. Combined analysis of genotyping with RNA-Seq data provides a unique opportunity to correlate genetic variations and expression level at disease-associated loci [114]. Expression profiling is more relevant, however, when implemented in disease-affected tissues or cell types. Although little has been reported on the importance of eQTLs in RDDs, the integration of data on modifier and susceptibility variants with the NGS expression data would facilitate the elucidation of complex regulatory networks that can provide insights into novel intervention strategies.

Making sense of the vast amount of NGS data for disease gene identification
The identification of relevant candidate disease-causing variants from NGS data requires filtering strategies that depend on multiple factors, such as the availability of wellphenotyped patient cohorts, knowledge of the mode of inheritance, and large sample sizes. Computational tools that can predict the impact of a variant on protein function can assist in segregating the deleterious variants from neutral ones [115]. A general guideline for candidate variant identification is provided for Mendelian ( Figure 2) and complex RDDs (Figure 3), but each biological question might require a unique approach. For example, homozygosity and linkage data can complement WES or targeted sequencing in Mendelian RDDs [30,31,[45][46][47]. Similarly, the integration of GWAS data with rare variant, eQTL or pathway-based analyses can yield meaningful results for complex traits such as AMD. Ultimately, validation of genetic causality would require additional investigations using in vitro assays and/or model organisms.

NGS in diagnostics and disease management for RDDs
Over 200 genes have been implicated in RDDs, offering an opportunity to clarify etiology, provide prognosis, and calculate associated risk(s). Nevertheless, molecular diagnosis and counseling are complicated by genetic heterogeneity and extensive phenotypic variability. Mutations in the same gene can cause different phenotypes and similar clinical findings can result from mutations in different genes. For example, RPGR and RP2 are primary causative genes in X-linked RP, but recent studies have reported the prevalence of RP2 and RPGR mutations even in simplex retinal degeneration in males [116] and in pedigrees with 'apparent' autosomal dominant inheritance of RP [117]. Thus, the boundaries of distinct clinical entities can be blurred, demanding more comprehensive methods of molecular evaluation [118].
Customized arrays have been developed for screening patients with RP [119,120] and other retinal dystrophies [121,122]. The National Eye Institute has established the National Ophthalmic Disease Genotyping and Phenotyping Network (eyeGENE®), which offers molecular diagnosis as a service [123]. The eyeGENE® network currently includes Clinical Laboratory Improvement Amendments (CLIA)-certified diagnostic laboratory partners; over 270 registered clinical organizations with 500 registered users from around the United States and Canada have analyzed over 4,400 samples. eyeGENE® is also working towards setting up high-throughput genotyping and sequencing technologies for improved clinical sequencing. These efforts, however, need continuous validation of reliability, robustness and reproducibility of the technology being employed. Implementation of NGS hardware requires substantial in-house infrastructure and a standardized guideline for NGS protocols. Issues related to the ownership and confidentiality of genomic data and the handling of incidental or secondary findings (Box 1) must also be addressed before NGS technology moves into routine clinical practice.

Understanding the biology of disease
The extreme phenotypic variability in RDDs can be attributed to allelic heterogeneity, modifier loci, and epigenetic and environmental factors, or a combination of these. A modifier gene is predicted to alter the phenotypic outcome of a given genotype by interacting with the primary disease gene or by functioning in the same or a related biological pathway, affecting penetrance, expressivity and pleiotropy [124,125]. A few examples are warranted. Digenic inheritance has been reported for mutations in ROM1 and peripherin/RDS resulting in RP [126] and for ROM1 and ABCA4 mutations in macular dystrophy [127]. CNOT3 can modify the phenotype of a PRPF31 mutation in RP [128]. The potential involvement of mutations in more than one causative gene has also been described in LCA [129]. Phenotypic differences in ciliopathies are also attributed to modifiers [130,131]. For example, a common allele in RPGRIP1L, Ala229Thr, is reported to be a modifier of retinal degeneration in ciliopathy patients [132]. AHI1 seems to act as a modifier of CEP290 [133] and NPHP1 [134], and PDZD7 can modify the phenotype in Usher syndrome patients who have a homozygous USH2A mutation [135]. The search for modifiers has, however, been limited because of their non-Mendelian segregation and restricted exploration within known RDD genes, and because of the abundant normal genetic variations in humans. NGS approaches offer an expanded platform for genome-wide evaluation of modifier variants that would permit a better understanding of phenotypic variability and progression in RDDs.
Mouse models have provided valuable insights into RDD pathogenesis, but complex interactions among retinopathy proteins and additional variants can exacerbate or ameliorate the disease phenotypes. For example, mice that had a combination of Cep290 and Mkks disease alleles had better sensory functions than those with either mutation alone [136]. The development of therapeutic strategies would therefore require a comprehensive understanding of RDD gene interaction networks and cellular pathways. NGS technology should expedite the integration of genetic variants in relevant RDDs and modifier genes with retinal transcriptome and epigenetic profiles.

Conclusions, challenges and future prospects
It is an exciting time in the genetic analysis of RDDs as NGS has led to unprecedented access to various genomewide datasets. NGS has proven successful in identifying the genetic cause of monogenic RDDs in many patients and families and offers great promise for the genetic dissection of complex RDDs. With the generation and analysis of NGS data becoming more accessible and affordable, a comprehensive catalog of variants for most (if not all) vision-related traits seems a viable prospect. Whole-transcriptome and epigenome analysis in retinal tissues would greatly facilitate the elucidation of important pathways or networks underlying development and disease. In this review, we have highlighted the challenges and the opportunities in applying NGS for gene discovery and clinical diagnosis of RDDs. A major goal lies ahead in developing a unified framework for identifying all diseaserelevant variants and genes. Methods for downstream bioinformatic analyses are still evolving and represent a major bottleneck in NGS applications. There is significant room for improvement in mapping and variant-calling methods, especially for small insertions or deletions and CNVs. Better tools are required for combining information from across studies that have used different sequencing platforms or even distinct methods of data analysis. As each kind of NGS data has its own merits, integrated and multidimensional analyses of biological systems with relevant clinical information records would be valuable for intervention and personalized medicine.

Competing interests
The authors declare that they have no competing interests.