From sequence to functional understanding: the difficult road ahead

DNA sequencing has become cheap, rapid and accurate, allowing us to access thousands of genomes and reveal the extensive variation among individuals. The major problem that arises from this is distinguishing between neutral and pathogenic variants. A recent study by Davis et al., in which a functional screen of all the non-synonymous variants of a newly discovered gene was performed, highlights the value and necessity of characterizing the functional consequences of each genomic variant discovered. This is the main challenge for the advancement of genomic medicine in the years to come.

The recent development of high-throughput sequencing (HTS) and its application for sequencing the exomes or genomes of thousands of people (including participants of the 1000 Genomes Project) has provided experimental evidence of the extensive variability of the human genome (both in single nucleotide polymorphisms (SNPs) and copy number variations (CNVs)). Within the coding fraction of the genome (the exome), each individual is estimated to have approximately 8,500 to 10,500 nonsynonymous variants, 350 to 400 of which are predicted to cause loss-of-function alleles affecting 250 to 300 genes [1]. HTS data have also provided experimental evidence that the mutation rate of the human genome is 10 -8 per nucleotide per generation, resulting in two to seven new variants in each individual exome [2].
The most difficult challenge for HTS projects aiming to discover pathogenic variants is the correct identification of the disease-causing mutations among thousands of additional variants that could be either contributing to unrecognized phenotypes or neutral [3]. At present, most HTS projects focus on the known functional elements of the genome. Protein-coding genes are at the heart of this analysis, along with non-coding transcripts and highly conserved non-coding sequences. The rules of heredity, gene expression data, evolutionary principles and protein structure-function relationships provide the current set of criteria for deciding between potential contributing and non-contributing variants relative to the phenotype in question. The phenotype is also an important considera tion because identified variants may contribute to other phenotypes but not the one in question. Furthermore, the correlation between genome variation and phenotypic variation is relatively simple for monogenic/ oligogenic phenotypes and highly penetrant variants, but is complicated for polygenic phenotypes and for medium or low-penetrance variants.
More precise examples of these criteria are: the presence of the variants and their allelic composition in affected and non-affected individuals according to the mode of inheritance imposed or hypothesized; the mapping position of the variants following linkage or association studies in families and populations; the predicted functional consequence of the variant (missense, nonsense, frameshift or splice-site); the evolutionary conservation of the affected codon; the expected disruption of the protein's structure; the frequency of the variant in the population without the phenotype in question; the potential disruption of a protein network; and the predicted 'recessive' or 'dominant' nature of variants in a gene of interest. There are computer prediction programs using some of these criteria for predicting the likely pathogenicity of non-synonymous variants [4].

Establishing the function of human genomic variants
However, the 'prior probability' for the pathogenicity of the majority of non-synonymous variants is not satisfactory, the gray zone of uncertainty is extensive, and most investigators ultimately require experimental evidence for the functionality of each variant. A recently Abstract DNA sequencing has become cheap, rapid and accurate, allowing us to access thousands of genomes and reveal the extensive variation among individuals. The major problem that arises from this is distinguishing between neutral and pathogenic variants. A recent study by Davis et al., in which a functional screen of all the non-synonymous variants of a newly discovered gene was performed, highlights the value and necessity of characterizing the functional consequences of each genomic variant discovered. This is the main challenge for the advancement of genomic medicine in the years to come. published paper by Davis et al. [5] provides an excellent example of such a functional screening study. The authors studied the TTC21B gene in 753 patients with ciliopathies and 398 controls in order to examine the spectrum and disease contribution of variants. The TTC21B gene encodes the IFT139 protein, which is involved in retrograde intraflagellar transport in cilia and negatively modulates Sonic Hedgehog signal transduction [6]. Forty non-synonymous variants of TTC21B were identified in patients, and all of these were studied in a functional assay using zebrafish embryos to establish pathogenicity. Briefly, the embryonic phenotype associated with reduced levels of the zebrafish TTC21B ortholog can be rescued using human TTC21B mRNA. Different mRNAs carrying HTS-identified non-synonymous variants of this gene either failed to rescue or partially or completely rescued the phenotype; these represent functional null, hypomorphic and benign alleles, respectively. The functional studies provided evidence for TTC21B causative variants in ciliopathies such as Jeune asphyxiating thoracic dystrophy (JATD) and nephronophthisis (NPHP); furthermore, other TTC21B variants function as modifier alleles in additional ciliopathies. The functional evidence for each allelic variant is pivotal in the understanding of the observed phenotype. A caveat, however, is that we cannot always predict the effect of a variant on the human phenotype from the experiments in model organisms. This is even more relevant in cases such as those studied by Davis et al. [5], in which a dysfunctional protein may result in different disorders.
For proteins for which there are functional assays, one could predict that databases will be developed with the functional results for all variants detected for specific proteins. Functional validation of non-synonymous variants could be performed using several laboratory models, using either whole organisms (such as yeast, Drosophila, fish or mice), or cells (such as cell-based models derived from humans or other organisms and invitro-differentiated cells). The advantage of such functional assays is that they provide not only the functional proof of the pathogenicity of a variant, but also novel insights into protein function and perhaps even the mechanism of disease. Unfortunately, there are no TTC21B-like functional assays for the majority of proteins, and most of the methods to test functionality are not amenable to large-scale screening approaches. Thus, considerable effort should be made to develop large-scale screening assays for all possible nonsynonymous variants for all human proteins.

The challenges ahead
This is only the tip of the iceberg for the characterization of pathogenic variants. Assays need to be developed for the assessment of variants in all functional genomic elements outside the protein-coding genes. There is a sea of non-coding transcripts [7,8], hundreds of thousands of genomic regions with potential regulatory function [9], and hundreds of thousands of conserved non-coding regions with unknown but presumably important function [10,11]. This substantial fraction of the genome, for which we do not know the functional rules and constraints, could harbor variants for which functional assays need to be developed. This is obviously a major obstacle in the evaluation of the majority of the genomic variability. It is expected that the technology used in the ENCODE [9] and other projects will enhance our knowledge on the functional elements of our genomes. In addition, it is well known that the contribution of pathogenic variants to the phenotype is modified by the overall genomic variability of each individual, a notion in genetics known as 'penetrance' . Thus, an experimentally proven pathogenic allele may result in a phenotype in some individuals, but not in others.
We now have the ability to read almost entire individual genomes in a reasonable time-frame, and this is cause for celebration. On the other hand, the daunting task in front of us is the functional understanding of the extensive genomic variation (common and rare) that now populates the hard disks of supercomputers and biobanks. The next decade at the leading edge of genetic medicine will certainly be dedicated to this effort. And as the new graduate students and physicians in training now realize: sequencing is simple; functional understanding is still a dream.