Exome sequencing resolves apparent incidental findings and reveals further complexity of SH3TC2 variant alleles causing Charcot-Marie-Tooth neuropathy
© Lupski et al.; licensee BioMed Central Ltd. 2013
Received: 11 March 2013
Accepted: 27 June 2013
Published: 27 June 2013
The debate regarding the relative merits of whole genome sequencing (WGS) versus exome sequencing (ES) centers around comparative cost, average depth of coverage for each interrogated base, and their relative efficiency in the identification of medically actionable variants from the myriad of variants identified by each approach. Nevertheless, few genomes have been subjected to both WGS and ES, using multiple next generation sequencing platforms. In addition, no personal genome has been so extensively analyzed using DNA derived from peripheral blood as opposed to DNA from transformed cell lines that may either accumulate mutations during propagation or clonally expand mosaic variants during cell transformation and propagation.
We investigated a genome that was studied previously by SOLiD chemistry using both ES and WGS, and now perform six independent ES assays (Illumina GAII (x2), Illumina HiSeq (x2), Life Technologies' Personal Genome Machine (PGM) and Proton), and one additional WGS (Illumina HiSeq).
We compared the variants identified by the different methods and provide insights into the differences among variants identified between ES runs in the same technology platform and among different sequencing technologies. We resolved the true genotypes of medically actionable variants identified in the proband through orthogonal experimental approaches. Furthermore, ES identified an additional SH3TC2 variant (p.M1?) that likely contributes to the phenotype in the proband.
ES identified additional medically actionable variant calls and helped resolve ambiguous single nucleotide variants (SNV) documenting the power of increased depth of coverage of the captured targeted regions. Comparative analyses of WGS and ES reveal that pseudogenes and segmental duplications may explain some instances of apparent disease mutations in unaffected individuals.
Exome sequencing (ES) is an approach to human genome analysis and genetic variant detection that focuses on the coding exons of genes and closely linked functional elements. ES is less expensive than comparable approaches based on whole genome sequencing (WGS), because there is overall less raw DNA sequence data required. Furthermore, coding regions contain the changes that are currently the most easily interpretable, and knowledge gained from ES may therefore have more immediate medical utility and application. There is a generally held belief, however, that WGS may be more informative than ES, even within the boundaries of exome regions, as the random distribution of individual sequence reads offers an overall higher likelihood of effective testing of individual sites in the genome. Further, biases inherent to the capture process may result in missing or poorly covered bases in a subset of the exome regions .
We previously applied WGS using massively parallel next-generation sequencing (NGS) methods to a subject with molecularly undefined, autosomal recessive, Charcot-Marie-Tooth (CMT) neuropathy and identified apparently causative compound heterozygous SH3TC2 variants: nonsense p.R954X (g. chr5: 148,406,435 G>A) and missense p.Y169H alleles (g. chr5: 148,422,281 A>G) . The two alleles satisfied multiple criteria for disease causation: both were validated with alternate technologies; the nonsense p.R954X allele had been previously reported in unrelated CMT families and corroborated with functional studies; each segregated faithfully in the pedigree and no other lesion was discovered in the locus despite adequate coverage of all coding bases with NGS reads. Further, each allele faithfully independently segregated with additional different mild phenotypes observed in other family members - susceptibility to carpal tunnel syndrome (presumably due to loss-of-function nonsense mutation) and an electrophysiologically identified subclinical axonopathy (missense allele).
In the period since the initial CMT study, technologies for both WGS and ES have advanced considerably. We developed a series of improved ES methods and reagents that reliably assay >95% of the coding regions of the human genome [5, 6]. The current ES assay design involves routine 'oversampling' of the targeted regions, and ensures that a minimum of 95% (average 97%) of these positions are tested with >20-fold NGS coverage. The new methods also take advantage of longer individual DNA sequence reads generated on the Illumina HiSeq platform (2 × 100 bp) versus the previous SOLiD platform (2 × 50 bp) used for WGS.
This study conforms to the Helsinki Declaration regarding ethical principles for medical research involving human subjects. Subject BAB195 and additional family members were recruited and consented for genomic and DNA studies under a research protocol approved by the Institutional Review Board of Baylor College of Medicine. All patients gave informed consent for participation in this study.
DNA preparation methods
Blood was directly collected in PAXgene Blood DNA tubes and DNA was isolated using the PAXgene Blood DNA kit (PreAnalytiX, Qiagen, Valencia, CA, USA). The quality of the DNA sample was ascertained by electrophoresis and determined to be of high quality (size >23 kb) with no visible degradation. Quantity was determined using standard Pico Green assays.
Description of NimbleGen VCRome 2.1 exome capture design
In the current methods, the HGSC-design 'VCRome 2.1' reagent  was used for exome capture. This NimbleGen probe set targets the Vega , CCDS, and RefSeq gene models, including predicted genes within RefSeq, as well as microRNA (miRNA)  and regulatory regions for a total target size of 42 Mb. Thus, exome capture sequencing will identify SNPs and indels in protein coding regions, exon flanking sequences (including intron splice sites), regulatory regions, and small non-coding RNAs (for example, microRNAs).
Illumina exome sequencing on GAII and HiSeq
Genomic DNA samples were constructed into Illumina PairEnd pre-capture libraries according to the manufacturer's protocol (Illumina Multiplexing_SamplePrep_Guide_1005361_D) with modifications as described in the BCM-HGSC Illumina Barcoded Paired-End Capture Library Preparation protocol. The complete library and capture protocol, as well as oligonucleotide sequences are accessible from the HGSC website .
Briefly, 1 ug of genomic DNA in 100 uL volume was sheared into fragments of approximately 300 base pairs in a Covaris microTube with the E2 system (Covaris, Inc., Woburn, MA, USA) followed by end-repair, A-tailing, and ligation of the Illumina multiplexing PE adaptors. Pre-capture Ligation Mediated-PCR (LM-PCR) was performed for seven cycles of amplification using the 2X SOLiD Library High Fidelity Amplification Mix (custom product manufactured by Invitrogen). Universal primer IMUX-P1.0 and pre-capture barcoded primer IBC were used in the PCR amplification. Purification was performed with Agencourt AMPure XP beads. Quantification and size distribution of the pre-capture LM-PCR product was determined using the Agilent Bioanalyzer 2100 DNA 7500 chip.
Pre-capture libraries (1 ug) were hybridized in solution to the VCRome 2.1 exome capture platform (HGSC design, NimbleGen) described above according to the manufacturer's protocol NimbleGen SeqCap EZ Exome Library SR User's Guide (Version 2.2) with minor revisions. Human CotI DNA and full-length Illumina adaptor-specific blocking oligonucleotides were added into the hybridization mix to block repetitive genomic sequences and the adaptor sequences. Post-capture LM-PCR amplification was performed using the 2X SOLiD Library High Fidelity Amplification Mix with 14 cycles of amplification. After the final XP bead purification, quantity and size of the capture library was analyzed using the Agilent Bioanalyzer 2100 DNA Chip 7500. The efficiency of the capture was evaluated by performing a qPCR-based quality check on the four standard NimbleGen internal controls. Successful enrichment of the capture libraries was estimated to be in the range of 7 to 9 of ΔCt value over the non-enriched samples.
Illumina sequencing on GAIIx and HiSeq 2000
Library templates were prepared for sequencing using Illumina's cBot cluster generation system with the corresponding TruSeq PE Cluster Kits for the GA and HiSeq. Briefly, these libraries were denatured with sodium hydroxide and diluted to 3 to 6 pM in hybridization buffer in order to achieve a load density of 400 to 550 k clusters/mm2 on the GAIIx and 700 to 900 k clusters/mm2 on the HiSeq 2000. Barcoded libraries were loaded in two independent lanes of a GAIIx flow cell and then merged for analysis. Due to the higher capacity of the HiSeq 2000, each barcoded sample was pooled (in a set of three) for loading onto a single lane of a HiSeq flowcell and then deconvoluted for analysis based on barcode sequence. All lanes were spiked with 2% phage phiX DNA control library for run quality control. After loading onto the flow cell, sample libraries underwent bridge amplification to form clonal clusters, and the sequencing primer was hybridized.
Sequencing runs were performed in paired-end mode using the Illumina Genome Analyzer IIx (GAIIx) and HiSeq 2000 platforms. Using the TruSeq SBS Kits for the GA and the HiSeq, sequencing-by-synthesis reactions were extended for 101 cycles from each end, with an additional seven cycles for the index read for runs that included pooled samples. Real-time analysis (RTA) software was used to process the image analysis and base calling. Sequencing runs generated approximately 40-65 million and 350-500 million successful reads per lane for the GAIIx and HiSeq, respectively. With these run yields, each sample library achieved 10 to 12 Gb of raw DNA sequence data, which enabled a minimum of 20x coverage for 95% to 97% of the bases targeted in the exome.
Illumina whole genome sequencing on HiSeq 2000
The WGS library was sequenced with the Illumina HiSeq 2000 platform using the template preparation and sequencing methods described above for the HiSeq 2000, with the following exceptions: a total of three flow cell lanes were loaded, yielding approximately 148 Gb of sequence for the WGS dataset.
Illumina analysis and SNP calls
Illumina sequence analysis was performed using the HGSC Mercury analysis pipeline  that manages all aspects of data processing and analyses, moving data step by step through various analysis tools from the initial sequence generation on the instrument to annotated variant calls (SNPs and intra-read in/dels). First, the primary analysis software on the instrument produces .bcl files that are transferred off-instrument into the HGSC analysis infrastructure by the HiSeq Real-time Analysis module (1.13.48). Once the run is complete and all .bcl files are transferred, Mercury runs the vendor's primary analysis software (CASAVA v1.8.0), which de-multiplexes pooled samples and generates sequence reads and base-call confidence values (qualities). The next step is the mapping of reads and qualities to the GRCh37 human reference genome assembly  using the Burrows-Wheeler aligner (BWA [13, 14]) and producing a BAM  (binary alignment/map) file. The third step involves quality recalibration (using GATK [16, 17]), and where necessary the merging of separate sequence-event BAMs into a single sample-level BAM is performed. BAM sorting, duplicate read marking, and realignment to improve in/del discovery all occur at this step. Next, we used the Atlas2  suite (Atlas-SNP and Atlas-indel) to call variants and produce a variant call file (VCF ). Finally, annotation data is added to the vcf using the Cassandra  suite of annotation tools that brings together relevant annotation information using ANNOVAR  with UCSC and RefSeq gene models, as well as a host of other internal and external data resources.
Ion torrent exome sequencing using PGM and Proton
Library construction:The pre-capture libraries for the Ion Torrent platforms (PGM and Proton) were constructed using the Ion Xpress Plus Fragment Library Kit (Life Technologies) and the experimental procedures followed the manufacturer's Ion Xpress Plus gDNA and Amplicon Library Preparation User Guide with minor modifications.
The PGM pre-capture library was generated using 1 ug of genomic DNA which was fragmented to approximately 200 base pairs by the Covaris S2 system (Covaris, Inc., Woburn, MA, USA) and purified with 1.2x Agencourt Ampure XP Beads (Beckman Coulter, Cat. No. A63882). Fragmentation was followed by end-repair, blunt-end ligation of the Ion Xpress Barcode and Ion P1 adaptors as well as nick translation. Pre-capture ligation-mediated PCR (LM-PCR) was performed for eight cycles of amplification using 2X SOLiD Library High Fidelity Amplification Mix (custom product by Invitrogen).
The Proton pre-capture library was generated in a similar manner as above starting with fragmentation of genomic DNA to 100 base pairs using the Covaris S2 system and purified with 1.8x Agencourt Ampure XP Beads. Blunt-end ligation was performed using the non-barcoded Ion A and Ion P1 adaptors. Post-ligation size selection was performed using a 3% agarose gel cassette with the target size of 180 bp. Final pre-capture LM-PCR was performed again using 2X SOLiD Library High Fidelity Amplification Mix (custom product by Invitrogen), for 12 cycles of amplification. The above enzymatic reactions for both the ION Torrent PGM and Proton libraries were followed by AMPure XP bead purification (1.4x for the PGM library and 1.8x for the Proton library, respectively). Quantity and size distribution of the PCR product were analyzed using the Agilent Bioanalyzer 2100 DNA 7500 chip.
Capture methods:For target enrichment, 1 ug of each pre-capture library was hybridized in solution to the previously described HGSC VCRome 2.1 exome capture design following the Manual of Ion TargetSeq Custom Enrichment Kits with minor modifications. Human CotI DNA and adaptor A and P1-specific blocking oligonucleotides were added into the hybridization reactions to block repetitive genomic sequences and the common adaptor sequences. Post-capture LM-PCR amplification was performed using 2X SOLiD Library High Fidelity Amplification Mix with 14 to 16 cycles of amplification. Quantities and sizes of the capture libraries were analyzed using the Agilent Bioanalyzer 2100 DNA Chip 7500. The efficiency of the capture was again evaluated by performing a qPCR-based quality check on the four standard NimbleGen internal controls. Successful enrichment of the capture libraries was estimated to range from a 7 to 9 of ΔCt value over the non-enriched samples.
Template preparation and sequencing:Library templates were prepared for sequencing using the Life Technologies Ion Xpress and Ion OneTouch protocols and reagents. Briefly, library fragments were clonally amplified onto ion sphere particles (ISPs) through emulsion PCR and then enriched for template-positive ISPs. More specifically, PGM emulsion PCR reactions utilized the Ion Xpress 200 Template Kit (Life Technologies) as specified by the vendor. Emulsions were generated with an IKA Ultra-Turrax, amplification followed using standard thermal cycling methods, and the ISPs were recovered with a SOLiD emulsion collection tray (Life Technologies) through centrifugation. In some instances, these amplification and recovery steps were automated for the PGM reactions using the Ion OneTouch System and the Ion OneTouch Template Kit v2, and similarly, the Proton emulsion PCR reactions were performed using the Ion Proton I Template OT2 Kit (Life Technologies), with amplification and recovery steps automated with the Ion OneTouch 2 System. Following recovery, enrichment was completed by selectively binding the ISPs containing amplified library fragments to streptavidin coated magnetic beads, removing empty ISPs through washing steps, and denaturing the library strands to allow for collection of the template-positive ISPs. For all reactions, these steps were accomplished using the Life Technologies ES module of the Ion OneTouch 2 System, and template-positive ISPs were quantified using the Guava EasyCyte 5 (Millipore Technologies), obtaining >90% enrichment efficiency for all reactions.
For PGM runs, approximately 35 million template-positive ISPs per run were deposited onto the Ion 318C chips (Life Technologies) by a series of centrifugation steps that incorporated alternating the chip directionality. Sequencing was performed with the Ion PGM 200 Sequencing Kit (Life Technologies) using the 440 flow ('200 bp') run format. For the single Proton run, the Proton P1 Chip was first pre-rinsed and incubated with NaOH for 1 min before loading in order to minimize residual contaminants and decrease background signal. Approximately 300-350 million template-positive ISPs were deposited onto the Proton PI Chip and then underwent sequencing using the Proton's 260 flow ('100 bp') run format and the Life Technologies Ion Proton I Sequencing Kit. A total of nine PGM runs and one Proton run were sequenced, which yielded a total of 6.2 and 7.9 Gb of data, respectively. Analyses showed that 91% to 92% of the targeted exome bases were covered to a depth of at least 20x.
Ion Torrent analysis methods:Sequence data from either Ion Torrent Personal Genome Machine (PGM) or Proton were processed on the instrument server using the Torrent Suite v3.2 software package. These include signal processing, base calling, and mapping. Reads were mapped to the GRCh37 human reference genome assembly  via TMAP short read aligner . In the case of the Proton platform, reads identified with the same start coordinate and 3' adaptor flow position were considered duplicates and were removed via a proprietary method in the Torrent Suite. For the PGM, duplicate reads were identified solely by start position and were removed using the Picard MarkDuplicates tool . Where multiple sequencing runs were generated on the same library, BAM files were combined into a single BAM file with Picard MergeSamFiles prior to identifying and removing duplicates.
Single nucleotide variants (SNVs) were identified using VarIONt, an extensible and highly configurable variant caller pipeline developed at the HGSC and specifically tuned for Ion Torrent sequence. It utilizes SAMtools pileup  to generate a pileup string at every mapped base coordinate. It filters out portions of the read that may be unreliable, and then identifies alternate alleles from the remainder. Part of VarIONt's capability is to define thresholds at which variants are called. For this application, we required that the interrogation site have a minimum read coverage of 10X and a minor allele fraction (maf) of 0.05. High confidence variants must have a minimum read depth of 30X and 0.1 maf. Once the variants were called, the previously described Cassandra annotation suite was used to annotate the identified variants.
SOLiD whole genome sequencing
SOLiD data used for comparative analyses, referred to here as SOLiD WGS, were originally obtained as specified in Lupski et al. . In summary, whole genome sequence data were generated at 29.6x average depth of coverage using the Sequencing by Oligonucleotide Ligation and Detection (SOLiD) technology (Life Technologies, formerly Applied Biosystems), with a mappable yield of 89.6 Gb. Mapping and variant calling analyses were performed using the analysis suite Corona Lite.
This study has been registered at the National Center for Biotechnology Information (NCBI) under BioProject ID 203659 (Accession: PRJNA203659) for BioSample: SAMN00009513. Data are publicly available through the Sequence Read Archive (SRA) under accession numbers: SRX286243, SRX286245, SRX286282, SRX286417, SRX286419, SRX286832.
Data analyses and comparison of variants between the multiple sequencing runs were performed using custom Perl scripts.
Confirmation of variants
Variants of interest were confirmed by Sanger sequencing of amplified PCR products. Primers specific to the region containing the variant to be tested were designed. Standard end-point PCR was performed using QIAGEN HotStar Taq polymerase (QIAGEN Sciences, Maryland, USA). For ABCD1 fragment amplification, long-range PCR was performed using the QIAGEN Long-range PCR mix. Amplification of specific PCR fragments was confirmed by agarose gel electrophoresis. Endonuclease restriction digestion was performed to orthogonally test the genotype and segregation of some of the variants of interest. The amplified PCR products were digested using specific restriction enzymes (New England BioLabs) to identify the site of the mutations to be tested. Specific endonuclease digestion was verified by agarose gel electrophoresis.
Results and Discussion
SNV identified by replicated ES of CMT proband.
Total reads produced
Duplicate reads (%)
Total reads aligned (%)
Aligned reads on target (%)
Targets hit (%)
Targeted bases with 10+ coverage (%)
Targeted bases with 20+ coverage (%)
Targeted bases with 40+ coverage (%)
Of note, comparison between the two WGS approaches, one being the original SOLiD WGS at approximately 30x average depth of coverage and the second one the Illumina HiSeq WGS at approximately 47x depth of coverage resulted in 3,090,120 concordant SNPs across the whole genome between the two experiments (90.34% of the original SOLiD WGS SNPs).
Resolution of incidental findings from WGS
Observed pharmacogenetic variants by WGS versus ES.
T/T(r = 2)
C/T(r = 65)
Elsewhere, the greater sensitivity of the ES data resolved false positive calls from the WGS. For example, the dominant ABCD1 ALD allele that had been previously reported was resolved as 'wild-type' (normal) with the further sequencing on the HiSeq platform. Examination of the individual sequence reads in the original WGS data showed that the available data had only six reads with three reference reads (two unique) and three variant reads (three unique). In contrast, the new ES data provided 232 reads at this base position. Thirty-five of these reads did contain the variant; however, the mapping algorithm (BWA) that was used to align the reads also showed all 35 were distinct, providing evidence for non-unique alignment. Hence, these 35 reads were more likely to represent other related genomic loci, such as a pseudogene or segmental duplications of the genomic interval containing this gene, and not the true ABCD1 locus.
Identification of a complex SH3TC2 allele associated with CMT
Comparison of SH3TC2 alleles in six exome sequencing experiments and one whole-genome sequencing.
Retrospective analysis of the prior WGS SOLiD data  revealed that the total accumulated DNA sequence reads covering this site had satisfied the applied filters (total of 15 unique sequencing reads covering position chr5:148,442,585). The fraction of those reads, however, that represented the variant allele was below the accepted threshold (two reads of the total 15, 13.3% versus the filter's cutoff >20% variant allele fraction). Hence, the site was adequately covered overall, but not by enough reads containing the variant allele for the bioinformatic algorithm to call it a variable position. Others have suggested an intrinsic allelic bias in the ligation based SOLiD sequencing method, however, this individual example can be readily explained stochastically [27, 28].
These new data regarding SH3TC2 variant alleles suggest three possibilities: (1) the newly identified p.M1? variant is the causative allele in cis with a benign p.Y169H; (2) the p.M1? variant alone has little effect because of possible re-initiation of translation from an alternative methionine initiation codon 90 amino acids downstream and hence pathogenicity of this allele results from the downstream p.Y169H; or (3) the CMT causality requires the combination of variants in the complex allele (p.M1?; p.Y169H) plus the co-segregating nonsense allele p.R954X . This latter hypothesis would be consistent with a 'mutational load' model, in which a partial loss-of-function hypomorphic complex allele is independently segregating with the electrophysiologically identified axonopathy as observed for the homozygous PMP22 T118M mutation . Rare variant alleles and combinations of such alleles at a locus may aggregate from parental contributions or may occur by new mutations within recent ancestors; a concept referred to as clan genomics . It is distinctly possible that the p.M1? variant occurred de novo on a haplotype that contains p.Y169H within the clan. Of note, while the p.M1? variant is novel, since our original report  the p.Y169H variant was observed in additional individuals. However, it is always observed in the heterozygous state and never as a homozygous variant, making its functional significance elusive.
Western blotting of lymphoblastoid cell lines derived from the subject and family members carrying different variant alleles at the SH3TC2 locus identified, using a specific anti-SH3TC2 antibody, only one smaller band of a size consistent with the protein product of a predicted spliced variant of SH3TC2 (Additional File 1). Thus, in these lymphoblastoid cell lines from the subject and other family members segregating the alleles, no evidence of a truncated protein, perhaps reflecting nonsense mediated decay of the nonsense mutation bearing transcript, nor evidence of a predicted sized protein due to re-initiation of translation from an internal AUG initiation codon could be obtained.
These new findings from ES studies further support the original study conclusions of: (1) pathogenic involvement of the SH3TC2 gene; and (2) spurious interpretations of 'incidental' findings that may occur based upon current database entries. Further, these new data illuminate how structural variation unique to individuals from clinical populations might challenge interpretation of some variant alleles.
Variant call comparison across all platforms for variants mentioned throughout the text.
The authors would like to thank Dr. Arthur Beaudet and Dr. Pawel Stankiewicz for their critical review of the manuscript; and Joep de Ligt for helpful discussions of analysis results. This work was supported in part by US National Institute of Neurological Disorders and Stroke (NINDS) grant R01NS058529, and US National Human Genome Research Institute (NHGRI) grants U54HG006542 and U54HG003273.
- Gonzaga-Jauregui C, Lupski JR, Gibbs RA: Human genome sequencing in health and disease. Annu Rev Med. 2012, 63: 35-61. 10.1146/annurev-med-051010-162644.PubMed CentralView ArticlePubMedGoogle Scholar
- Lupski JR, Reid JG, Gonzaga-Jauregui C, Rio Deiros D, Chen DC, Nazareth L, Bainbridge M, Dinh H, Jing C, Wheeler DA, McGuire AL, Zhang F, Stankiewicz P, Halperin JJ, Yang C, Gehman C, Guo D, Irikat RK, Tom W, Fantin NJ, Muzny DM, Gibbs RA: Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N Engl J Med. 2010, 362: 1181-1191. 10.1056/NEJMoa0908094.PubMed CentralView ArticlePubMedGoogle Scholar
- Dvoráková L, Storkánová G, Unterrainer G, Hujová J, Kmoch S, Zeman J, Hrebícek M, Berger J: Eight novel ABCD1 gene mutations and three polymorphisms in patients with X-linked adrenoleukodystrophy: The first polymorphism causing an amino acid exchange. Hum Mutat. 2001, 18: 52-60. 10.1002/humu.1149.View ArticlePubMedGoogle Scholar
- Tachi N, Kikuchi S, Kozuka N, Nogami A: A new mutation of IGHMBP2 gene in spinal muscular atrophy with respiratory distress type 1. Pediatr Neurol. 2005, 32: 288-290. 10.1016/j.pediatrneurol.2004.11.003.View ArticlePubMedGoogle Scholar
- Bainbridge MN, Wang M, Burgess DL, Kovar C, Rodesch MJ, D'Ascenzo M, Kitzman J, Wu YQ, Newsham I, Richmond TA, Jeddeloh JA, Muzny D, Albert TJ, Gibbs RA: Whole exome capture in solution with 3 Gbp of data. Genome Biol. 2010, 11: R62-10.1186/gb-2010-11-6-r62.PubMed CentralView ArticlePubMedGoogle Scholar
- Bainbridge MN, Wang M, Wu Y, Newsham I, Muzny DM, Jefferies JL, Albert TJ, Burgess DL, Gibbs RA: Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities. Genome Biol. 2011, 12: R68-10.1186/gb-2011-12-7-r68.PubMed CentralView ArticlePubMedGoogle Scholar
- Bainbridge MN, Wang M, Wu Y, Newsham I, Muzny DM, Jefferies JL, Albert TJ, Burgess DL, Gibbs RA: Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities. Genome Biol. 2011, 12: R68-10.1186/gb-2011-12-7-r68.PubMed CentralView ArticlePubMedGoogle Scholar
- Wilming LG, Gilbert JG, Howe K, Trevanion S, Hubbard T, Harrow JL: The vertebrate genome annotation (Vega) database. Nucleic Acids Res. 2008, 36 (Database): D753-D760.PubMed CentralView ArticlePubMedGoogle Scholar
- Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ: miRBase: tools for microRNA genomics. Nucleic Acids Res. 2008, 36 (Database): D154-D158.PubMed CentralView ArticlePubMedGoogle Scholar
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics. 2009, 25: 1754-1760. 10.1093/bioinformatics/btp324.PubMed CentralView ArticlePubMedGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup: The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics. 2009, 25: 2078-2079. 10.1093/bioinformatics/btp352.PubMed CentralView ArticlePubMedGoogle Scholar
- DePristo M, Banks E, Poplin R, Garimella K, Maguire J, Hartl C, Philippakis A, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell T, Kernytsky A, Sivachenko A, Cibulskis K, Gabriel S, Altshuler D, Daly M: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011, 43: 491-498. 10.1038/ng.806.PubMed CentralView ArticlePubMedGoogle Scholar
- Challis D, Yu J, Evani US, Jackson AR, Paithankar S, Coarfa C, Milosavljevic A, Gibbs RA, Yu F: An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics. 2012, 13: 8-10.1186/1471-2105-13-8.PubMed CentralView ArticlePubMedGoogle Scholar
- Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group: The variant call format and VCFtools. Bioinformatics. 2011, 27: 2156-2158. 10.1093/bioinformatics/btr330.PubMed CentralView ArticlePubMedGoogle Scholar
- Bainbridge MN, Wiszniewski W, Murdock DR, Friedman J, Gonzaga-Jauregui C, Newsham I, Reid JG, Fink JK, Morgan MB, Gingras MC, Muzny DM, Hoang LD, Yousaf S, Lupski JR, Gibbs RA: Whole-genome sequencing for optimized patient management. Sci Transl Med. 2011, 3: 87re3-10.1126/scitranslmed.3002243.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang K, Li M, Hakonarson H: ANNOVAR: Functional annotation of genetic variants from next-generation sequencing data. Nucleic Acids Res. 2010, 38: e164-10.1093/nar/gkq603.PubMed CentralView ArticlePubMedGoogle Scholar
- Braun A, Kammerer S, Ambach H, Roscher AA: Characterization of a partial pseudogene homologous to the adrenoleukodystrophy gene and application to mutation detection. Hum Mutat. 1996, 7: 105-108. 10.1002/(SICI)1098-1004(1996)7:2<105::AID-HUMU3>3.0.CO;2-B.View ArticlePubMedGoogle Scholar
- Hedges DJ, Guettouche T, Yang S, Bademci G, Diaz A, Andersen A, Hulme WF, Linker S, Mehta A, Edwards YJ, Beecham GW, Martin ER, Pericak-Vance MA, Zuchner S, Vance JM, Gilbert JR: Comparison of three targeted enrichment strategies on the SOLiD sequencing platform. PLoS One. 2011, 6: e18595-10.1371/journal.pone.0018595.PubMed CentralView ArticlePubMedGoogle Scholar
- McKernan KJ, Peckham HE, Costa GL, McLaughlin SF, Fu Y, Tsung EF, Clouser CR, Duncan C, Ichikawa JK, Lee CC, Zhang Z, Ranade SS, Dimalanta ET, Hyland FC, Sokolsky TD, Zhang L, Sheridan A, Fu H, Hendrickson CL, Li B, Kotler L, Stuart JR, Malek JA, Manning JM, Antipova AA, Perez DS, Moore MP, Hayashibara KC, Lyons MR, Beaudoin RE, et al: Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 2009, 19: 1527-1541. 10.1101/gr.091868.109.PubMed CentralView ArticlePubMedGoogle Scholar
- Shroyer NF, Lewis RA, Yatsenko AN, Lupski JR: Null missense ABCR (ABCA4) mutations in a family with Stargardt disease and retinitis pigmentosa. Invest Ophthalmol Vis Sci. 2001, 42: 2757-2761.PubMedGoogle Scholar
- Shy ME, Scavina MT, Clark A, Krajewski KM, Li J, Kamholz J, Kolodny E, Szigeti K, Fischer RA, Saifi GM, Scherer SS, Lupski JR: T118M PMP22 mutation causes partial loss of function and HNPP-like neuropathy. Ann Neurol. 2006, 59: 358-364. 10.1002/ana.20777.View ArticlePubMedGoogle Scholar
- Lupski JR, Belmont JW, Boerwinkle E, Gibbs RA: Clan genomics and the complex architecture of human disease. Cell. 2011, 147: 32-43. 10.1016/j.cell.2011.09.008.PubMed CentralView ArticlePubMedGoogle Scholar
- Lifton RP: Individual genomes on the horizon. N Engl J Med. 2010, 362: 1235-1236. 10.1056/NEJMe1001090.View ArticlePubMedGoogle Scholar
- Nelson MR, Wegmann D, Ehm MG, Kessner D, St Jean P, Verzilli C, Shen J, Tang Z, Bacanu SA, Fraser D, Warren L, Aponte J, Zawistowski M, Liu X, Zhang H, Zhang Y, Li J, Li Y, Li L, Woollard P, Topp S, Hall MD, Nangle K, Wang J, Abecasis G, Cardon LR, Zöllner S, Whittaker JC, Chissoe SL, Novembre J, et al: An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science. 2012, 337: 100-104. 10.1126/science.1217876.PubMed CentralView ArticlePubMedGoogle Scholar
- Clark MJ, Chen R, Lam HY, Karczewski KJ, Chen R, Euskirchen G, Butte AJ, Snyder M: Performance comparison of exome DNA sequencing technologies. Nat Biotechnol. 2011, 29: 908-914. 10.1038/nbt.1975.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.