BAIT: Organizing genomes and mapping rearrangements in single cells
© Hills et al.; licensee BioMed Central Ltd. 2013
Received: 29 May 2013
Accepted: 9 September 2013
Published: 13 September 2013
Strand-seq is a single-cell sequencing technique to finely map sister chromatid exchanges (SCEs) and other rearrangements. To analyze these data, we introduce BAIT, software which assigns templates and identifies and localizes SCEs. We demonstrate BAIT can refine completed reference assemblies, identifying approximately 21 Mb of incorrectly oriented fragments and placing over half (2.6 Mb) of the orphan fragments in mm10/GRCm38. BAIT also stratifies scaffold-stage assemblies, potentially accelerating the assembling and finishing of reference genomes. BAIT is available at http://sourceforge.net/projects/bait/.
SCEs are the outcome of the repair of double strand breaks, and their accumulation is an early indicator of genomic instability . Strand-seq data allows the identification and mapping of these events at unprecedented resolution . The frequency of SCEs has been used as a surrogate for assessing the toxicity of mutagens , and as a diagnostic marker for disorders such as Bloom’s syndrome, which have a characteristically high frequency of SCEs . Stand-seq can also detect translocations, inversions, deletions, and amplifications. Deletions and amplifications present as a loss or gain of reads over particular regions, and will locate to the same region across all libraries, making them easy to identify. Translocations and inversions appear identical to SCE events in individual libraries (Figure 1c), but can be resolved when the event locations are compiled across multiple libraries, as they will all occur over the same region. Preliminary data suggests that this approach works well in identifying and localizing chromosomal abnormalities (manuscript in preparation). It is further possible to apply Strand-seq to estimate the frequency of genomic rearrangements in a heterogeneous population of cells.
We showed previously that Strand-seq also has an application in correcting incorrectly oriented portions of the mouse reference assemblies . Reference assemblies have become essential tools for aligning sequences and identifying variations, and thus, the need for a complete and accurate reference genome for any organism of interest is essential . At present, a variety of organisms have been targeted for genome sequencing projects , and more established genomes are being continually updated. For example, the mouse reference genome was first published in 2002 , and has been periodically updated with more complete and corrected assembly versions. In most such iterations of reference assemblies, there are both gaps of unknown length within the sequence (typically regions difficult to sequence), and 'orphan scaffolds’ that have yet to be mapped to particular chromosomes or regions on specific chromosomes (likely to map within gaps, and lacking the tiling to form contiguous sequences). Although PCR-based approaches , forms of restriction mapping [10, 11] and optical mapping  can be used to bridge these gaps or connect orphan scaffolds, there are still currently 628 gaps and 44 orphan scaffolds in the latest mouse reference assembly (GRCm38/mm10), and 357 gaps and 65 orphan scaffolds in the latest iteration of the human assembly (GRCh37/hg19). Many of the gaps are unbridged, representing spaces in the genome build of unknown length, and importantly, the relative orientation of sequences on either side of these gaps are also unknown. Furthermore, there are many early-build genome projects underway, most of which remain at the contig stage, consisting of thousands of contiguous sequences that are unplaced with respect to each other, and not localized to any chromosomes. With recent efforts aiming to rapidly generate reference genomes from 10,000 organisms [13, 14], the need for alternate approaches to build the thousands of contigs from scaffold-level genomes into useable reference assemblies is paramount, and here we show that Strand-seq can perform a pivotal role in this.
Strand-seq has many applications for the study of tumor heterogeneity and evolution, and for genome instability in diseases of aging, as well as an enormous potential for rapidly building and refining the growing repertoire of reference assemblies. It is also an efficient technique, with the ability to sequence up to 200 indexed libraries simultaneously on a single lane. However, in order to analyze Strand-seq features across these large datasets, the technique needed an intuitive software package that could automate this process. Here we describe new open source software, Bioinformatic Analysis of Inherited Templates (BAIT), which builds upon our previously described plotting function  and enables high-throughput analysis of Strand-seq data. BAIT is a command line-driven application for UNIX platforms, available under the two-clause Berkeley Software Distribution (BSD) license .
Data management and processing
BAIT provides a core framework for Strand-seq analysis, including functionality to plot W and C template strands, count aneuploid chromosomes, and map and enumerate SCE events (see Additional file 1: Figure S1). Extending these core functions for genome assembly, BAIT leverages strand-inheritance data to identify misoriented contigs, localize orphan scaffolds to specific chromosome regions on late-build genomes, and assemble early-build genomes de novo from non-overlapping fragments, using only one lane of sequencing containing up to 200 indexed libraries. In concert with Strand-seq, BAIT has major applications in detecting SCEs, analyzing sister chromatid segregation, and building and finishing genome assemblies.
BAIT accepts sequencing data in BAM format and parses it with SAMtools  to remove duplicate reads, threshold for quality, and discern read direction. These data are then fed to multiple R scripts (incorporating packages from Bioconductor ), which bin the data (200 kb windows by default), and compute strand inheritance, perform SCE analysis and plot chromosome ideograms showing read density, directionality, and predicted SCE events (Figure 1). Additional options in the command line allow alternate forms of output, additional plotting parameters, and the ability to convert data into BED files that are auto-formatted for UCSC genome browser upload using the BEDtools package .
The ability of BAIT to accurately assess SCE events and genome build analyses can be confounded by technical variability from the Strand-seq protocol, including spurious or constant low-background reads, or variable read depths. Much of this variability is presumably engendered by BrdU uptake by the cell, and the subsequent successful removal of the BrdU-incorporated (non-template) strand from the pre-amplified library. In order to aid decisions to remove low-quality libraries from further analysis, BAIT calculates this metric by first performing an unfiltered prediction of strand inheritance, then computing library background as the average frequency of spurious non-template-strand reads (C reads on chromosomes when homozygous W template strands were inherited, and vice versa). This value is expressed as a background percentage on each library ideogram.
A summary file is also generated (see Additional file 2: Supplemental Data File 1), including the frequency of WW, WC, and CC template inheritance for each intact chromosome for the analysis of sister chromatid segregation. The distributions of template strands are presented as pie charts, showing P-value significance from χ2 analysis after Holm correction . BAIT also plots the template inheritance across each bin of every chromosome (see Additional file 2: Supplemental Data File 1), and creates BED files of the locations of all SCE events, which is useful for all subsequent analysis of Strand-seq data, such as mapping SCEs and genomic rearrangements.
The 62 Strand-seq libraries used in this study are publically available from the Sequence Read Archive SRA055924, and have been published previously . BAIT took 81 minutes to process these libraries, with an average of 3,235,111 reads each, using a single core of an Intel i7-870 2.93 GHz processor on a computer with 16 Gb of RAM.
Detection of sister chromatid exchanges, misorientations, and genomic rearrangements
BAIT first makes gross event calls by utilizing the circular binary segmentation algorithm  implemented in the CNV Bioconductor package DNAcopy  to locate the SCE event to the two-bin interval. It then recalculates the template-strand ratio by segmenting this interval into five new bins (80 kb each using default bin size), narrowing the location of the SCE interval further. BAIT applies this binning-based DNA-copy detection method iteratively, decreasing the bin size by a factor of five each time (Figure 2b), until the read density is no longer sufficient to make accurate calls (determined to be when an interval has less than 50 reads, or when DNAcopy can no longer predict a single event (Figure 2c). In order to identify SCE events on the boundary of bins, BAIT pads each interval with one-half of the interval length in each direction (Figure 2b,c; red arrows).
BAIT then refines the gross interval by incorporating a simple walker algorithm that analyzes reads starting from the homozygous state, and reports the first read on the opposite template that represents a switch to a heterozygous state (Figure 2c; green box). From this refined interval, the walker checks that the 10 preceding reads map to the homozygous state, and that at least 4 of the 20 following reads map to the opposite template state (Figure 2c). If these criteria are not met, as may be the case where the background is high, BAIT continues to analyze the across the interval until they are met. These checks improved the localization of SCE events (see Additional file 3: Figure S2), and varying these thresholds did little to change the data. Through this two-step process, BAIT automatically detects and localizes SCEs with a high degree of confidence, plots them on ideograms, and creates a UCSC-formatted BED file of all SCE event intervals.
BAIT amalgamates all called SCE events across libraries to identify any locations that have multiple SCE events associated with them. It reports any SCE-like event that occurs over the same interval in more than one library, treating them as a potential structural (genomic rearrangement) event, and calculating the number of occurrences. Events occurring in the same location over multiple libraries either are regions of recurrent SCE, or represent translocations, deletions, or inversions (Figure 1c). In addition, duplications are identified using the CNV function across each chromosome, and chromosomal anueploidy is calculated by comparing the read depth of each chromosome to the average read depth within the (diploid) library. A chromosomal read depth of half the library average corresponds to a single copy (monosome), whereas 1.5× the library average corresponds to three copies (triploid).
Although SCEs show a transition from a homozygous to a heterozygous template state (WW to WC, or CC to WC) in Strand-seq libraries, transitions between two homozygous template states (WW to CC and CC to WW) are identified as misoriented fragments in the reference genome. Previously, we manually identified and localized these events to unbridged gaps, and confirmed a subset of misorientations by hybridization of directional probes . BAIT distinguishes these events from SCEs, and writes the locations of these data to a separate CSV file. Invariably, misorientations in the reference genome will present as a template-strand switch in every Strand-seq library, so BAIT also computes the concordance across all libraries as a measure of robustness of the misorientation call. Because BAIT already calculates chromosomal aneuploidy, an SCE event in a monosome chromosome (W to C or C to W) will not be erroneously called as a misorientation (WW to CC or CC to WW).
Stratification of early-build genome assemblies
Early-build genome assemblies consist of many contigs, which are effectively unanchored and unordered. However, performing Strand-seq on cells derived from organisms with early assemblies will yield directional strand information for each contig, and any contigs residing on the same chromosome will inherit the same templates. Contigs from different chromosomes will inherit template strands independently, and by chance, the templates will be the same in only half of all libraries. Conversely, adjacent contigs will inherit the same template strands across all libraries. By comparing all contigs together, it is possible to cluster them into putative chromosomes based on the concordance between them.
BAIT initially excludes libraries where every contig has inherited WC templates (probably a failed Strand-seq library), as well as individual contigs that have inherited WC templates in all libraries (probably a contig with degenerate sequences that cannot be placed). It then uses a two-stage approach to assemble the remaining contigs into a putative assembly. First, it clusters all contigs with highly similar template inheritance into linkage groups that represent individual chromosomes. It does this by comparing the two contigs represented across the most libraries, and assessing template-strand concordance between them; if they share a high concordance, they are classified together in a single linkage group, otherwise they are classified into separate linkage groups. Each remaining contig in the assembly is individually compared with the groups already assigned, and is then either added to a linkage group if it shares a high similarity with that group, or is classified into a new linkage group if it does not. This process continues until all contigs have been stratified into linkage groups or classified as single unlinked contigs. Ideally, the number of linkage groups is equal to double the number of chromosomes within the organism (a plus-strand and minus-strand linkage group for each chromosome).
The second stage in BAIT scaffolding is performed individually on each linkage group/putative chromosome, by analyzing the contigs within each group. These contigs are compared with each other, and a relative order is computed based on template-strand concordance. If a chromosome had no SCEs in any libraries analyzed, every contig from that chromosome will share an identical template-strand inheritance, and their order cannot be determined. However, because SCEs switch template-strand inheritance along chromosomes, every SCE event will switch template strands along linkage groups (LGs), and therefore stratify the contigs within it. A single SCE event will split LGs into a cluster of contigs with homozygous WW or CC template inheritance to one side of the SCE event, and a cluster of contigs with heterozygous WC templates to the other side of the SCE event. In this way, the cumulative SCEs on any particular chromosome can be compiled across all libraries to help order contigs within the LG.
Similar to how meiotic recombination is used to create a genetic linkage map between loci , SCE events along the chromosome can be used to determine a genetic distance between contigs on the same chromosome, allowing them to be arranged and ordered. Adjacent contigs will have a lower probability of an SCE between them and a higher chance of inheriting the same template strands across all the libraries compared with contigs at opposite ends of the chromosome, which will be far more likely to have an SCE event between them. BAIT uses template-strand inheritance and SCE localization to build an inter-contig distance matrix for each linkage group. Then, using a traveling salesman algorithm (similar to finding the shortest route to take for traveling to multiple destinations only once) , BAIT calculates the shortest path through the distance matrix on each chromosome, thereby inferring the relative order of contigs within a linkage group.
Stratification of late-build genome assemblies
By comparing these locations across a batch of libraries, BAIT localizes these scaffolds to particular chromosomes. For each orphan scaffold with sufficient reads, BAIT assigns a template state, compares this against the template state of each chromosome within a particular library, and then iterates this process to compute the concordance across all libraries. Concordance is never 100% in practice, owing to libraries with high background, orphan scaffolds with too few reads to accurately call strands, SCE events within gaps between the scaffolds, and the 5 to 10% error rate of BAIT in SCE detection. Nevertheless, BAIT is still able to achieve high-quality predictions of scaffold location by taking the highest-concordance chromosome. Chromosomes are further split based on SCE locations, allowing for localization of orphan scaffolds to particular chromosomal regions (Figure 4). Because orphan scaffolds are likely to be located within gap regions rather than within contiguous sequence, BAIT can use a provided BED-format gap file to cross-reference all mapped orphan scaffold locations to gaps within the same interval. BAIT outputs in a BED file both the best predicted region for each fragment and any candidate gaps within that region.
Results and discussion
Accurate localization and mapping of SCEs
To assess the ability to computationally identify SCE events, BAIT predictions were compared with 528 SCE events from 62 murine embryonic stem cell Strand-seq libraries that had previously been identified manually . Manual processing of SCE events involved uploading BED-formatted Strand-seq data into the UCSC genome browser , and identifying the interval at which the templates switch. Initial comparisons showed that although BAIT identified over 97% of SCEs called manually, it also displayed a high false-discovery rate. To reduce this rate, a user-changeable threshold was incorporated, which excludes any bins that deviate from the average read depth, and thus have fewer or greater reads than expected.
Of the correctly identified SCE events, a comparison of the location of the SCE interval between automated and manual calls showed a median difference of just 34 bp (see Additional file 3: Figure S2). Almost two-thirds (65.8%) of the predictions were within 100 bp of the manual calls, with 74.7% of predictions within 10 kb. A summary of SCE distribution across all libraries was plotted, together with a histogram reporting the distance between events, helping to identify significant clustering of SCEs (see Additional file 2: Supplemental Data File 1). The accurate identification of SCEs is also important for the functions of BAIT which assemble and refine reference genomes (see sections below).
BAIT facilitates SCE analyses by rapidly counting and locating events, presenting a pipeline that can be incorporated into high-throughput strategies. BAIT accurately refines the interval between reads in which the template switch occurs, allowing regions with a high propensity to undergo SCE to be identified (for example, fragile sites  or sites of recurrent DNA damage). Accurate interval identification is also important in looking for genomic rearrangements such as translocations, and BAIT is able to detect these and assign a frequency of the rearrangement within the pool of libraries, requiring a far lower read depth than conventional split-pair read sequencing . A caveat to these analyses is that SCEs and genomic rearrangements are more difficult to detect on chromosomes that have more than two copies within a cell, potentially limiting its use in highly polyploid cancer cells. Taken together, our results show that BAIT is very accurate and efficient at predicting SCE intervals, and will be indispensable for future high-throughput analysis of Strand-seq data.
Improving early-stage reference genome builds
To test the ability of BAIT to build genomes de novo, we realigned our libraries to the first build of the mouse genome (MGSCv3). Of the 224,713 contigs in this assembly version, we included in the analysis the 77,258 that were over 10 kb, representing 2,006 Mb of DNA (81.0% of total assembly). After remerging and reorienting similar clusters, BAIT assigned 54,832 contigs, representing 1,742 Mb (64.9%) of the assembly, into 20 primary LGs (Figure 3a). Allosomes in these male-derived ESCs are effectively monosome, and so contigs derived from the sex chromosomes can be separately identified, as they only inherit a single W or C template strand, never both. After cross-referencing the locations of MGSCv3 contigs to GRCm38/mm10 coordinates, the majority of LGs clustered to only one chromosome (see Additional file 4: Figure S3), and the majority of chromosomes consisted of only one linkage group (Figure 3b). When more than one chromosome was attributed to the same linkage group, these groups could be split into two subclusters (see Additional file 4: Figure S3).
Similar results were seen when we simulated an early-stage reference by splitting the GRCm38/mm10 genome into a scaffold of the 403 chromosomal Giemsa bands (based on coordinates from the UCSC genome browser ), and realigned our libraries to this new reference version (see Additional file 5: Figure S4). Using disrupted concordance from SCEs as a genetic distance indicator, it was further possible to infer the relative orders of the contigs present in each linkage group.
The accuracy of ordering fragments is dependent on the frequency of SCEs, the number of libraries used in the analysis, and the level of library background (high-background libraries are more likely to have incorrect template calls). If the template strands of contigs are identical in all libraries (because no SCE events have occurred between them) their relative order remains unknown.
Taken together, these data show that with only a single lane of sequencing and just 62 Strand-seq libraries, BAIT can aid in the rough draft assembly of a scaffold-level reference genome. Importantly, preliminary sequencing efforts in lesser-studied organisms suffer from fewer resources spent on deep sequencing and subsequent curating and refining of the reference genome assemblies. With several ambitious sequencing projects in development , there is an increasing need for rapid and cost-effective construction of accurate and useful reference genomes. Arranging contigs to facilitate building chromosome-level and genome-level hierarchy represents an attractive advance toward this goal, especially in conjunction with existing technologies. We have shown that BAIT can effectively 'stitch’ contigs together based on shared template inheritance, and rapidly construct a useful skeleton assembly that can be built upon, and believe this technique will be widely adopted in standard genome assembly pipelines.
Refining and finishing completed reference assemblies
We have previously shown using Strand-seq that over 20 Mb of the MGSCv37/mm9 Mus musculus reference assembly is misoriented, involving 17 regions flanked by unbridged gaps . In the more recent GRCm38/mm10 build of the genome, 35% (7,079.49 kb) of these identified misorientations were subsequently corrected, validating Strand-seq with other approaches to correct orientation issues. In order to identify misorientations in the newest GRCm38/mm10 assembly, we repeated these analyses using the automated function of BAIT, identifying a total of 15 misoriented regions and 5 autosomal misorientations, with the remaining 10 located to the X chromosome (see Additional file 6: Table S1). Because the X chromosome only exists as one copy (monosomy) in the male embryonic stem cells (ESCs) of our dataset, misorientations appear indistinguishable from SCEs, and were identified by the intersection of events occurring over the same region across all libraries (see Additional file 2: Supplemental Data File 1). In this way, using just a single lane of sequencing, we were able to orient the majority of contigs (those larger than 10 kb with minimal segmental duplications) with respect to flanking contigs. Thus, using Strand-seq and BAIT with relatively low-coverage sequencing, the relative orientation of all reference contigs can be determined, effectively bridging all gaps in an assembly.
Locations of unplaced scaffolds on GRCm38/mm10 a
Scaffold size, kb
Localizes to: ,Mbb
BAIT provides the functionality to realize several powerful and exciting applications of Strand-seq: strand inheritance, SCE analysis, genomic rearrangements, and finishing genomes. With a robust strand-inheritance analysis tool and accurate SCE calling, BAIT is able to interrogate Strand-seq data to follow template-strand segregation patterns, and is currently the most informative technique for testing such patterns [29–32]. In being able to identify SCE events to a kilobase resolution in one cell division (compared with a megabase resolution and two cell divisions for standard cytogenetic analysis [33, 34]), Strand-seq offers a unique tool to examine regions of recurrent damage, and enumerates events in cells that have differing genetic backgrounds or have been subjected to different damaging agents. Crucially, these events can be independently assayed and mapped in individual chromosomes at a very high resolution without relying on cytogenetic expertise. In addition, we present here a novel use of template-strand analysis to localize fragments and orient contigs, which has yielded a more refined mouse reference assembly with 20.8 Mb of contigs corrected (see Additional file 6: Table S1) and 2.7 Mb of orphan scaffolds localized to specific regions (Table 1). The ability to refine assemblies can be expanded to systematically stratify the thousands of scaffolds that make up early-version reference genome endeavors without the need for overlapping contigs to determine orientation or relative order. Taken together, BAIT will be indispensable for future Strand-seq studies, and we foresee its widespread adoption in a number of applications, most notably for refining and finishing assemblies at various levels of completeness.
Availability and requirements
Project name: BAIT.
Project homepage: See reference .
Operating system: Linux.
Programming language: BASH and R.
Other requirements: SAMtools version 1.17 or higher, BEDtools version 2.17.0 or higher, R version 3.0 or higher, DNAcopy R package, gplots R package.
License: Two-clause BSD.
Restrictions for non-academics: license needed.
Bioinformatic Analysis of Inherited Templates
Binary alignment map
Browser Extensible Data
Berkeley Software Distribution
Copy number variation
Embryonic stem cell
Genome Reference Consortium
National Center for Biotechnology Information
Sister chromatid exchange
University of California Santa Cruz.
We are grateful for advice and discussion from Ulrike Naumann, Ashley D Sanders, Elizabeth A Chavez, Duncan F Locke and Steven SS Poon. We thank Marianna Bevova for critical review of the manuscript. The Lansdorp laboratory is supported by grants from the Canadian Institutes of Health Research (RMF-92093 and 105265), the US National Institutes of Health (R01GM094146), and the Terry Fox Foundation (018006). PML is a recipient of an Advanced Grant from the European Research Council.
- Falconer E, Hills M, Naumann U, Poon SS, Chavez EA, Sanders AD, Zhao Y, Hirst M, Lansdorp PM: DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. Nat Methods. 2012, 9: 1107-1112. 10.1038/nmeth.2206.PubMed CentralView ArticlePubMedGoogle Scholar
- Falconer E, Lansdorp PM: Strand-seq: a unifying tool for studies of chromosome segregation. Semin Cell Dev Biol. 2013, 00068-2-Google Scholar
- Aguilera A, Gomez-Gonzalez B: Genome instability: a mechanistic view of its causes and consequences. Nat Rev Genet. 2008, 9: 204-217. 10.1038/nrg2268.View ArticlePubMedGoogle Scholar
- Wilson DM, Thompson LH: Molecular mechanisms of sister-chromatid exchange. Mutat Res. 2007, 616: 11-23. 10.1016/j.mrfmmm.2006.11.017.View ArticlePubMedGoogle Scholar
- Wu L: Role of the BLM helicase in replication fork management. DNA Repair (Amst). 2007, 6: 936-944. 10.1016/j.dnarep.2007.02.007.View ArticleGoogle Scholar
- Nagarajan N, Pop M: Sequence assembly demystified. Nat Rev Genet. 2013, 14: 157-167.View ArticlePubMedGoogle Scholar
- Pagani I, Liolios K, Jansson J, Chen IM, Smirnova T, Nosrat B, Markowitz VM, Kyrpides NC: The Genomes OnLine Database (GOLD) v. 4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 2012, 40: D571-D579. 10.1093/nar/gkr1100.PubMed CentralView ArticlePubMedGoogle Scholar
- Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, et al: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420: 520-562. 10.1038/nature01262.View ArticlePubMedGoogle Scholar
- Tettelin H, Radune D, Kasif S, Khouri H, Salzberg SL: Optimized multiplex PCR: efficiently closing a whole-genome shotgun sequencing project. Genomics. 1999, 62: 500-507. 10.1006/geno.1999.6048.View ArticlePubMedGoogle Scholar
- Samad A, Huff EF, Cai W, Schwartz DC: Optical mapping: a novel, single-molecule approach to genomic analysis. Genome Res. 1995, 5: 1-4. 10.1101/gr.5.1.1.View ArticlePubMedGoogle Scholar
- Marra MA, Kucaba TA, Dietrich NL, Green ED, Brownstein B, Wilson RK, McDonald KM, Hillier LW, McPherson JD, Waterston RH: High throughput fingerprint analysis of large-insert clones. Genome Res. 1997, 7: 1072-1084.PubMed CentralPubMedGoogle Scholar
- Levy-Sakin M, Ebenstein Y: Beyond sequencing: optical mapping of DNA in the age of nanotechnology and nanoscopy. Curr Opin Biotechnol. 2013, 24: 690-698. 10.1016/j.copbio.2013.01.009.View ArticlePubMedGoogle Scholar
- Genome 10K Community of Scientists: Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. J Hered. 2009, 100: 659-674.PubMed CentralView ArticleGoogle Scholar
- Bernardi G, Wiley EO, Mansour H, Miller MR, Orti G, Haussler D, O'Brien SJ, Ryder OA, Venkatesh B: The fishes of Genome 10K. Mar Genomics. 2012, 7: 3-6.View ArticlePubMedGoogle Scholar
- BAIT. http://sourceforge.net/p/bait/wiki/Home/,
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25: 2078-2079. 10.1093/bioinformatics/btp352.PubMed CentralView ArticlePubMedGoogle Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-R80.16. 10.1186/gb-2004-5-10-r80.PubMed CentralView ArticlePubMedGoogle Scholar
- Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010, 26: 841-842. 10.1093/bioinformatics/btq033.PubMed CentralView ArticlePubMedGoogle Scholar
- Holm S: A Simple sequentially rejective multiple test procedure. Scand J Stat. 1979, 6: 65-70.Google Scholar
- Olshen AB, Venkatraman ES, Lucito R, Wigler M: Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004, 5: 557-572. 10.1093/biostatistics/kxh008.View ArticlePubMedGoogle Scholar
- Seshan VE, Olshen A: DNAcopy: DNA copy number data analysis. R package version 1.16.0. 2010, http://www.bioconductor.org/packages/2.3/bioc/html/DNAcopy.html,Google Scholar
- Copeland NG, Jenkins NA: Development and applications of a molecular genetic linkage map of the mouse genome. Trends Genet. 1991, 7: 113-118.View ArticlePubMedGoogle Scholar
- Croes GA: A method for solving traveling-salesman problems. Oper Res. 1958, 6: 791-812. 10.1287/opre.6.6.791.View ArticleGoogle Scholar
- Nagarajan N, Cook C, Di Bonaventura M, Ge H, Richards A, Bishop-Lilly KA, DeSalle R, Read TD, Pop M: Finishing genomes with limited resources: lessons from an ensemble of microbial genomes. BMC Genomics. 2010, 11: 242-251. 10.1186/1471-2164-11-242.PubMed CentralView ArticlePubMedGoogle Scholar
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res. 2002, 12: 996-1006.PubMed CentralView ArticlePubMedGoogle Scholar
- Sutherland GR, Baker E, Seshadri RS: Heritable fragile sites on human chromosomes. V. A new class of fragile site requiring BrdU for expression. Am J Hum Genet. 1980, 32: 542-548.PubMed CentralPubMedGoogle Scholar
- Chen W, Kalscheuer V, Tzschach A, Menzel C, Ullmann R, Schulz MH, Erdogan F, Li N, Kijas Z, Arkesteijn G, et al: Mapping translocation breakpoints by next-generation sequencing. Genome Res. 2008, 18: 1143-1149. 10.1101/gr.076166.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M, Sloan CA, Rosenbloom KR, Roe G, Rhead B, et al: The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res. 2013, 41: D64-D69. 10.1093/nar/gks1048.PubMed CentralView ArticlePubMedGoogle Scholar
- Cairns J: Mutation selection and the natural history of cancer. Nature. 1975, 255: 197-200. 10.1038/255197a0.View ArticlePubMedGoogle Scholar
- Potten CS, Hume WJ, Reid P, Cairns J: The segregation of DNA in epithelial stem cells. Cell. 1978, 15: 899-906. 10.1016/0092-8674(78)90274-X.View ArticlePubMedGoogle Scholar
- Lansdorp PM: Immortal strands? Give me a break. Cell. 2007, 129: 1244-1247. 10.1016/j.cell.2007.06.017.View ArticlePubMedGoogle Scholar
- Falconer E, Chavez EA, Henderson A, Poon SS, McKinney S, Brown L, Huntsman DG, Lansdorp PM: Identification of sister chromatids by DNA template strand sequences. Nature. 2010, 463: 93-97. 10.1038/nature08644.PubMed CentralView ArticlePubMedGoogle Scholar
- Kato H: Spontaneous sister chromatid exchanges detected by a BUdR-labelling method. Nature. 1974, 251: 70-72. 10.1038/251070a0.View ArticlePubMedGoogle Scholar
- Allen JW, Latt SA: Analysis of sister chromatid exchange formation in vivo in mouse spermatogonia as a new test system for environmental mutagens. Nature. 1976, 260: 449-451. 10.1038/260449a0.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.