- Open Access
Reconstruction of full-length circular RNAs enables isoform-level quantification
- Yi Zheng†1, 2,
- Peifeng Ji†1,
- Shuai Chen1, 2,
- Lingling Hou1 and
- Fangqing Zhao1, 2, 3Email authorView ORCID ID profile
© The Author(s). 2019
- Received: 3 July 2018
- Accepted: 10 January 2019
- Published: 19 January 2019
Currently, circRNA studies are shifting from the identification of circular transcripts to understanding their biological functions. However, such endeavors have been limited by large-scale determination of their full-length sequences and also by the inability of accurate quantification at the isoform level. Here, we propose a new feature, reverse overlap (RO), for circRNA detection, which outperforms back-splice junction (BSJ)-based methods in identifying low-abundance circRNAs. By combining RO and BSJ features, we present a novel approach for effective reconstruction of full-length circRNAs and isoform-level quantification from the transcriptome. We systematically compared the difference between the BSJ-level and isoform-level differential expression analyses using human liver tumor and normal tissues and highlight the necessity of deepening circRNA studies to the isoform-level resolution. The CIRI-full software can be accessed at https://sourceforge.net/projects/ciri.
- Alternative splicing
- Circular RNA (circRNA)
- Transcript reconstruction
- Isoform quantification
Circular RNA (circRNA) is a type of RNA molecules in which both ends are covalently linked. Advances in deep sequencing and identification algorithms have resulted in a huge number of circRNAs from fly to human [1–6]. Most recently, new subclasses of circRNAs, including non-exonic circRNAs [7, 8] and exon-intron circRNAs , have been explored. Subsequent studies unveiled the ubiquity of alternative splicing (AS) events within circRNAs and revealed a profound difference in the expression of circRNAs and mRNAs . Profiting from these identified circRNAs, most recent studies have shifted to the efforts of revealing the biological functions of circRNAs. As a heterogeneous class, circRNAs may participate in various aspects of biological processes. In addition to the well-studied function of microRNA sponges [3, 4], studies have illustrated that these circular transcripts may be involved in gene regulation , development , innate immune response , and diseases [14–21]. Recent efforts have shown that N6-methyladenosine (m6A) promotes the efficient initiation of protein translation from circRNAs . Subsequently, Zhou et al. demonstrated the prevalence of m6A in circRNAs . Collectively, these intriguing findings illustrate the complexity of circRNA functions and show that our understanding of how circRNAs participate in biological processes is still rudimentary.
Evolutionary analyses of gene sequences and expression patterns have provided essential insights into gene functional study. Increasing attention has also been paid to evolutionary analysis of circRNAs among different species. Rybak-Wolf et al. systematically compiled a catalog of neuronal human and mouse circRNAs and found that these circRNAs were preferentially enriched in mammalian brain and that the same circRNAs were often expressed in both species and were well conserved in sequence . As the first systematic analysis of circRNAs in mammalian brains, this work represents an important step toward further elaboration of circRNA functions. However, owing to the lack of full-length circRNAs, sequence conservation comparison is restricted to flanking introns and coding DNA sequences (CDSs). Considering the prevalence of circRNA isoforms generated by combinations of internal components within back-splice junctions (BSJs), current cross-species conservation analyses based on partial sequences may lead to biased estimation of circRNA conservation with respect to expression pattern and sequence composition. Moreover, using BSJs to represent a collection of different circRNA isoforms hampers our understanding of specific circRNA functions and makes it difficult to achieve further evolutionary insights into circRNAs among species. A fully automated method that can identify large-scale full-length circRNAs from RNA-seq data has yet to be developed.
A number of efforts have recently been made to explore the internal landscape of circRNAs. Some studies utilized a straightforward strategy that simply combined all known mRNA exons in a sequential order as putative full-length circRNA [2, 25]. This method, however, relies on an unsupported assumption that circular and linear transcripts share the same composition and thus may lead to misunderstanding in downstream analyses. Other methods, such as CIRI-AS , CIRCexplorer2 , and FUCHS , involve identifying the internal components of the BSJ. CIRI-AS employs a spliced junction signature-based algorithm and enables, for the first time, high-throughput detection of internal components of circRNAs based on short-read sequencing. Similar to CIRI-AS, FUCHS predicts circRNA internal sequences by extracting the mapping results of BSJ reads. In an alternative manner, CIRCexplorer2 detects alternative splicing events through comparison analysis between poly(A)+ and poly(A)− RNA-seq data sets. However, without considering combinations of these components, whole-sequence prediction of circular isoforms with complicated AS events is still beyond the reach of these methods. Most recently, Ye et al. employed a strategy similar to CIRI-AS to assemble full-length sequences of circRNAs using BSJ read pairs, but this approach still faces an inherent challenge in that only a small fraction of circRNAs can be identified by assembling BSJ reads . As a result, without whole-sequence of circRNAs, the reconstruction of circular isoforms within a certain BSJ extends far beyond the scope of current analysis and accurate quantification of circRNAs at the isoform level remains an insurmountable obstacle. Therefore, the inability of reconstructing full-length circRNAs and quantifying circular isoforms places limitations on the discovery of previous unknown biological phenomena, which may restrict our ability to understand the diversity and expression patterns of circRNA isoforms.
To address this challenge, we present a new feature, reverse overlap (RO), for full-length circRNA reconstruction and isoform-level quantification. RO is especially suitable for identifying low-abundance circRNAs that are difficult to identify using the BSJ feature. Considering that a vast majority of circRNAs in various transcriptomes are in low expression levels, the detection of such low-abundance transcripts is extremely important in circRNA studies. Moreover, we develop an accurate, high-throughput approach (CIRI-full) that uses both BSJ and RO features to reconstruct full-length circRNAs and circular isoforms within them from RNA-seq data sets. Several recent independent studies demonstrated that CIRI2 exhibited remarkably balanced sensitivity, reliability, running time, and RAM usage on circRNA detection [29–32]. Most recently, Thomas B. Hansen further systematically compared 11 circRNA detection algorithms and found that CIRI2 was one of the best algorithms for circRNA identification and performed comparably to annotation-based algorithms . In CIRI-full, CIRI2 is employed to detect cirexons (circRNA’s exon) and to determine the boundaries of circRNAs. The RO feature, which is deduced from reversely overlapped paired-end reads, is used to explore the detailed cirexon landscape within boundary sites and to assemble into full-length sequence. Based on the assembled full-length circRNAs, a forward splice graph (FSG)-based algorithm is employed to reconstruct all full-length isoforms within them and to determine their abundances. Compared with previous methods, CIRI-full is not only efficient at determining complete sequences of circRNAs, but, more importantly, enables the analysis of circRNAs at the isoform level. We applied CIRI-full to survey circRNA expression patterns in samples from the brains of six vertebrates and also explored circRNA expression divergence between tumor and normal tissues at both BSJ- and isoform-level resolution, which uncovered distinct expression patterns between circRNAs and their isoforms. This study presents an important approach to assemble and quantify circRNAs and will greatly improve our understanding of their biogenesis and functions.
Overview of CIRI-full
The CIRI-full algorithm is a four-step process that includes RO read detection and verification, BSJ and cirexon detection, combined assembly of both RO and BSJ reads, and isoform reconstruction and quantification. RO read detection and verification is designed to detect RO reads from paired-end reads based on their 5′ reverse overlaps and to rule out linear transcripts and false positives resulting from lariat structures using sophisticated post-alignment filters. An RO-merged read is identified as full-length circRNA if the genomic alignments of both its ends either have overlaps or are located in the same cirexon. The BSJ and cirexon detection step was developed to detect BSJs and cirexons and to identify single-splice events. If the BSJ read pairs are exclusively located on all the cirexons within the BSJ, the complete sequence of this circRNA can be reconstructed using these cirexons. RNA-seq reads are processed separately in the first two steps, resulting in the identification of a number of full-length circRNAs in addition to BSJs and RO-merged reads that are not sufficient for reconstructing full-length circRNA independently. Therefore, these unused but informative RO-merged reads and BSJ reads are integrated in make a combined assembly. Finally, for a given BSJ, all forward splice junctions are recognized and quantified, and then an adapted FSG [34–36] is built to estimate the isoform expression abundance.
RO read detection and verification
During the reverse transcription step of library preparation, transcription begins at the primers and walks along the RNA template. Owing to the unique structure of circRNAs, these circular transcripts will be repeatedly reverse-transcribed. When sequencing these reverse-transcribed cDNAs, peculiar reverse overlap features on the read pair will be observed on the pair-end reads if the library insert length is greater than the length of the circRNA. Specifically, the 5′- or 3′-ends on both paired reads are reversely overlapped with each other, which can be used as an indicator of circRNA. Moreover, it should be noted that the presence of 3′-RO on both reads indicates that the complete sequence of the circRNA has been read, and thus, its whole sequence can be reconstructed.
The 5′ RO reads are identified based on the following strategy. For each read pair, the first 10 bp at the 5′-end of one read is divided into three subsequences with a window size of 8 bp and a step size of 1 bp. These subsequences are then used as seeds to search for matches at the 5′-end of the other read. Once all these seeds have matches (match base pair ≥ 7 bp) on the other read, the sequence with location ranging from the 5′ terminus to the matches on this read is extracted and aligned to the counterpart read. Both members of this pair of reads are taken as candidate 5′ RO reads if the alignment satisfies two criteria: (i) aligned length ≥ 13 bp and (ii) nucleotide acid identity ≥ 95%. Subsequently, these two reads are merged into a long read based on the alignment, and the long read is treated as a candidate RO-merged read for further validation.
During the library preparation step, the size of fragments is not strictly the same as the library insert length but varies around the insert length. Moreover, if the fragment size is shorter than the sequenced read length, this will yield partial adapter sequences attached to the 3′-ends of paired-end reads. The presence of this type of read, which indeed originates from linear transcripts, can lead to false-positive 5′ RO features when performing candidate RO read detection. To rule out such false-positive RO reads, a mapping-based filtration strategy is employed. First, the precise location of the candidate RO-merged read is determined. Specifically, the candidate RO-merged read is aligned to the reference genome using BWA-MEM (-T 19, minimum score to output), which outputs split alignments of the read on the genome. The alignments are then collected and sorted according to their alignment lengths. The longest alignment with mapping quality greater than 15 is used as an anchor alignment, and the accumulated mapping length within a 100-Kbp interval on both sides of this anchor alignment is calculated. If the summed length exceeds half of the read length, this candidate RO-merged read is reserved for further analysis. Otherwise, it is discarded. Finally, the reserved candidate RO-merged reads are remapped using a local realignment strategy to accurately determine their locations. Briefly, highly reliable mapping fragments are determined using the BSJ position or, when no BSJ position is available, the anchor position. Then, the precise locations of the unmapped or abnormal mapped fragments are obtained using dynamic programming. After determining the locations of the candidate RO-merged reads, linear transcript-derived reads are identified and ruled out if they satisfy two criteria : the reads contain no BSJ and  the subsequences with the same length on both ends of the read have no hit around the anchor alignment.
Because BWA-MEM was originally designed for mapping DNA sequencing reads, this tool does not consider GT/AG splicing signals and fails to obtain the accurate boundaries of split alignments in which the mapping position may deviate by a few base pairs from the true boundary position. Moreover, use of BWA-MEM may also lead to the inclusion of lariat structures in the candidate RO reads. To justify the boundaries and remove the lariat structures, the alignments for each candidate RO-merged read are revisited by checking for the presence of a GT/AG splicing signal. For each read, the GT/AG splicing site of the aligned fragments is first checked, and the read is filtered if any aligned fragment does not contain a GT/AG splicing site. Then, the GT/AG splicing sites on the remaining candidate RO reads are justified. Next, the alignments of two 5-bp subsequences from both sides of the junction site on each read are extracted, and whether these alignments have mapping gaps or mismatches is determined. If there is no gap or mismatch in these alignments, this read is taken as a highly reliable RO-merged read. Otherwise, it is discarded.
BSJ and cirexon detection
BSJs in the RNA-seq reads are detected using CIRI2 , and single-splice events within these BSJs are inferred from CIRI-AS (parameter -d yes) . Within each BSJ, all cirexons inferred from the single-splice events are collected, sorted, and recorded.
Combined assembly of RO and BSJ reads
After identifying RO-merged reads and BSJ boundaries, full-length circRNAs are identified separately using these two types of information. Specifically, an RO-merged read is identified as a full-length circRNA if the genomic alignments of its both ends either have overlaps (3′ RO) or are located on the same cirexon. Otherwise, the RO-merged reads are reserved for further combined assembly. For each BSJ, the alignments of all the paired-end BSJ reads are collected; if the BSJ reads are all exclusively located on cirexons, the complete sequence of this circRNA is reconstructed by linearly connecting the cirexons. Otherwise, the BSJ and the cirexons within them are recorded for further combined assembly.
The BSJ feature provides an efficient method of detecting the existence of circRNAs but cannot be used to identify the internal composition of the circRNA. The RO feature greatly facilitates determination of the internal compositions of circRNAs but sometimes may fail to cover the BSJ site owing to the limitation of read length. Hence, CIRI-full employs a combined strategy that uses the advantages of both BSJ and RO features to reconstruct a more comprehensive full-length circRNA repository. By utilizing the unused RO-merged reads and BSJ reads, full-length circRNAs are reconstructed using the following process. The RO-merged reads and BSJ reads are sorted and clustered according to the BSJ. If both types of read are observed within a BSJ, the reads are used to reconstruct full-length circRNAs. For each circRNA, the RO-merged reads are used to determine additional cirexons that were not identified by the BSJ reads, and the alignments of all the BSJ read pairs are re-checked. If these reads are all exclusively located on cirexons, the complete sequence is reconstructed by linearly connecting the cirexons. Otherwise, the identified cirexons within the BSJ are outputted and marked as a partially reconstructed circRNA.
Quantifying the expression of circRNAs at the isoform level
After obtaining the reconstructed circRNAs, CIRI-full builds a forward splice graph (FSG) using all BSJ reads and RO-merged reads within each BSJ. The nodes in the FSG represent cirexons, and the edges represent their connections (i.e., forward splice junctions between cirexons). Theoretically, the FSG covers all possible circular isoforms that are consistent to the mapped reads within the BSJ, and the traversing path from the start cirexon to the end cirexon represents a candidate splicing isoform. It should be noted that the FSG is a closed circuit owing to the nature of circRNAs. Then, an adapted depth-first search (DFS) algorithm is performed to exhaustively decompose the FSG graph into paths. Specifically, the DFS algorithm starts iteratively at each node and stops at the breakpoints (without continuous splicing events) or the start cirexon. After obtaining these paths, short paths are merged into longer paths and redundant paths are then filtered. To avoid a large number of false-positive paths which will significantly affect the efficiency of later iteration steps, CIRI-full screens out a certain number (by default, set to 10) of paths. In detail, the edges on the FSG graph are categorized into four types: (i) BSJ, (ii) phasing FSJ, where the splicing event is exclusively occupied by only one circular isoform, (iii) co-occurred FSJs, where the number of splicing events is supported by the same RO read, and (iv) the remaining FSJs. Paths containing phasing FSJ and co-occurred FSJs will give the top priority for screening, and the corresponding paths are referred to as phased isoforms, because these circular isoforms undoubtedly exist. Regarding the paths that contain the remaining FSJs, they are sorted using the node sequencing depth and the paths with high-sequencing depth are retained to fill up the threshold (by default, set to 10). The resulting outputted paths are referred to as candidate isoforms.
Next, the following steps are to determine the relative abundance for each path. In detail, a Monte Carlo simulation method is used to simulate the BSJ-reads distribution on each path based on the insert length distribution of the RNA-seq library, which is inferred from mapping distance of paired-end reads. According to the distribution of simulated reads, the abundance of nodes and edges (splicing events) on each path can be calculated in the latter steps. To quantify the relative abundance of each path, an approximate exhaustive search algorithm is proposed. Specifically, this approach starts by assigning a random putative abundance (positive integral value) to each path, where the summed abundance for all paths should be equal to the total number of BSJ reads. Based on the assigned putative abundance and the distribution of simulated BSJ reads, the putative abundances of nodes and edges on each path are computed. Based on the resulting abundance of nodes and edges, accumulated putative abundance of nodes and edges are calculated. Then, the distance between putative and real abundance (inferred from mapped BSJ reads) of nodes and edges is calculated and recorded. The putative abundances of paths are adapted to real abundance of nodes and edges. Next, it iterates through the following steps. The putative abundances of nodes and edges on each path are re-calculated. Accumulated putative abundance of nodes and edges are computed. The distance between putative and real abundance (inferred from mapped BSJ reads) of nodes and edges is calculated and compared with pre-recorded distance. If this distance is larger than the pre-recorded distance, it means that the path abundance adaption process goes wrong and new path abundance adaption process is performed. Otherwise, the distance is recorded, new iteration starts. This iteration process stops when the distance converges. By this method, we can obtain the relative abundance for circular isoforms of a certain circRNA.
Simulated data sets for circRNA identification and reconstruction
To make the simulated circRNA more closely resemble real data, the length distribution and expression distribution of the circRNAs were also considered. We first applied CIRI-AS to RNA-seq data from the HeLa cell line (SRA accession number: SRR3476956) to predict the size of identified circRNAs by summing all cirexon lengths with supporting BSJ reads > = 5.
To validate the robustness of the RO feature, two groups of simulated paired-end transcriptomic data sets were used to test the performance of the RO detection method. The first group was designed to test the performance of the RO feature under different sequencing depths. This group consisted of four circular transcript sequencing datasets with different average depths (2×, 5×, 10×, and 15×). The read length was set to 200 bp, and the insert length distribution was set to μ = 350 bp, σ = 200 bp. The second group was constructed to test the performance of the RO feature under different sequencing lengths. This group contained four sets of circular transcript sequencing data with different read lengths (75 bp, 100 bp, 150 bp, and 200 bp). The average depth of these four datasets was 10×, and the insert length followed the same distribution as that of the former group. Furthermore, BSJ-based circRNA detection tools, including CIRI2 , find_circ , CIRCexplorer2 , and KINFE , were used for performance comparison.
Simulated data sets for circRNA isoform quantification
To compare the performance of circRNA isoform quantification between CIRI-full and CIRI-AS, we simulated circRNA-containing RNA-seq data sets, where two different isoforms were added for each circRNA by simulating additional exon skipping event. Note that the number of isoforms for each circRNA was set to two, because only in this situation, CIRI-AS can estimate the relative abundance of the two isoforms within a certain circRNA using the PSI values. Consequently, two data sets were simulated. The first one simulated transcripts with different sequencing depth (25×, 50×, 100×, and 150×, respectively) and uniform read length of 150 bp with insert length of 350 ± 200 bp. The second one simulated transcripts with different read length (100, 150, 200, 250, and 300 bp, respectively), insert length of 350 ± 200 bp and sequencing depth of 75×.
To further evaluate the sensitivity and accuracy of CIRI-full on circRNA isoform detection and quantification, we also simulated circRNA-containing RNA-seq data sets with three isoforms for each circRNA. The transcript sequencing depth was set to 50× with sequencing length of 150 bp and insert length of 350 ± 200 bp.
Generation of HeLa cell RNA-seq data
Total RNA was isolated using TRIZOL (Invitrogen) from HeLa cells grown in standard medium under standard conditions. The RNA was divided into three samples containing equal amounts of RNA. The quality of these samples was manually controlled to produce different RIN values. Specifically, the RIN value of the low-quality RNA sample was 5 and that of the two high-quality samples was 10. Next, a RiboMinus kit (Invitrogen, Carlsbad, CA, USA) was utilized to deplete ribosomal RNA in these samples. The resulting RNA was incubated at 37 °C and treated with 10 U μg− 1 RNase R (Epicenter, Madison, WI, USA). One of the high-RIN samples and the low-RIN sample were used separately as templates for cDNA libraries following the TruSeq protocol (Illumina, San Diego, CA, USA); the other high-RIN sample was used to construct a sequencing library using the same protocol but without the fragmentation step. Fragments with a broad range of fragment size (300–800 bp) were selected for library construction. The three libraries (low RIN/fragmented, high RIN/fragmented, high RIN/unfragmented) were sequenced on the Illumina HiSeq 2500 platform of the Research Facility Center at the Beijing Institutes of Life Science, CAS, with a read length of 250 bp. The sequencing data sets have been deposited to SRA with the following project ID (PRJNA475651) .
Generation of whole brain RNA-seq data
Human, macaque, and rabbit whole brain RNA samples were purchased from Zyagen (San Diego, CA, USA). Mouse, rat, and chicken whole brain tissues were obtained from the Research Facility Center at the Beijing Institutes of Life Science, CAS. RNA samples were isolated using TRIZOL (Invitrogen). For each species, three types of cDNA library were prepared. Specifically, a Ribo-/RNase R library was constructed using RNA samples that had been treated with the RiboMinus kit (Invitrogen) and then incubated at 37 °C with 10 U μg−1 RNase R, a Ribo-/cDNA library was constructed using RNA samples that were only treated with the RiboMinus kit, and a poly-A library was prepared according to the TruSeq v2 guide. Poly-A and Ribo-RNA samples were used as templates for cDNA libraries according to the TruSeq protocol (Illumina); Ribo-/RNase R-treated samples used the same protocol but without fragmentation. These libraries were sequenced on the Illumina HiSeq 2500 platform of the Research Facility Center at Beijing Institutes of Life Science, CAS. PolyA+ and Ribo-/RNase R libraries were sequenced with paired-end 250-bp reads, and Ribo-libraries were sequenced with paired-end 150-bp reads. The sequencing data sets have been deposited to SRA with the following project ID (PRJNA475651) .
HeLa cell and brain RNA-seq data processing
CircRNAs from three HeLa cell RNA-seq data sets were identified using CIRI-full with default parameters. The RNA-seq data sets of brain samples that had undergone RiboMinus/RNase R treatment from six species were processed using CIRI-full. To normalize the number of resulting circRNAs, circRNAs with no RO read support and those with BSJ read support ≤ 5 were filtered in human, mouse, rat, and rabbit according to the total BSJ number. Brain data sets from RiboMinus libraries were processed by CIRI2 with default parameters, and expression level was normalized to data set size. Poly-A-selected RNA-seq data sets of brain samples of the six vertebrates were analyzed using Hisat2  and StringTie [40, 41] with default parameters.
To validate the predictions made by the RO method, outward-facing primer sets were designed to amplify 21 circRNAs (15 for BSJ validation and 6 for full-length circRNA validation). PCRs were performed using 35 cycles, and the sequences of the PCR products were determined via Sanger sequencing. Among 21 validated circRNAs, all BSJs and FSJs of six circRNAs were supported by Sanger sequencing, indicating the accuracy of the reconstructed full-length sequences of these circRNAs. The remaining 15 circRNAs were also validated by the confirmed BSJ sites using Sanger sequencing.
To verify predicted circRNA isoforms and their relative abundances, outward-facing primers were designed to quantify the expression of 17 isoforms within 8 circRNAs. Specifically, HeLa cells were grown in standard media and conditions. Total RNA was isolated using TRIZOL and converted to cDNA using random hexameters primers within the FastKing RT Kit (TIANGEN). Resulting cDNA was used as templates and real-time qPCR was performed using primer pairs specific for circRNA isoforms and two negative controls (GAPDH and b-actin) with SYBR FAST qPCR Kits (Kapa Biosystems). The reaction volume was 20 μl, which contained 1 μl of serial diluted cDNA, 10 μl of qPCR SYBR Green Master Mix, 0.5 μl each of forward and reverse primers, and 8 μl of water. Thermal cycling was carried out on StepOnePlus (Applied Biosystems) using the following conditions: 95 °C for 5 min and followed by 40 cycles of 95 °C for 10 s and 60 °C for 30 s. Fluorescent signals were detected at the step of annealing/extension (60 °C).
Differential expression analysis of circRNAs and their isoforms in HCC patients
RiboMinus treated RNA-seq data sets of tumor (SRX1558046-SRX1558064) and normal liver samples (SRX1558026-SRX1558045) from 20 HCC patients generated in a previous study  were downloaded from the NCBI SRA database. Full-length circRNAs and their isoforms were obtained by running CIRI-full on these data sets using default parameters. The statistical significances of differentially expressed circRNAs between normal and tumor samples were calculated using Mann–Whitney U test.
Currently available approaches on circRNA identification are exclusively based on the detection of back-splice junctions (BSJs). In this study, we propose a new feature, reverse overlap (RO), for full-length circRNA detection, which outperforms previous BSJ-based methods in detecting circRNAs, even for highly degraded RNA samples. We further develop an accurate and high-throughput approach, CIRI-full, that uses both BSJ and RO features to reconstruct full-length circRNAs and their isoforms from RNA-seq data sets.
The CIRI-full approach
The CIRI-full pipeline involves four different steps, RO detection and reconstruction, BSJ and cirexon detection, combined assembly of RO and BSJ reads (Fig. 1e) and circular isoform detection and quantification (Fig. 1f). Several recent studies demonstrated that CIRI2 exhibited remarkably balanced sensitivity, reliability, running time, and RAM usage on circRNA detection [29–32] and performed comparably to annotation-based algorithms , and thus, CIRI2 was employed to detect BSJ reads in this pipeline. RNA-seq reads are processed separately in the first two steps, thus yielding a number of full-length circRNAs, as well as BSJ and RO-merged reads, which are not sufficient for reconstructing full-length circRNA independently. Therefore, these unused but informative reads are integrated in the next step to generate a combined assembly. Based on the identified full-length circRNAs, circular isoforms within the BSJs of a circRNA are then detected and quantified by employing statistic-based models. In the first step, an RO-merged read is identified as full-length circRNA if the genome alignments of both of its ends satisfy one of two criteria : they have overlap on the genome (Additional file 1: Figure S4A) or  they do not have overlap but locate on the same cirexon (Additional file 1: Figure S4B). In the second step, CIRI-AS is employed to detect BSJ and cirexons of circRNAs. Then, for each circRNA, the locations of BSJ read pairs within the BSJ are checked; if all the reads are exclusively located on the cirexons, the complete sequence of this circRNA is assembled by linearly connecting the cirexons (Additional file 1: Figure S4C). In the third step, incomplete information from the first two steps is clustered according to the BSJ, and the RO reads and cirexons for each BSJ are combined to complement each other to reconstruct full-length circRNAs (Additional file 1: Figure S4D). However, under certain conditions, full-length circRNAs cannot be reconstructed; these may include long circRNAs whose length is twofold greater than the library size (Additional file 1: Figure S5D) and circRNAs with incomplete cirexons due to low expression levels (Additional file 1: Figure S5A–C). Finally, a forward splice graph (FSG) is constructed using the BSJ and RO-merged reads alignments within the BSJs for each assembled circRNA (Fig. 1f). The resulting FSG is dissected into paths by using an adapted deep-first search method, which iteratively traverses from different source node to find all non-redundant paths (Additional file 1: Figure S6). Next, the following steps are to estimate the abundance of each circular isoform (Additional file 1: Figure S7). First, a Monte Carlo method is employed to simulate the distribution of BSJ reads on each path according to the insert length of RNA-seq library, which will be used to estimate the coverage of each node and edge of this path. Then, an approximate exhaustive search method is employed to find the optimum solution of the abundance of each path. Specifically, CIRI-full initially assigns a random value to each path and then calculates the abundance of every node and the splicing events of each edge on the path based on the simulated BSJ-reads distribution. Consequently, CIRI-full calculates the distance between the accumulated putative abundance of each node and splicing events of each edge. This distance score represents the discrepancy between the putative and real abundance of each path. To obtain the smallest distance, CIRI-full corrects the putative abundance of each path iteratively according to the FSG until the distance scores get converged. After iterative computation, the optimum solution will be output, and thus, the abundance of each path is determined (Fig. 1f).
RO feature facilitates identification of low-abundance circRNAs
To obtain a more comprehensive understanding of the RO feature, we further compared the relationship between RO reads and BSJ reads used for circRNA identification. As shown in Fig. 2c, 69% (30% + 39%) of the RO-merged reads were derived from 78% of the circRNAs (27% + 51%) having BSJ read support ≥ 2, and more than half of these RO-merged reads had 3′ RO also, thus indicating that they could directly generate full-length circRNA transcripts. The remaining RO-merged reads comprised the 22% of the circRNAs that were exclusively identified by the RO-based method, thus suggesting that the RO-based method offers a distinct advantage compared with previous BSJ-based methods, in which low-abundance circRNAs with limited BSJ read support are usually discarded by setting an arbitrary threshold (e.g., #BSJ reads > 2 or more). We further surveyed the base depth distribution of circRNAs and found that the RO reads produced a more uniform depth distribution along the normalized circRNA transcript than the BSJ reads (Fig. 2d). Based on the foregoing results, we conclude that the RO feature performs well in identifying low-abundance circRNAs and generating more uniform read distributions along circular transcripts. This improvement will greatly facilitate downstream reconstruction of full-length circRNAs.
The FSG-based algorithm accurately identifies and quantifies circular isoforms
To validate the quantification accuracy of the FSG-based algorithm, we simulated circRNA-containing transcriptomic datasets with read length varying from 100 to 300 bp and sequencing depth at 25–150 fold (Fig. 2e, f). We sought to evaluate the performance of this approach by comparing with CIRI-AS. Therefore, all the circRNAs in these data sets were designed to possess two isoforms, where CIRI-AS can estimate the relative abundance of the two isoforms within a certain circRNA using the PSI values. We then compared the accuracy of these two tools by calculating the discrepancy between predicted and real abundance of each isoform. As shown in Fig. 2e, both of these two approaches achieved a high level of accuracy, especially for those isoforms with an abundance over 50 fold. Moreover, the FSG-based quantification approach exhibited increased levels of accuracy with increasing read length, relative to the splicing events-based method (Fig. 2f). For more complicated splicing pattern, we further designed a simulated dataset containing 994 circRNAs with three isoforms, which were referred to as major, medium, and minor isoform according to their abundance, respectively. We performed CIRI-full on this dataset, detected circular isoforms within each circRNA, and determined their abundances. Among these simulated circRNAs, 73% (726/994) of them could be precisely recognized for all three isoforms. For isoform quantification, an average of 79% of these isoforms can be correctly determined (Fig. 2g–i). We further used four circRNA data sets with biological replicates and two simulated datasets to evaluate the reliability of isoform quantification by CIRI-full. As shown in Additional file 1: Figure S11, 76.2 ~ 85.6% of moderately or highly expressed isoforms (#BSJ reads > = 30) in the real datasets, including human brain tissue, human liver tissue, and Hs68 cell line, could be accurately quantified using CIRI-full. Similar findings were also found in the simulated datasets.
To experimentally evaluate our approach on quantifying circRNA isoforms, we performed real-time RT-PCR to validate eight randomly selected circRNAs with two or three isoforms from a transcriptomic data set of HeLa cells (Additional file 1: Figure S12–14). Each of these circRNAs was predicted to contain at least two isoforms. We designed 17 pairs of primers to amplify fragments containing both BSJ and alternatively spliced cirexons and quantified their abundance using real-time RT-PCR. As shown in Fig. 2j and Additional file 1: Table S1, the abundances of circRNAs determined by CIRI-full and qRT-PCR show a high level of consistency, demonstrating the reliability of the FSG-based method for isoform-level circRNA quantification.
Reconstruction of full-length circRNAs based on CIRI-full
For further validation, we generated 16.2 Gb of sequencing data from human brain samples based on RiboMinus RNA sequencing with RNase R treatment and performed CIRI-full on this dataset. Most of the circRNAs identified in this dataset were full- or nearly full-length (Fig. 3d). The number of circRNAs in the brain dataset was greater than that in HeLa cells when the sizes of the data sets were normalized. The length distribution of full-length circRNAs and the recognition patterns of the RO and BSJ features were similar to those identified in HeLa cells (Fig. 3e, f and Additional file 1: Figure S14B). Notably, a large majority of cirexons that were specifically identified by RO reads were enriched in both RiboMinus-treated and RiboMinus/RNase R-treated samples as compared to the poly (A) enrichment sample (Additional file 1: Figure S15).
To verify the reconstructed circRNAs based on RO features, both computational and experimental approaches were employed. Firstly, over 80% of circRNAs detected by the RO-based method could be supported by at least one BSJ-based method (Additional file 1: Figure S16A). The remaining circRNAs solely detected by the RO-based method also exhibited a typical reversely mapping signature when aligned them to the reference genome (Additional file 1: Figure S16B). Secondly, we performed experimental validation by randomly selecting 15 circRNA loci in the HeLa sample; 3, 6, and 6 of these loci were highly, moderately, and weakly transcribed, respectively. Outward-facing primers were designed to amplify fragments containing BSJs, and the sequences of the PCR products were determined via Sanger sequencing. As shown in Additional file 1: Figure S17–19, all 15 of the loci were successfully validated. For additional validation, six predicted full-length circRNAs were randomly selected and validated using the same approach. As shown in Additional file 1: Figure S20, all of these predicted full-length circRNAs were successfully verified using the experimental method. This solid evidence demonstrates the excellent reliability of CIRI-full in reconstructing full-length circRNAs.
To measure the robustness of using RO and BSJ features for circRNA reconstruction, RNA-seq libraries were constructed using both high-quality RNA (RNA Integrity Number, RIN = 10) and degraded RNA (RIN = 5) of HeLa cells treated by RNase R and RiboMinus. For the high-quality RNA sample, two different methods (with or without fragmentation) were used to construct RNA-seq libraries. For these three libraries, 9.2, 5.3, and 11.6 Gb of sequencing data were generated, respectively. We employed CIRI-full to detect RO and BSJ reads from these data; the ratio of RO reads to the sum of RO and BSJ reads was then calculated. As expected, compared with the unfragmented library, the RO reads ratio decreased considerably in the fragmented and low-quality libraries across all the compared distribution levels (Fig. 3g, h). In particular, the low-quality library, in which the circRNAs suffered from degradation, exhibited the lowest RO reads ratio. We next investigated whether the number of RO-identified circRNAs was also affected (Additional file 1: Figure S21). Therefore, we performed CIRI-full on each dataset and calculated the ratio of RO-identified circRNAs to the sum of RO- and BSJ-identified circRNAs. We found that although the ratio of RO reads decreased, the number of RO-identified circRNAs was unaffected by RNA fragmentation (Fig. 3i) and only weakly influenced by RNA degradation (Fig. 3j). These findings further confirm that the RO feature can be used to efficiently detect low-abundance circRNAs even if most of the RO reads are degraded or fragmented.
In addition to its high sensitivity and robustness in circRNA identification, CIRI-full also offers high accuracy for exploring detailed internal components within circRNAs. In contrast, previous studies simply combined all known or aligned mRNA exons in a sequential order as putative full-length circRNAs (hereafter referred to as the reference-based method). We measured the accuracy of full-length circRNAs predicted using the reference-based method by aligning them with the circular transcripts reconstructed using CIRI-full. As shown in Fig. 3k, only 34% of these predicted full-length circRNAs in human brain predicted by the reference-based method perfectly matched the circular transcripts reconstructed using CIRI-full, thus suggesting the former method’s low level of accuracy. The errors can be classified into three categories: false cirexons, missing ICFs, and a mixture of the former two errors (Additional file 1: Figure S22). False cirexons, representing the insertion of false additional exons, accounted for 23% of the predicted circRNAs, and the average number of false exons per circRNA was 2.4. Moreover, 21% of the errors were identified as missing ICFs, referring to missing intronic/intergenic circular fragments; these included an average of 25.5% of the full-length circular transcripts. A mixture of both false cirexons and missing ICFs accounted for up to 22% of the errors. A similar error rate was also consistently observed in HeLa cells and mouse brain samples (Additional file 1: Figure S23), strongly indicating that the reference-based method is error-prone and not reliable for resolving the internal structure of circRNAs. Compared with previous approaches, which focused on the determination of internal sequence or alternative splicing events, CIRI-full exhibited a high efficiency on reconstructing circular transcripts using the combination of RO and BSJ features (Additional file 1: Figure S24).
Profiling full-length circular RNAs in vertebrate brains
We further examined the orthologous circRNAs present in these six vertebrates and calculated the number of orthologous circRNAs that possessed the same sequences and BSJs and the number of shared genes on their ancestral nodes (denoted by “a”, “b”, “c”, and “d”). As shown in Fig. 4a, the number of shared full-length orthologous circRNAs and BSJs decreased rapidly with increased genetic distance. In contrast, the number of shared orthologous circRNAs decreased much more slowly, thus indicating that although derived from the same genes, the circRNAs of different species diverged rapidly in terms of sequences and BSJs. Next, the expression levels of orthologous circRNAs and genes on each node were estimated. As shown in Fig. 4c, the shared orthologous circRNAs exhibited increased levels of expression compared with lineage-specific circRNAs, whereas this scenario was not observed in the shared orthologous genes from which the circRNAs were derived. These findings suggest that there is a distinct evolutionary conservation pattern of orthologous circRNAs and that these shared circRNAs provide valuable targets for further functional screening. We next surveyed circRNA AS events in the six species. All four types of AS events could be detected within circRNAs in all these species (Fig. 4d). Exon skipping (ES) was the most prevalent AS type in circRNAs. Alternative 3′-splicing site (A3SS) and alternative 5′-splicing site (A5SS) were also major circular AS types, in agreement with our previous study showing that AS events not only occur in mRNAs but are also prevalent in circRNAs . To this end, we investigated the expression level of conserved circRNA isoforms in these species. As shown in Fig. 4e, conserved circRNA isoforms exhibited similar splicing patterns in closely related species. For example, the expression and splicing patterns of these circRNAs were more similar between human and macaque than between human and other species.
Read length is a key determinant of circRNA identification but not quantification
Isoform-level quantification helps filter false positives in differential circRNA expression analysis
To better understand the discrepancy between BSJ-level and isoform-level differential expression analysis, we extracted top 1000 most abundant circRNAs that expressed in at least 80% of the 40 samples and found that 778 of them could be fully reconstructed. Then, we investigated their expression changes between normal and tumor tissues at both BSJ and isoform levels. Specifically, Mann–Whitney U test was employed to calculate the significance of expression alternation and signs of P value were used as proxies for directions of changes in expression (Fig. 6d). Among these 778 circRNAs, 587 of them showed the same significance values between BSJ-level and isoform-level differential expression analysis, because each of these circRNAs only expressed a single isoform, with 66 and 230 significantly up- and downregulated in tumor tissues, respectively (Fig. 6e). Regarding the remaining 191 circRNAs that expressed multiple isoforms, the significance level of their differential expression was generally overestimated because different isoforms of circRNAs with a certain BSJ cannot be distinguished from each other if they are quantified at the BSJ level. After corrected for isoform-level quantification, only 32% of them were still significantly upregulated in tumor samples. The same phenomenon was also observed in downregulated circRNAs, where only 35% of them were kept after correction. Notably, a small number of circRNAs were recognized as differentially expressed isoforms only by the isoform-level quantification, indicating that other isoforms within the same BSJ may interfere with the performance of differential expression analysis solely based on the BSJ-level quantification.
To further investigate whether there were alternative splicing isoform switches present between normal and tumor samples, we extracted the top 50 most highly expressed circRNAs with multiple isoforms and calculated the expression fold change for each isoform between tumor and normal samples. We found that 10 out of 50 circRNAs underwent isoform switches, where the striking expression changes occur in the most abundant isoform (green and red circles, Fig. 6f). In contrast, the BSJ-level differential expression analysis cannot distinguish such scenario. For example, circRNA chr2:207144264|207162097 locating on the ZDBF2 gene could express four circular isoforms (Fig. 6g). The alignments of BSJ reads on the second and fourth cirexons, as well as the splicing events between the fourth and sixth exons, exhibited distinct read supports between tumor and normal samples, indicating the existence of alternative splicing isoform switches. Indeed, the circular isoform with sequence length of 477 nt was the dominant isoform in normal samples, whereas its expression dropped rapidly in tumor samples and the expression of this circRNA was dominated by the circular isoform with sequence length of 334 nt. Moreover, eight circRNAs were illustrated to serve as examples to detail the expression changes at both the BSJ and isoform levels between normal and tumor tissues of 20 patients (Fig. 6h). Although four of these circRNAs were found to be differentially expressed by the traditional BSJ-level quantification, it could not distinguish the real differentially expressed isoforms. In addition, there were also a few cases that certain circular isoforms exhibited increased expression levels across tumor samples, which were missed by the BSJ-level differential expression analysis. Collectively, these results highlight the limitation of BSJ-based quantification and the necessity of extending differential circRNA expression to the isoform level.
Most approaches to circRNA identification rely exclusively on recognizing BSJs, and these methods are of limited application in determining the internal structure of circRNAs. In this study, for the first time, we propose an RO feature-based method for circRNA detection. This new approach is an important addition to the BSJ feature-based method, with each approach having its own advantages and limitations. Compared with the BSJ feature-based method, the RO feature-based method has the following distinct advantages. First, the RO feature provides more solid evidence for identifying full-length circRNAs. This greatly facilitates genome-wide full-length circRNA identification and thereby offers an indispensable advantage for downstream analyses, including functional and evolutionary analyses. Second, the RO feature facilitates the detection of weakly transcribed circRNAs, and even extremely low-abundance circRNAs could be efficiently and accurately identified. For instance, in the HeLa cell dataset, 3887 circRNAs were identified; 14.4% of these were supported by only one read, and 92% were successfully identified by CIRI-full. Considering that a vast majority of circRNAs in various transcriptomes are in low expression levels, the detection of such low-abundance transcripts is extremely important in exploring circRNA profiles. For example, when detecting circRNAs from RNA-seq data sets without RNase R treatment, most of the circRNAs are in low abundance compared with their linear counterparts. Therefore, the ability of identifying and reconstructing low-abundance circRNAs is not trivial in circRNA studies. Third, compared with BSJ reads, RO reads produce a more uniform depth distribution along the normalized circRNA transcript, which greatly facilitates quantification of circRNAs and determination of their internal structures. Compared with the BSJ feature, RO-based circRNA identification also has limitations. First, the RO-based approach requires longer reads to obtain an entire circRNA sequence. Considering that most circRNAs are between 200 and 800 bp in length, it should be possible to easily obtain RO reads as sequencing technology continues to advance. Second, a large library size is required to produce high-quality RO reads during library preparation.
Considering the potential significance of the biogenesis and functions of full-length circRNAs, several computational algorithms have been developed to determine the internal components of these circular transcripts. The reference-based prediction method has been revealed to be error-prone; up to 66% of the predicted full-length circRNAs in our study were demonstrated to contain errors, including false cirexons and missing ICFs. Despite the fact that spliced junction signature-based methods such as CIRI-AS represent an important step forward by facilitating accurate cirexon identification, full-length prediction of circular isoforms with complicated AS events is still not feasible. An alternative approach involves utilization of new sequencing technologies such as PacBio long read sequencing, which promises increases in read length of several orders of magnitude, thereby making full-length circRNA identification considerably easier. However, this is achieved at the expense of higher cost per base and lower throughput. Furthermore, it is not cost-effective to use PacBio long reads to obtain the complete sequences of circRNAs, especially considering that their lengths range from 250 to 800 bp. In contrast, CIRI-full generates high-throughput “long” reads that are sufficient for determining the complete sequences of most full-length circRNAs in an economical and practical manner. Without the need for additional library preparation and with bypassing of the fragmentation step, RO reads are easily obtained using general paired-end sequencing, greatly expanding the applicability of this method.
Besides circular transcript reconstruction, CIRI-full is also the first tool to provide isoform-level quantification for circRNAs. Previous studies revealed the prevalence of AS events within circRNAs and the importance of accurate quantification of circRNA expression to confirm their crucial functions in many biological processes. However, current state-of-the-art transcript quantification approaches largely focus on linear RNAs, which estimate expression abundance at both gene and transcript levels. Till now, there is no available method for isoform-level circRNA quantification. In this study, CIRI-full can successfully determine the abundance of circRNAs at both BSJ and isoform levels by employing the FSG-based algorithm, which reconstructs all full-length isoforms within each assembled full-length circRNA. By applying this approach to 40 RNA-seq data sets of human HCC tumor tissues and their adjacent normal tissues, we identified thousands of circRNAs of which most were assembled into full-length, and we systematically explored the discrepancy between BSJ-level and isoform-level differential expression analysis. We found that the significance level for differential expression of most circRNAs that expressed multiple isoforms was generally overestimated at the BSJ level. A large majority of differentially expressed circRNAs measured at the BSJ level tended to be false positives, as BSJ-level quantification cannot make a distinction among different isoforms within a certain circRNA. Consequently, for circRNAs with multiple isoforms, BSJ-level quantification does not necessarily reflect the real expression changes of certain circular isoforms between normal and tumor samples. These findings not only provide more accurate candidates for functional screening, but also unveil the complexity of circRNA isoform expression.
This study, for the first time, presents a high-throughput approach, CIRI-full, that employs a new feature for full-length circRNA reconstruction and isoform-level quantification. Extensive evaluations demonstrate that CIRI-full exhibits excellent performance in circRNA identification and whole-sequence assembly, as well as isoform reconstruction and quantification. We applied CIRI-full to investigate the evolutionary conservation of circRNAs in brain transcriptomes across six vertebrates. In addition, we systematically compared the difference between the BSJ-level and isoform-level differential expression analyses using human liver tumor and normal tissues and found that a large majority of differentially expressed circRNAs measured at the BSJ level tended to be false positives. For circRNAs with multiple isoforms, isoform-level quantification instead of BSJ-level quantification can reflect the real expression changes of certain circular isoforms between normal and tumor samples. This study provides an indispensable approach for circRNA transcript reconstruction and quantification and highlights the necessity of deepening circRNA studies to the isoform-level resolution.
The availability and requirements are listed as follows:
Project name: CIRI-full.
Project home page: https://sourceforge.net/projects/ciri.
Operating system(s): Linux, Mac.
Programming language: Java, Perl.
We are grateful to National Science Foundation of China, Chinese Academy of Sciences for the support. We also thank the Sequencing and Computing Facilities at Beijing Institute of Life Sciences, Chinese Academy of Sciences that supported this work.
This work was supported by NSFC grants (31722031, 91640117, 91531306 and 31701148), National Key R&D Program (2018YFC0910400), Beijing Natural Science Foundation (JQ18020) and the Strategic Priority Research Program of the Chinese Academy of Sciences [XDB13000000].
Availability of data and materials
The sequencing data generated in this study was deposited to SRA with the following project ID (PRJNA475651) .
FZ conceived the study and proposed the method. YZ implemented the algorithm and designed the software. YZ and PJ analyzed the data. SC and LH performed sequencing and experimental validations. PJ and YZ drafted the manuscript, and FZ revised the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Ashwal-Fluss R, Meyer M, Pamudurti NR, Ivanov A, Bartok O, Hanan M, Evantal N, Memczak S, Rajewsky N, Kadener S. circRNA biogenesis competes with pre-mRNA splicing. Mol Cell. 2014;56:55–66.View ArticleGoogle Scholar
- Guo JU, Agarwal V, Guo H, Bartel DP. Expanded identification and characterization of mammalian circular RNAs. Genome Biol. 2014;15:409.View ArticleGoogle Scholar
- Hansen TB, Jensen TI, Clausen BH, Bramsen JB, Finsen B, Damgaard CK, Kjems J. Natural RNA circles function as efficient microRNA sponges. Nature. 2013;495:384–8.View ArticleGoogle Scholar
- Memczak S, Jens M, Elefsinioti A, Torti F, Krueger J, Rybak A, Maier L, Mackowiak SD, Gregersen LH, Munschauer M, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013;495:333–8.View ArticleGoogle Scholar
- Jeck WR, Sorrentino JA, Wang K, Slevin MK, Burd CE, Liu J, Marzluff WF, Sharpless NE. Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA. 2013;19:141–57.View ArticleGoogle Scholar
- Salzman J, Gawad C, Wang PL, Lacayo N, Brown PO. Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS One. 2012;7:e30733.View ArticleGoogle Scholar
- Salzman J, Chen RE, Olsen MN, Wang PL, Brown PO. Cell-type specific features of circular RNA expression. PLoS Genet. 2013;9:e1003777.View ArticleGoogle Scholar
- Gao Y, Wang J, Zhao F. CIRI: an efficient and unbiased algorithm for de novo circular RNA identification. Genome Biol. 2015;16:4.View ArticleGoogle Scholar
- Li Z, Huang C, Bao C, Chen L, Lin M, Wang X, Zhong G, Yu B, Hu W, Dai L, et al. Exon-intron circular RNAs regulate transcription in the nucleus. Nat Struct Mol Biol. 2015;22:256–64.View ArticleGoogle Scholar
- Gao Y, Wang J, Zheng Y, Zhang J, Chen S, Zhao F. Comprehensive identification of internal structure and alternative splicing events in circular RNAs. Nat Commun. 2016;7:12060.View ArticleGoogle Scholar
- Chen YG, Satpathy AT, Chang HY. Gene regulation in the immune system by long noncoding RNAs. Nat Immunol. 2017;18:962–72.View ArticleGoogle Scholar
- Kristensen LS, Hansen TB, Veno MT, Kjems J. Circular RNAs in cancer: opportunities and challenges in the field. Oncogene. 2017;37:555–65.Google Scholar
- Li X, Liu CX, Xue W, Zhang Y, Jiang S, Yin QF, Wei J, Yao RW, Yang L, Chen LL. Coordinated circRNA biogenesis and function with NF90/NF110 in viral infection. Mol Cell. 2017;67:214–227 e217.View ArticleGoogle Scholar
- Salta E, De Strooper B. Noncoding RNAs in neurodegeneration. Nat Rev Neurosci. 2017;18:627–40.View ArticleGoogle Scholar
- Fei T, Chen Y, Xiao T, Li W, Cato L, Zhang P, Cotter MB, Bowden M, Lis RT, Zhao SG, et al. Genome-wide CRISPR screen identifies HNRNPL as a prostate cancer dependency regulating RNA splicing. Proc Natl Acad Sci U S A. 2017;114:E5207–15.View ArticleGoogle Scholar
- Hirsch S, Blatte TJ, Grasedieck S, Cocciardi S, Rouhi A, Jongen-Lavrencic M, Paschka P, Kronke J, Gaidzik VI, Dohner H, et al. Circular RNAs of the nucleophosmin (NPM1) gene in acute myeloid leukemia. Haematologica. 2017;102:2039–47.View ArticleGoogle Scholar
- Legnini I, Di Timoteo G, Rossi F, Morlando M, Briganti F, Sthandier O, Fatica A, Santini T, Andronache A, Wade M, et al. Circ-ZNF609 is a circular RNA that can be translated and functions in myogenesis. Mol Cell. 2017;66:22–37 e29.View ArticleGoogle Scholar
- Piwecka M, Glazar P, Hernandez-Miranda LR, Memczak S, Wolf SA, Rybak-Wolf A, Filipchyk A, Klironomos F, Cerda Jara CA, Fenske P, et al. Loss of a mammalian circular RNA locus causes miRNA deregulation and affects brain function. Science. 2017;357.Google Scholar
- Yang Y, Gao X, Zhang M, Yan S, Sun C, Xiao F, Huang N, Yang X, Zhao K, Zhou H, et al. Novel role of FBXW7 circular RNA in repressing glioma tumorigenesis. J Natl Cancer Inst. 2018;110.Google Scholar
- Yu CY, Li TC, Wu YY, Yeh CH, Chiang W, Chuang CY, Kuo HC. The circular RNA circBIRC6 participates in the molecular circuitry controlling human pluripotency. Nat Commun. 2017;8:1149.View ArticleGoogle Scholar
- Zheng Q, Bao C, Guo W, Li S, Chen J, Chen B, Luo Y, Lyu D, Li Y, Shi G, et al. Circular RNA profiling reveals an abundant circHIPK3 that regulates cell growth by sponging multiple miRNAs. Nat Commun. 2016;7:11215.View ArticleGoogle Scholar
- Yang Y, Fan X, Mao M, Song X, Wu P, Zhang Y, Jin Y, Yang Y, Chen LL, Wang Y, et al. Extensive translation of circular RNAs driven by N(6)-methyladenosine. Cell Res. 2017;27:626–41.View ArticleGoogle Scholar
- Zhou C, Molinie B, Daneshvar K, Pondick JV, Wang J, Van Wittenberghe N, Xing Y, Giallourakis CC, Mullen AC. Genome-wide maps of m6A circRNAs identify widespread and cell-type-specific methylation patterns that are distinct from mRNAs. Cell Rep. 2017;20:2262–76.View ArticleGoogle Scholar
- Rybak-Wolf A, Stottmeister C, Glazar P, Jens M, Pino N, Giusti S, Hanan M, Behm M, Bartok O, Ashwal-Fluss R, et al. Circular RNAs in the mammalian brain are highly abundant, conserved, and dynamically expressed. Mol Cell. 2015;58:870–85.View ArticleGoogle Scholar
- Zhang XO, Wang HB, Zhang Y, Lu X, Chen LL, Yang L. Complementary sequence-mediated exon circularization. Cell. 2014;159:134–47.View ArticleGoogle Scholar
- Zhang XO, Dong R, Zhang Y, Zhang JL, Luo Z, Zhang J, Chen LL, Yang L. Diverse alternative back-splicing and alternative splicing landscape of circular RNAs. Genome Res. 2016;26:1277–87.View ArticleGoogle Scholar
- Metge F, Czaja-Hasse LF, Reinhardt R, Dieterich C. FUCHS-towards full circular RNA characterization using RNAseq. PeerJ. 2017;5:e2934.View ArticleGoogle Scholar
- Ye CY, Zhang X, Chu Q, Liu C, Yu Y, Jiang W, Zhu QH, Fan L, Guo L. Full-length sequence assembly reveals circular RNAs with diverse non-GT/AG splicing signals in rice. RNA Biol. 2017;14:1055–63.View ArticleGoogle Scholar
- Gao Y, Zhao F. Computational strategies for exploring circular RNAs. Trends Genet. 2018;34:389–400.View ArticleGoogle Scholar
- Gao Y, Zhang J, Zhao F. Circular RNA identification based on multiple seed matching. Brief Bioinform. 2018;19:803–10.View ArticleGoogle Scholar
- Zeng X, Lin W, Guo M, Zou Q. A comprehensive overview and evaluation of circular RNA detection tools. PLoS Comput Biol. 2017;13:e1005420.View ArticleGoogle Scholar
- Wang J, Liu K, Liu Y, Lv Q, Zhang F, Wang H. Evaluating the bias of circRNA predictions from total RNA-Seq data. Oncotarget. 2017;8:110914–21.PubMedPubMed CentralGoogle Scholar
- Hansen TB. Improved circRNA identification by combining prediction algorithms. Front Cell Dev Biol. 2018;6:20.View ArticleGoogle Scholar
- Heber S, Alekseyev M, Sze SH, Tang H, Pevzner PA. Splicing graphs and EST assembly problem. Bioinformatics. 2002;18(Suppl 1):S181–8.View ArticleGoogle Scholar
- Feng J, Li W, Jiang T. Inference of isoforms from short sequence reads. J Comput Biol. 2011;18:305–21.View ArticleGoogle Scholar
- Li JJ, Jiang CR, Brown JB, Huang H, Bickel PJ. Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation. Proc Natl Acad Sci U S A. 2011;108:19867–72.View ArticleGoogle Scholar
- Szabo L, Morey R, Palpant NJ, Wang PL, Afari N, Jiang C, Parast MM, Murry CE, Laurent LC, Salzman J. Statistically based splicing detection reveals neural enrichment and tissue-specific induction of circular RNA during human fetal development. Genome Biol. 2015;16:126.View ArticleGoogle Scholar
- Zheng, Y., Ji, P., Chen, S., Hou, L. and Zhao, F. (2018) RNA-seq data sets from human brain and Hela cell line. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA475651/ Google Scholar
- Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12:357–60.View ArticleGoogle Scholar
- Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33:290–5.View ArticleGoogle Scholar
- Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc. 2016;11:1650–67.View ArticleGoogle Scholar
- Yang Y, Chen L, Gu J, Zhang H, Yuan J, Lian Q, Lv G, Wang S, Wu Y, Yang Y-CT, et al. Recurrently deregulated lncRNAs in hepatocellular carcinoma. Nat Commun. 2017;8:14421.View ArticleGoogle Scholar
- Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013;8:1494.View ArticleGoogle Scholar