Diagnostic laboratories are rapidly adopting high throughput genomic sequencing for clinical genetic tests. This transition is enabling a dramatic expansion in our ability to diagnose and screen heterogeneous monogenic disorders . One critical aspect of a clinical genomics test is the bioinformatics pipeline used to analyse the sequencing data and output variants for clinical consideration. Thus far most clinical sequencing analysis pipelines have been driven by individual laboratories, who have either developed their own bioinformatics capability for processing data, relied on commercial products or have partnered with research institutions to acquire the expertise needed. This approach has enabled rapid adoption, but has resulted in a wide diversity of implementation approaches and great variability in the methods used for evaluation, interpretation and reporting of variants. When pipelines have been primarily developed for research use they often lack the robustness, provenance and quality control features, maintainability and high degree of automation required in the clinical diagnostic setting. Additionally, many such analysis pipelines are designed without prioritising the ability to generalise to different diseases, technologies or computational contexts. Commercial pipelines can address some of these problems. However they are inevitably constrained in the level of customisation and transparency they can offer due to their commercial nature. Additionally commercial pipelines can be expensive for laboratories to acquire, evaluate and deploy. Altogether these issues impede the standardisation of bioinformatics pipelines for routine diagnostics across multiple clinics and healthcare systems. An analysis pipeline that is specifically designed for the clinical setting and that can be informed and iteratively improved by the clinical diagnostic community has the potential to offer the most effective diagnostic value.
Recognising these issues, the Melbourne Genomics Health Alliance was formed as a collaboration between seven institutions, including hospitals, diagnostic laboratories, universities and research institutes, with the aim of developing a common approach to the analysis and management of genomic data within Australia’s publicly funded healthcare system. A key outcome of the Alliance has been the development of a consensus bioinformatics pipeline, which we have called Cpipe. Cpipe is founded upon best practice analysis components that are emerging in the global clinical sequencing community and are already being employed by many of the members of the Alliance. However, the goal of Cpipe is not to improve upon these core bioinformatics analysis methods, nor is it ultimately to focus on any particular tool set. Rather, the aim of Cpipe is to create a common framework for applying the tools that can be readily adapted for a diverse range of diagnostic settings and clinical indications.
We identified three key requirements for a clinical bioinformatics pipeline that differ from a pipeline intended for research use. First, a clinical pipeline must be designed with a greater emphasis on robust and reproducible analysis. There must be clear records of what analysis was performed and what files were used to generate results. Second, a number of specialised bioinformatics steps are required in clinical settings. For example, one key difference in a clinical setting is the need for variants to be assessed for their relevance to a given patient. Therefore it becomes vital to filter and prioritise variants to speed up this process and thus reduce the time clinicians spend assessing variants. Finally, the pipeline must be highly transparent and modular, so that the individual steps as well as the overall flow of the pipeline are easy to understand and modify. These qualities are critical in the clinical environment to allow laboratories to maintain and adapt pipelines to their needs without compromising on quality.
There have been a number of previous efforts to create publicly available analysis pipelines for high throughput sequencing data. Examples include Omics-Pipe , bcbio-nextgen , TREVA  and NGSane . These pipelines offer a comprehensive, automated process that can analyse raw sequencing reads and produce annotated variant calls. However, the main audience for these pipelines is the research community. Consequently, there are many features required by clinical pipelines that these examples do not fully address. Other groups have focused on improving specific features of clinical pipelines. The Churchill pipeline  uses specialised techniques to achieve high performance, while maintaining reproducibility and accuracy. However it is not freely available to clinical centres and it does not try to improve broader clinical aspects such as detailed quality assurance reports, robustness, reports and specialised variant filtering. The Mercury pipeline  offers a comprehensive system that addresses many clinical needs: it uses an automated workflow system (Valence, ) to ensure robustness, abstract computational resources and simplify customisation of the pipeline. Mercury also includes detailed coverage reports provided by ExCID , and supports compliance with US privacy laws (HIPAA) when run on DNANexus, a cloud computing platform specialised for biomedical users. Mercury offers a comprehensive solution for clinical users, however it does not achieve our desired level of transparency, modularity and simplicity in the pipeline specification and design. Further, Mercury does not perform specialised variant filtering and prioritisation that is specifically tuned to the needs of clinical users.
Cpipe focuses on implementing or improving the three key aspects of clinical analysis pipelines that we have identified. The first aspect includes features that support the robustness and quality of the pipeline operation and these are provided automatically in Cpipe by the underlying pipeline framework, Bpipe . The second aspect is the addition of specialised bioinformatics steps that are required for clinical settings. These include detailed quality reports, additional filtering and prioritisation of variants, and carefully designed output formats that accelerate clinical interpretation. Finally, Cpipe aims to be highly transparent and modular, so that it is easy to understand and modify the underlying tools used. This is critical to ensuring that Cpipe can be deployed in diverse clinical settings and can be updated and shared between different organisations, while still maintaining a common underlying framework.
Cpipe has been developed in close consultation with many different stakeholders from the clinical and research sequencing community in Melbourne, Australia. It is being actively used by three separate institutions for clinical sequencing, and is undergoing accreditation for diagnostic use. By adopting Cpipe, a solution that has already been tested in a diagnostic context, a laboratory can save significant effort in developing a pipeline. Perhaps even more importantly, by adopting Cpipe they can become part of a community of users and developers, and can benefit from the ongoing maintenance and active development that will occur over time. The open source license of Cpipe (GPLv3) will allow users of Cpipe to become contributors to the project, further ensuring its ongoing maintenance and development.