Best practices for bioinformatic characterization of neoantigens for clinical utility

Richters, Megan M.; Xia, Huiming; Campbell, Katie M.; Gillanders, William E.; Griffith, Obi L.; Griffith, Malachi

doi:10.1186/s13073-019-0666-2

Table 2 Key analysis considerations and practical guidance for clinical neoantigen workflows

From: Best practices for bioinformatic characterization of neoantigens for clinical utility

Analysis area	Guidance
Reference genome sequences	The choice of human reference genome sequences can have important implications for various analysis steps throughout neoantigen characterization workflows. A consistent build or assembly (e.g., GRCh38 or GRCh37) of the genome should be used throughout the analysis. Even if two resources provide annotations that are based on the same assembly, they may organize or name sequences differently and might follow different conventions for representing ambiguous or repetitive sequences. They may also drop some sequences (e.g., alternative contigs) or add sequences that are not part of the official assembly (e.g., ‘decoy’ sequences). The use of reference files from multiple sources for different tools is difficult to avoid but should be pursued cautiously. For example, the naming of chromosomes and contigs used for DNA read alignment and variant calling should be compatible (identical) to those used in transcript annotations. Otherwise, this may prevent correct prediction of the protein sequences of neoantigens
Use of alternative contigs in the reference genome	The inclusion or exclusion of alternative contigs from the latest human reference genome build can have important implications for HLA typing tools such as xHLA [74]. In particular, if a tool assumes that all relevant reads for HLA typing can be extracted from an existing alignment (rather than performing de novo re-alignment of all reads), it matters whether some of these reads may have been placed on alternative contigs for the HLA locus of chromosome 6. Some HLA typing approaches avoid this issue by aligning all reads directly to a database of known HLA gene sequences (e.g., from the IPD-IMGT/HLA resource). This has the disadvantage that without competitive alignment of each read to the whole genome, some reads may be misaligned to the known HLA sequences and this may affect accuracy during HLA typing. A reference genome alignment approach, in which the diversity of HLA loci is properly represented in the reference, avoids this concern and has the potential to leverage alignments that may have already been produced for variant calling. For example, all reads aligning to the HLA loci of chromosome 6, the corresponding alternative contigs (if present in the reference), and unaligned reads could be extracted from a BAM file and used for HLA typing
Transcript annotation build versions	Transcript annotation resources (e.g., Ensembl, RefSeq, GENCODE, and Havana) update their transcript sequences and associated annotations more frequently than new reference genome sequence builds/assemblies are released. For example, Ensembl is currently on version 96, the 21st update since the latest release of the human reference genome, build GRCh38. As with reference genome builds, it is highly desirable to use a consistent set of transcript annotations across the steps of a neoantigen characterization workflow. For example, the transcripts used to annotate somatic variants should be the same as those used to estimate transcript and gene abundance from RNA data
Variant detection sensitivity	Correct neoantigen identification and prioritization rely on somatic and germline variant detection (for proximal variant analysis) and variant expression analysis. QC analysis of both DNA and RNA data should be performed to assess the potential for a high false-negative rate in detecting somatic variants that might lead to neoantigens, to identify germline variants in phase with somatic variants that influence the peptide sequence bound by MHC, or to assess the expression of these variants. Tumor samples vary significantly in their level of purity and genetic heterogeneity. Common strategies to achieve high sensitivity in variant detection involve increasing the average sequencing depth and combining results from multiple variant callers
Combining variants from multiple callers	The majority of somatic variant callers now use the widely adopted variant call format (VCF). Furthermore, many toolkits now exist for the manipulation of these files, including merging. However, because of the complexity and flexibility of the VCF specification (https://samtools.github.io/hts-specs/VCFv4.2.pdf), the existence of multiple versions of the specification, and the varying interpretations of VCF rules observed in the output of somatic variant callers, great care must be taken when combining multiple VCFs and using these merged results. Important considerations include: (i) variant justification and parsimony such as left aligning or trimming variants to harmonize those that can be correctly represented at multiple positions without changing the resulting sequence (e.g., GATK LeftAlignAndTrimVariants); (ii) normalization of multi-allelic variants by separating multiple variant alleles that occur at a single position into multiple lines in a VCF (e.g., vt decompose); (iii) harmonization of sequence depths, allele depth, and allele fraction values that may be calculated inconsistently by different variant callers through the use of an independent counting tool, such as bam-readcount (https://github.com/genome/bam-readcount); (iv) determining the final status for each variant (PASS or filters failed; e.g., GATK SelectVariants); and (v) choosing the variant INFO and FORMAT fields to represent in the final merged VCF
Variant refinement (‘manual review’)	Somatic variant calling pipelines remain subject to high rates of false positives, particularly in cases of low tumor purities or of insufficient depth of sequencing of tumor (or matched normal) samples or sub-clones. Prior to final neoantigen selection, all somatic variants should be carefully reviewed for possible alignment artifacts, systematic sequencing errors, nearby in-phase proximal variants, and other issues using a standard operating procedure for variant refinement, such as that outlined by Barnell et al. [27]
Choosing RNA and DNA variant allele fraction (VAF) cutoffs	It is impossible to define universal VAF recommendations because of the varying distribution of VAFs observed for tumor samples with different sequencing depths, tumor purity/cellularity, genetic heterogeneity, and degree of aneuploidy. The interpretation of each individual candidate may be influenced by one or more of these factors. In general, however, neoantigens corresponding to somatic variants with higher VAFs (in both DNA and RNA) will be considered with higher priority. Estimating the overall purity of the DNA sample by VAF distribution and distinguishing founding clones from sub-clones requires accurate assignment of each variant to a copy number estimate. Accepting or rejecting candidates on the basis of VAF requires a nuanced approach that takes the characteristic of each tumor into account. For example, a variant with a relatively low DNA VAF may be accepted in some cases if sequencing depth at the variant position was marginal, leading to a less accurate VAF estimate. A variant with a relatively high DNA VAF may be rejected if RNA-seq analysis shows strong evidence of allele-specific expression (of the wild-type allele)
Interpretations that depend on RNA quality assessment	Attempting to define expressed and unexpressed variants by RNA-seq analysis is a common feature of many neoantigen characterization workflows. Applying hard filters in this area should be pursued with great caution. All interpretation of RNA-seq should be accompanied by comprehensive QC analysis of the data [204]. A lack of evidence for expression in RNA-seq data may not be definitive evidence of non-expression of a variant because not all genes can be robustly profiled by RNA-seq (for example, very small genes may be poorly detected by standard RNA-seq libraries [205]). Tumor samples that are obtained in clinical workflows, particularly those involving FFPE, may frequently result in poor-quality RNA samples. In these cases, the requirements for expression support may be relaxed when nominating neoantigen candidates. Furthermore, some variants occur within a region of a gene that is difficult to align reads to. In these cases, robust apparent expression of the gene may still be used to nominate a neoantigen even in the absence of evidence supporting the expression of the variant allele itself. Use of spike-in control reagents and routine profiling of reference samples can be helpful in determining consistent expression value cutoffs (e.g., FPKM or TPM values) across samples. In the absence of reliable gene or variant expression readout for an individual tumor, robust expression of the gene in tumors of the same type may be used to prioritize neoantigens
Assessing variant clonality	A major consideration in the interpretation of DNA VAFs of variants is the assessment of tumor clonality. Neoantigens corresponding to variants that reside in the founding clone are inherently more valuable therapeutically than those residing in tumor sub-clones, because the former have the potential to target the elimination of all tumor cells. In personalized cancer vaccine designs, after correcting for ploidy and tumor purity, VAFs should be interpreted to prioritize neoantigens that correspond to founding clones
Variant types and agretopicity	Calculation of ‘agretopicity’ (also known as ‘differential agretopicity index’ [121], or ‘wild-type/mutant binding affinity fold change’) refers to an attempt to estimate the degree to which a neoantigen’s ability to bind to MHC differs from that of its corresponding wild-type sequence. This calculation thus depends on the ability to define a wild-type counterpart for each neoantigen sequence. For non-synonymous SNVs, the wild-type counterpart sequence is assumed to be a peptide of the same length without the amino acid substitution. For many other variant types, defining a counterpart wild-type sequence is much less obvious because the variant may lead to a sequence that is entirely novel and shares little or no homology with the wild-type sequences encoded from the region of the variant. These include frameshift mutations caused by deletions or insertions, translocations that lead to in-frame or frame-shifted RNA fusions, alternative isoforms caused by aberrant RNA splicing that lead to partial or complete intron retention, novel exon junctions, and so on. In these cases, agretopicity values are typically not calculated and may be reported as not applicable. This should be taken into consideration when prioritizing variants of mixed type using these values. Interpretation of agretopicity is primarily relevant when the mutant amino acid(s) involve anchor residues of the MHC [206]
HLA naming conventions	Neoantigen characterization workflows should consistently adopt the widely used standards and definitions for the communication of histocompatibility typing information [207]. Briefly, HLA alleles are named using an HLA prefix followed by a hyphen, gene designation, asterix separator, and four fields of digits delimited by colons (e.g., HLA-A*02:101:01:02 N). The four fields (typically of two or three digits each) represent the allele group, specific HLA protein, synonymous changes in the coding region, and non-coding differences, respectively. Several popular HLA typing bioinformatics tools only report two field HLA types. The first two fields are generally sufficient for pMHC binding affinity predictions because they describe any polymorphisms that influence the protein sequence of MHC. However, three-field typing might be desirable for patient-specific assessment of expression, because even silent variations in the DNA sequence of the HLA locus may influence read assignments to specific alleles
HLA typing (class I vs II typing)	Accurate HLA typing is critical to neoantigen characterization workflows. Without accurate knowledge of the HLA alleles of an individual, it is not possible to predict pMHC binding and presentation on tumor cells or cross presentation by APCs. Many clinical- or research-grade HLA typing assays are available, and they rely on PCR amplification or, more recently, NGS data. HLA typing results from a CAP/CLIA-regulated assay are expected to be robust and remain the gold standard. In addition to clinical HLA typing, there are now several bioinformatics tools and pipelines available for HLA typing from whole genome, exome, or RNA-seq data (Table 1). Several groups have now conducted comparisons between the results of these tools and clinical assay results and have reported high concordance, particularly for class I typing. Class II typing remains challenging, with fewer tools available and poorer consistency between the results of these tools and clinical assays. Use of clinical-typing results remains advisable for class II. As in other areas of neoantigen analysis, the use of a consensus approach involving multiple tools has become a common strategy to increase confidence in HLA typing results [208]
HLA typing (selection of data type and samples)	Several options are available for input data when performing HLA typing from NGS data, including DNA (WES or WGS) or RNA-seq data. RNA-seq data often exhibit highly variable coverage across the HLA loci, potentially leading to variable accuracy in typing for each. Coverage data from exome data may vary depending on the exome reagent’s design (probes selected against HLA regions) and capture efficiency. Care should be taken to evaluate sufficient read coverage for each HLA locus when assessing HLA-typing confidence. WGS data may exhibit comprehensive breadth of coverage, but generally at the expense of overall depth of coverage (again coverage achieved for the HLA loci specifically should be evaluated). In addition to data type, there is also the choice of whether to perform HLA typing using data from the tumor itself or a reference normal sample. The normal sample has the advantage that it should represent the germline HLA alleles present in both the initiating cells of the tumor and the antigen presenting cells of the immune system (relevant for cross-presentation). In many clinical and research workflows, the quality of genomic DNA may be higher in the normal sample than in the tumor (often a FFPE-preserved sample). The genomic DNA of the tumor may also be complicated by aneuploidy that affects the HLA loci (which is important to observe and has the potential to interfere with HLA typing). HLA typing using the tumor DNA data has the advantage that it may more accurately reflect the MHC binding and presentation of neoantigens on the surface of the targeted tumor cells. However, it is important to note that HLA-typing tools are, for the most part, not designed for de novo HLA typing; instead, they seek to determine which of a list of known alleles best explain the sequence reads of a given data set. HLA-typing tools also generally do a poor job of reporting HLA-typing confidence. At present, identification of the loss of expression or a somatic mutation of an HLA allele in a tumor is perhaps best treated as a separate exercise from HLA typing. One strategy for choice of data for HLA typing is to use all of the datasets available (DNA and RNA, normal and tumor), to note any discrepancies, and to investigate them
HLA expression and mutation	Loss of expression of MHC molecules by HLA deletion (or downregulation) and somatic mutation of HLA loci have both been identified as possible resistance mechanisms for immunotherapies [76]. It is therefore desirable for neoantigen characterization workflows to incorporate examination of HLA expression and somatic mutation in the tumor. Unfortunately, very few tools and best practices exist for these examinations. Given the sequence diversity of the HLA loci across individuals, when estimating the expression of HLA transcripts in a tumor, it is desirable to customize the reference transcripts used (e.g., from the IPD-IMGT/HLA resource) for each individual’s HLA type by using the results of HLA genotyping to select the matching transcript sequences (three-field matched) for expression abundance estimation (for example, with Kallisto)
Class I versus class II allele specification for binding prediction algorithms	Class I HLA alleles are typically supplied to binding affinity prediction algorithms using a standard two-field format (e.g., HLA-A02:01). However, class II alleles are often supplied as a pair using valid two-field pairing combinations (e.g., DQA101:01-DQB1*06:02) to reflect the functional dimers of class II MHC. Peptide MHC prediction tools will typically document the syntax and list the valid pairings for which binding-affinity predictions are supported
Proximal variation	Neoantigen selection pipelines often focus entirely on one variant or position at a time, and consider it to be independent of all nearby variations. It is important to examine candidates carefully to determine whether nearby variation exists that is both in phase (on the same allele) and close enough to influence the peptide sequence and therefore the MHC binding predictions [117]
Peptide-length considerations	Many human class I pMHC binding affinity prediction tools support a range of peptide lengths for each individual HLA allele (e.g., IEDB supports lengths of 8–14 amino acids for class I for HLA-A*01:01). Typically, although multiple lengths are supported, the peptides that are found to have strong binding will be highly biased towards the lengths actually favored by the allele (for example, many human HLA alleles strongly favor nonamers). The open binding groove of MHC class II is thought to support a greater range of peptide lengths. This is reflected in some class II binding prediction tools, although it should be noted that the IEDB API and web resource currently enforce a length of 15 amino acids only
Relationship between genomic variants and short peptides	There is a complex relationship between genomic variants and the short peptide neoantigen candidates that they might represent. Though rare, it is possible for multiple distinct somatic variations to result in the same amino acid change (for example, several single nucleotide substitutions affecting a single triplet codon) and therefore they might lead to identical neoantigens. If these variations were to occur on opposite alleles, it might be important to analyze them separately because they could differ in expression level and/or their proximal variants, giving rise to distinct peptides. Other ways in which a single genomic variant can give rise to distinct short peptides for pMHC binding prediction include: (i) a homozygous somatic variant representing two distinct alleles; if these alleles are in phase with one or more nearby heterozygous proximal variants, distinct peptide sequences may result; (ii) SNVs expressed in different RNA transcripts or isoforms that differ in their reading frame at the position of the variant, in the inclusion or exclusion of nearby alternative exons, or in the nearby use of alternative RNA splicing donor or acceptor sites; and (iii) multiple short peptides that result simply from shifting the ‘register’ of the somatic variant in a short sequence or from the use of multiple peptide lengths (e.g., 8–11-mers) during the prediction of pMHC binding affinity. In some ways, mostly similar peptide sequences do not matter in peptide vaccine design because a longer peptide will ultimately incorporate several of them into a single peptide sequence. However, pMHC binding prediction algorithms require that you supply a short sequence, of a specific length with the variant in a particular register, and each of these lead to different predicted binding affinity values. Making decisions about how to summarize, collapse, filter, and select representatives is one of the complexities that are addressed by pipelines such as pVACtools
Importance of transcript annotation quality and choice to select a single transcript variant annotation	Peptides that are considered as potential neoantigens are generally derived from the anticipated open reading frame of a known or predicted transcript sequence. A common consideration in variant effect annotation is whether to allow annotations for each variant against multiple transcripts or whether a single representative transcript should be selected. If choosing a single transcript for each gene, multiple strategies exist including the following: (i) use of a pre-selected automatically determined or manually curated choice of ‘canonical’ transcript for each gene; or (ii) considering all transcripts but selecting the single transcript that results in the most confident and/or consequential predicted functional impact. The latter is the basic intent of the ‘--pick’ option of the Ensembl Variant Effect Predictor (VEP), which chooses one block of annotations for each variant using an ordered set of criteria (refer to the VEP documentation for extensive details). The benefit of choosing a single transcript for the annotation of each variant is simplicity, and in many cases, it will result in the selection of a suitable peptide sequence for neoantigen analysis. However, the downside is that distinct peptides may not be considered and the peptide corresponding to the selected annotation is not guaranteed to be the best. Note that a single variant may be assigned annotations for: multiple genes, multiple transcripts of the same gene, and multiple effects for the same transcripts. For example, a single variant can be annotated as splicing-relevant (near the edge of an exon causing exon skipping) and also as missense (causing a single amino acid substitution). The same variant could be silent for a different transcript of the same gene and have a regulatory impact on a transcript of another gene. Making sensible automated choices about how to choose and report neoantigen candidates that correspond to these variants is a complexity that neoantigen characterization workflows seek to address
Importance of transcript annotation quality	When using VEP, it can be important to consider the Transcript Support Level assigned by Ensembl. As described above, this classification is one of many factors that are considered in choosing a single ‘best’ transcript for the annotation of variants. Occasionally, a variant annotation will be reported with a dramatic effect (e.g., nonsense) but on further inspection, it is found that this effect is only true for a transcript that is poorly supported by sequence evidence, and another more reliable transcript would lead to different candidate neoantigen sequences
Selection of pMHC binding affinity prediction cutoff(s)	Many pMHC binding prediction tools report binding strength as an IC50 value in nanomolar (nM) units. Peptides that have a binding affinity of less than 500 nM are commonly selected as putative strong binding peptides. However, the widespread use of this common binding strength metric may provide a false sense of consistency. Trusting a simple cutoff of 500 nM from a single algorithm should be avoided, but combining scores from multiple algorithms should also be pursued very cautiously. The range, median, and even shape of distribution of IC50 scores varies dramatically across algorithms, even when applied to exactly the same peptides [8]. Further complicating the selection process, the accuracy of the IC50 estimates varies across HLA alleles (reflecting the biased and variable strength of experimental evidence used to train generalized predictive models). Partially addressing this concern, the IEDB now provides recommended ‘per allele’ binding-score thresholds for the selection of strong binders
Interpretation of binding affinity from multiple binding prediction algorithms	Given the variability in IC50 predictions across binding prediction algorithms, some neoantigen workflows involve the use of multiple binding prediction tools and attempt to calculate or infer a consensus. Best practices for determining such a consensus are poorly articulated, and limited gold-standard independent validation data sets exist to evaluate the accuracy of divergent predictions. Unsophisticated but pragmatic approaches currently involve reporting the best score observed, calculating the median score, determining average rank values, or manually visualizing the range of predictions across algorithms for promising candidates, before making a qualitative assessment
Neoantigen candidate reporting, visualization, and final prioritization	Prior to the final review of candidates, the automated filtering of variants and peptides that do not meet basic criteria (VAFs, binding affinity, and so on) is performed to provide a more interpretable result. As discussed above, a single genomic variant can lead to many candidate peptide sequences (resulting from alternative reading frames, peptide lengths, registers, and so on). At the time of final candidate review and selection, a common strategy is to use a pipeline that will automatically choose a single representative (best) peptide for each variant in a filtered result. Similarly, a condensed report may be generated to present only the most important information about each candidate. Final assessment of a candidate neoantigen can easily involve the consideration of 20–50 specific data fields. Review of this data in spreadsheet form can be time-consuming and inefficient, and can make it difficult to consider some data in the context of a cohort of comparators (for example, expression values are often best interpreted relative to reference samples). Tools such as pVACviz are now emerging to facilitate more efficient visual interfaces for neoantigen candidate review
Vaccine manufacturing strategy	In the case of personalized cancer vaccine trials, the method of vaccine delivery can influence bioinformatics tool selection and other analysis considerations. For example, if candidates are to be encoded in a DNA vector, a tool such as pVACvector may be used to determine the optimal ordering of the peptide candidates. Owing to the combinatorial nature of candidate peptide sequence ordering, and the need to examine all pairs for junctional epitopes, this is currently one of the most computationally expensive and time-consuming steps of these workflows. Similarly, if peptides are to be synthesized for a peptide vaccine, there is a need to predict possible problems with synthesizing each peptide (for example, by calculating ‘manufacturability’ scores)

A detailed summary of analysis and interpretation best practices and nuances that should be considered when implementing a neoantigen identification workflow. Topics are covered in an order that corresponds to the flow of major steps discussed in the main body and depicted in Fig. 1. For further nuanced details on how to put the following guidance into practice, please refer to our tutorial on precision medicine bioinformatics (https://pmbio.org/). Abbreviations: CAP College of American Pathologists, CLIA The Clinical Laboratory Improvement Amendments, FPKM fragments per kilobase of exon model per million reads mapped, TPM transcripts per million

Back to article page

ISSN: 1756-994X

Contact us

Submission enquiries: editorial@genomemedicine.com
General enquiries: info@biomedcentral.com

Genome Medicine

Contact us