Skip to main content
Fig. 1 | Genome Medicine

Fig. 1

From: A robust deep learning workflow to predict CD8 + T-cell epitopes

Fig. 1

Schematic of TRAP workflow and cross-species variation in T-cell recognition features. A Schematic diagram of TRAP (T-cell recognition potential of HLA-I presented peptides), a robust deep learning-based workflow to predict CD8 + T-cell epitopes from MHC-I presented pathogenic or self-peptides. Once peptides have been predicted by NetMHCpan to bind HLA alleles, the TRAP uses the peptide sequence and NetMHCpan rank scores as inputs to predict the immunogenicity of the peptide with the respective HLA binding affinity. The TRAP workflow will output TRAP prediction score along with confidence in its prediction. If the prediction is detected to have a low confidence, we recommend predicting cancer neoepitopes using TESLA [52], which is known to use more general features such as agretopicity and dissimilarity to self-proteome, and pathogenic peptides with RSAT (relative similarity to autoantigens or tumour-associated antigens). B, C Distribution of MHC binding rank scores predicted by NetMHCpan (B) and hydrophobicity (C) for peptides derived from different pathogenic species. CMV: cytomegalovirus; EBV: Epstein-Barr virus; HCV: hepatitis C virus; HBV: hepatitis B virus; SARS-2: SARS-CoV-2; VACV: vaccinia virus; YFV: yellow fever virus. D Statistics of peptides in cross-species dataset (i.e. non-vaccinia virus (non-VACV) peptides for training and VACV peptides for testing, non-SARS-CoV-2 (non-SARS2) peptides for training and SARS-2 peptides for testing), and data randomly divided into 90% train and 10% test, as a resemblance of 10-fold cross-validation. E, F Models trained using cross-species datasets could not effectively predict the immunogenicity of peptides derived from unseen pathogens. ROC-AUC curves of XGBOOST classifiers on training data by 10-fold cross-validations—on 90% data, non-SARS-2 and non-VACV peptides (E). ROC-AUC curves of XGBOOST classifiers on test datasets—10% data, SARS2 and VACV peptides (F). G Sequence logo of amino acids enriched in epitopes (Positive) compared to non-epitopes (Negative) in contact positions for randomly split data (i), SARS-2 data (ii) and vaccinia virus data (iii). H, I High performance may be a reflection of HLA bias. ROC curve of DeepImmuno algorithm on single HLA allele, for peptides bound on HLA-A*02:01 (H) or HLA-A*24:02 (I). J Performance of DeepImmuno algorithm on per-HLA down-sampled dataset, i.e. the number of peptides has been down

Back to article page