Skip to main content


Fig. 1 | Genome Medicine

Fig. 1

From: A bioinformatic framework for immune repertoire diversity profiling enables detection of immunological status

Fig. 1

Rendering HTS repertoire data suitable for machine learning-based immunodiagnostics. a The clonal distribution and diversity of lymphocyte repertoires may represent a fingerprint of an individual’s current immunological status (e.g., healthy, vaccinated, diseased/infected). b Lymphocyte repertoire 1 represents a uniform repertoire (e.g., resembling that of a healthy individual) as opposed to lymphocyte repertoire 2, which shows a large clonal expansion (few clones dominate the repertoire, e.g., as a result of disease/infection or vaccination). Each color describes one lymphocyte clone (usually defined by the CDR3). c The immediate output of HTS datasets are immune repertoire clonal frequency distributions, which are composed of the frequency of each clone (where frequency is the proportion of the sequencing reads bearing the same clonal identifier [e.g., CDR3 amino acid sequence]). These distributions differ in clonal composition even in inbred mice [9, 15] (Additional file 4); this renders the application of machine learning approaches highly problematic (f) as they require identical composition. d Diversity (αD, derived from the Rényi entropy) alleviates the problem of incomparable datasets by projecting clonal frequency distributions onto the same (reduced) alpha space. Shannon diversity (alpha = 1) and Simpson’s index (alpha = 2) are widely used for diversity comparisons but, depending on the dataset structure, show qualitatively inconsistent Diversity values (Additional file 2). e The Diversity value αD for each alpha signifies an equivalent repertoire in which all clones are equally abundant. These equivalent repertoires represent different portions of the original repertoires, with only the top clones remaining as alpha tends towards infinity. f Diversity profiles (vectors of alpha values) are of identical (alpha-)composition and are therefore suitable for cross-repertoire comparisons by machine learning approaches allowing for their potential application in next-generation immunodiagnostics

Back to article page