Skip to main content
Fig. 5 | Genome Medicine

Fig. 5

From: Sequence dependencies and mutation rates of localized mutational processes in cancer

Fig. 5

Hotspots capture highly mutated 11-mer sets. a Reference base distribution scaled by the mutational profile of signature 17b. The frequency logo (left) shows the percentage of each base that occupies a given position. The information logo (right) shows the Kullback–Leibler divergence (bits) of each base compared to the base distribution in the reference genome (chromosome 1–22; A = 29.5%; C = 20.5%; G = 20.5%, T = 29.5%). This signature-scaled base distribution is used as background input to the probability logo software. b Interpretation of positional dependencies as visualized by kpLogo. The bases of a given k-mer (k ≤ 4) is stacked vertically within the position it starts from with the top base (A1) at the start (position -5) and the bottom base (A4) at the end (position -2). The vertical k-mer (A1A2A3A4) should be interpreted as the most significant sequence of bases at that given position (-5). Only the most significant k-mer is shown at each position. As the logo software (pLogo and kpLogo) maxed out at p-value = 10–300 (equivalent to z-scores above 38.5), significance is reported using z-scores. c Example of motif visualization for signature 17b using four types of logo plots. The frequency logo and the information logo are produced as in panel a. pLogo and kpLogo quantify the surprise of observing a letter given a binomial distribution, where kpLogo only shows the most surprising k-mer (k ≤ 4) at each position. pLogo and kpLogo use as background the expected base distribution under a given signature, for signature 17b, the background is equivalent to the base distributions in panel a. d Signature 17b-assigned 11-mers of all recurrences-levels (1 + ; top horizontal panels), 11-mers with a hotspot in at least one of its instances (2 + ; middle horizontal panels), and 11-mers with a highly recurrent hotspot in at least one of its instances (5 + ; bottom horizontal panels). Information logo plots use as background the base distribution from the reference genome (left logo plot). Genomic span (y-axis) distribution on mutation rates (x-axis; middle histogram). Cancer type distribution within the cohort (right stacked bar plot), colored as in Fig. 1a. e UV-signature 7a-assigned 11-mers with a highly recurrent hotspot in at least one of its instances (5 +). Plots are interpreted as in panel d. f POLE-signature 62-assigned 11-mers with a highly recurrent hotspot in at least one of its instances (5 +). Plots are interpreted as in panel d. g Signature 72-assigned 11-mers with a highly recurrent hotspot in at least one of its instances (5 +). Plots are interpreted as in panel d. h Signature 19-assigned 11-mers with a highly recurrent hotspot in at least one of its instances (5 +). Plots are interpreted as in panel d

Back to article page