Total matches and average number of other genes patented plotted against k-mer size. We searched for matches of a given fragment size (k-mer) of all genes relative to all other genes. We fragmented each Consensus Coding Sequence (CCDS) gene (n = 18,382) into variable lengths (x-axis) and then (a) summed the percentage of genes that would match another CCDS gene (y-axis). (b) We then examined, for each gene, the number of other genes that would match any of its own k-mers. We plotted the distribution of these cross-gene matches (y-axis) across all CCDS genes, with a varying k-mer size (x-axis). Boxplots show the distribution of the data as the median (middle line), 75th and 25th percentiles (top and bottom of red boxes, respectively) and any data points beyond the distribution (shown as dots).