Skip to main content


Fig. 4 | Genome Medicine

Fig. 4

From: An expansion of rare lineage intestinal microbes characterizes rheumatoid arthritis

Fig. 4

Prediction model of the gut microbiota for RA status based on the genus-level relative abundances using random forests. a Comparison of the classification error of the random forests-trained model with guessing, which always predicts the class label based on the majority class in the training data set. The boxplots are based on the results from 200 bootstrap samples. Random forests achieved a significantly lower classification error. b Predictive power of individual genera as assessed by the Boruta feature selection algorithm. Blue boxplots correspond to minimal, average, and maximum importance Z scores of shadow genera, which are shuffled versions of real genera introduced to the random forests classifier and provide a benchmark to detect truly predictive genera. Red, yellow and cyan colors show the rejected, tentative, and confirmed genera, respectively, by Boruta selection. Three genera, Eggerthella, Faecalibacterium, and Collinsella, were confirmed by Boruta selection. The genus Collinsella was not identified by univariate tests. c Many RA samples exhibit a large increase in the abundance of Collinsella. Solid and dashed lines indicate mean and median values respectively. d Heat map based on the abundance ranks of the three Boruta-confirmed genera. Red and blue indicate high and low abundance, respectively. Hierarchical clustering (Euclidean distance, complete linkage) shows that RA samples tend to cluster together

Back to article page