Heat Map Variables Sequences Do Not Sum to Same Amount

Nicholas_Bokulich · July 14, 2021, 6:31am

Hi @Chantel ,
The heatamps output by q2-sample-classifier only display the top N most important features in the heatmap, not all features, so:

Because it is picking out only the most important features, so even if the total sequence counts for all samples are the same going in, the counts for individual features will likely be different, so will not have the same sum in a subset of features.

This is in part for the same reason as above; because after rarefying you have the same number of sequences in each sample, but not for each feature. So the subset of important features will still not sum to the same amount across all samples.

Random Forests is also rather robust to sequence counts, so unless if you have very skewed sequencing depths across samples I would not expect rarefying to impact the results.

This post gives a nice explanation, and also a link for where to learn more about the algorithms used:

Good luck!