I have an experiment with multiple samples types, some of which have wildly differing number of sOTU features (adult mouse faeces and colonic mucosa vs. placenta and pup meconium).
These differences were expected due to differences in microbial load.
We have included negative extraction and sequencing controls and processed the feature table with decontam so I am relatively sure that any features remaining are a true biological signal.
My question is whether it is valid to rarefy at different depths (for alpha and beta diversity calculation) for high vs. low microbial biomass samples?
I agree with the principle of rarefication to avoid biasing diversity by sequencing depth but feel that imposing a very low depth on more diverse samples will undersample them and lose biological information.
Surely it would be more appropriate to rarefy to different depths for each sample type to avoid technical variation within sample types while retaining biological variation between sample types?
My plan is to rarefy to each depth in QIIME2 and calculate diversity metrics and ordinate with vegan or phyloseq
If you have not already, I recommend reading this article (at least figure 1!):
Do not rarefy at different depths, as this will introduce a significant technical variation between sample types. It would be better to not rarefy at all than to rarefy at different depths. If you have significant differences in read counts between samples, you could instead consider using diversity metrics that are insenstive to sampling depths, e.g.:
This is a very valid concern. Rarefaction would only be needed for classical diversity analyses, however, and not for other steps, e.g., differential abundance testing or qualitative comparisons of taxonomic composition. So the biological information is only lost when estimating alpha and beta diversity. You can use alpha and beta rarefaction methods (i.e., with repeated subsampling, see the actions in q2-diversity) to determine whether the sampling depth is sufficient for making a fair comparison between high- and low-biomass samples, or if the lower rarefaction depth (to enable of low-biomass samples in the comparison) leads to loss of too much information from the high-biomass samples.
The other option of course is to rarefy at different depths if you do not compare the high- vs. low-biomass samples, only make comparisons within these groups (i.e., only compare samples rarefied at the same depth)