I'm wondering what is the correct answer for this "Question" in the moving pictures tutorial:
When grouping samples by “body-site” and viewing the alpha rarefaction plot for the “observed_features” metric, the line for the “right palm” samples appears to level out at about 40, but then jumps to about 140. What do you think is happening here? (Hint: be sure to look at both the top and bottom plots.) QIIME 2 View
(I think there is an error about the numbers 40 and 135 when I look at the graph but the question still here.)
The number of sample begin at 9 and is bring down to 3 at about 900 of depth and in the same time the observed features rise up from 45 to 135 (which means the 6 other samples pull down the observed feature median when they can be here).
So I wonder what depth threshold we can use for the body-site analyse: 9 samples to increase the power of the statistical analysis or only 3 samples but with much depth. This choice can completely change the results.
As the 3 samples have more depths, maybe there are better in sequencing quality (DNA extraction...) than the 6 other so putting the 6 outside of the analyse is a good choice. Or maybe the 3 samples are just different in observed feature for other reason.
Can you help me for the interpretation please?
More broadly, in this tutorial why the alpha rarefaction plotting is not before the alpha and beta diversity? Because I think the alpha rarefaction plotting can help to choose the depth threshold, no?
Thank you for creating qiime2 and the support you provide .
I already have multiple answers for my others questions (and quickly), this is really helpful. Thanks!
Thanks again for your patience! I am happy to discuss all of your questions below:
I first want to call out your note at the end here - that 40 and 135 are inaccurate. The process of rarefaction is inherently random. So if you ran the commands in this tutorial and were seeing slightly different results in the alpha rarefaction plot, that is actually to be expected. Here are a couple of examples of the exact same dataset being rarefied, with slightly different results:
I would highly recommend watching this brief tutorial on rarefaction to help clarify what is happening behind the scenes here, and why we say about 40 and 140 (instead of precisely 40 and 140).
Regarding the actual question in our tutorial - we are seeing a change in the number of observed features (as well as the number of samples, if you're looking at both the top and bottom plots in this example) with respect to the sampling depth.
Depending on what type of analysis you are doing and/or what data is being focusing on, you may either want to pick a sampling depth that retains most of the features or most of the samples. One consideration to help guide your decision could be the study's sample size - if you have a study with hundreds of samples, you may not need to retain as many as in a smaller study (such as the PD Mice Tutorial) where we are working with a very small number of samples.
To take this one step further would be to examine the number of samples that are present within the subgroups you are utilizing for your study. For example, if you had a study of 1,000 patients and your control group only made up 10 or 20 of those patients, you may want to pick a sampling depth that would retain all of those patients' samples.
Other considerations would be what specifically you are attempting to test, what conclusions you are hoping to draw, etc. Does your study require an analysis of ALL of the observed features seen in a given body site? If that was the case, you'd want to pick a sampling depth that will retain most (if not all) of your observed features.
Great observation! Depending on the specific analysis you are performing, you may be exactly right - that you would want to perform the alpha rarefaction plotting first. However, in the Moving Pictures Tutorial we include the alpha-rarefaction step after our diversity metrics simply because its a convenient time to talk about things.
If for we rarefy to 889 to have a maximum of sample, we will have a sequencing depth effect which is not erased by the rarefaction since samples with few sequences have less observed features than the others: see figure below. In the figure we can clearly see that the 3 samples with ~ 9000 depth have more observed features than the 6 other samples even when rarefying at 889 depth.
Doesn't that bias the analyzes in alpha and beta diversity?
@JeremyTournayre, yep you're exactly right! If you were to use a sampling depth of ~900, you would definitely see that each body site has a different number of observed features relative to each other than they do at a sampling depth greater than ~1200. This is another important consideration to take into account, and why these rarefaction plots can be such a helpful tool - you want to make sure the sampling depth you select accurately depicts the relationship between your sub-samples! I've included a screenshot of the alpha rarefaction plot for these samples and marked the sampling depth you mentioned above so you can more clearly see this.