Hello! I am currently analyzing microbiome 16S samples and have a question about my alpha rarefaction plots. My feature table frequency per sample summary looks like this:
When I performed the alpha rarefaction, I set max depth = 20,000 (similar to median frequency from summary table). Instead of the curves leveling out as sampling depth increases, they level out for a bit, then continue to fluctuate (see my observed features alpha rarefaction curve below):
I am new to 16S analyses and have never seen this before. I am wondering if there is anything that could have caused the curves to look like this? Is it possible that I did something wrong during the initial data cleaning (I used dada2)? Thank you for your help!
Alpha rarefaction plots are built using a random sampling for each point. In my experience when we see plots aggregated by a metadata column it is more likely to see fluctuations like those you are observing. How many samples do you have? Could you share your QZV file so we make sure the plot is not aggregated by metadata?
I normally set alpha rarefaction max depth according to the biggest sample, so I can see full curves for each sample. Then, I use that plot in order to decide a sensible threshold for diversity metrics.
Looking at your max and min frequencies, I would discard your shallowest samples (you have at least one with < 1000), but that is completely up to you (you can decide once you see full rarefaction curves with full-depth).
Thank you for all of your help! I have 145 total samples. I am planning set my sampling depth at 3000 for this analysis. I have messaged you my .qzv file. I believe I did aggregate by metadata to get these results, though I am confused as to how aggregating by metadata would produce fluctuations in the results? Is there any way I can avoid this?
Thank you again!
Sandra
Hi @Sandra, I suspect what might be happening is that some samples are dropping out as you get to the right end of the plot, and that's why you're seeing some instability there. In other words, as you move right in the plot you're averaging over fewer samples.
The graph below this one in your .qzv shows the number of samples in each metadata group at each sampling depth, so that will allow you confirm this.
I looked to your QZV file and what @gregcaporaso said is completely true. That's why I suggested generating the alpha rarefaction curves at sample level, i.e. not aggregated by metadata.
That "sensible threshold" may be one at which you don't lose too many samples in exchange for having sufficient depth.