Hello guys. Firstly, happy holidays! I hope everyone is well and safe.
I have this particular concern hoping someone can answer me. I'm kind of troubled when I came across comparing the sample with lowest feature count in my
table.qzv (around 58,000 feature counts) but when I visualize the table through alpha rarefaction curve
observed_features (using the command
qiime diversity alpha-rarefaction where
--p-max-depth is set at the maximum number of feature counts I have from the
table.qzv which is around 100k feature counts) the sequencing depth shown by alpha rarefaction curve in this particular sample do not match with the feature count shown in
table.qzv where this sample has 58,000 feature counts but in the rarefaction curve, the sample has sequencing depth at around 43,000 when I check the
observed_features.csv file. I'm assuming here that the feature counts listed in
table.qzv corresponds to the sequencing depth (the x-axis) in alpha-rarefaction curve. Is this normal? Are they computed differently? How can I explain the discrepancy?
This is the alpha rarefaction general command I performed:
qiime diversity alpha-rarefaction \
--i-table table.qza \
--p-max-depth INTEGER \
--m-metadata-file metadata.tsv \
INTEGERis the highest number of feature count in
I am using and running
QIIME2 v2021.8 in Ubuntu Oracle Virtual Box
Edit: P.S. I tried putting
--p-min-depth INTEGER where
INTEGER is set at the lowest number of feature counts (which is 58000) and I can still see the sample. But if I increase the min-depth to from 58000 to 58,001 as the
--p-min-depth INTEGER , the sample disappears from the list which make sense to me if I base the feature counts in
table.qzv since it's the feature count (58,000) is below the cut-off value (58,001 in this example). I kind of don't understand how rarefaction works in this sense and I'm not stat savvy so I'm not sure if I completely understand the jargons in the documentation and some of the forum posts. Some tutorials I've seen do not seem to address this. Nonetheless, my samples are plateauing.
Another Edit: I tried setting the
--p-max-depth INTEGER at 58000 and all samples appear. So now I'm kinda confused how sampling/sequence depth are calculated because if I view the 2nd line graph when I set the INTEGER at the highest number of feature counts, it tells me that I will exclude this sample if I exceed 50,000 but when I set the INTEGER at 58000, all samples still appear.