Hello guys. Firstly, happy holidays! I hope everyone is well and safe.
I have this particular concern hoping someone can answer me. I'm kind of troubled when I came across comparing the sample with lowest feature count in my table.qzv
(around 58,000 feature counts) but when I visualize the table through alpha rarefaction curve observed_features
(using the command qiime diversity alpha-rarefaction
where --p-max-depth
is set at the maximum number of feature counts I have from the table.qzv
which is around 100k feature counts) the sequencing depth shown by alpha rarefaction curve in this particular sample do not match with the feature count shown in table.qzv
where this sample has 58,000 feature counts but in the rarefaction curve, the sample has sequencing depth at around 43,000 when I check the observed_features.csv
file. I'm assuming here that the feature counts listed in table.qzv
corresponds to the sequencing depth (the x-axis) in alpha-rarefaction curve. Is this normal? Are they computed differently? How can I explain the discrepancy?
This is the alpha rarefaction general command I performed:
qiime diversity alpha-rarefaction \
--i-table table.qza \
--p-max-depth INTEGER \
--m-metadata-file metadata.tsv \
--o-visualization alphararefaction.qzv
where
INTEGER
is the highest number of feature count intable.qzv
I am using and running QIIME2 v2021.8
in Ubuntu Oracle Virtual Box
Thank you!
P.S.
Edit: P.S. I tried putting --p-min-depth INTEGER
where INTEGER
is set at the lowest number of feature counts (which is 58000) and I can still see the sample. But if I increase the min-depth to from 58000 to 58,001 as the --p-min-depth INTEGER
, the sample disappears from the list which make sense to me if I base the feature counts in table.qzv
since it's the feature count (58,000) is below the cut-off value (58,001 in this example). I kind of don't understand how rarefaction works in this sense and I'm not stat savvy so I'm not sure if I completely understand the jargons in the documentation and some of the forum posts. Some tutorials I've seen do not seem to address this. Nonetheless, my samples are plateauing.
Another Edit: I tried setting the --p-max-depth INTEGER
at 58000 and all samples appear. So now I'm kinda confused how sampling/sequence depth are calculated because if I view the 2nd line graph when I set the INTEGER at the highest number of feature counts, it tells me that I will exclude this sample if I exceed 50,000 but when I set the INTEGER at 58000, all samples still appear.