P Sampling Depth Error?

So I'm not sure whether or not this question goes in User Support or Technical Support, but here we go:

I'm attempting to create a bray-curtis PCA plot for the samples I sequenced and successfully ran through QIIME II (qiime2-2021.4, installed through conda), and I am working with 28 samples. The sampling depth I selected, 10,210, should cut out 10 samples, leaving 18 present on the PCA plot. This is not the case, however, as there are only 12 samples that are shown on the PCA plot with this sampling depth. It cut out all of the samples below 10,210 reads, but it also cut out a ton of samples that are above 10,210. For example, two samples had reads of 11,618 and 12,525, and they simply aren't present. It also didn't just happen at 10,210, when I selected lower sampling depths, it continued to cut out certain samples above the sampling depth specified.

Does anybody have any thoughts?

The code I ran was this:

qiime diversity core-metrics-phylogenetic
--i-phylogeny rooted-tree.qza
--i-table table.qza
--p-sampling-depth 10,210
--m-metadata-file sample-metadata.tsv
--output-dir core-metrics-results

Thanks for the help!

I do not know, if I am right or not, but I have an impression that you separated 10 and 210 with ", " not only here on the forum but also in your command. If it is a case could you try to write 10210 without any separator.

Not sure if it is what happened there but now I am curious.

Apologies, I didn't use any commas in my code. It won't let me edit the original post lol

That's interesting. Are you sure that your table.qzv, or alpha-rarefaction.qzv, based on which you are judging about number of sequences in each of your samples, was created with the latest feature table? For example, you could create a visualization, then additionally filter your feature table. In that case there will be differences between those files.

Sorry, I'm not entirely sure I'm following (I will say I'm pretty new to bioinformatics/QIIME as a whole lol). I looked at the number of sequences through the demultiplexed.qzv which is how I determined sampling depth, and I also did everything in one fell swoop, so everything should be created with the latest feature table.

Usually this file is created before Dada2 step. After Dada2, number of retained sequences will be lower due to quality filtering, chimeras removal and reads, failed to merge. After Dada2, researchers can additionally perform more filtering steps to remove rare features and sequences from organelles. This will also lead to decrease in the number of sequences, left for the analysis.
You can visualize your latest feature table artifact and check how many sequences are in each sample to reevaluate sequencing depth.

1 Like

Perfect, I think this worked, but I'll know for certain once I make the PCA plots tomorrow. Thanks for your help Timur!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.