Unable to detect taxa with few assignments via classify-sklearn

ahmet · October 27, 2022, 12:12pm

Hi everyone,

I've been using qiime2 for a few weeks and I can analyze my paired end16s soil data however I'm experiencing some differences with qiime1 results. When I compared the taxonomy assignments from qiime1 (97% OTU clustering with vsearch and RDP classifier, greengenes db) to qiime2 (dada2 denoising + readily available silva classifier from resources webpage) I saw from the taxonomy barplot that qiime1 had assignments to a lot more different OTUs.

I have checked various posts and adjusted my dada2 settings which improved my assignments on qiime2 pipeline quite a lot (since I had lot more reads passing the filters). In the end, I get a similar amount of total features assigned to different taxa however, qiime2 results give a much better level of specificity (more genus/species level assignments) but from qiime1 I get a few hundred more OTUs that have mostly 1 to 10 features assigned to them. Should I treat the OTUs found with a small number of assigned reads from qiime1 pipeline as potentially false positives? Or maybe change something in the classifier part to include more bacteria that have a small number of features assigned to them? This difference is quite important to me because I'm trying to quantify how many different bacteria I have in my test samples to extrapolate how many samples I would need (a ballpark estimation based on beta diversity analyses) to hit a specific number of different bacteria.

Thanks a lot for your time!

colinbrislawn · October 28, 2022, 3:58pm

Yeah, I would! Qiime 1 has an issue with OTU inflation / false positives. Check out all the Spurious OTUs from Qiime1 in this table:

I would not use Qiime1 to do that for exactly the reasons you describe.

This is the way

system · November 28, 2022, 9:58pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.