Hi,
I compared Greenngenes1 and Greenngenes2 and noticed a significant drop in features (from 1,220 to 420) and total frequency (from ~4M to ~2M) in Greenngenes2-table.
Is that expected?
I used qiime greengenes2 non-v4-16s command and gg-13-8-99-515-806-nb-classifier.qza classifier for Greenngenes1.
My data is V4 sequences but when I try qiime greengenes2 filter-features and then qiime greengenes2 taxonomy-from-table I get ‘No requested tips found’ error. After reading this post I understood that I can also run non-v4-16s command.
Interesting, thank you! If you don't mind, what environment are these data from? Do you recover more total frequency with a relaxed clustering threshold?
Given the trim and truncation, I would not anticipate ASVs to be found on filter-features. We primarily placed 90, 100, and 150nt sequences based on the EMP protocol. These sequences will tend to start with TAC on the 5' as that is immediately proximal to the 515F fwd primer
Thank you Daniel,
I didn't completely understand your response. We work with human and mouse samples, and the problem I described is from a mouse dataset - we did not have similar issues with a human vaginal dataset. I am using qiime2 2023.7. In all cases, we trimmed only the barcodes from the front end of the sequences.
Can you explain what you mean by the relaxed clustering? I used the default clustering threshold in DADA2 and did not specify a threshold in the gg2 command.
Can you explain what the gg2 filtering does exactly?
If I would like to retain my taxa table, including unnamed taxa, can I use gg2 as a classifier (in qiime feature-classifier classify-sklearn command) rather than using the filtering function? In the past, I used gg1 only as a classifier.
By relaxed threshold, I mean to reduce the level of similarity used with vsearch via non-v4-16s.
GG2 filtering computes a set intersection between the features in your table, and the features in the tree. So the features need to exist exactly. In practice, for well studied environments in particular like human and mouse samples, at 90, 100, or 150nt fwd ASVs from V4 relative to the EMP 16S primers, I would expect most ASVs to match and representative of most of the sequence data, as we placed ~20M fragments from over 300,000 public and private microbiome samples.
There exists pre-computed Naive Bayes classifiers that are compatible with the feature-classifier sklearn action in QIIME 2 data resources