Greengenes 2 confidence threshold

Emily_Yu · May 30, 2024, 4:52pm

Hi! I am curious whether there are best practices for determining the confidence threshold when using the Greengenes 2 classifier for taxonomic assignments?

wasade · May 30, 2024, 6:31pm

Hi @Emily_Yu,

We recommend using the phylogenetic taxonomy if using V4. Otherwise, Naive Bayes. I don't have specific recommendations on confidence thresholds for Naive Bayes but am unaware of a reason prior guidance for other resources would differ here.

Best,
Daniel

Emily_Yu · June 7, 2024, 9:42pm

6/13 Update: I think all my questions have been answered either from other threads or some extra looking, thanks!

Thank you!

I think I may have asked the wrong question originally, my apologies. I will try to clarify.

I have a paired end data V4 data that I am trying to classify using Greengenes2. I ended up using the non-v4-16s action, which will perform a closed reference OTU picking against the full length 16S sequences in Greengenes2. According to the QIIME2 doc on "clustering sequences into OTUs using q2-vsearch", clustering is performed at 85% identity against the Greengenes 13_8 85% OTUs reference database. Can I assume that clustering is also performed at 85% against the full length 16S sequences in Greengenes2 and is there a way to increase this percent identity threshold?
^ Update 6/13: Ok, I see that I can choose the percent identity to cluster at using the non-v4-threshold.

As an aside question: is there anything methodologically wrong with clustering my sequences into OTUs for taxonomic classification if the rest of my analyses on alpha and beta diversity are done on ASVs? It feels methodologically wrong to do analyses at two different resolutions, and I am leaning towards not using Greengenes2.
^ Update 6/13: After some searching, it seems like OTU clustering on an ASV table is not technically wrong but not recommended because you will lose resolution (discussion here).

wasade · June 13, 2024, 8:06pm

Hi @Emily_Yu,

I do apologize for the delay in a generalized fragment insertion method. Note though that a phylogeny based on aligned short reads has the potential to be quite bad, in which case phylogenetic analyses based off recruitment to a backbone could yield more interpretable results.

Best,
Daniel

Emily_Yu · June 13, 2024, 8:20pm

No worries at all! Sometimes I find that I am not searching for the right questions/keywords in in the QIIME forum so threads that have answers to my questions don't pop up.

Thanks, I will keep that note about phylogenies in mind