Greengenes 2 confidence threshold

Hi! I am curious whether there are best practices for determining the confidence threshold when using the Greengenes 2 classifier for taxonomic assignments?

We recommend using the phylogenetic taxonomy if using V4. Otherwise, Naive Bayes. I don't have specific recommendations on confidence thresholds for Naive Bayes but am unaware of a reason prior guidance for other resources would differ here.


I think I may have asked the wrong question originally, my apologies. I will try to clarify.

I have a paired end data V4 data that I am trying to classify using Greengenes2. I ended up using the non-v4-16s action, which will perform a closed reference OTU picking against the full length 16S sequences in Greengenes2. According to the QIIME2 doc on "clustering sequences into OTUs using q2-vsearch", clustering is performed at 85% identity against the Greengenes 13_8 85% OTUs reference database. Can I assume that clustering is also performed at 85% against the full length 16S sequences in Greengenes2 and is there a way to increase this percent identity threshold?
^ Update 6/13: Ok, I see that I can choose the percent identity to cluster at using the non-v4-threshold.

As an aside question: is there anything methodologically wrong with clustering my sequences into OTUs for taxonomic classification if the rest of my analyses on alpha and beta diversity are done on ASVs? It feels methodologically wrong to do analyses at two different resolutions, and I am leaning towards not using Greengenes2.
^ Update 6/13: After some searching, it seems like OTU clustering on an ASV table is not technically wrong but not recommended because you will lose resolution (discussion here).


I do apologize for the delay in a generalized fragment insertion method. Note though that a phylogeny based on aligned short reads has the potential to be quite bad, in which case phylogenetic analyses based off recruitment to a backbone could yield more interpretable results.


