Issues assigning taxonomy when using V4V5 region

SoilRotifer · March 15, 2023, 6:41pm

Can you provide more details on how your data was sequenced? Some sequencing facilities provide data in "mixed orientation". Meaning a portion of your reads are flipped in the other direction. When trying to use the naive-bayes classifier on these reads, you'll obtain a lot of poorly or unassigned reads, as your reads are not in the same orientation as the reference sequences. If this is the case, you'll have to find a way to correctly orient your reads prior to classification etc... probably a good idea to do this before denoising.

If that is not the case, and the taxonomic assignments for the other amplicon regions make sense, then it is likely an issue with the V4V5 primers not matching very well to the reference sequences. That is, too many mismatches. Using primers to extract your target region can be an issue in some cases, as we warn here. I recommend trying this approach to building your classifier.