how to train my classifier for V3-V4 16SRNA gene region

Not necessarily. You can validate by running against pre-made full-length GreenGenes or SILVA classifiers. If you obtain similar results then it is likely that one or more of the following is occurring:

  • contamination
  • too many off-target taxa were sequenced.
  • your sequence data is in mixed or reverse orientation
    • That is your data is a reverse compliment with respect to the classifier.
    • You an try running qiime feature-classifier classify-consensus-vsearch ... as this does not care about orientation. If you get reasonable results then your sequence orientation is an issue. However, you'll need to fix the mixed orientation issue as any sequence alignment or phylogeny will be inaccurate.
    • If your vsearch results are still not reasonable then I'd suspect one of the prior issues, or something else, is occurring.

You can find other discussions in the forum about these issues.

1 Like