q2-clawback and fit-classifier-naive-bayes, which is better for training classifier?


As the article https://www.nature.com/articles/s41467-019-12669-6.pdf said, incorporating environment-specific taxonomic abundance information would give a significant increase in the species-level classification accuracy.

In another post, it said In QIIME 2 k-mers were used as the feature for naive-bayes trainer (not sequence counts). So it seems that environment-specific taxonomic abundance information is not essential when using naive-bayes trainer in QIIME 2.

Therefore, I want to ask a noob question, should we use q2-clawback
post to retrain the classifier, which is first trained using fit-classifier-naive-bayes against SILVA database? :man_shrugging:


Hi @nmgduan,
Just to clarify some general points before answering your questions:

  1. q2-clawback allows you to extract and generate taxonomic weights for training classifiers with q2-feature-classifier, so this is not an either/or, it’s a question of whether to incorporate q2-clawback in your pipeline or not.
  2. Taxonomic weights are incorporated during classifier training, you cannot retrain the classifier.
  3. q2-clawback/taxonomic weights are not necessary, but as you read it will improve taxonomic accuracy if you have environment-appropriate taxonomic weights.

That’s right, provided you have appropriate taxonomic weights to use for your environment. You can either follow that tutorial to create them yourself, or grab pre-generated weights for different sample types here: https://github.com/benkaehler/readytowear

No, it’s not essential. If you do not incorporate taxonomic weights, the classifier assumes uniform taxonomic weights.

clawback does not train or retrain the classifier, it just provides you with the appropriate taxonomic weights. You can see the tutorial for more details on how to do this — but if your sample type and reference database are included in the readytowear taxonomic weights then it will save you the time and trouble of using clawback yourself (but if you have other source data or a very specific sample type, assembling your own taxonomic weights will improve accuracy; the more similar your taxonomic weights are to your target samples, the better your classifications will be).

I hope that helps!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.