Adding reference sequences to feature classifier

Hello qiime2 people!

I am having some troubles classifying my feaures using qiime feature-classifier classify-sklearn.

Since I work with endospore-forming Bacteria, annotating the feature table using GG or silva datasets leads to a lot of sequences not being recognised (endospores do not figure in most environmental studies because of their resistance to DNA extraction).

To solve this problem I downloaded GreenGenes classifier (with taxonomy and reference sequences separated) from here qiime Data resources. I then proceeded add some endospores reference sequences and taxonomy I had. Finally, I tried to build and train my expanded classifier as shown in the “training feature classifier” tutorial.

This invariably fails when I use the command:

qiime feature-classifier fit-classifier-naive-bayes
–i-reference-reads exp-ref-seqs.qza
–i-reference-taxonomy exp-ref-taxonomy.qza
–o-classifier exp-classifier.qza

ValueError: rep_set/99_otus_TSP.fasta is not a QIIME archive

I was wondering whether there is a way to add sequences to a pre-existing classifier without causing qiime2 to freak out. Also, is is a good idea to do so? Could this lead to a bad classification?

Thank you in advance, and apologies if this question was already asked. I Looked around in this forum and elsewhere but - to my surprise - found nothing.


No, you cannot add sequences to a pre-trained classifier.

You have the correct workflow: add the sequences and taxonomy to your fasta and taxonomy files, then import to QIIME 2 and train the classifier as described in the tutorial.

This error does not really fit with the command that you showed. Could you please provide a little more context, maybe also a minimal example (small test QZA files) that reproduce this error?

Absolutely a fine idea — as long as the new sequences are from the same marker gene, it should work fine.

