Adding reference sequences to feature classifier

Hello qiime2 people!

I am having some troubles classifying my feaures using qiime feature-classifier classify-sklearn.

Since I work with endospore-forming Bacteria, annotating the feature table using GG or silva datasets leads to a lot of sequences not being recognised (endospores do not figure in most environmental studies because of their resistance to DNA extraction).

To solve this problem I downloaded GreenGenes classifier (with taxonomy and reference sequences separated) from here qiime Data resources. I then proceeded add some endospores reference sequences and taxonomy I had. Finally, I tried to build and train my expanded classifier as shown in the "training feature classifier" tutorial.

This invariably fails when I use the command:

qiime feature-classifier fit-classifier-naive-bayes
--i-reference-reads exp-ref-seqs.qza
--i-reference-taxonomy exp-ref-taxonomy.qza
--o-classifier exp-classifier.qza

ValueError: rep_set/99_otus_TSP.fasta is not a QIIME archive

I was wondering whether there is a way to add sequences to a pre-existing classifier without causing qiime2 to freak out. Also, is is a good idea to do so? Could this lead to a bad classification?

Thank you in advance, and apologies if this question was already asked. I Looked around in this forum and elsewhere but - to my surprise - found nothing.

Giacomo

No, you cannot add sequences to a pre-trained classifier.

You have the correct workflow: add the sequences and taxonomy to your fasta and taxonomy files, then import to QIIME 2 and train the classifier as described in the tutorial.

This error does not really fit with the command that you showed. Could you please provide a little more context, maybe also a minimal example (small test QZA files) that reproduce this error?

Absolutely a fine idea — as long as the new sequences are from the same marker gene, it should work fine.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.