Input and output file in qiime feature-classifier classify-sklearn different Feature IDs

Hi, I have a problem when classifying sequences. The taxonomy output file has different feature IDs than the input rep-seqs file. Only a few feature IDs are the same. Otherwise the taxonomy file looks fine.
I use the latest version of Qiime2 (2020.2) and run this command:
qiime feature-classifier classify-sklearn --i-classifier silva-132-99-nb-classifier.qza --i-reads B-rep-seqs.qza --o-classification B-taxonomy.qza
I downloaded the pre-trainedSilva reference database from the data resource page on the Qiime2 website (Silva 132 99% OTUs full-length sequences)
I can not find a previous similar issue in the forum, so I am hoping it is a simple mistake on my part. Hopefully someone can help.

Hi @AsaJac, could you please post both of these files here:

You can send via DM if these data need to be kept private

This will help us diagnose. Thanks!

Hi, the B samples worked, while the A samples did not. I have attached the files.

A-rep-seqs.qza (246.7 KB) A-taxonomy.qza (164.8 KB)

1 Like

Hi @AsaJac,
It looks like the issue a mix-up in filenames (or processing parameters).

Take a look at the provenance in those two files (you can view this with… both artifacts come from the same original data, but diverge at the filter-samples step.

Specifically, the rep-seqs.qza were filtered where "[Primers]='A'" but the taxonomy was filtered where "[Primers]='B'", so the samples (and as a consequence the features) will not overlap 100% in the resulting table.

In other words, you filtered your rep seqs two different times and are trying to compare the taxonomy to the wrong rep seqs artifact.

I hope that helps!


That makes sense. I will try and look back in the filtering process, see if I can fix it. I am new to Qiime2, so thank you very much for your assistance, it is highly appreciated.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.