Filtering 18S taxonomy table trained by Silva


I am trying to filter taxa from my 18S table by keyword. I’m classifying taxonomy using:

qiime feature-classifier classify-sklearn
–i-classifier silva-132-99-nb-classifier.qza
–i-reads rep-seqs.qza
–o-classification taxonomy.qza
–p-n-jobs 5

Then, I’m filtering the table using:

qiime taxa filter-table
–i-table table.qza
–i-taxonomy taxonomy.qza
–p-exclude Archaeplastida,Arthropoda,Chordata,Mollusca,Unassigned,Bacteria
–o-filtered-table filtered-table-2.qza

However, it is failing to remove all of the taxa as it should based on the keywords. I’ve done some digging, and I think the problem is that not all of the levels are named in the taxonomy table - it skips levels. For example, one taxon that should be filtered but isn’t is written in the table as
D_0__Eukaryota;D_1__Opisthokonta;D_2__Holozoa;D_3__Metazoa (Animalia);D_10__Neoptera;D_11__Coleoptera.

It should be filtered out by the Arthropoda keyword, but it skips that level of the taxonomy in the label. I think this is probably because I’m using a Silva classifier (since it’s 18S), which has many more levels than greengenes. Is there something I should be including in my feature classifier step that gets it to maintain the full taxonomy? Or is there any other fix you can suggest besides individually typing each species that I want removed?

Thanks so much!

Good afternoon,

You have found the problem all right:

The filter just looks at the levels that were used to train the silva-132-99-nb-classifier.qza, so you can only filter using those taxonomy names. :frowning_face:

That's a great idea! By downloading the full silva database and training a new classifier, you could get all the levels in Silva.

The easy solution would be to use a level included in the pre-built classifier, but you can only get exactly Arthropoda by using your own classifier.


Great, I’ll try that! Just to make sure I do this right, though, is there something specific that I would need to do differently (compared to how the pre-trained classifiers were made) to make sure that it maintains the full taxonomy? Because, to me, it seems like the problem might be in the classification step as opposed to the classifier itself - it is assigning taxonomy at high resolution, it’s just not saving the whole taxa name in the taxonomy table it makes.

1 Like

The problem is from the database, not from the classifier. In the SILVA 7-level taxonomy file not all sequences have the same taxonomy levels shown, so those taxonomic levels are missing in the raw data. The classifier can't report those levels if they are not in the raw data...

However, using the full database will result in an error with the classify-sklearn method, since it has an uneven number of taxonomic levels. So you will either need to use another classification method, such as classify-consensus-vsearch, or figure out how to fix the 7-level taxonomy!

1 Like

Thanks so much for your help! I was able to train my own classifier with the full SILVA taxonomy instead of the SILVA-7 taxonomy and that solved my problem. I hadn’t realize that the pre-trained classifier used the 7-level SILVA, so that was really my problem.

Thanks again! :blush: :tada:


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.