Faulty taxonomic assignment?


Looking at the .csv file exported (genera level) from the taxonomy barplot I got confused: a lot of the taxa are not classified down to genus level, which I understand, but then same taxa classified only to a certain level, let’s say class level, appear twice - e.g.

k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria; __ ; __ ; __ and k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__;f__;g__

And this appears in several instances. Why is that? I am trying to produce a filtered bar plot in excel with taxa having relative abundance >1% and I don’t know if I should add the values for such columns together or not. I didn’t realise this until I was creating a legend for the figure and ended up with two “Alphaproteobacteria ©”, which confused me.


1 Like

Hello Alex,

This also used to happen in Qiime 1. At the time, it meant

The o__ means that there is a best match in Greengenes, but that best match doesn’t have an assignment at the order level.

In contrast, the __; means that there were multiple matches at this level and taxonomy is uncertain.

So I’m curious too: Is this still the case in Qiime 2? Is there official documentation that spells out this meaning for new users?


Hey @Alex_14262 and @colinbrislawn,

In QIIME 2, if you see a __; that just means a visualization is padding out the taxonomy string. It should be treated as not existing (in contrast to the Greengenes prefixed version: o__;).


There is not :frowning:, I also wonder if we should maybe remove the padding altogether, it seems to generate confusion often enough. @thermokarst, any thoughts on that?


Thanks @ebolyen! So for the sake of plotting then, would it be alright to group the two counts together? Essentially, that SV was only classified down to a certain level (e.g. Alphaproteobacteria). Whether further classification was not possible either because there was no match or there was no assignment in Greengenes is not that relevant when presenting results? Unless I am missing something.

Another issue related to this topic… For extracting the reference reads and training the classifier I used the 85% OTU dataset as done in the tutorial. However, I have just realised the reason you used it was to reduce computation time, while I took it for granted… Would that impact the taxonomic assignment further down the line? Should I have used 97% OTUs or 99% OTUs, and where could I find these files to download? I have redone an analysis in qiime 2 that had been done in qiime 1 by my supervisor, and I obtained a lot less taxa assigned to genus level - I thought this might be the reason why.

Hello Alex,

That’s exactly what I do to make the plots more readable.

Yes it would! The 99% OTUs should have better taxonomic resolution and get rid of some of the __; levels.

All the downloads are on the Date Resources page., including silva and greengenes databases at 99% with just the V4 region cut out. I think that’s what you want :slight_smile:

Keep in touch!


Amazing, thanks @colinbrislawn! :blush:

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.