Looking at the .csv file exported (genera level) from the taxonomy barplot I got confused: a lot of the taxa are not classified down to genus level, which I understand, but then same taxa classified only to a certain level, let’s say class level, appear twice - e.g.
k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria; __ ; __ ; __ and k__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__;f__;g__
In QIIME 2, if you see a __; that just means a visualization is padding out the taxonomy string. It should be treated as not existing (in contrast to the Greengenes prefixed version: o__;).
Exactly!
There is not , I also wonder if we should maybe remove the padding altogether, it seems to generate confusion often enough. @thermokarst, any thoughts on that?
Thanks @ebolyen! So for the sake of plotting then, would it be alright to group the two counts together? Essentially, that SV was only classified down to a certain level (e.g. Alphaproteobacteria). Whether further classification was not possible either because there was no match or there was no assignment in Greengenes is not that relevant when presenting results? Unless I am missing something.
Another issue related to this topic… For extracting the reference reads and training the classifier I used the 85% OTU dataset as done in the tutorial. However, I have just realised the reason you used it was to reduce computation time, while I took it for granted… Would that impact the taxonomic assignment further down the line? Should I have used 97% OTUs or 99% OTUs, and where could I find these files to download? I have redone an analysis in qiime 2 that had been done in qiime 1 by my supervisor, and I obtained a lot less taxa assigned to genus level - I thought this might be the reason why.
That's exactly what I do to make the plots more readable.
Yes it would! The 99% OTUs should have better taxonomic resolution and get rid of some of the __; levels.
All the downloads are on the Date Resources page., including silva and greengenes databases at 99% with just the V4 region cut out. I think that's what you want