meaning of ambiguous taxonomic labels in Greengenes

Hello,
k__Bacteria;p__;c__;o__;f__;g__
k__Bacteria;;;;;__
k__Bacteria;p__Actinobacteria;c__MB-A2-108;o__;f__;g__

I have many of these in my taxa file. I am wondering, if I need to filter out the unassigned ones and also, the ones which are distinguished from others but lacking annotations in some taxa levels. How important it to filter out and is there any way to perform this in Qiime2? Some of them are also assigned with unusual names like DS-18, iii1-8 ,does those numbers denote anything(k__Bacteria;p__Acidobacteria;c__iii1-8;o__DS-18) ? To do further analysis like lefse for differntial abundance, is combining all the taxa levels common or should just go with one of the levels?

qiime feature-classifier classify-sklearn --i-classifier gg-13-8-99-515-806-nb-classifier.qza --i-reads rep_seqs.qza --o-classification taxonomy.qza
I used gg classifier.

Thanks in advance

Hi @Ishanmanandhar,
Please see this topic for more details about what those different labels mean:

I would filter this one but not the others (there is some discussion elsewhere on this forum regarding motivations for this):

Probably uncultivated/unknown clades of bacteria, but that nomenclature is specific to the reference database and is not coming from QIIME 2, so you should contact the authors of those databases for more details on their nomenclature.

Good luck!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.