taxa collapse and Greengenes2 taxonomy confusion

Hello everyone,

I have run taxonomy with following command..

qiime feature-classifier classify-sklearn --i-classifier 2022.10.backbone.full-length.nb.qza --i-reads rep-seqs.qza --o-classification taxonomy.qza

I have run taxa-collapse on the taxonomy.qza to get the taxa level (collapsed) tables.

qiime taxa collapse --i-table table.qza --I taxonomy taxonomy.qza --p-level 6 --o-collapsed-table genus_table.qza

I am confused about the taxa levels in the genus table for the ones where no genus has been assigned. Why are these in the genus table? And why are they ending differently?

These are the different levels I am getting in the genus_table where they don't have a genus name rather they are ending in one of the following ways

1. Family;__ 
2. Family;g__
3. Order;__;__
4. Order;f__;g__
5. Class;__;__;__
6. Class;o__;f__;g__
7. d__Bacteria;__;__;__;__;__

Can someone please help me with the nomenclature. How to interpret these? How are 1 & 2, 3 & 4, 5 & 6 different, when both are ending in family, order and class names respectively.

Below are some examples for these from the genus table.

d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales_A_737866;f__Enterobacteriaceae_A;__
d__Bacteria;p__Firmicutes_D;c__Bacilli;o__RFN20;__;__
d__Bacteria;p__Firmicutes_D;c__Bacilli;o__RF39;f__UBA660;g__
d__Bacteria;p__Firmicutes_D;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;__
d__Bacteria;p__Firmicutes_D;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;__
d__Bacteria;p__Firmicutes_D;c__Bacilli;o__Lactobacillales;__;__
d__Bacteria;p__Firmicutes_D;c__Bacilli;o__Erysipelotrichales;f__Coprobacillaceae;g__
d__Bacteria;p__Firmicutes_C;c__Negativicutes;o__Veillonellales;f__Megasphaeraceae;__
d__Bacteria;p__Firmicutes_C;c__Negativicutes;__;__;__
d__Bacteria;p__Firmicutes_A;c__Clostridia_258483;o__TANB77;f__CAG-508;g__
d__Bacteria;p__Firmicutes_A;c__Clostridia_258483;o__Peptostreptococcales;f__Peptostreptococcaceae_256921;__
d__Bacteria;p__Firmicutes_A;c__Clostridia_258483;o__Peptostreptococcales;f__Anaerovoracaceae;g__
d__Bacteria;p__Firmicutes_A;c__Clostridia_258483;o__Peptostreptococcales;f__Anaerovoracaceae;__
d__Bacteria;p__Firmicutes_A;c__Clostridia_258483;o__Oscillospirales;f__Oscillospiraceae_88309;__
d__Bacteria;p__Firmicutes_A;c__Clostridia_258483;o__Oscillospirales;f__Butyricicoccaceae;g__
d__Bacteria;p__Firmicutes_A;c__Clostridia_258483;o__Oscillospirales;f__Acutalibacteraceae;g__
d__Bacteria;p__Firmicutes_A;c__Clostridia_258483;o__Lachnospirales;f__Lachnospiraceae;g__
d__Bacteria;p__Firmicutes_A;c__Clostridia_258483;o__Lachnospirales;f__Lachnospiraceae;__
d__Bacteria;p__Firmicutes_A;c__Clostridia_258483;o__Lachnospirales;f__Anaerotignaceae;g__
d__Bacteria;p__Firmicutes_A;c__Clostridia_258483;o__Christensenellales;f__CAG-74;g__
d__Bacteria;p__Firmicutes_A;c__Clostridia_258483;o__Christensenellales;f__;g__
d__Bacteria;p__Firmicutes_A;c__Clostridia_258483;o__Christensenellales;__;__
d__Bacteria;p__Firmicutes_A;c__Clostridia_258483;o__;f__;g__
d__Bacteria;p__Cyanobacteria;c__Vampirovibrionia;o__Gastranaerophilales;f__Gastranaerophilaceae;__
d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Muribaculaceae;__
d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;__
d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;__;__
d__Bacteria;p__Actinobacteriota;c__Coriobacteriia;o__Coriobacteriales;f__Eggerthellaceae;__
d__Bacteria;p__Actinobacteriota;c__Coriobacteriia;o__Coriobacteriales;f__Atopobiaceae;g__
d__Bacteria;p__Actinobacteriota;c__Coriobacteriia;o__Coriobacteriales;__;__
d__Bacteria;__;__;__;__;__

Hi @Alka_Srivastava,

Would it be possible to share the taxonomy.qza file?

Best,
Daniel

Hello @wasade,

Pleas find the taxonomy.qza.
taxonomy.qza (83.4 KB)

Thanks
Alka

Thanks, @Alka_Srivastava. The taxonomy file seems well formed, so I suspect there may be a bug in the collapse method of q2-taxa. I'll send an internal inquiry about it.

Best,
Daniel

Thanks @wasade.

I also want to bring to your notice that in the taxonomy.qza file, there are few such elements as well. Shown below are four of the representatives:

d__Bacteria; p__Firmicutes_D; c__Bacilli; o__Erysipelotrichales; f__Coprobacillaceae; g__; s__
d__Bacteria; p__Firmicutes_A; c__Clostridia_258483; o__Oscillospirales; f__CAG-272; g__RUG13077; s__
d__Bacteria; p__Firmicutes_A; c__Clostridia_258483; o__Christensenellales; f__; g__; s__
d__Bacteria; p__Firmicutes_A; c__Clostridia_258483; o__; f__; g__; s__

Warm Regards
Alka

Thanks, @Alka_Srivastava! Those would be indicative of unidentified genera or families, etc. But, the lack of classification is in part from feature classification with Naive Bayes, and not specific to Greengenes2.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.