Weird letters and numbers after taxa ID - f__Bacillaceae_H_294103

Dear qiimers,

First of all I'm using qiime2-amplicon-2023.9, with a single-end approach (using only the forward reads).

I'm having some issues regarding my taxonomy, which I've generated using a pre-trained classifier Greengenes2 ( Greengenes2 2022.10 full length sequences).

I've done my analysis without big concerns, although when I ran ANCOM-BC I noticed that I have different taxa IDs with weird letters and numbers, for example: f__Bacillaceae_H_294103; f__Bacillaceae_G_310392; f__Burkholderiaceae_A_592522

I want to analyze possible microbe interactions, ANCOM-BC gave me that f__Bacillaceae_H and f__Bacillaceae_G have different kinds of differential abundance on Wolbachia (the taxa that I'm investigating).

I don't know how different are these taxa? If they are very near, it's possible to group them in a single taxa? What are these numbers?

Here are ANCOM-BC, taxonomy, rep-seqs, and feature-table files:

barplot-family.qzv (340.7 KB)
taxonomy-gg.qzv (1.8 MB)
rep-seqs-single_2.qzv (701.6 KB)
Jess_collapsed_table_level_5.qza (192.7 KB)

1 Like

Hi there,

Without entering in the biological part of your issue (because I don't know whether those two Bacillaceae are the same, and also I'm not familiar with Greengenes), if you think that those should be annotated as the same taxa you can always use the QIIME2 plugin RESCRIPt to relabel them as in this section of the RESCRIPt tutorial.

I haven't tried it, but your command should look something like:

qiime rescript edit-taxonomy \
    --i-taxonomy my-taxonomy.qza \
    --p-search-strings f__Bacillaceae_H_294103;  f__Bacillaceae_G_310392; \
    --p-replacement-strings f__Bacillaceae; f__Bacillaceae; \
    --o-edited-taxonomy my-taxonomy-fixed.qza

If you have a lot of IDs to change you can also follow tutorial instructions to create a replacements metadata file and feed the command with it.

Best wishes :dolphin:


Disclaimer: I'm just another user, just like you. Please don't take my answer as a ground truth. A Forum Moderator would probably provide you with a more accurate answer.


Hi @joaomiranda and @salias,

Remember that taxonomy ≠ phylogeny most places, and just because two bacteria are given the same name doesn't mean that they're closely related.

To rectify this in GTDB, they added a "lineage" designation. So, f__Bacillaceae_H and f__Bacillaceae_G are seperate, monophyletic lineages that both have the name "f__Bacillaceae". I'm less sure about the numbers.

I would nopt personally recommend modifying them to the same clade (there is a "f__Bacillaceae" clade somewhere) because you lose information. This is especially true if you're seeing different behavior. FWIW, I tend to report things like, "ASVs from the X lineage of Y..." and then let people sort out the taxonomy in the wash.



Hi @joaomiranda and @salias,

More information on the Greengenes2 taxon labels can be found here. Briefly, as @jwdebelius notes, the labels in Greengenes2 map to the phylogeny and are effectively clade coordinates. Clade specific identifiers are used to disambiguate names which are not supported as monophyletic in the phylogeny.



Oh, I see it now. Thank you so much for the clarifications @jwdebelius and @wasade . I was not aware of the way Greengenes differentiates lineages, so I answered with what you can do if those lineages are different (that is, nothing) or if they are the same (use RESCRIPt).

Off-topic: since these last days I asked for a couple of things in the forum (and I obtained really good answers from moderators) I felt like I should give something to the community in exchange so I started to answer some new questions in my own. I think from now on I'm going to add a sentence at the end of my answers clarifying that I'm just another user and maybe my answer needs to be "peer-reviewed" by a moderator :point_right:t4: :point_left:t4:



Thanks for your help around the forum! We are all learning here!
Happy :qiime2:ing!


Hi @salias , @jwdebelius and @wasade,

I'm very grateful for your insights! Both RESCRIPt and the explanation about the clades would be useful to my future analyses :smile: