How to get feature ID or sequence from unidentified taxon in taxa bar plot

Hello. There is an unidentified taxon, “Bacteria; p_GN02; c_BD1-5” that differs in abundance across treatments and I want to see the sequence that it is associated with but there are a number of sequences with the same general taxon name in my “taxonomy.qzv” file. Is there any way to get a taxa ID from a taxa bar plot?

Hey there @Lisa_Crummett!

That box on the barplot is all of the features that have the same general taxon name in your taxonomy artifact - that is to say, it isn't necessarily one single feature, but rather all features that have the same annotation at the selected depth (this will depend on the method of classification, the DB used, the representative sequences supplied, etc.).

So it sounds like you already have what you need ("there are a number of sequences") - those are the sequences you are looking for.

Keep us posted! :qiime2:

Thank you for your speedy reply Matt. I have 69 features that have that same generic unidentified taxonomic name in my taxonomy.qzv file. It is possible that only 1 or 2 out of the 69 are driving that pattern of higher abundance in one treatment vs. another seen in my taxon bar plot and it sounds like there is no way to figure out which ones out of the 69 might have the most significant influence since all 69 are assigned the same generic name right? By the way I used the default green genes classifier that was provided in the tutorial. Do you know what the “GN02” or the “BD1-5” might stand for do you? Are they just arbitrary classification codes? Thanks again :slight_smile:

Hello Lisa,

Looks like those strange taxa codes are newly characterize Phyla:

According to the greengenes manuscript, there is some disagreement about this: some argue that GN02 is part of the Phylum BD1-5!

I have no strong options about taxonomy names. :man_shrugging:


which ones out of the 69 might have the most significant influence since all 69 are assigned the same generic name right?

Now this is a good question! While their names are the same and they all get combined in the barplot, you can still do differential abundance testing on them independently to find the feature that are most different between treatment. Check out this plugin:
https://docs.qiime2.org/2018.6/plugins/available/composition/ancom/
Unlike the barplot, ancom won't combine all these mystery features.

I hope that helps,
Colin

2 Likes

Thank you for your feedback Colin. I did perform ANCOM analysis as I was following the Moving Pictures Tutorial and it discussing using ANCOM. I still get that same “Bacteria; p_GN02; c_BD15;o_” name in the ANCOM percentile abundance output but there isn’t a feature ID provided so there is no way for me to find out which “Bacteria; p_GN02; c_BD15;o_” feature or sequence it is associated with in the rep-seqs.qzv file.

I will check out the green genes manuscript that you provided a link to to read more about “Bacteria; p_GN02; c_BD15;o_”.

Cheers,
Lisa

1 Like

Hey there @Lisa_Crummett! Did you try running a feature table filtered to just these features of interest through the feature-table heatmap viz? This would allow you to identify individual features, and their relative frequency.

Thanks Matt. I have not tried this. The command line instructions for the heat map aren’t as straightforward as the ones that you provide in the tutorials but I will give it a try!

Cheers,
Lisa

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.