Repeated classifications in the Greengenes taxonomy file

When I tried to go deeper into the details, I found many Greengene IDs have the same classification in the taxonomy file. For example, I found these Greengene IDs in the 97%-similarity taxonomy file:

3002161 k__Bacteria; p__Firmicutes; c__Erysipelotrichi; o__Erysipelotrichales; f__Erysipelotrichaceae; g__[Eubacterium]; s__dolichum
539581 k__Bacteria; p__Firmicutes; c__Erysipelotrichi; o__Erysipelotrichales; f__Erysipelotrichaceae; g__[Eubacterium]; s__dolichum
1143784 k__Bacteria; p__Firmicutes; c__Erysipelotrichi; o__Erysipelotrichales; f__Erysipelotrichaceae; g__[Eubacterium]; s__dolichum
4396877 k__Bacteria; p__Firmicutes; c__Erysipelotrichi; o__Erysipelotrichales; f__Erysipelotrichaceae; g__[Eubacterium]; s__dolichum

My questions are: if these Greengene IDs are the rRNA from the same species, how can they have sequences that are so different, such that they formed different OTUs under a similarity of 97%? Does QIIME2 take it into account if some of my sequencing reads are mapped to multiple OTUs where the representative sequences belong to the same species?

Thanks!!

Hi @Yang_Liao!

I'll let @wasade, the Greengenes maintainer, get back to you on that!

QIIME 2 will only take your features' taxonomic annotations into account when performing taxonomy-based analyses. Otherwise, when you provide a feature table to QIIME 2, the features will be treated independently of their taxonomic annotations. The features may not be treated completely independently, for example, if you're using UniFrac or Faith's Phylogenetic Diversity metrics you'll provide a phylogenetic tree describing the evolutionary relationship between your features. But all of this is based on sequence similarity and phylogenetics and doesn't consider taxonomic annotation.

If you're interested in performing taxonomy-based analyses in QIIME 2, you can use qiime taxa barplot to visualize the taxonomic composition of your samples, and in this case QIIME 2 will combine all features with the same taxonomic annotation (at a given taxonomic level). With qiime taxa collapse, you can create a feature table that has your original features combined at a given taxonomic level. Then you can use this "collapsed" feature table to perform other analyses in QIIME 2 that operate generically on feature tables.

2 Likes

Hi @Yang_Liao,

It’s possible there is reasonable copy number variation, the source annotations were in accurate, or the sequences contained errors or low level chimeras which passed the filters. My guess is the latter two are more likely. The original annotations used as input to the taxonomy decoration were from the Genbank records, and we cannot independently verify their isolates. What the phylogeny suggests is this group is polyphyletic, and the taxonomy is ultimately based on the phylogeny.

Best,
Daniel

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.