Differences in depths of taxonomic assignments within dataset?

I have a question about taxonomic assignments using the feature classifier and how the depth is labeled for some taxonomic assignments.

I’ve noticed that in both my data processing and in the “Moving Pictures” tutorial, the taxon labels seem to not be consistent, even if the assignment is down to the same taxonomic depth.

For instance, in the “Moving Pictures” taxonomy file, Feature ID 02ef9a59d6da8b642271166d3ffd1b52 is assigned “k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Ruminococcaceae; g__; s__” and Feature ID 73291cac0e802b6a1fb25ae7079390ef is assigned “k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Ruminococcaceae”.

My interpretation of these assignments is that both Feature IDs have been assigned to the family Ruminococcaceae, but the assignment cannot go more specific than the family level. Why does one Feature ID get an additional “g__; s__” but the other does not?

I’m doing some downstream analysis in R using phyloseq, and I think I may run into some issues with taxonomic ranks if some taxonomic assignments have additional labelling.

Is the solution to export the taxonomy file to .tsv (which I’m already doing to create a phyloseq-importable .biom file), and then make the taxonomic labeling consistent (i.e. add "p__; c__; o__; etc all the way down to s__;)?

Thanks in advance!


Hi @Nick,
Thanks for posting!

This is not an issue with consistency or anything in QIIME2, but a quirk in Greengenes annotations as discussed in this post. The one feature can only be classified to family level. The other feature is successfully classified to species level, but to a taxon that has no genus or species level affiliation.

One solution is to collapse to a taxonomic level of interest. E.g., you can collapse at species level — any features classified to these groups would then be collapsed into a single feature. The taxonomic label containing g__;s__ would be unchanged, but the other label (only classified to f__Ruminococcaceae) would become:
k__Bacteria; p__Firmicutes; c__Clostridia; o__Clostridiales; f__Ruminococcaceae; __; __

Because the empty levels automatically become filled out.

Yes, that would be the only other solution if you need even taxonomic labels in phyloseq and collapsing features is a problem.

I hope that helps!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.