Compute relative abundance at different taxonomic levels

I was trying to compute relative abundance at each taxonomy level. So I collapsed the otu table into the each taxonomy level. I found some features are like this: f__; or g__; ,which have no annotation information on specific taxonomy level. How can I filter these features out ? Should I just discard these features and then calculate relative abundance with the remaining features at each each taxonomy level ?

Hi @qindan,

Greengenes does not annotate those sequences because multiple sequences with different annotations at that taxonomic level were clustered into the same OTU.

I would not recommend discarding these. After all, these are presumably valid sequences in your dataset, they just have the misfortune of being most closely related to ambiguous OTUs. Discarding these will distort the relative abundances in your samples and remove potentially important information.

You could try to:

  1. Use the 99% OTUs instead of 97% for classification (maybe you have already. This won't fix the problem 100%, but the annotations may be a little more specific)
  2. Use SILVA reference database instead

If you must, see this tutorial

I hope that helps!

Thank you very much for your reply. I will try it.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.