What should be reported as no of OTUs

I have very basic queries-

  1. what should be reported in number of OTUS for samples?
    I got confused between these outputs-sequence counts in table.qzv or observed otus.qza?
    It would be really nice if I can understand how the number of OTUs are reflected.

Sequence counts in table.qzv are the number of actual reads that survive filtering, regardless of how they are assigned to features. You should also be able to determine the total number of OTUs off this, based on the feature count data.

Observed OTUs (observed_otus.qza) is a number of OTUs observed in each sample. However, it’s generally recommended that you normalize your data before doing diversity calculations. Rarefaction is the current standard (although there’s some interesting stuff coming out).

The data is resampled so a number of representative sequences are drawn from each sample (say 5000 from each sample) and the number of OTUs at that depth is calculated. Since the number of OTUs is a function of the sequencing depth, it’s not valid to compare the diversity in a sample with 5000 sequences/sample with the number of OTUs in a sample with 50,000 sequences/sample.


Thank you for the prompt reply.
I understood the explanation well, but then it is also stated that the feature table(table.qzv
) of QIIME2 is analogous to OTU table of QIIME1.
Secondly,cant we determine the observed OTUS without sequencing depth? I am slightly getting confused with this concept as still obtaining OTus with sequencing depth will still vary and relative.
I am looking forward to understand what are the absolute number of OTUs which can be reported or published.
Thank you.

The feature table is analogous to the OTU table in QIIME 1, that’s correct. The output of qiime feature-table summarize is analogous to the biom summarize-table.
In some papers, I’ve seen a count of total OTUs published, which you can determine from the number of features field in the summary. This is not a per-sample OTU description, but a description of the over-all recruitment. I think this is a fine number to publish as is.

However, per-sample diversity is a different question. While it is among many things its technically possible to do, its not terribly advisable.

As a toy example, imagine that I have a sample and I have 3 replicates of a mock community where I put in 100 unique species in an even mixture. I got 10 sequences in R1, 100 in R2, and 10,000 in R3. In R1, I can detect, at most, 10 features. In R2, I could theoretically detect all 100, but only if I was very lucky and each feature was sequenced. In R3, I can detect all 100 with confidence.

This idea is illustrated by a rarefaction curve, where the same sample will have a different number of observed OTUs based on the sequencing depth (Check out the example from the Moving pictures tutorial).

So, when I report a number of Observed OTUs in each sample, (or the PD whole tree diversity, or even the shannon diversity), I always report it with a rarefaction depth. That way, readers can be sure that I was making my comparisons using the same scale. Otherwise, they have no way of knowing whether the diversity differences are due to different sequence depths or something biologically interesting about the community.


This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.