Different OTUs detected using different workflow

Hi all,

I tried to replicate an OTU clustering analysis (pipeline A - using QIIME2). I created one and run it on a mock sample (expected 10 genus only).
When I compared it with another OTU clustering analysis (pipeline B - using QIIME), both are able to get the expected 10 genus although the count number is different.

But when I run it on an actual sample and compared the output from both pipelines, the number of OTUs detected is very different where pipeline A detect 2-3x more OTUs than pipeline B. However, top 10 OTUs is almost similar.

I am wondering if the workflow does play an important role here? Most of the workflow are similar but here are some differences.

Pipeline B:

  • run the reference-based chimera removal first then do de novo clustering

Pipeline A:

  • run quality trim before import into QIIME2 and then --p-min-quality 3 after merge reads.
  • run de novo clustering first then do de novo chimera removal.

I would like to try and reduce the OTU detected in my pipeline in which I believe it could be ambiguous/spurious OTU and not the actual OTU.

Would appreciate for any inputs on this.
Thank you in advance!

Hi @afinaa ,

Those are some very big differences. De Novo vs. Reference-based OTU clustering is like comparing apples and oranges, and with de novo we would indeed expect to see many more OTUs than reference-based OTU clustering, because many of the de novo OTUs will indeed be spurious. See this article for more details:



Hi @Nicholas_Bokulich,

Thank you for the quick reply.

I don't understand what do you mean by that.
Both pipeline are using de novo clustering..? The chimera removal differs though where pipeline A uses de novo method while pipeline B uses reference-based chimera removal.

ah understood, misread the first time. Must be the de novo vs. reference-based chimera removal then. You might also want to bump up the min-quality threshold, this is low (see the paper I linked above for a point of reference):

Good luck!

1 Like