I tried to replicate an OTU clustering analysis (pipeline A - using QIIME2). I created one and run it on a mock sample (expected 10 genus only).
When I compared it with another OTU clustering analysis (pipeline B - using QIIME), both are able to get the expected 10 genus although the count number is different.
But when I run it on an actual sample and compared the output from both pipelines, the number of OTUs detected is very different where pipeline A detect 2-3x more OTUs than pipeline B. However, top 10 OTUs is almost similar.
I am wondering if the workflow does play an important role here? Most of the workflow are similar but here are some differences.
- run the reference-based chimera removal first then do de novo clustering
- run quality trim before import into QIIME2 and then
--p-min-quality 3after merge reads.
- run de novo clustering first then do de novo chimera removal.
I would like to try and reduce the OTU detected in my pipeline in which I believe it could be ambiguous/spurious OTU and not the actual OTU.
Would appreciate for any inputs on this.
Thank you in advance!