Subsampled open reference OTU clustering

Nicholas_Bokulich · April 5, 2018, 3:33pm

This does sound unusual, particularly if this is on the same exact dataset.

QIIME1 and 2 can essentially perform all the same steps, and as far as I know it looks like you are performing the same series of steps that the QIIME1 pipeline performs (OTU picking, chimera filtering, remove singletons). However, q1 and q2 use entirely different algorithms for OTU picking and chimera filtering, so theoretically there could be differences in performance. The OTU seeding process is also pseudo-random, as far as I know, so can impact the number and centroids of OTUs, leading to some amount of stochasticity (but probably not 3-fold differences in counts).

Make sure you are using the same reference database for open-reference OTU picking and also for chimera filtering if you are using reference-based chimera detection.

Make sure you are also using the same quality filtering steps prior to OTU picking. q2-quality-filter performs qiime1-style quality filtering (make sure the same parameter settings are applied if you wish to replicate).

This tutorial covers different OTU picking strategies with q2-vsearch so might give some insight, though it does not discuss open-reference specifically.

At the end of the day, though, I would generally recommend using denoising methods instead of OTU picking if you can. dada2 and deblur are going to provide much more sensitive information, and better error filtering, than OTU picking (in q1 or q2).

I hope that helps!