Subsampled open reference OTU clustering

Hi all,
Previously, I’ve used the subsampled open-reference OTU clustering in qiime1 (Rideout et al. 2014, PeerK) in the past, now I’m updating to qiime2.

To run it in qiime2, I’m following the directions from here: https://docs.qiime2.org/2017.12/plugins/available/vsearch/cluster-features-open-reference/

However, I am not getting a comparable total number of OTUs at the end. For a set of samples that I would expect to get ~6-7,000 OTUs (based on qiime1 results), I’m getting over 23,000 (this is after removing global singletons).

  • What are the differences between qiime1 and qiime2 for this pipeline? I have not found as much documentation on using open reference OTU clustering in qiime2 beside the above link and this link: Script for open_reference_OTU picking in QIIME2

    Clustering step

    qiime vsearch cluster-features-open-reference
    –i-table derep_table.qza
    –i-sequences derep_seqs.qza
    –i-reference-sequences /sting/tax_db/pr2_4.75.qza
    –o-clustered-table openref/table_openref_97.qza
    –o-clustered-sequences openref/rep-seqs_openref_97.qza
    –o-new-reference-sequences openref/new_rep-seqs_openref_97.qza
    –p-perc-identity 0.97
    –p-threads 8

    Chimera detect and filtration

    qiime vsearch uchime-denovo --i-table table_openref_97.qza --i-sequences rep-seqs_openref_97.qza --output-dir uchime_out_dir
    qiime feature-table filter-features --i-table table_openref_97.qza --m-metadata-file uchime_out_dir/nonchimeras.qza --o-filtered-table table_openref_97_nc.qza
    qiime feature-table filter-seqs --i-data rep-seqs_openref_97.qza --m-metadata-file uchime_out_dir/nonchimeras.qza --o-filtered-data rep-seqs_openref_97_nc.qza

This does sound unusual, particularly if this is on the same exact dataset.

QIIME1 and 2 can essentially perform all the same steps, and as far as I know it looks like you are performing the same series of steps that the QIIME1 pipeline performs (OTU picking, chimera filtering, remove singletons). However, q1 and q2 use entirely different algorithms for OTU picking and chimera filtering, so theoretically there could be differences in performance. The OTU seeding process is also pseudo-random, as far as I know, so can impact the number and centroids of OTUs, leading to some amount of stochasticity (but probably not 3-fold differences in counts).

Make sure you are using the same reference database for open-reference OTU picking and also for chimera filtering if you are using reference-based chimera detection.

Make sure you are also using the same quality filtering steps prior to OTU picking. q2-quality-filter performs qiime1-style quality filtering (make sure the same parameter settings are applied if you wish to replicate).

This tutorial covers different OTU picking strategies with q2-vsearch so might give some insight, though it does not discuss open-reference specifically.

At the end of the day, though, I would generally recommend using denoising methods instead of OTU picking if you can. dada2 and deblur are going to provide much more sensitive information, and better error filtering, than OTU picking (in q1 or q2).

I hope that helps!

1 Like

Yes, I’ve been keeping QC steps and reference database consistent. I know results will vary to some degree, but if the algorithm has changed, I would like to know exactly how how (yes, I’ve read through the tutorial you suggest). The qiime1 open reference page had a description of the steps.

At the end of the day, though, I would generally recommend using denoising methods instead of OTU picking if you can. dada2 and deblur are going to provide much more sensitive information, and better error filtering, than OTU picking (in q1 or q2).

Yes. I’m constantly evolving our OTU clustering (and ASV!) approach, but am current trying to compare to what we’ve previously run in q1.

1 Like

You can see the documentation on vsearch (OTU clustering algorithm used in QIIME2) and uclust (default OTU clustering algorithm in QIIME1) for more details.

The open-reference OTU picking pipeline should still follow those same steps, as far as I know, but using vsearch instead of uclust. I would not expect such a dramatic difference, but it is possible.

I hope that helps!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.