Major differences in alph diversity between q1 and q2

Following this posted question in the qiime1 forum, I wanted to ask if such differences are acceptable?

I also see rather similar results - I compared alpha diversity of 2 libraries (~170 stool samples) and while qiime1 shows observed species count of 700 (in average) qiime2 gives an average of 200.

Now, Colin Brislawn in his answer to Meirav, says this differences are expected due to the different OTU picking implementations between q1 and q2. But, to my understanding, the number of species should remain the same, and changes in the OTU picking implementation cannot (or should not) cause such major changes to it.

My guess was that something is wrong with the way I used q2. But since
I followed the “moving pictures tutorial” step by step, using the DADA2 option (since I couldn’t use Deblur on paired end reads), I really don’t know where could I get it wrong…

Is there a published benchmarks comparison I can refer to?


Hi @uria,
The short answer: yes, these differences are expected, sensible, and acceptable.

@colinbrislawn’s answer on the qiime1 forum does a great job of summarizing the cause of this discrepancy — the methods used by that user (and yourself) in qiime1 and qiime2 are very different — but does not go into the nitty gritty details. Let’s discuss those here.

The long answer: The differences that you are observing are not differences between qiime1 and qiime2, strictly speaking. Each of these implement a number of different methods for OTU picking and denoising — and the methods that you/Meirav happen to be comparing are external methods that are wrapped in qiime1 and qiime2, which happen to be very, very different approaches. (and I should note that qiime2 does implement a number of OTU picking methods via the q2-vsearch plugin, which should approximately replicate the OTU picking methods in qiime1)

The “qiime1” method that you are using is probably uclust for OTU picking, which does not perform any quality control in and of itself — it is only clustering sequences at a defined similarity threshold. Other methods, such as chimera filtering and additional QC, should be performed downstream.

The “qiime2” method that you are using, dada2, performs error profiling and chimera filtering to remove erroneous sequences before dereplication, generating “actual sequence variants” (ASVs), essentially 100% OTUs with errors removed.

Out of the box, both of these methods will (generally) give very different OTU/ASV counts. OTU-picked data should be augmented with additional chimera filtering and QC (e.g., with q2-quality-filter) to yield results more similar to dada2.

No, not really. Even if your data contained zero sequencing errors and these methods were 100% accurate (and nothing is ever perfect! :wink:), they are still providing very different outputs. Your qiime1 OTUs are probably clustered at 97% similarity at trimmed to different lengths (where quality drops off), whereas your qiime2 dada2 seqs are clustered at 100% similarity and probably trimmed to the same lengths.

So in a perfect world (or at least low-error sequencing run followed by best practices), OTU picking would actually yield fewer observed “species” than dada2, as some forum users have reported.

Not at all :smile:. dada2 is giving you the expected (and typical) lower ASV count compared to OTUs (especially if additional QC has not been applied to those OTUs).

The best reference to read and cite would be the dada2 publication. That does not include a thorough comparison of different OTU methods and other denoising methods/parameters (e.g., other approaches supported in QIIME2), so further benchmarks are forthcoming, but the results in that paper should really put your concerns to rest, since it demonstrates the differences between dada2 and qiime1-style OTU picking (without stringent additional QC, if I recall correctly). :sleeping_bed:

I hope that helps! :tractor: :sunflower: :sunflower: :sunflower:


Thanks a lot,

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.