Number of inferred OTUs vs SVs

blau · November 6, 2017, 4:02pm

Hi,
I have analyzed multiple datasets in qiime1 and dada2, examining the number of inferred features with each method. I understand that dada2 is expected to find less SVs than qiime1’s OTUs, and I can indeed confirm that I observed that in couple of the datasets. However, I cannot say that it is consistent - I found in multiple of my datasets that the number of features inferred by dada2 is actually greater than the OTUs found by qiime1.

I have tried different manipulations of the setting but did not see any meaningful change.
I was wondering what could be the reason for such phenomena? In what sort of dataset would you expect that to happen? Could it be that in more complex datasets there’s an inflation in the number of SVs?

I’m curios to hear your opinion regarding this issue.
Thanks,
Dor

Nicholas_Bokulich · November 7, 2017, 12:53am

Hi @blau,
dada2 typically produces fewer features than OTU clustering, as the error correction and chimera checking steps employed by dada2 will remove spurious features.

However, dada2 is also more sensitive than OTU picking in that it resolves exact sequence variants (100% OTUs) that would be collapsed together into fewer OTUs during OTU clustering at, say, 97% identity. So under some circumstances (e.g., where sequencing error or chimera occurrence are low), it is conceivably possible for OTU clustering to detect fewer unique features that dada2.

Additionally, the number of features detected by QIIME1 also depends on additional quality filtering steps, e.g., the filters applied during split_libraries_fastq.py and if chimera checking is used. So your results may not reflect those reported by others (if they do not apply the same filtering methods)

@gregcaporaso and @benjjneb may have additional thoughts to add.

I hope that helps!

system · December 8, 2017, 6:53am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.