Differing Feature Identification Results Between QIIME2/DADA2 and QIIME1/UPARSE

This is more a question about troubleshooting and wondering where my pipeline goes wrong. I'm hoping the responses may help me in future projects where this inevitably happens again.

I'm working on the raw data sequenced by a commercial metabarcoding sequencing/analysis firm in 2021. Sequencing was done on a MiSeq with 250 PE sequencing. They ran their analyses on this data and I am attempting to replicate their results.

They used QIIME v1.9.1 to filter, remove primers, and merge PE reads. They then used UPARSE to cluster OTUs at 97%, and remove chimeras.

I used QIIME v2022.2 to import the PE data, then DADA2 to denoise, demultiplex, trim adapters/primers, remove chimeras, and identify features. Following the moving pictures tutorial, I used qiime feature-table summarize on the DAD2 table.qza to look at the features and see where to rarefy the samples.

The sequencing company found OTUs in most samples, and found 1698 unique OTUs. The pe-table-dada.qzv (400.6 KB) from DADA2 identified 26 features, with most samples not having any identifiable features.

I used the default filtering parameters on DADA2, and the methods above are as detailed of methods as I have from the sequencing company. Did I over filter my data, or as you mention in the moving pictures tutorial, might this be a difference between QIIME1 and QIIME2, meaning my dataset is more trustworthy?

It seems like using qiime vsearch cluster-features-de-novo wouldn't produce MORE OTUs than features identified. But might the difference be that I'm comparing features in my dataset to their OTUs?

What else might produce such stark differences in results?

Thanks for your help.

Hi @alexkrohn ,
I think we need more information on what you did for your DADA2 step and before, to be able to help you.
Did you use 'qiime feature-table summarize' to visualize the sequences before the dada2 denoising step? How many sequences are there? Can you share the picture for the quality profiles of your reads?
What 16S region is in the analysis (v4, v3-4 or?) . On the dada2 step, what trimming settings did you use? Do your trimming lengths in dada2 allow for at least 12 bases of overlap between forward and reverse read? My first hypothesis is that the denoising trimming setting you use prevent mostof the sequence to overlap.
If you did use the '--o-denoising-stats stats.qza' you can visualize the number of sequences passing any of the dada2 denoising step.

Hope it helps

Hi @llenzi

You are exactly right. This was a filtering and merging error on my end. I saw that most of the reads were dropping at the merged step. I increased my thresholds for trimming a bit higher, that caused more reads to merge, and maaaaaany more features to be discovered. In fact, I now have ~2600 unique features in my dataset!

Thank you for your help and your suggestion to investigate the denoising stats qzv.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.