different taxa result between deblur and dada2

Kry4tle · September 25, 2020, 12:03pm

Hi everyone! I am a new user, and I hope to discuss the following problems with you!

(My qiime2 version is 2020.6, platform is Ubuntu 18.04.)

I used both dada2 and deblur to denoise the samedata. After annotation, I found that the results were quite different (as follow.

demux result is like this

I kept the 10-220 forward reads and 0-220 reverse read to proceed the dada2, the result goes like

I kept the 0-300 joined-read to proceed the deblur, the result goes like

I tried a lot of lengths of cuts but the results didn't vary much, it showed defferent beween dada2 and deblur after feature classifier. I guess it has something to do with the classifier I chose. I sequenced the V3-V4 16s RNA region but used the full length classifier posed on qiime2 official website "silva-138-99-nb-classifier.qza".

I would like to ask if the reason why Dada and Deblur are very different is the problem of my classifier selection or the problem of data processing.

##################

Here is the code I use to proceed with the data

qiime vsearch join-pairs
--i-demultiplexed-seqs paired-end-demux.qza
--o-joined-sequences demux-joined.qza

qiime quality-filter q-score-joined
--i-demux demux-joined.qza
--o-filtered-sequences joined-filtered.qza
--o-filter-stats joined-filtered-stats.qza

time qiime deblur denoise-16S
--i-demultiplexed-seqs joined-filtered.qza
--p-trim-length 300
--p-left-trim-len 0
--p-min-reads 10
--p-jobs-to-start 60
--o-representative-sequences rep-seqs.qza
--o-table table.qza
--p-sample-stats
--o-stats deblur-stats.qza

time qiime dada2 denoise-paired
--i-demultiplexed-seqs paired-end-demux.qza
--p-trim-left-f 10
--p-trim-left-r 0
--p-trunc-len-f 220
--p-trunc-len-r 220
--o-representative-sequences rep-seqs-dada2.qza
--o-table table-dada2.qza
--p-n-threads 0
--o-denoising-stats denoising-stats.qza

qiime feature-classifier classify-sklearn
--i-classifier /home/dell/micro/database/2020.6/silva-138-99-nb-classifier.qza
--i-reads rep-seqs.qza
--p-n-jobs 60
--o-classification taxonomysslivaFL.qza

Waiting for your help!
thanks and hope you have a wonderful day !!
Kry4tle

ChrisKeefe · October 13, 2020, 2:15am

Thanks for your patience, @Kry4tle! I've reclassified this as a General Discussion question because it's not so much about a problem encountered in QIIME 2 analysis as it is about how to interpret two different outcomes. Please feel free to clarify your question and change the category back if you think this interpretation is incorrect.

Now, on to the good stuff!

I'm assuming you used the same pre-trained classifier for both DADA2 and Deblur outputs. I'm not sure exactly how that classifier does what it does under the hood, but my gut feeling tells me that if you gave it two sets of very similar data, it would probably produce very similar results. If not, there's a problem somewhere.

Working from that idea, I suspect the data you're getting from DADA2 and Deblur are pretty different. Your Deblur sequences are 300 bp long, while your DADA2 sequence lengths may show some variation, and will depend on how much your sequences overlap (and therefore the target amplicon length). In addition, the two algorithms handle sequencing errors very differently. DADA2 attempts to correct those errors, and groups the resulting like sequences. Deblur, IIRC, drops erroneous sequences over a given threshold. You have more than 20k more sequences from DADA2 in sample B1, and those sequences may be slightly different from the ones produced by Deblur, given their different approaches.

It's worth looking both papers over. Benchmarks show they both perform quite well, but one approach might be a better fit for your study, or may approach denoising in a way you like better.

Happy :qiime2:-ing!
Chris

system · November 13, 2020, 8:15am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.