dada2 vs deblur in relative frequency tables

pau · April 27, 2020, 4:49pm

Hi everyone!
I'm having trouble in explaining some diferences in the same process, but comparing dada2 and deblur as denoising methods.
Dataset (16S rRNA, illumina MiSeq) comes from 3 different runs (60 samples) and the pipeline has been exactly the same except for denoising (dada2 and deblur). After importing the sequences, two denoising methods have been performed. After, two taxonomy.qza have been created from dada2 and deblur rep-seqs using sklearn v3v4-trained classifier and relative-frequency-level-6-collapsed-tables have been done (for 2 methods). After, I chose only those taxa that were in both tables (a core of 500 taxa aprox, more than 99% of total relative frequency both in dada2 and deblur) and from there I extracted unclassified taxa at level 6 (200 taxa didn't classify at level 6 and were in dada2 and deblur tables). When looking at relative frequency of these taxa I saw that they meant a 35% of deblur-relative-freq-table and 44% in dada2! While the unclassified taxa that were found exclusively in one table didn't pass 0,5% in both cases!
How can this be possible? I have done this process to see from which denoising method I get better taxonomy assignment!
I hope it's understandable...
Thanks a lot in advance!

Mehrbod_Estaki · April 27, 2020, 6:53pm

Hi @pau,
There are numerous places in your pipeline where the data might start differentiation between each other and we would need a lot more information regarding your data to give more concrete answers, but to get us started here are some things you should consider right off the bat:

Are these single-end or paired-end data? I'm guessing PE if you are using V3-V4 trained classifier. But just double checking. Longer reads like those of V3V4 cause Deblur to conservatively remove more reads than DADA2 (in general). If you wanted a true comparison between the two methods, I would suggest trimming both outputs to a shorter length like 150-250, but this might not be desirable?
Are you running dada2 denoising 3 different times for each run and then merging the output? If not, you should be. Keeping in mind that your trim parameters need to be the same across all 3 runs. This isn't required for Deblur, you can run them all at once.
How are you pre-mering your reads before Deblur? Are you running any quality score based filtering before Deblur? Different filtering and merging processes can lead to various results.
What are your trim/truncating parameters across these runs (deblur and DADA2). One of the biggest culprits in getting different taxonomic assignments is comparing variable lengths, where as one feature can be assigned to the genus level based on x number of reads, but with a little bit shorter read length it might get assigned only to the family level.
Deblur by default discards any reads found less than 10 times across all your samples. Dada2 only removes singletons. Something to consider as well.
Have you seen this impartial benchmark paper or this one, both which compare denoisers including dada2 and Deblur?

pau · April 28, 2020, 7:31am

Hi @Mehrbod_Estaki , thanks a lot for your quick and so-detailed answer!
I am sorry I forgot to add a lot of information!
1: yeah, is paired-end data trimmed to 250fw and 240rv based on demux.qzv (I have added commands in point number 4)
2: No, I only ran dada2 one single time. I was recommended to run same dada2 in order to have the same error model for all my samples. So I need changing to 3 different denoising paths?
3: yes, I am running the pre-filtering recommended in qiime2 “moving pictures tutorial”

qiime quality-filter q-score
--i-demux /scratch/varagon/analisi/definitiu/trimmed-demux.qza
--o-filtered-sequences deblur_files/demux-filtered.qza
--o-filter-stats deblur_files/demux-filter-stats.qza

4: I think are the same…

qiime deblur denoise-16S
--i-demultiplexed-seqs deblur_files/demux-filtered.qza
--p-trim-length 240
--p-left-trim-len 5
--p-jobs-to-start 16
--o-table deblur_files/table-deblur.qza
--o-representative-sequences deblur_files/rep-seqs-deblur.qza
--o-stats deblur_files/denoising-stats-deblur.qza

qiime dada2 denoise-paired
--i-demultiplexed-seqs /scratch/varagon/analisi/definitiu/trimmed-demux.qza
--p-trim-left-f 0
--p-trim-left-r 5
--p-trunc-len-f 250
--p-trunc-len-r 240
--o-table DADA2_files/table-dada2.qza
--o-representative-sequences DADA2_files/rep-seqs-dada2.qza
--o-denoising-stats DADA2_files/denoising-stats-dada2.qza
--p-n-threads 24

6: I saw first one, I am taking a look to the second! But in general there’s non better than the other I think; that’s because I was checking which one gave me best results. It's also true that depending on the goal makes sense choosing one or the other, but I think both have pros and cons.

Thanks again for your time!

Mehrbod_Estaki · April 29, 2020, 3:33am

Hi @pau,
Thanks for the updates.

Hmm, can I ask where you got this recommendation from? In particular if its somewhere on this forum so we can correct it there. If your samples were carried in 3 different runs (or even different PCRs but same sequence run), then the error profiles are going to be specific to those batches. So when we want to train an error model for dada2 to use, it is best to train them based on each batch separately. This is why the dada2 developer recommends to run them separately, and join after. For deblur this is not an issue because it uses a pre-trained error model which is applied to each run, therefore you can combine them and run together. Again, just to reiterate that your trimming parameters across these 3 dada2 runs should be the same in order to obtain features of the exact same length and position.

So, this is an important one in your comparison. In DADA2 your truncating parameters are applied before merging, so by the time your reads are merged you will have features that are ~ 420 bp (for V3-V4, excluding primers). There is naturally variability in this region across taxa so that's why I say ~ 420. But in deblur you are applying your 240 trimming after joining, so all of your reads that are ~ 420 will end up being exactly 240 bp post-deblur. Now you can imagine the difference between taxonomic classification when your deblur reads are exactly 240 and your DADA2 reads are ~420. There's quite a bit of room here for discrepancies. You could try increasing the Deblur trimming to 420 for a closer comparison but then you will lose quite a bit of reads as a result of the longer read length, but if you can afford to lose some reads then this might be a good option. Or you could trim all of your DADA2 reads to 240, but then you are tossing potentially useful data.

Don't forget about # 5 of, if you are truly trying to compare the 2 workflows, then not discarding rare features in Deblur or filtering DADA2 results with min abundance feature <10 will get you closer to each other.

That's exactly the right conclusion. It all depends. For example, for analysis of a stand-alone study especially with a longer regions like V3-V4, I personally find DADA2 to be a bit more giving as it's able to retain more reads and allows for variable length reads. But for larger studies, meta-analyses, especially those targeting shorter regions like V4, Deblur is my go to as it is computationally less intensive and easier to compare data across other datasets, since this is the denoiser used in Qiita.

pau · April 29, 2020, 7:06am

Hi @Mehrbod_Estaki, thanks for your quick answer again!

I can’t remember where I got this recommendation from, but it was not in this forum. I’m sorry. But in the end, having 3 different error models in one dataset won't be a problem?Excuse my ignorance in this point.

So, maybe it’s truncating parameters what impacts the most on these differences. Maybe it’s better not to change anything, preventing from loosing data, but to take it into account when comparing results.

Yeah, sure about #5!

Thanks a lot for your useful help!

Mehrbod_Estaki · April 30, 2020, 6:15am

No apologies needed! I just wanted to be sure conflicting recommendations are not going around the forum.

Nope. As I mentioned, this is the DADA2 developer's own recommendation, The problem is losing sensitivity if you merged first then trained your model on the average of 3.
There will be a lot of different reasons why results could differ, we just highlighted some of the main ones.
Good luck!

system · May 31, 2020, 12:15pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.