deblur is preferred than dada2 in combining data?

ranxx005 · January 30, 2020, 10:37pm

I searched the difference between dada2 and deblur. In the Qiime2 training conference 2019.5, I was suggested to use deblur based on the reasons: 1. easy to combine analyzed data from different runs 2. deblur is designed based on illumina data. dada2 is designed for Ion torrent, and it is hard to combine analyzed data from different runs. Later as I search the forum, I found deblur is conservative in a post, and qiime has been updated twice since then. So I am wondering in term combining analyzed data from different runs, if deblur is still preferred than dada2. Thank you very much!

Mehrbod_Estaki · January 31, 2020, 1:34am

Hi @ranxx005,
The comparison between Deblur and DADA2 have been made a few times on this forum before, I would recommend searching through the topics and reviewing those, some very good insights there.
Performance wise, this impartial review may shed some light on the topic too.

Important to clarify:

This is not true at all. DADA2 has been tested on and can handle Illumina, IonTorrent, 454, and most recently PacBio data. The QIIME 2 implementation of DADA2 currently doesn't have PacBio support but I believe this is arriving soon enough. Deblur on the other hand is only fitted to work with Illumina data. This is because Deblur uses a static error model which is precalculated based on Illumina data, so it is only fitted for this type of dataset.
The 2 tools were created with different primary goals in mind but both perform very well.

The fact that Deblur uses the same error model for each run makes it faster and easier to combine data from different studies and run together. Whereas with DADA2 each 'run' has to be denoised separately, but can still be combined together afterwards. Since both tools output exact sequence variants, the onus is on the user to ensure that all the sequences being compared are of the exact same location (both in length and region).
In my experience Deblur does become much more conservative than DADA2 as read length increases, the reason for this is explained here, but keep in mind that Deblur was designed to work mainly with shorter regions like V4. That isn't to say it doesn't perform well with other regions, but it is optimized for those.

Yes! Qiime2 does update frequently () , at the moment every ~3-4 months, there is a new version coming soon as well. I recommend always using the most up-to-date version.

This is totally up to you and what your needs are for your data. Both can be used to combine data, in their own way, and both produce reliable results. If you plan on comparing your data to others in Qiita then you'll be using Deblur regardless, if you are just comparing a few of your own datasets to each other, then DADA2 will be just fine too.

ranxx005 · February 3, 2020, 7:56pm

Thank you very much!