Vsearch-Deblur vs DADA2

Hi All,
I’ve followed the vsearch method in joining and analysing MiSeq demultiplexed 16S PE reads. Analyzing paired end reads in QIIME 2

My reads were both adapters and barcodes trimmed but primers were not removed.
What I noticed is that my FeatureTable had very low number of sequence counts per sample compared to the original method where I used DADA2. On average there was at least 4 fold reduction in sequence coounts using the vsearch-deblur method!!

why is this difference between the two methods? isn’t DADA2 supposed to join my PE reads?

Thanks for your help

Hi @asr17,

This is a bit surprising but not necessarily wrong — they are different methods for denoising sequences so may behave differently (particularly as joining reads prior to denoising may alter this behavior).

I suspect that the particular commands you used may be at fault here, though. A likely problem area is the trim length that you set in deblur — if this is higher than the length of some joined reads, those reads will be dropped. Could you please double-check on those values to make sure that they are appropriate, given the quality and length of joined reads (as discussed in the tutorial that you linked to)?

If that is not the cause, could you please share:

  1. the exact commands that you used for vsearch, deblur, and dada2
  2. the demux summaries (quality plots) for your inputs (i.e., the joined input to deblur and the paired input to dada2).
  3. the output of deblur visualize-stats

Yes, dada2 does join reads after denoising.

I hope that helps!

Hi @Nicholas_Bokulich,

Thanks for your help and sorry for my late reply. I set my trim length at 292 based on the quality score plots.

1- Here are the commands I used:
Vsearch-deblur method:

qiime vsearch join-pairs
--i-demultiplexed-seqs paired-end-demux.qza
--o-joined-sequences demux-joined.qza

qiime quality-filter q-score-joined
--i-demux demux-joined.qza
--o-filtered-sequences demux-joined-filtered.qza
--o-filter-stats demux-joined-filter-stats.qza

qiime deblur denoise-16S
--i-demultiplexed-seqs demux-joined-filtered.qza
--p-trim-length 292
--o-representative-sequences rep-seqs.qza
--o-table table.qza
--p-sample-stats
--o-stats deblur-stats.qza

In DADA2 :
qiime demux summarize
--i-data demux.qza
--o-visualization demux.qzv

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--o-table table
--o-representative-sequences rep-seqs
--p-trim-left-f 13
--p-trim-left-r 13
--p-trunc-len-f 250
--p-trunc-len-r 250

2- Demux summaries:demux-joined-filtered.qzv (284.0 KB)
demux.qzv (280.7 KB)

3- Deblur ouptut attached: deblur-stats.qzv (192.2 KB)

Thanks again!

Hi @asr17,
I’m not certain what is going on, but I think the issue might be with your setting for –p-trim-length. Can you try setting that to 286 instead of 292 to see if that dramatically increases your read count? Based on demux-joined-filtered.qzv file that you attached to your last message, it looks like 292 is just about the length of most of your reads, so I’m wondering if dropping that slightly will help. That same visualization tells me that all of the sampled reads were at least 286 bases long, so that’s where I got that value from. It’s possible that you can increase it from 286, but I’d like you to first try with that value so we can determine if this is the cause of this issue. Your new command should be:

$ qiime deblur denoise-16S \
 –i-demultiplexed-seqs demux-joined-filtered.qza \
 –p-trim-length 286 \
 –o-representative-sequences rep-seqs.qza \
 –o-table table.qza \
 –p-sample-stats \
 –o-stats deblur-stats.qzv

After re-running this, can you please include the resulting deblur-stats.qzv file in your reply?

Hi @gregcaporaso,
I decrease the trimming length to 286 as you suggested, it improved the sequence counts by a mere 1-2% which is still way below DADA2 method. I am attaching the new deblur-stats file deblur-stats.qzv (192.2 KB)
Thanks for your help!

Hi, @asr17,

I had the same situation as you. Based on my experience, I think –p-trim-length does not have the same meaning as DADA2. It seems that it means the reads shorter than 286 will all be dropped. You can try to set the –p-trim-length -1 (no trim) to have a see. The bigger number you set, the fewer reads you will have. You can have a try to see. Just my experience was that.

But I still can not understand the appropriate trim length for the reads in DEBLUR. Looking forward to hearing from @Nicholas_Bokulich, @gregcaporaso suggestions.

In DADA2, we can trim the left and right. But in DEBLUR there is only one parameter –p-trim-length, some people say all reads will be trimmed to the same length(the same as the tutorial described). How could that be? If my reads are mostly distributed in two peaks, one is 250bp, the other is 200 bp. How could I set for that? Still confusing.

Hello,

To better understand the DADA2 and deblur parameters + the primer removal difference, I would suggest taking a look at this post and be sure to identify the assumptions that each of these tools are making and how they relate to their parameters.

Now, IMOO it’s really hard to make direct comparisons between the 2 methods, mainly because of the error models. Remember DADA2 creates an error model per run (“on the fly”), while deblur uses a general predefined model.

Anyway, thanks for sharing your summaries. Something clear from the 3 files (demux.qzv, demux-joined-filtered.qzv, deblur-stats.qzv), is that demux.qzv doesn’t have the same initial input as the other 2. Is this expected? Perhaps you can try using the same inputs. Also, out of curiosity, what’s the results if you use --p-trim-length 240 in deblur?

Thanks

1 Like

Hi @Lu_Yang @antgonza,

You can try to set the –p-trim-length -1 (no trim) to have a see. The bigger number you set, the fewer reads you will have.

In Deblur your joined reads should be trimmed based on the quality scores, check out the tutorial Alternative methods of read-joining in QIIME 2 — QIIME 2 2018.2.0 documentation, also the post [Deblur vs DADA2 Questions] @antgonza referred to.

Now, IMOO it’s really hard to make direct comparisons between the 2 methods, mainly because of the error models

I am not sure about that, but the huge drop in total frequencies made me little wary of the Deblur method.

is that demux.qzv doesn’t have the same initial input as the other 2. Is this expected? Perhaps you can try using the same inputs.

I had to re-import my fastq files after removing "underscores" from filenames in order to use Deblur which seemed to be sensitive to "_" . This is why the inputs may looked a bit different, but overall they are pretty much comparable.

Also, out of curiosity, what’s the results if you use --p-trim-length 240 in deblur?

Using a trim-length 240 we see an increase in total frequency from ~14,000 up to ~55,000 but that still too far from ~234,000 total frequency in DADA2. deblur-stats-240.qzv (192.2 KB)

Thanks for your comments!

Hello again,

Thanks for all the tests and replies.

Could you also post the stats qzv for DADA2 and if possible the qzv for each of the resulting qzv bioms? Basically, I would like to find a few sequences that show in one method but not in the other, and try to understand why this is happening.

Agree, it's always good to understand why, just note that more sequences not always means that the method/algorithm is right/better. For example, no Q/C will yield the most sequences.

Best,

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.