My reads were both adapters and barcodes trimmed but primers were not removed.
What I noticed is that my FeatureTable had very low number of sequence counts per sample compared to the original method where I used DADA2. On average there was at least 4 fold reduction in sequence coounts using the vsearch-deblur method!!
why is this difference between the two methods? isn't DADA2 supposed to join my PE reads?
This is a bit surprising but not necessarily wrong — they are different methods for denoising sequences so may behave differently (particularly as joining reads prior to denoising may alter this behavior).
I suspect that the particular commands you used may be at fault here, though. A likely problem area is the trim length that you set in deblur — if this is higher than the length of some joined reads, those reads will be dropped. Could you please double-check on those values to make sure that they are appropriate, given the quality and length of joined reads (as discussed in the tutorial that you linked to)?
If that is not the cause, could you please share:
the exact commands that you used for vsearch, deblur, and dada2
the demux summaries (quality plots) for your inputs (i.e., the joined input to deblur and the paired input to dada2).
Hi @asr17,
I'm not certain what is going on, but I think the issue might be with your setting for –p-trim-length. Can you try setting that to 286 instead of 292 to see if that dramatically increases your read count? Based on demux-joined-filtered.qzv file that you attached to your last message, it looks like 292 is just about the length of most of your reads, so I'm wondering if dropping that slightly will help. That same visualization tells me that all of the sampled reads were at least 286 bases long, so that's where I got that value from. It's possible that you can increase it from 286, but I'd like you to first try with that value so we can determine if this is the cause of this issue. Your new command should be:
Hi @gregcaporaso,
I decrease the trimming length to 286 as you suggested, it improved the sequence counts by a mere 1-2% which is still way below DADA2 method. I am attaching the new deblur-stats file deblur-stats.qzv (192.2 KB)
Thanks for your help!
I had the same situation as you. Based on my experience, I think –p-trim-length does not have the same meaning as DADA2. It seems that it means the reads shorter than 286 will all be dropped. You can try to set the –p-trim-length -1 (no trim) to have a see. The bigger number you set, the fewer reads you will have. You can have a try to see. Just my experience was that.
But I still can not understand the appropriate trim length for the reads in DEBLUR. Looking forward to hearing from @Nicholas_Bokulich, @gregcaporaso suggestions.
In DADA2, we can trim the left and right. But in DEBLUR there is only one parameter –p-trim-length, some people say all reads will be trimmed to the same length(the same as the tutorial described). How could that be? If my reads are mostly distributed in two peaks, one is 250bp, the other is 200 bp. How could I set for that? Still confusing.
To better understand the DADA2 and deblur parameters + the primer removal difference, I would suggest taking a look at this post and be sure to identify the assumptions that each of these tools are making and how they relate to their parameters.
Now, IMOO it's really hard to make direct comparisons between the 2 methods, mainly because of the error models. Remember DADA2 creates an error model per run ("on the fly"), while deblur uses a general predefined model.
Anyway, thanks for sharing your summaries. Something clear from the 3 files (demux.qzv, demux-joined-filtered.qzv, deblur-stats.qzv), is that demux.qzv doesn't have the same initial input as the other 2. Is this expected? Perhaps you can try using the same inputs. Also, out of curiosity, what's the results if you use --p-trim-length 240 in deblur?
Now, IMOO it’s really hard to make direct comparisons between the 2 methods, mainly because of the error models
I am not sure about that, but the huge drop in total frequencies made me little wary of the Deblur method.
is that demux.qzv doesn’t have the same initial input as the other 2. Is this expected? Perhaps you can try using the same inputs.
I had to re-import my fastq files after removing "underscores" from filenames in order to use Deblur which seemed to be sensitive to "_" . This is why the inputs may looked a bit different, but overall they are pretty much comparable.
Also, out of curiosity, what’s the results if you use --p-trim-length 240 in deblur?
Using a trim-length 240 we see an increase in total frequency from ~14,000 up to ~55,000 but that still too far from ~234,000 total frequency in DADA2. deblur-stats-240.qzv (192.2 KB)
Could you also post the stats qzv for DADA2 and if possible the qzv for each of the resulting qzv bioms? Basically, I would like to find a few sequences that show in one method but not in the other, and try to understand why this is happening.
Agree, it's always good to understand why, just note that more sequences not always means that the method/algorithm is right/better. For example, no Q/C will yield the most sequences.