How to choose a reasonable trim-length for Deblur plugin?

Hey everyone,

Happy to post my first topic, as a new member joining the big family of Qiime2 forum.

Recently I taught myself the knowledge of denoising protocols, and several problems in terms of reads truncation and trimming confused me a lot:
1. When we use --p-trunc and --p-trim options in DADA2, the nucleotides with lower quality scores both in 5' and 3' would be discarded. However, the case in Deblur is to trim the nucleotides to a length, regardless of the quality in 5' (did I misunderstand the methods?). Are there some differences between DADA2 and Deblur in trimming principle? Furthermore, will the differences cast vast impact on the subsequent analysis?
2. Though I have learned about using interactive quality plots to capture crucial information for truncation and trimming, I did not figure out How to choose a reasonable trim-length with Deblur yet.

Here are my specific visualization results of two pair-end samples:


I noticed the lower quality scores of 5nts and 300nts in forward reads (median percentile bases have a quality score < Q20 ), and the counterparts of 5nts and 270nts in reverse reads. Nevertheless, I could not see a nucleotide with a quality score < Q30 anywhere in interactive plot after merging single-end reads. It seems difficult for me to choose a reasonable trim-length for paired-end reads in Deblur protocol.

More details could be found in demux.qzv (313.6 KB) and merged-demux.qzv (298.0 KB).

To solve them, I have reviewed some similar questions in Qiime2 forum, but not thoroughly addressed my confusion. Any idea and help would be appreciated, thanks sincerely.

1 Like

Hello @peccat,

When we use --p-trunc and --p-trim options in DADA2, the nucleotides with lower quality scores both in 5' and 3' would be discarded. However, the case in Deblur is to trim the nucleotides to a length, regardless of the quality in 5' (did I misunderstand the methods?).

The --p-trunc-* and --p-trim-* parameters in dada2 truncate/trim to the specified positions and do not take quality scores into account. There are other parameters for quality score aware trimming.

Are there some differences between DADA2 and Deblur in trimming principle? Furthermore, will the differences cast vast impact on the subsequent analysis?

As far as the trimming/truncating options go, no there is no difference between the two, both allow you to trim any number of bases from the 5' end and any number of bases from the 3' end. Deblur only supports single-end reads however. We generally recommend to use dada2 when possible.

Though I have learned about using interactive quality plots to capture crucial information for truncation and trimming, I did not figure out How to choose a reasonable trim-length with Deblur yet.

In your case I don't think deblur will be possible because you have paired-end reads.

I noticed the lower quality scores of 5nts and 300nts in forward reads (median percentile bases have a quality score < Q20 ), and the counterparts of 5nts and 270nts in reverse reads. Nevertheless, I could not see a nucleotide with a quality score < Q30 anywhere in interactive plot after merging single-end reads. It seems difficult for me to choose a reasonable trim-length for paired-end reads in Deblur protocol.

The small dips in quality at the 5' end of the reads are not big enough to warrant trimming I don't think. From looking at the plots I would think truncating the forward around 260 and the reverse around 220 would give good results. The size of the amplicon sequenced will play a role (the reads need to overlap by 12 bases).

Hello @peccat,

As @cherman2 pointed out you can use deblur with merged reads if you prefer it over dada2. In that case you would use the vsearch merge-pairs action first. It's still probably best to use dada2 if possible.

2 Likes

Hello @colinvwood,

As @cherman2 pointed out you can use deblur with merged reads if you prefer it over dada2. In that case you would use the vsearch merge-pairs action first. It's still probably best to use dada2 if possible.

Thank you for your kind advice.

For some errors, the DADA2 plugin doesn't work in my environment. My current objective is to learn the analysis protocol of 16S amplicons and subsequently to test it. Therefore I think it doesn't matter to continue to use Deblur for denoising data in this span of time, without a re-installation of QIIME2.

In your case I don't think deblur will be possible because you have paired-end reads.

Actually, it is the case that before importing the reads, I have ever used the mentioned vsearch method to merge my single-end reads, subsequently summarized visualization for viewing interactive plots.

And in terms of quality,

The small dips in quality at the 5' end of the reads are not big enough to warrant trimming I don't think. From looking at the plots I would think truncating the forward around 260 and the reverse around 220 would give good results. The size of the amplicon sequenced will play a role (the reads need to overlap by 12 bases).

Really helpful with your idea for using DADA2. Thank you so much, and I will revisit the relevant guide to take overlaps into account.

Besides, would there be any other idea of trim-length for Deblur? Looking forward to your suggestion.

Hello @peccat,

Besides, would there be any other idea of trim-length for Deblur? Looking forward to your suggestion.

I don't think that trimming the merged reads makes much sense. As we said earlier, your 5' ends don't need trimming and now that the 3' ends have been overlapped, it's no longer possible to truncate there.

2 Likes

Hello @colinvwood

I don't think that trimming the merged reads makes much sense. As we said earlier, your 5' ends don't need trimming and now that the 3' ends have been overlapped, it's no longer possible to truncate there.

It is fine if no trimming is needed. Thanks again for your patient replies.

1 Like