DADA2 paired - end inquiry (trunc settings)

Hi guys,

I've run DADA2 for a single - end job before and at --p-trunc-len - 150, it would result in all my sequences normalized at 150 bp, for example.

However, for paired - end sequences, I've noticed that while my --p-trunc-len-f -150 and --p-trunc-len-r -150, it doesn't actually normalize the sequences at 150. Instead, it makes the minimum at 150. I would rather have a normalized output, if that is possible.

I may be misunderstanding how the paired - end DADA2 runs, if anyone can clarify, it'd be much appreciated.

Hi @sabasu,
When you are running DADA2 in single mode the truncating value will be used to trim all your reads to the position of 150 and any reads that don't have at least 150bp are discarded, so it makes sense that in that case all your reads are at 150bp.
When you run DADA2 in paired-end mode however there is an extra step where the corresponding forward and reverse pairs merged and depending on the overlap region and natural variability of that region you can have reads that are not all the same. For example, in forward mode:

AAATTTGGGCCC <- Forward read only
------------------------ <- total length is 12bp

In paired-end mode:
AAATTTGGGCCC___ <-Forward reads
___AAACCCGGGTTT <-Reverse reads
------------------------------ <- combined count is 15bp

Now imagine that the above overlap region is naturally variable across various taxa so some reads will be longer and some shorter. So you will have variability in paired mode compare to single-end mode. Hope that clarifies it a bit.

4 Likes

@Mehrbod_Estaki

Extremely thorough and helpful!

Appreciate the prompt reply. Have a great day.

@Mehrbod_Estaki

Just another quick question . . . I know that deblur is 'agnostic' to single or paired end data but would you say that it is an inaccurate method for paired end data?

Hi @sabasu,
Good question. Deblur is designed to work on single reads alone but it can technically work with paired-end reads as long as you merge the reads prior to Deblur. It will just treat it as one longer read, that's why it is said it is 'agnostic'. It can do this because Deblur uses a static error model for denoising which means it doesn't care what the quality scores of your reads are, instead it uses its own a priori error model developed previously based on Illumina data.
This also means that the error model becomes more stringent as the length increases. From my own experience Deblur works wonderfully for what it was designed for, which is single end reads, but it becomes a bit too conservative for my liking as the read lengths increased (ex. when working with paired-end reads), but that's just my preference. So in short, it works totally fine and you can use it with paired-end data, and it will be faster than DADA2, but with paired-end data, especially with longer amplicons like V3-V4, you do generally lose more reads due to the conservative nature of its error model. Why not try both and see for yourself how it works out!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.