Using merged sequences from vsearch join-pairs and DADA2?

The DADA2 plugin offers the possibility to run the analysis on a single read, by using the denoise-single version of the algorithm. This version of the plugin requires, as input, either an artifact of type SampleData[SequencesWithQuality] or SampleData[PairedEndSequencesWithQuality].

In principle, this makes sense (i don't get why paired-ends as an input, though, since it should be single end...). However, if I want to perform the merging of R1 and R2 beforehand, then I have to use the vsearch plugin's join-pairs function which returns an artifact which type is SampleData[JoinedSequencesWithQuality] making it impossible to use with DADA2 denoise-single. Maybe there is a way of transforming a JoinedSequencesWithQuality artifact into a SequencesWithQuality one, but I wasn't able to find it.

Am I missing something, here?

Hi @gabt ,

There is a reason for this. Using vsearch to join the reads modifies the per-base quality scores, which in turn will mess up dada2's error model. Hence, joined reads should not be passed as input to dada2.

You should instead:

  1. use dada2 denoise-paired, which will denoise the forward and reverse reads independently and then join (merge) them.
  2. OR use join-pairs but then pass to q2-deblur for denoising, instead of dada2 (deblur uses a static error model and disregards the per-base quality scores, so is fine to use on joined reads).

Note that denoise-single only uses the forward reads; the reverse gets discarded. This is not relevant to your current query, but just want to add that in case you were planning to run this on paired-end reads.

Good luck!

3 Likes

@Nicholas_Bokulich, thank you very much for the nice explanation, which I found very interesting. I didn't know that vsearch is affecting the reads quality score although, now that you mention it, it makes total sense since the overlapping part should have a score, too (I guess that is the reason). So, in general, DADA2 should not be used with reads that were joined beforehand, if I got this properly. And this is true either with or without the help of Qiime2. But then there is no way of merging R1 and R2 and, only then, run DADA2 on the merged, am I right?

Exactly. I think the Q score at the overlapping part becomes the higher of the two Q scores (from the overlapping bases) but I cannot recall exactly how this is calculated as it is different for different joining methods.

Correct

Correct

Correct. The two opens I gave above are the options you have for denoising paired-end reads.

Good luck!

2 Likes

@Nicholas_Bokulich alright, thanks a lot!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.