how to convert DADA2 plugin so that it will merge all paired reads after denoising

ek_97 · January 31, 2020, 8:18pm

Hi we are using QIIME2-2019.10 installed by Conda.
We have had trouble merging paired reads using DADA2 for 16s rRNA gene reads. As a test when using CLC Genomics Workbench we got 50% successfully merged but with DADA2 we are lucky to get 5%. This has been a consistent problem with different sequence sets when using DADA2, which otherwise we are happy with. As a result we have used single end reads as others have suggested.

Our questions is how difficult would it be to alter the Plugin for DADA2 in QIIME2 so it would allow the mergePairs(..., justConcatenate=TRUE) option as it does when you run DADA2 natively. To do this within QIIME2 would this require us to alter the code of the DADA2 plugin, and if so how hard would that be?

thanks, Elissa Kim and Guy Adami

Mehrbod_Estaki · January 31, 2020, 10:59pm

Hi Elissa and Guy,
The optimization of DADA2 truncating parameters in order to get proper merging has been thoroughly covered in the forum. Ultimately you will need a minimum of 12 nt overlap between your forward and reverse reads for DADA2 to merge them.
Can you describe your set up and perhaps we can help with that before looking at other options. What are the primers you are using and what is the length of the overlap region with those, what sequencing platform did you use, and how long are your sequences? (2x300?)

When your reads do not have sufficient overlap, I wouldn't trust using paired end reads (on any software), how do you know the true position of those reads and the profile of the insertion? Not to mention it is nearly impossible to compare to other datasets too. Without proper merging I would stick with just the forward reads.

That being said if you still really wanted to use justConcatenate, I would just stick with the native DADA2 in R then import your result into Qiime2 after. Otherwise you would have to clone yourself a version of the Qiime2 repo and manually change the q2-dada2 script to include justConcatenate, this is likely more of a hassle than its worth to be honest.

Nicholas_Bokulich · February 3, 2020, 9:45pm

I sort of disagree with this but it is just a matter of differing perspectives, I suppose. One advantage of "doing it all in QIIME 2" is to store it all in provenance (and these could be changes that you could consider contributing to the q2-dada2 source code, though doing so would require 1) approval of the q2-dada2 developers and 2) a little more work to expose these options so get in touch if you want to contribute).

Here is a topic where I described the process for a similar problem, and link at the end to the file that should be altered in your local branch:

Or here is a tutorial for importing dada2 objects into QIIME 2:

Choose your own adventure

ek_97 · February 3, 2020, 10:03pm

HI Mehrbod,
Thanks for the information. The reads are 2x300, from a MiSeq machine, with 280 bases of usable sequence for the forward reads and 260 for the reverse (PHRED quality 25) which gives an overlap of about 33 bases. The primers we use to sequence are at the 3' ends of CS1_27 and CS2_534. If we do concatenate sequences we will test how the artificially merged read pairs compare to both forward reads and reverse reads examined separately in regard to taxa identification and numbers to validate the approach.

It is interesting that we can use DADA2 natively then import the result into QIIME2. Unfortunately our skill level in R is limited. So you think it would be a lot of work to have some who is familiar with programming to adapt the DADA2 plugin to include justConcatenate option.

thanks, Guy

Mehrbod_Estaki · February 3, 2020, 10:59pm

Hi @ek_97,
Sounds like you should have enough overlap to properly merge these with DADA2, perhaps your run just needs some parameter optimization. Would you mind posting the demux summary visualization so we can figure out why the merging is failing? Given that your reads technically overlap, I think using justConcatenate will make things even worse actually by artificially increasing the length of your reads. Overall, I just don't see this as being a good option at all..

That being said, if you do end up using justConcatenate for whatever reason, I would again be very careful with the results.

This is actually a rather straight forward process, and doesn't require too much skill if you're willing to dedicate a little time for it. The DADA2 tutorials are very easy to follow and the link for importing DADA2 into QIIME 2 is also very easy to follow. Most of it would be copy & paste.

I suggest you don't hold your breath for implementation of this feature for the immediate future simply for the reasons that it is not a priority. Not to say it won't be implemented somewhere down the line, but just not at the moment.

Btw, as to @Nicholas_Bokulich's comment, I should clarify that I agree in that I think it would be wonderful to have q2-dada2 parameters on par with the native version through user contributions, especially so that everything can be done in Qiime2 and stored within provenance, I just meant in specific the justConcatenate option to me is not worth anyone's time really