Can DADA2 be carried on pre-joined reads


(Sindhu Mohandas) #1

I have a similar question- I have 16s V1-V3 amplicon data for which forward and reversed reads have been assembled and primers, adapters and tags are trimmed.
I used it as a single ended sequence for analysis and for Dada2 chose p trim left 0 and p trunc 500, (so essentially not trimming much). Is this an acceptable way of using this data for analysis?

Get representative sequences from pre-joined data
(Mehrbod Estaki) #2

Hi @sindhu_mohandas,
No actually, Dada2 will not perform properly if your reads are pre-joined with another tool. The main issue is that this is in conflict with the error-building approach utilized by Dada2. When you join reads you are assigning new quality scores (calculated differently between different tools) in the overlapping region which are not true q-scores which Dada2 uses. So, you want to run raw data without any prior modifications. What you’ve done is more appropriate for Deblur however, as Deblur uses a static error model and does not rely on quality scores.

(Sindhu Mohandas) #3

Thanks, appreciate the response! So I could use Deblur and proceed with the analysis that way? I have been using the Moving pictures tutorial to work through the data- does using V1-V3 change anything else since the tutorial seems to use V3-V4

Also how do I make a classifier , the primers were 8F and 534R. specifically what is the p trunc, min and max that I should use in the following step:
qiime feature-classifier extract-reads
–i-sequences 85_otus.qza
–p-trunc-len 120
–p-min-length 100
–p-max-length 400
–o-reads ref-seqs.qza

Thanks a lot for helping with the questions

(Nicholas Bokulich) #4

(Mehrbod Estaki) #5

Hi @sindhu_mohandas,

The Moving Pictures tutorial is actually using V4 region, but the only thing that would really change is your denoising step as it requires different consideration in the truncating parameters of DADA2. This is well covered through various other threads on the forum.

It looks like you have already seen the great tutorial here that shows exactly how to train your own classifier. You should follow that tutorial closely right from the beginning.
The command you have above needs to be completely modified to match your case. For example, you need to include your own primer sequences and not the one in the tutorial as those are specific to the V3 region. Also as mentioned in one of the highlighted blue ‘note boxes’ in that tutorial, with paired-end reads you end up with reads that are variable in length and so it is recommended that you don’t actually do any truncating when extracting your reads. Another point that is also in the tutorial, do not use 85% green-genes otus file, that is just for the sake of the tutorial.
Have a read there and let us know if you have any other questions. Good luck!

(Mehrbod Estaki) #6

Oops, ignore that, I forgot you were asking about deblur.
You’ll just need to set a truncating length as deblur requires all the reads to be the same length. You could pick a length the same or smaller than the shortest real read you expect using your primer set. Looking at demux summary of your merged reads should also help.

(Mehrbod Estaki) #7

(Sindhu Mohandas) #8

Thanks again- very helpful information - I will work on the classifier as you suggested.
Here is the demux summary

I picked 500 as the truncating length but maybe should go with 300?

Also a problem I have while trying to run the deblur is that it comes up with the error

Plugin error from deblur:

Duplicate sample IDs!

(Sindhu Mohandas) #9

Just an update. I also tried DADA2 and was able to proceed without any problems. In fact the taxa bar plots looked like this and seem consistent with what I would expect

and the diversity plots are also consistent with this study analysis done by someone else using other software.

Do you think use of DADA2 in this scenario makes the results I got unreliable?

(Nicholas Bokulich) #10

(Mehrbod Estaki) #11

Certainly not! In fact for paired-end reads I would personally use DADA2 to over Deblur. I just want to make sure that when you said you used DADA2 instead, did you mean you used the the pre-joined reads as your input, or did you use unpaired reads? For DADA2 you want to be using unpaired reads.

As for your earlier question regarding your run with Deblur, how did you merge your reads exactly? I ask because those quality plots don’t look like typical quality plots I would expect from merged reads. The overlap regions (which would be in the middle of your plot) usually have much higher quality scores whereas your plot looks like the reads were simply just stitched beside each other without any overlap.

On a second look, you mentioned you are using 8F and 534R primers, and your quality plots look like you had a 2x250 PE run. Is that correct? If so, since your target is actually over 500bp in size you would have no overlap at all and so I’m wondering how DADA2 didn’t give you an error since you wouldn’t have any reads retained due to lack of merging.

(Matthew Ryan Dillon) #12