Why does deblur work and dada2 does not?

Carla_Uranga · July 28, 2020, 5:19am

Hi I am using 4 forward and 4 reverse primers for amplifying V1-V9 16s regions. We first sequenced 150 bp. Dada2 does not find any overlapping sequences, however, deblur does. Since we only sequenced 150 bp is the data obtained from deblur wrong? We are now sequencing 300 bp reads to improve overlap, but I am interested in finding out why deblur works but dada2 does not. Thank you!

Carla

llenzi · July 28, 2020, 7:59am

Hi @Carla_Uranga

Can you explain better your experimental design, please?
When you say 4 primer pairs, do you mean you co-sequence them in the same library, or do you have different barcode for each primer-pair?

I am asking because both dada2 and deblur expect all sequences being originated from the same region (I am sure for dada2, not 100% sure for deblur but I would be very surprised if not!).

In terms of differences between dada2 and deblur, deblur works on single reads only, technically it does not try to merge sequences at all! Hence, in your analysis you should have obtained sequences from the forward read only in the results (how does the length profile for the result look?). In your case you should pre-merge your pairs as described in Alternative methods of read-joining in QIIME 2 — QIIME 2 2020.6.0 documentation before performing the deblur denoising.

Cheers

Carla_Uranga · July 28, 2020, 6:56pm

Hi so we sequenced 150 bp (paired-end) using four primer pairs designed to amplify V1-V9 regions, and see if we can identify microbes to the species level. We are currently sequencing 300 bp to see if we can improve overlap, since we are obviously not getting good overlap. I ran dada2 in R and was able to change the minimum overlap parameter to 6. Is it possible to change this parameter in Qiime2? If so, how? Thanks!

llenzi · July 28, 2020, 7:24pm

Hi,
I'm sorry but I don't think that parameter to change the minimum overlap is available in the qiime2 plug in, so you will be forced to use the R version (if I am wring i am sure someone will be happy to correct me!).

However, I wanted to point you out to the fact that within dada2 you should denoise separately the sequences for each primer pairs.

So you may have something like the following (in which the expected amplicons may or may not overlap):

F1 F2 R1 R2
--> --> <-- <--
±±±±±±±±±±±±±±±
V1-V3

You should perform a denoising step for sequences originated from F1/R1, another for sequences from F2/R2, and so on.

It may be obvious, and may be what you doing already, but I wanted to be sure because is not clear to me from what you writing above !

Carla_Uranga · July 28, 2020, 9:21pm

Hi training classifiers for each primer pair is something we are contemplating, but we are looking for a faster way of getting species-level assignments bioinformatically speaking. I have been reading that qiime2 has a complete 16s classifier, and am wondering how it was generated. I also tried training my own classifier using the V1 forward and V9 reverse primers. Would this be a valid approach? Would the trained classifier then have the entire V1-V9 16s regions? Thanks!

llenzi · July 28, 2020, 9:36pm

Hi,

On the denoising side, you can not denoise a pool of sequences containing, e.g., V1 and V3 (at least with dada2 or deblur I mean).
How to train your classifier in your case is a different problem, I am not sure I can fully help on this.

However, I think you should look at the close-reference clustering if you want have amplicons from more than one primer pair in the same library, so you could avoid both the denoising and classifier issues!

I hope make sense!

Carla_Uranga · July 29, 2020, 4:27am

Well, honestly it's tough to find the logic of these algorithms. I don't want a black-box level understanding. I really want to understand what we are doing at every step. Our samples were amplified with 4 different primer pairs, so you can imagine we are getting all sorts of fragments amplified resulting from all of the possible combinations of these 8 primers in each fastq file. However, keep in mind right now we only have 150 bp from each end, so very little overlap is occurring. in R, dada2 statistics showed our average overlap was 14 nt, which does not allow us to use qiime2 paired-end analysis due to the default being 20 nt.

So what do you mean I cannot denoise a pool of sequences containing V1-V9? We are using the Swift 16s kit, which multiplexes with 4 primer pairs and techically amplifies this entire region, which we are sequencing in 150 bp fragments, but will soon have 300 bp fragments which will theoretically overlap. Qiime2 is not able to merge overlapping sequences? In R it seems dada2 is able to do this and then assign taxonomy from the resulting mergers.

llenzi · July 29, 2020, 2:38pm

I surely agree on your principle on not using a tool as a black-box (although sometime is handy for speedy result)!

I think we are going really off-topic now so I'll try to add more of my thought in here but after this I would ask if you so kind to open one (or more if you need) additional topic(s).

On the expected min overlap, please note that is now 12 bp, as discussed in the thread:

A possible way to change this threshold is described here:

On the denoising step, let see if I can explain myself better.
In denoising together sequences of different lengths (different amplicons), I would be worried that a trimming setting wont fit them all, because all the sequences shorter (and expected to be shorter) than the chosen settings will be lost by dada2 normal behaviour.

If you would like to try there are alternative merging methods such as join-pairs: Join paired-end reads. — QIIME 2 2020.6.0 documentation
You could try this to have an idea don how good is the merging possibility in your sequences!

In my mind, your analysis would be close to the closed-reference clustering described here Clustering sequences into OTUs using q2-vsearch — QIIME 2 2020.6.0 documentation

Hope it helps

system · August 29, 2020, 10:58pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.