DADA2: loss too many reads after filtering

Hi timanix,

I checked other forum post, if I run it with 230 for both sides, I think I will have 4bp gap, right?

(forward read) + (reverse read) - (length of amplicon) - = overlap
230+230-464 = -4 bp,
then my reads cannot merge at all because they don't overlap.

Or, should I discard truncation and lower the value of overlap region which may save the reads loosing on merging step?

1 Like

In that case, probably it is an optimal solution! Decreasing minimum overlap with disabled truncation may save some reads :clap:.

Hi! Sorry for bothering you again!

I tried to use the minimum overlap parameter, I set --p-min-overlap 4, actually the result doesn't change too much. It is basically the same as the no truncation result. Seems I can't do anything else to improve my results? Do you have any suggestions? Thanks!

Hi!
I am running out of ideas :thinking:

Based on your results, you are still loosing reads on merging step after filtering. From one side, you have enough reads to proceed, from another - you results may be biased towards bacteria with shorter amplicons.
So option 1 will be just proceed with disabled truncation by Dada2 with a risk of biased data.
Option 2 is to use only forward reads. This way you will keep most of the reads but taxonomy annotations will be performed with shorter reads.
Option 3 is to try to merge your reads with vsearch plugin, where you can not only decrease overlapping region but also allow some mismatches and then denoise by Deblur to see if it will improve the output.

2 Likes

Hi @Xinming_Xu!
Got a hint that you also may take a look on Figaro tool (not in Qiime2) that may help determine best truncation parameters.

Hi, really appreciate for all your suggestion! I've seen people using figaro, so first I give a shot to this method.

So, the recommended forward truncation position is 238 and the recommended reverse truncation position is 249.

[
    {
        "trimPosition": [
            238,
            249
        ],
        "maxExpectedError": [
            2,
            3
        ],
        "readRetentionPercent": 81.36,
        "score": 76.36009224060184
    },

Then I run it on dada2, also I indicate maxExpectedError to 2 and 3, here is what I got

No reads passed the filter. trunc_len_f (238) or trunc_len_r (249) may be individually longer than read lengths, or trunc_len_f + trunc_len_r may be shorter than the length of the amplicon + 12 nucleotides (the length of the overlap). Alternatively, other arguments (such as max_ee or trunc_q) may be preventing reads from passing the filter.

Then I realize, maybe it's because I forgot to set the primer length in figaro as I'm using the original data without trimming. But it still recommend me the same truncation position.

Figaro seems like a great tool to know the truncation value, but I got confused about the result I got when running dada2.

Hi @Xinming_Xu
So I will suggest to proceed with option 2 or try option 3 to see if it will improve amount of reads.

2 Likes

Hi, I've had a similar problem with some soil DNA. What I found worked best was to use only FWD reads and adjust the p=min to 4. Then I was able to keep more than 70% of reads after DADA2, whereas before adjusting I was only keeping between 30-40%. Interestingly, this is only happening for certain soil samples, other soil samples I collected in a different location are working much better. Just out of curiosity, what kind of soil/DNA extraction method are you using?

1 Like

Hi,

Thanks for your suggestion, I used V3V4, 341F-804R region, and the forward/reverse reads are around 245bp. So it's because the primer sets using or it's soil DNA caused only FWD reads are the best case you've got? I used powersoil kit (Qiagen) to extract my DNA.

I have used both the PowerSoil and a phenol:chloroform extraction method - the PowerSoil does seem to give me better quality DNA, especially after I add some powdered skim milk to the bead-beating step. I'm sequencing reads on a MiSeq v3 600, also with the 341F-804R primer pairs. The extractions I performed with the PowerSoil kit don't need the p-min drop to 4 to keep over 70% of reads after DADA2. But I do just use the FWD reads, the REV reads I usually get are just not good quality. :confused:

Hi Qiime2 fellas,

I'm working with similar data (DNA extracted with PowerSoil, same primer pair, plus a organellar blocking primer set, Miseq sequencing),

I also lost a lot of data during dada2 step (see image below) even after trying all variations of metaparameters (FYI, I truncated fwd at 290 and rev 260, trimmed first 10 bp, this last one helped a lot),

I might sound mean, but with the combination of protocols we used, apparently we get low throughput, I even got a sample which only a 37% survived after all dada2 process.

Cheers,
Luis Alfonso.

Hi @Xinming_Xu,

Just checking in on this! Were the suggestions from @16sIceland and @WeedCentipede helpful for you, or do you need further assistance with this?

Cheers,
Liz

Hi Liz,

Sorry for the late reply. I decided to use no truncation and minimum overlap value even though it still loses many reads. But most of the samples still remain 20,000 reads. I assume it's still enough. Thanks for all your help!

1 Like

An off-topic reply has been split into a new topic: DADA2: losing a high number of reads after filtering

Please keep replies on-topic in the future.

Hi @timanix
Thank you for the information you provided in your answers, I'm new to qiime and bioinformatics and you have helped a lot. I was wondering if you could help me out with this same issue I am having.

So I used the 515F and 806RB primers and The V4 region is ~252 nts long. Add 39 (515F=19, 806R=20) bases of primers and you are up to 291 nts. I only have 2x150 = 300 nts of total sequencing, so just ~9 nts of overlap.

With the default -p-min-overlap setting I was getting near 0% merging, I then tried -p-min-overlap 6 and I'm getting much more merging. I tried truncating at 129 for forward read and 130 for reverse read, the merging is 60% to 80% but I have a lot of 40% and it is worrying me so I tried to truncate at 0 and I am getting much better merging. the quality after trimming the primers using cutadapt was not bad, do you think I should use trunc =0? or should I lower the min overlap some more and does lowering the minimum overlap affect the result negatively?

here are some pictures:
Truncating at 129 and 130

truncating at 0

quality of trimmed reads

length summary
length summary

Thank you!

Hi @Nayla_Higazy ,
Looks you already managed to get it working! If disabling truncating parameter by providing a 0 gives you better output, you should use it. Your last output looks fine to me as it is already and you can proceed with it.

If you still want to try to adjust the settings, I would:

  1. Set trunc parameter to 0 for forward reads and 140 for reverse with 4 bp as overlap
  2. Run with 0 for both trunc parameters and 4 as overlap
  3. Choose the best parameters from your settings you already run and a new ones.
1 Like

@timanix Thank you so much for your help and for replying so fast! In that case, I think I will proceed with my data right away.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.