DADA2: loss too many reads after filtering

Hi!
I would try to run it again but this time with several sets of lower truncation values to compare results with one you got and decide based on it how to proceed. As you can see, you are also loosing reads on merging step and truncation of bad quality ends of reads may improve it.

1 Like

Hi, timanix!

Thanks for your suggestions! I have two questions now after I read through some other posts.

  1. So based on the below post, it explains --p-trunc-len discard sequence greater than the value it gives, which in my case, it discard sequence longer than 244 instead of discard sequence shorter than 244? If I understand correctly? I got confused by this post and your reply, which is opposite answer to me.

You're exactly correct, good catch! I'll make sure we get that documentation updated.

  1. I actually run it again with lower truncation values, like

--p-trunc-len-f 230
--p-trunc-len-r 240 \

and

--p-trunc-len-f 240
--p-trunc-len-r 240 \

All get worse results, and with the lower value I give to the forward reads, I only got less than 10% reads shown as non-chimeric.

In case you also need to check the stats file, I will upload it here.
denoising-stats_230_240.qzv (1.2 MB) denoising-stats_240_240.qzv (1.2 MB) denoising-stats_no_truncation.qzv (1.2 MB)

What other parameters I can set to get rid of low-quality reads? Also, what causes the worse results after I lower truncation values since it should be better after I truncate the bad quality reads?

Again, really appreciate your help!

Nope, it will discard reads shorter than given value and truncate longer reads to this value

  1. Could you try to run it with 230 for both forward and reverse reads? If you worry about overlap region in the latest version of Qiime2 you can specify it to a lower value than 12.

You lost a lot of reads when you applied 240/240 since a lot of your forward reads are shorter than 240 and where discarded.
230/240 looks similar to the stats with no trimming.
I will suggest to try 230/230 to see if it will improve stats and choose one with best outputs.

Hi timanix,

I checked other forum post, if I run it with 230 for both sides, I think I will have 4bp gap, right?

(forward read) + (reverse read) - (length of amplicon) - = overlap
230+230-464 = -4 bp,
then my reads cannot merge at all because they don't overlap.

Or, should I discard truncation and lower the value of overlap region which may save the reads loosing on merging step?

1 Like

In that case, probably it is an optimal solution! Decreasing minimum overlap with disabled truncation may save some reads :clap:.

Hi! Sorry for bothering you again!

I tried to use the minimum overlap parameter, I set --p-min-overlap 4, actually the result doesn't change too much. It is basically the same as the no truncation result. Seems I can't do anything else to improve my results? Do you have any suggestions? Thanks!

Hi!
I am running out of ideas :thinking:

Based on your results, you are still loosing reads on merging step after filtering. From one side, you have enough reads to proceed, from another - you results may be biased towards bacteria with shorter amplicons.
So option 1 will be just proceed with disabled truncation by Dada2 with a risk of biased data.
Option 2 is to use only forward reads. This way you will keep most of the reads but taxonomy annotations will be performed with shorter reads.
Option 3 is to try to merge your reads with vsearch plugin, where you can not only decrease overlapping region but also allow some mismatches and then denoise by Deblur to see if it will improve the output.

2 Likes

Hi @Xinming_Xu!
Got a hint that you also may take a look on Figaro tool (not in Qiime2) that may help determine best truncation parameters.

Hi, really appreciate for all your suggestion! I've seen people using figaro, so first I give a shot to this method.

So, the recommended forward truncation position is 238 and the recommended reverse truncation position is 249.

[
    {
        "trimPosition": [
            238,
            249
        ],
        "maxExpectedError": [
            2,
            3
        ],
        "readRetentionPercent": 81.36,
        "score": 76.36009224060184
    },

Then I run it on dada2, also I indicate maxExpectedError to 2 and 3, here is what I got

No reads passed the filter. trunc_len_f (238) or trunc_len_r (249) may be individually longer than read lengths, or trunc_len_f + trunc_len_r may be shorter than the length of the amplicon + 12 nucleotides (the length of the overlap). Alternatively, other arguments (such as max_ee or trunc_q) may be preventing reads from passing the filter.

Then I realize, maybe it's because I forgot to set the primer length in figaro as I'm using the original data without trimming. But it still recommend me the same truncation position.

Figaro seems like a great tool to know the truncation value, but I got confused about the result I got when running dada2.

Hi @Xinming_Xu
So I will suggest to proceed with option 2 or try option 3 to see if it will improve amount of reads.

2 Likes

Hi, I've had a similar problem with some soil DNA. What I found worked best was to use only FWD reads and adjust the p=min to 4. Then I was able to keep more than 70% of reads after DADA2, whereas before adjusting I was only keeping between 30-40%. Interestingly, this is only happening for certain soil samples, other soil samples I collected in a different location are working much better. Just out of curiosity, what kind of soil/DNA extraction method are you using?

1 Like

Hi,

Thanks for your suggestion, I used V3V4, 341F-804R region, and the forward/reverse reads are around 245bp. So it's because the primer sets using or it's soil DNA caused only FWD reads are the best case you've got? I used powersoil kit (Qiagen) to extract my DNA.

I have used both the PowerSoil and a phenol:chloroform extraction method - the PowerSoil does seem to give me better quality DNA, especially after I add some powdered skim milk to the bead-beating step. I'm sequencing reads on a MiSeq v3 600, also with the 341F-804R primer pairs. The extractions I performed with the PowerSoil kit don't need the p-min drop to 4 to keep over 70% of reads after DADA2. But I do just use the FWD reads, the REV reads I usually get are just not good quality. :confused:

Hi Qiime2 fellas,

I'm working with similar data (DNA extracted with PowerSoil, same primer pair, plus a organellar blocking primer set, Miseq sequencing),

I also lost a lot of data during dada2 step (see image below) even after trying all variations of metaparameters (FYI, I truncated fwd at 290 and rev 260, trimmed first 10 bp, this last one helped a lot),

I might sound mean, but with the combination of protocols we used, apparently we get low throughput, I even got a sample which only a 37% survived after all dada2 process.

Cheers,
Luis Alfonso.

Hi @Xinming_Xu,

Just checking in on this! Were the suggestions from @16sIceland and @WeedCentipede helpful for you, or do you need further assistance with this?

Cheers,
Liz

Hi Liz,

Sorry for the late reply. I decided to use no truncation and minimum overlap value even though it still loses many reads. But most of the samples still remain 20,000 reads. I assume it's still enough. Thanks for all your help!

1 Like