DADA2: losing a lot of reads

EDIT: While waiting for my post to be approved I made the parameters much lower (110 & 110) and now 99.99% of my reads are passing filter. This feels like maybe I swung too far the other direction. I'd appreciate any advice on setting these truncation lengths

Hi all, I'm super new to QIIME and testing out merging reads on a single sample (18S V9 sequences, average 167bp in length after adapter trimming in fastp. Reads are paired end 2x300bp) Here is my code:

qiime dada2 denoise-paired
--i-demultiplexed-seqs /home/areaume/miseqruns/18SCh1/clean/qiime-import-sample5.qza
--p-trim-left-f 17 --p-trim-left-r 24
--p-trunc-len-f 190 --p-trunc-len-r 100
--p-n-threads 18
--o-denoising-stats /home/areaume/miseqruns/18SCh1/clean/dns
--o-table /home/areaume/miseqruns/18SCh1/clean/table
--o-representative-sequences /home/areaume/miseqruns/18SCh1/clean/rep-seqs

I'm losing a lot of reads during filtering:

After reading a few threads, I understand this is likely due to my truncation length. I've tried editing the parameters a few times but without much luck. Here are my Q Score plots.

What can I do to improve my reads passing filter?

Hi @areaume,
Can you provide the updated command that resulted in 99.99% of the reads passing the filter, along with the .qzv file for DADA2's denoising stats for both of the runs?

Thanks!

Hi Greg,

Here is the new command:

qiime dada2 denoise-paired
--i-demultiplexed-seqs /home/areaume/miseqruns/18SCh1/clean/qiime-import-sample5.qza
--p-trim-left-f 17 --p-trim-left-r 24
--p-trunc-len-f 110 --p-trunc-len-r 110
--p-n-threads 18
--o-denoising-stats /home/areaume/miseqruns/18SCh1/clean/dns2
--o-table /home/areaume/miseqruns/18SCh1/clean/table
--o-representative-sequences /home/areaume/miseqruns/18SCh1/clean/rep-seqs

And here are the .qzv files for my first and second attempts:
dns.qzv (1.2 MB)
dns2.qzv (1.2 MB)

I've been working with only one sample from my run at the moment because I'm still troubleshooting.

Thanks!

Hi @areaume,
Thanks for passing these on. First off, it looks like you're in good shape now with your current parameter settings so I recommend using those to process all of the samples in that run (rather than just the single sample you used for testing). A couple of other thoughts:

  1. I think your trim parameters might be unnecessary as it looks like you have very high quality base calls at the beginning of your sequences. And you mentioned trimming adapters prior to running DADA2, so I'm guessing that's not an attempt to remove adapters/primers? If you are doing that to remove primers, I recommend using qiime cutadapt trim-paired instead as it's better targeted to adapter/primer removal.
  2. I'm surprised that you were losing so many reads due to the filter in your first attempt. The quality seems a little low when you get out to 190 bases, but nothing that I would be very concerned about yet. Generally I tend to look for where the median quality crosses some threshold, such as Q30 or Q25, but there are not hard-and-fast rules for where trimming / truncating should be performed. Have you tried intermediate values for those parameters as well to see what the longest you can go is without seeing a big drop in the number of reads that are retained?
  3. Since most of your reads are now passing the filter and are being merged, it probably doesn't matter a whole lot whether you set a higher value than 110 for the truncation length. When joining paired end reads, you mostly need the read length to be able to join the read pairs. If you can avoid the trimming however, that might give you slightly longer sequences, which may be slightly more informative when you do taxonomic assignment.
1 Like

Thanks Greg, your insight is appreciated! I'll try running all my samples and testing some intermediate values.
Also, yes the trimming parameters were for primers, I believe fastp only removed the adapters. I'll use the command you suggested instead.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.