qiime dada2 denoise-paired parameters adjust

sdpapet · February 7, 2022, 8:41pm

I need to denoise a paired ended library (MiSeq 300bpX2). The sequencing center told us the quality was not good when they did the QC. So, I know my sequencing results may be bad.

I can't use the normal denoise parameters as I used to. I want to save more reads. I even didn't trim any (--p-trim-left-f, --p-trim-left-r both 0). However, I only have <5% reads after denoising.

I am wondering if it is possible for me to adjust any other default settings to get more reads after QC? If so, which one should I adjust.

Also, I suppose this step will be automatic pair (reverse complement) to find the paired reads/overlaps, as I don't find any settings related to complement.

Thanks

Mehrbod_Estaki · February 8, 2022, 1:06am

Hi @sdpapet,
Can you please provide us with some more information:

What is the region you have sequenced and what is the expected overlap region
Can you share with us your demux visualization, or at least a screen-shot of the quality score plot?
Are all non-biological reads (including primers) already removed from your reads?
Can you share with us your dada2 stats visualization artifact?

Finally, recall that if the quality reads are poor in your reads, the quality tends to drop from the 3' tails, meaning that the trim-left which removed from the 5' won't have any benefits for that. The trunc parameters removes from the 3' which is probably what you want to be using. But all of this depends on your data which we currently don't really know anything about.
Also, if your quality scores are bad you can always just use your forward reads and that way you won't have merging issues.

sdpapet · February 8, 2022, 1:38am

Hi just use the common 515F-806R primer sets. please see attached file.

demux plots.zip (235.0 KB)

Mehrbod_Estaki · February 11, 2022, 8:38pm

Hi @sdpapet,
Looks like some of our communication was lost during the recent forum outage.
To recap the parts that were cut off:
The zip file you sent above were PDFs of the demux summary visualizations and were cut off so I couldn't look through them. I asked that you send the actual artifacts instead, which you did provide in a .zip file. By the way, QIIME 2 artifacts are already just regular "zipped" files with a .qza/.qzv extension, so you don't actually need to zip them to send. I re-asked about the biological reads questions, which you still haven't replied, and that might be an important point in solving your issue.

sdpapet · February 11, 2022, 9:14pm

Hi, Thank you for your reply. I am sorry about missing your questions.

Before using Dada2 work flow (qiime dada2 denoise-paired). I ran qiime demux emp-paired. I supposed this script would remove all non-biological reads such barcodes and primers.

I did almost same as this tutorial does.

https://docs.qiime2.org/2021.11/tutorials/atacama-soils/

Mehrbod_Estaki · February 11, 2022, 11:13pm

Hi @sdpapet,

According to that protocol, the demultiplexing step is not removing anything from your reads because the barcodes were in a separate file than your F/R reads to begin with. That means you can still have other non-biological sequences in your reads such as primers, spacers/padding etc. depending on the design. For DADA2 you certainly want to remove your primers from your reads before denoising. This is why I was asking for them, you should check with whoever prepared your samples and/or your sequencing facility to see what has been done with the reads exactly after sequencing.

As per your original questions:

I actually wouldn't call this bad quality to be honest, this is actually pretty clean and typical from a MiSeq run. And the fact that you have paired-end reads gives you lots of room to truncate those poor quality tails.

As for your DADA2 step, you are losing lots of reads at the initial filtering step:

I looked at your provenance and noticed that your truncating parameter is set to 300 for both F and R reads. This is why you are losing so many reads at that initial step, because you are essentially telling DADA2 to not get rid of any of your poor quality tails and because the quality there is bad it drops them before it even gets to the denoising step. With your primer set you have lots of overlap so try truncating more of your reads. For example, start with truncating with something like F=240 & R=200. See how that goes, then you can fine-tune this from there.

Just as a heads up, fine-tuning these DADA2 parameters is one of the most commonly asked questions on the forum so there are tons of posts you can browse through regarding how to determine your trim/truncate parameters and how to troubleshoot low DADA2 outputs.

Good luck!

sdpapet · February 13, 2022, 4:27pm

Hello, Thank you very much for the input. Just have some quick follow-up questions.

1>Non-biological reads: It's the first time for me to use this Sequencing center, but I will check if they have removed the primers or not. My formal sequencing center has always removed it. I didn't think about this. I received three files from this new
metadata.tsv (8.0 KB)
sequencing center: Forward fastq file, Reversed fastq file and barcodes.fastq. It's exactly same as your tutorials (“Atacama soil microbiome” tutorial — QIIME 2 2021.11.0 documentation)

I supposed they have extracted the barcodes already. As for the primers, if they didn't removed primer sequences, is there QIIME scripts that can remove the primers for me? I have checked the Dada2 workflow settings, but there is no option on this.

2>Thanks for give the suggestions on the QC trim settings. I did notice the poor quality at the end of cycles, but I didn't trim anything. Since the Sequencing center told me the library quality is poor and the sequencing can be poor, I thought I shouldn't trim anything to save more reads even. Also, I am worry about the that I cannot pair the forward and reverse reads, if I trim too much (lack of overlaps) or uneven trim (e.g. F=240 & R=200).

PS, I did try the ~ F-240 and ~ R=200 setting, but the I sill lose a lot of reads (see attached stats file). After Dada workflow, most of samples will only lose 20% - 30% of total reads. Can you tell me how many reads (percentage) you usually get after Dada2 workflow.

I know this library (DNA quality) is poor. I just want to make sure this is because I didn't run QIIME2 properly. So, I can re-sequence it.

Thanks again,

Mehrbod_Estaki · February 14, 2022, 5:17am

Hi @sdpapet,

It may be, but as far as we know you just have 3 files like in the tutorial, what is exactly in your reads can differ depending on what has been done to your reads after sequencing. You certainly want to make sure of this. DADA2 assumes primers are removed.

With DADA2 you can trim the 5' of your reads using the trim parameters, however, you can only use a fixed number here so if your primers are a constant length it is probably ok to remove it with DADA2, however sometimes primers can come with heterogeneity spacers and that can add variable length that DADA2's trim won't be able to deal with.
I prefer doing all my pre-DADA2 processing with q2-cutadapt.

I personally don't think think is a poor quality run, especially for the typical V4 primers on a 2x300 run, but people have different standards

While counter-intuitive, this can actually have the reverse effect because the more poor quality sequences you keep in your reads, the more reads DADA2 drops during its quality filtering step, as was the case in your first run. You essentially want to truncate as much as your reads possible from both F/R reads without compromising the merging which with DADA2 default requires 12nts.

Your new run does again look to have lost a lot of reads and the DADA2 output is lower than I would expect, but this time the issue is coming at a different step. You can see that this time you are actually retaining most of the your reads passed the filtering step, but you have a major drop during the actual denoising step, at least in most samples. This is peculiar to me, could be related to primers not being removed or something else, hard to say. Some samples look better than others too, can you think of any biological or technical reason for this? For ex. what is the difference between sample C1-T1R6-C C1-T1R6-A? What type of samples are these? How was the DNA quality and amplification process?
That being said, in some of your samples a big chunk of the reads are being lost at other steps like the merging, which begs the question are you sure these are the 515F-806R V4 primers, and not V3-V4?

Can you also please share the actual DADA2 stats artifact, ending in .qzv file, instead of the underlying data that you have unzipped and shared as a text file? That way we have access to the provenance tab and we can actually check the parameters used. It makes troubleshooting much easier.

Let us know once you hear from your sequencing facility.

system · March 17, 2022, 11:18am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.