Help with DADA2-quality filtering

Hi everyone. I'd like to ask for an opinion on the DADA2 parameters.

I am using QIIME2 for the first time. Right now I am just wondering since after the DADA2-quality filtering I lost a lot of data.

My files can be found here (dropbox-link) EDIT: I uploaded the files here, away from my dropbox.

At first (OLD-files) I used values for trim 6 and trunc-len 202 for both forward and reverse reads.
Then I tried trim 6, trunc-len 212 for forward reads and 143 for reverse reads (NEW-files). Still doesn't look good.

I would be grateful to get an opinion about which parameters I should use. Thank you in advance.

EDIT: Here are the original files to be viewed.
OLD-table.qzv (334.9 KB)
demux.qzv (285.9 KB)
NEWEST-table.qzv (450.3 KB)

Hi @hennihi,
Thank you for posting! Indeed, even the “NEW-files” are not looking very good. Your “NEW” trimming parameters seem sensible, given the quality plots, so I do not think that parameter choice is necessarily the issue here.

What primers are you using? What is the total length of the expected amplicon? If you trim your dada2 reads too much, the forward and reverse may not overlap enough to join reads, leading to such a low number of output reads. If that is the case, you may need to just use the forward reads and discard the reverse reads, which are much lower quality (as often occurs).

You could also attempt to run dada2 just on the forward and just on reverse reads; if you get a higher output of reads, that could indicate that the issue is occurring during the read joining stage.

@benjjneb, do you have any other ideas what may be going wrong here?

2 Likes

Agree with @Nicholas_Bokulich on the key questions: Are your primers on your reads? And what is the length of the region you are amplifying?

This is because the cause is probably:

(1) Unremoved primers on reads -> spurious chimera detection -> few reads making it through pipeline.

OR

(2) Truncated reads not long enough to overlap in middle -> failed merging -> few reads…

See also the discussion here: Lost of data with dada2

3 Likes

Thank you for the answers. I can’t answer all your questions, but I will contact the sequencing service to get the details. However I can say that the primers are on the reads, that’s all I can say right now. I will get back to you when I get the answers.

I also performed the dada2-quality control again with new values (trim 6, trunc-len 277) and put the files to the same folder (file names start wit 1). Does it look any better?

HI @hennihi,
Thanks for the partial update — those new results look much better, which seems to suggest that insufficient read overlap was at least part of the issue you were having previously.

However, it sounds like you could improve your yield even more by following @benjjneb’s advice and removing primers from the reads. This can be performed by setting trim to the length of the primers on each read.

Please let us know the other details when you hear from the sequencing center, and if primer trimming solves your issue.

Good luck!

1 Like

It’s also worth verifying that there is no “heterogeneity spacer” at the 3’ end of your primers when using fixed primer triming length. Many sequencing centers will use one to maximize yield per run.

1 Like

Hi all and sorry for the long pause. I finally got the information from the sequencing centre, and re-run the dada2 quality control (the files that start with NEWEST), looks better I think. What do you think, could I continue further with analyzing the data?

1 Like

Hi @hennihi,
Yes, these new data look a lot better! It looks like you have many more features per sample, and a much better looking observed frequency for these features.

To help others with the same issue who are reading this forum post, would you mind letting us know what changes you made to improve quality? E.g., did any of @benjjneb, @jnesme, or my own hints solve your problem?

Thanks!

1 Like

This is great to hear! I trimmed the primers as @benjjneb suggested, then carefully chose the trunc-len-values so that the overlap is sufficient. Those are the results!

Thank you everyone for your help! And good luck with everyone with the same problem :slight_smile:

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.