DADA2 and quality filtering?

Hi, I was trying to better understand why we don't use the q-score quality filter plugin when using Dada2 and came across this thread and your post below. Still, there must be just some occasional, really crappy sequences that should be removed from the data set. In the dada2 plugin, I was wondering if " --p-max-ee FLOAT" gets rid of really poor quality sequences? I am not sure what a default value of 2 means. Is this the estimated number of errors? Is the q-score of the read used at all (that seems like a good indicator)?

1 Like

Hi @jessicalmetcalf,

dada2 uses the q-scores to model error frequency in your reads, and remove reads with > max-ee predicted errors. So yes, dada2 does use the q-scores, yes, it does remove very low-quality sequences, and yes, max-ee can be used to adjust your predicted error threshold.

I am not 100% certain how quality-filter would impact this (have not benchmarked it myself), but pre-filtering very noisy reads would:

  1. At best, be an entirely redundant step creating an unnecessary intermediate file, since those reads will be filtered anyway
  2. At worst, losing those erroneous reads might disturb the error model. Whether this could lead to more stringent or permissive filtering, I just don't know.

@benjjneb, do you have any advice to add?

5 Likes

This. The current denoise-single and denoise-paired functions are full workflow calls, which includes an initial error filtering step controlled primarily by the max-ee parameter. So pre-filtering is redundant.

Also, in the case of paired reads, pre-filtering will break denoise-paired if the filter doesn't maintain matching between filtered forward/reverse reads.

Yep, exactly. That's the estimated number of errors based on the quality scores in the read. You can read more on that here: Expected errors predicted by Phred (Q) scores

5 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.