Cutadapt and DADA2 QIIME2

I have a couple of questions regarding the Cutadapt and DADA2 pluggins for filtering samples.

Cutadapt:
I am using ‘qiime cutadapt trim-paried’ to trim my sequences.
https://docs.qiime2.org/2019.7/plugins/available/cutadapt/trim-paired/

Please could someone explain the following parameter commands:
(1)
–p-overlap [default = 3]
Require at least ‘overlap (eg 3 if default =3)’ bases of overlap between read and adaptor for an adaptor to be found.

I am unsure what the above statement really means so further clarification would be useful.

(2)
–p-discard-untrimmed [default = False]
I understand what this does but don’t understand why the default would be set at false - surely if you have sequenced something you only want only the reads that contain the primer adaptor sequence to be taken forward? Is there any situation where you would want to include sequences that didn’t have the primer?

(3)
–p-error-rate
proportion (0,1,inclusive_end=True). Maximum allowed error rate. [default = 0.1].

Is the above the same as quality filtering? Eg. does a default of 0.1 equate to a 1 in 1000 error rate allowance? If not what does this mean?
Eg Phred Quality Score = 30 --> 1 in 1,000 probability of incorrect base call --> base call accuracy of 99.9%. If this is correct is a sliding window approach being used here?

(4)
I then progress on to using DADA2:
qiime dada2 denoise-paired https://docs.qiime2.org/2019.7/plugins/available/dada2/denoise-paired/
Is there a quality score filtering step here?
I have read the DADA2 manuscript (Callahan et al, 2016) but am still unclear.

–p-max-ee-f NUMBER Forward reads with number of expected errors higher
than this value will be discarded. [default: 2.0]
–p-max-ee-r NUMBER Reverse reads with number of expected errors higher
than this value will be discarded. [default: 2.0]
If the reads were 250 bp in length and this was set at 2.0 would that equate to a quality score of 32.5
eg 1 error in 125 bases. Though I think this is just discarding the sequences if it comes across a 2nd error.

My own data results from cutadapt and DADA2:
(5) For two of my samples I end up with 0 reads after the DADA2 step.
eg
input reads: 206,853
output reads: 177,935
denoised reads: 176,508
merged reads: 0
non-chimeric reads: 0

This seems wrong. I have an ~435 bp amplicon size so I have trimmed to 250 bp to allow sufficient overlap. I plan to make sure I haven’t muddled up the barcodes of the f and r reads. Is there any other reason they might have failed like this on the DADA2 pipeline? The fastqc plots looked alright after cutadapt step.

Any advice on the above much appreciated, thanks.

Hi @LVC, regarding cutadapt, all parameter text and defaults are taken directly from cutadapt.

  1. https://cutadapt.readthedocs.io/en/stable/guide.html#minimum-overlap-reducing-random-matches
  2. The default is False in q2-cutadapt because the default is False in cutadapt.
  3. No, this is not the same as quality filtering. This has to do with the number of mismatched nts in the adapter sequence. AAAA vs AAAC are 25% different. Setting --p-error-rate 0.3 would make this a match, while --p-error-rate 0.1 would make this a miss.
  4. Yes, there is quality filtering here, but it isn’t related to max-ee. Check out the --p-trunc-q parameter.
  5. None of the reads are merging for that sample, which likely means there is insufficient overlap.

In the future please try and limit a post to one or two questions, this allows for faster response time and easier searching for future readers. Thanks! :t_rex:

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.