I have a question about the counts of number of denoised reads in the denoising-stats of dada2.
When I run dada2 in R as shown in the dada2 tutorial (https://benjjneb.github.io/dada2/tutorial.html), different numbers of denoised reads were obtained for forward and reverse reads (see below), as dada2 denoises the forward and reverse reads separately.
DADA2 in R
sample
input
filtered
denoisedF
denoisedR
merged
nonchim
sample1
71446
68040
64057
66183
58193
57484
sample2
75665
72427
68460
70667
62922
62207
sample3
67449
64553
61193
62852
55972
55399
sample4
63607
60688
57677
59132
53403
52997
sample5
35311
31254
28333
23661
17436
16806
.
.
However, when I denoised the same dataset using qiime dada2 denoise-paired command, output stats file showed only single column for the number of denoised sequence (see below).
DADA2 in QIIME2-2020.2
sample-id
input
filtered
% of input passed filter
denoised
merged
% of input merged
non-chimeric
% of input non-chimeric
sample1
71446
68040
95.23
64082
58171
81.42
57344
80.26
sample2
75665
72427
95.72
68472
62829
83.04
61901
81.81
sample3
67449
64553
95.71
61196
55945
82.94
55244
81.9
sample4
63607
60688
95.41
57675
53327
83.84
52836
83.07
sample5
35311
31254
88.51
28352
17573
49.77
16958
48.02
.
.
.
In the results of dada2 in R, sample1-4 showed higher number of denoised forward reads (denoisedF), whereas sample 5 had higher number of denoised reverse reads (denoisedR)
The QIIME2 output seems to report only the number of the denoised forward reads.
Is it correct? How are the denoised sequences counted in QIIME2?
Correct, in Q2 only the forward denoised count is reported.
You do see a small difference between the denoised forward read numbers in the plugin and your R output, but small differences like that are expected due to the differences in the version of the R package being used by the plugin (1.10) and the one you are using in R (1.14).
The numbers of denoised forward and reverse reads (and their percentages) may help trouble shooting in q2-dada2. These numbers give us information that the reduction of reads occurs in denoising or merging processes. If the reduction occurs in the merging process, I can change overlap length and trimming position of reads. If the reduction occurs in the denoising process, and proper parameters are used for quality filtering, the reduction seems to be difficult to solve.
In my samples, read quality of sample 5 was much lower compared with samples 1-4. Changes of overlap length (30, 50, 70bp) did not substantially improve the reduction of merged reads. I will remake the DNA library of this sample and try additional run.
Please excuse me for asking another related question. Which step removes the singletons (sequences appeared only 1 time across all the samples) in q2-dada2?