How are the numbers of denoised reads counted in denoising-stats of dada2?

I have a question about the counts of number of denoised reads in the denoising-stats of dada2.

When I run dada2 in R as shown in the dada2 tutorial (https://benjjneb.github.io/dada2/tutorial.html), different numbers of denoised reads were obtained for forward and reverse reads (see below), as dada2 denoises the forward and reverse reads separately.

DADA2 in R

sample input filtered denoisedF denoisedR merged nonchim
sample1 71446 68040 64057 66183 58193 57484
sample2 75665 72427 68460 70667 62922 62207
sample3 67449 64553 61193 62852 55972 55399
sample4 63607 60688 57677 59132 53403 52997
sample5 35311 31254 28333 23661 17436 16806

.
.

However, when I denoised the same dataset using qiime dada2 denoise-paired command, output stats file showed only single column for the number of denoised sequence (see below).

DADA2 in QIIME2-2020.2

sample-id input filtered % of input passed filter denoised merged % of input merged non-chimeric % of input non-chimeric
sample1 71446 68040 95.23 64082 58171 81.42 57344 80.26
sample2 75665 72427 95.72 68472 62829 83.04 61901 81.81
sample3 67449 64553 95.71 61196 55945 82.94 55244 81.9
sample4 63607 60688 95.41 57675 53327 83.84 52836 83.07
sample5 35311 31254 88.51 28352 17573 49.77 16958 48.02

.
.
.
In the results of dada2 in R, sample1-4 showed higher number of denoised forward reads (denoisedF), whereas sample 5 had higher number of denoised reverse reads (denoisedR)

The QIIME2 output seems to report only the number of the denoised forward reads.
Is it correct? How are the denoised sequences counted in QIIME2?

I used dada2 v1.14.1 in R and QIIME2-2020.2.

Thanks,

1 Like

Hello @yuuhirose,

Welcome to the forums! :qiime2:

Yeah, those number are varying a bit between dada2 and the q2-dada2 plugin. :thinking:
Let's see if @benjjneb can comment on these differences.

Colin

P.S. I put your number into markdown tables. Sorry for the edit.

1 Like

Correct, in Q2 only the forward denoised count is reported.

You do see a small difference between the denoised forward read numbers in the plugin and your R output, but small differences like that are expected due to the differences in the version of the R package being used by the plugin (1.10) and the one you are using in R (1.14).

3 Likes

@benjjneb Thank you for the response.

The numbers of denoised forward and reverse reads (and their percentages) may help trouble shooting in q2-dada2. These numbers give us information that the reduction of reads occurs in denoising or merging processes. If the reduction occurs in the merging process, I can change overlap length and trimming position of reads. If the reduction occurs in the denoising process, and proper parameters are used for quality filtering, the reduction seems to be difficult to solve.

In my samples, read quality of sample 5 was much lower compared with samples 1-4. Changes of overlap length (30, 50, 70bp) did not substantially improve the reduction of merged reads. I will remake the DNA library of this sample and try additional run.

Please excuse me for asking another related question. Which step removes the singletons (sequences appeared only 1 time across all the samples) in q2-dada2?

2 Likes

That occurs during denoising.

(which reminds me, I need to finish up adding pseudo-pooling to the plugin).

4 Likes

Thank you very much !!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.