How are the numbers of denoised reads counted in denoising-stats of dada2?

yuuhirose · April 16, 2020, 2:45pm

I have a question about the counts of number of denoised reads in the denoising-stats of dada2.

When I run dada2 in R as shown in the dada2 tutorial (DADA2 Pipeline Tutorial (1.16)), different numbers of denoised reads were obtained for forward and reverse reads (see below), as dada2 denoises the forward and reverse reads separately.

DADA2 in R

sample	input	filtered	denoisedF	denoisedR	merged	nonchim
sample1	71446	68040	64057	66183	58193	57484
sample2	75665	72427	68460	70667	62922	62207
sample3	67449	64553	61193	62852	55972	55399
sample4	63607	60688	57677	59132	53403	52997
sample5	35311	31254	28333	23661	17436	16806

.
.

However, when I denoised the same dataset using qiime dada2 denoise-paired command, output stats file showed only single column for the number of denoised sequence (see below).

DADA2 in QIIME2-2020.2

sample-id	input	filtered	% of input passed filter	denoised	merged	% of input merged	non-chimeric	% of input non-chimeric
sample1	71446	68040	95.23	64082	58171	81.42	57344	80.26
sample2	75665	72427	95.72	68472	62829	83.04	61901	81.81
sample3	67449	64553	95.71	61196	55945	82.94	55244	81.9
sample4	63607	60688	95.41	57675	53327	83.84	52836	83.07
sample5	35311	31254	88.51	28352	17573	49.77	16958	48.02
.
.
.
In the results of dada2 in R, sample1-4 showed higher number of denoised forward reads (denoisedF), whereas sample 5 had higher number of denoised reverse reads (denoisedR)

The QIIME2 output seems to report only the number of the denoised forward reads.
Is it correct? How are the denoised sequences counted in QIIME2?

I used dada2 v1.14.1 in R and QIIME2-2020.2.

Thanks,

colinbrislawn · April 16, 2020, 8:35pm

Hello @yuuhirose,

Welcome to the forums! :qiime2:

Yeah, those number are varying a bit between dada2 and the q2-dada2 plugin.
Let's see if @benjjneb can comment on these differences.

Colin

P.S. I put your number into markdown tables. Sorry for the edit.

benjjneb · April 17, 2020, 1:38pm

Correct, in Q2 only the forward denoised count is reported.

You do see a small difference between the denoised forward read numbers in the plugin and your R output, but small differences like that are expected due to the differences in the version of the R package being used by the plugin (1.10) and the one you are using in R (1.14).

yuuhirose · April 17, 2020, 3:21pm

@benjjneb Thank you for the response.

The numbers of denoised forward and reverse reads (and their percentages) may help trouble shooting in q2-dada2. These numbers give us information that the reduction of reads occurs in denoising or merging processes. If the reduction occurs in the merging process, I can change overlap length and trimming position of reads. If the reduction occurs in the denoising process, and proper parameters are used for quality filtering, the reduction seems to be difficult to solve.

In my samples, read quality of sample 5 was much lower compared with samples 1-4. Changes of overlap length (30, 50, 70bp) did not substantially improve the reduction of merged reads. I will remake the DNA library of this sample and try additional run.

Please excuse me for asking another related question. Which step removes the singletons (sequences appeared only 1 time across all the samples) in q2-dada2?

benjjneb · April 17, 2020, 8:13pm

That occurs during denoising.

(which reminds me, I need to finish up adding pseudo-pooling to the plugin).

yuuhirose · April 17, 2020, 11:44pm

Thank you very much !!

system · May 19, 2020, 5:44am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.