dada2 low percent merged in some samples

Nick_Jeffery · December 7, 2022, 8:15pm

Dear qiime2 users,
I have a problem with losing a large number of reads to dada2 merging despite >90% of my reads passing the filtering step. This problem is similar to other threads I have read, but changing my truncate lengths has not helped at all. For reference, I am sequencing 12S eDNA with the MiFishU forward and reverse primers in qiime2-2022.2 using the dada2 plugin, after trimming primers in cutadapt.

I'm mostly confused by why some samples have a good merging and non-chimera statistic, though most don't. Why would truncation parameters work well for some samples but not others?

I can successfully use just the forward reads instead of merging paired reads, but would like to understand why merging isn't working well with my input. Thanks in advance.

Dada2 denoising stats:

My code:
qiime cutadapt trim-paired
--i-demultiplexed-sequences 12S-combined-demux.qza
--p-cores 40
--p-front-f GTCGGTAAAACTCGTGCCAGC
--p-front-r CATAGTGGGGTATCTAATCCCAGTTTG
--p-error-rate 0.11
--p-discard-untrimmed True
--p-match-read-wildcards
--p-match-adapter-wildcards
--p-minimum-length 100
--o-trimmed-sequences 12s-demux-trimmed.qza
--output-dir trimmed
--verbose

qiime dada2 denoise-paired
--i-demultiplexed-seqs 12s-demux-trimmed.qza
--p-trunc-len-f 128 \ #have tried values from 110 to 140 here
--p-trunc-len-r 120
--p-n-threads 0
--p-pooling-method independent
--output-dir Denoised2
--verbose

colinbrislawn · December 8, 2022, 1:26am

That is strange!

Would you be willing to post your quality score plots so we can take a look?

EDIT: Is this a variable-length amplicon?

Nick_Jeffery · December 9, 2022, 2:09pm

Hi @colinbrislawn,
Do you prefer these quality score plots or something from FastQC? I pasted the screenshot from QIIME here:

I don't think the amplicon is variable-length - the primers target a roughly 170bp amplicon within a 12S variable region, though there may be some variation of +/- 10 nucleotides - I'll look into this. If it is in fact variable length, does this mean selecting a shorter truncate length might be better? I tried as short as 110 truncation, but it didn't help with the merging stat.

Thank you for your help!

EDIT: Wondering if this could be because this is from the NovaSeq? dada2 has worked fine in the past with MiSeq data, but with Novaseq data dada2 only uses 1 sample to learn error rates. This may not be an issue for merging, but here is the dada2 information as it starts running:
R version 4.1.3 (2022-03-10)
Loading required package: Rcpp
DADA2: 1.22.0 / Rcpp: 1.0.8.3 / RcppParallel: 5.1.5

Filtering ....................................................................................................
Learning Error Rates
150913332 total bases in 1300977 reads from 1 samples will be used for learning the error rates.
140505516 total bases in 1300977 reads from 1 samples will be used for learning the error rates.
Denoise samples

colinbrislawn · December 9, 2022, 4:29pm

Yes. Novaseq uses binned quality scores, and this is an issue with DADA2.

That's also an issue, as we want representative data for training.

Merging and processing with another pipeline other than dada2 may be worth trying. Deblur comes to mind.

Nick_Jeffery · December 13, 2022, 9:28pm

Thanks for your reply,
I know about the dada2 and Novaseq issues but i was under the impression it simply wouldn't run and am still confused about some samples merging well while others don't.
For now though I'll try deblur or putting just the forward reads through dada2 which seemed to work well for another marker of mine.

system · January 14, 2023, 3:28am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.