Qiime2 demux summary

Hi All,
I got a paired end sequencing data from the sequencing center that used 2x300 cycle kit on an NS2K using the 515F and 806R primers. I used cutadapt to trim my primer sequences following which I reran demux again and I got the attached.
Please help me understand the demultiplexed summary on why I have such less reads in the 2% and 9% bins.
Just using the the positional cutoffs based on the quality plots (for example: truncating at 285 F and 240R) DADA2 fails at the initial step with less than 1% of the reads passing the filter.
Any help/suggestions on where I might have gone wrong?
Since this is the V4 region with the 600-cycle kit, I have more than enough reads to make sure there is overlap!

Hello @lcbio,

From the command line help text for the --p-trunc-len-f option from qiime dada2 genoise-paired action:

Position at which forward read sequences should be
truncated due to decrease in quality. This truncates
the 3' end of the of the input sequences, which will
be the bases that were sequenced in the last cycles.
Reads that are shorter than this value will be
discarded.
After this parameter is applied there must
still be at least a 12 nucleotide overlap between the
forward and reverse reads. If 0 is provided, no
truncation or length filtering will be performed

You can see from your demux summary that the 91st percentile forward read length is 283. That's why --p-trunc-len-f 285 is dropping so many reads.

1 Like

This was just an example, even when I leave the parameters to 0, I am getting only about 25-60% reads to pass the filter but again lose 80% of these post chimera removal. I am pretty sure I don't have any non biological sequence in the dataset, unless somehow the forwards reads are running into the reverse reads and that is potentially causing an issue?
Thoughts?
Thanks

Hello @lcbio,

Would you mind sharing the dada2 stats qzv?

1 Like

rem_denoising-stats_2.qzv (1.2 MB)

Hey,
Just checking on if you guys have any suggestions?
Thanks

Hello @lcbio,

I don't see any obvious errors with your analysis here. You can read the docs for the dada2 action here, and try adjusting the --p-chimera-method and --p-min-fold-parent-over-abundance options to see how they affect chimera removal. It's always possible that there are a large number of true PCR chimeras in your data. Another option you have is to use only your forward reads because the V4 region you amplified will be mostly covered by them.

Thanks @colinvwood I tried the --p-min-fold-parent-over-abundance option and set that to 8 and had a negligible amount of chimeras being filtered out (pass percentage >80% of the reads). My main worry now is that have I somehow tricked DADA2 to allow any true chimeras to pass the filters (forgive my lack of knowledge in this regard, I am still learning and haven't been able to find good resources explaining this).
Another option I tried was when I truncated my sequences all the way to 220 in forward and 160 R and got a decent pass percentage (>50%) but again not a lot of chimeras being filtered out. With this approach my concern is losing the shorter sequences that might be biologically relevant!

Hello @lcbio,

I tried the --p-min-fold-parent-over-abundance option and set that to 8 and had a negligible amount of chimeras being filtered out (pass percentage >80% of the reads). My main worry now is that have I somehow tricked DADA2 to allow any true chimeras to pass the filters

I wouldn't say you've tricked dada2 to allow true chimeras to pass the filters, but that you've chosen to make a different tradeoff between sensitivity and specificity. The chimera detection algorithms aren't perfect. In fact, after glancing through some dada2 source code, this minimum-fold-parent-abundance parameter defaults to 2 everywhere I saw (not sure why we wrap it as default=1).

You can always investigate this chimera issue more thoroughly by using other bespoke chimera-detection tools on the dada2 output sequences for a second opinion.

I am still learning and haven't been able to find good resources explaining this

If you search on this forum for "min fold parent over abundance" there are lots of interesting discussions.

Another option I tried was when I truncated my sequences all the way to 220 in forward and 160 R and got a decent pass percentage (>50%) but again not a lot of chimeras being filtered out.

Is this a typo? Did you mean to say that you "again got a lot of chimeras being filtered out"? If not this is good news, right?

With this approach my concern is losing the shorter sequences that might be biologically relevant!

Not sure I follow this. The shorter the truncation lengths the more you retain the shorter sequences because of the point I mentioned above.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.