dada2 output: low %merged reads

jessica.song · July 7, 2020, 2:30pm

Hi,

I am currently working on a dataset of 2 x 300bp (Illumina MiSeq V3) 16S reads and am at the denoising step of the analysis. I have, however, encountered some confusion and was hoping to get some advice on the matter.

Raw demultiplexed sequences (primers 341f/785r) were imported and denoised using dada2 with the following parameters:

all-demux.qzv (290.3 KB)

qiime dada2 denoise-paired
--i-demultiplexed-seqs filepath/all-demux.qza
--p-trunc-len-f 280
--p-trim-left-f 17
--p-trunc-len-r 240
--p-trim-left-r 21
--p-max-ee-f 2.0
--p-max-ee-r 5.0
--p-trunc-q 0
--p-chimera-method 'consensus'
--p-min-fold-parent-over-abundance 4
--o-table filepath/table.qza
--o-representative-sequence filepath/repseqs.qza
--o-denoising-stats filepath/stats.qza
--verbose

These were the parameters with which I was able to obtain the most promising results, however, as you can see the %merged reads and %chimeric are still on the high side (36 - 82% merged and 20 - 65% chimeric).

metadata-280-240.tsv (6.0 KB)

I have also tried it with p-trunc-q 4 as well as with p-min-fold-parent-over-abundance 8 but the results were similar if not 'worse'.

My feeling is that there is a problem at the merging step but having tried already multiple truncation lengths, I am a little stumped as to how to further improve these findings. It might also be worth noting that I have attempted this with single reads as previously suggested in other forum threads and managed very good results:

metadata-280-single.tsv (4.9 KB)

I would greatly appreciate any advice you can give me on how to improve the output or if there was a fundamental mistake I've made given that I am still very new to this.

Thank you!

Mehrbod_Estaki · July 9, 2020, 9:55pm

Hi @jessica.song,
Welcome to the forum! And thanks for providing all your files and detail explanation of what you’ve done and the commands you ran. Always makes life easier.

I think everything you have done so far is pretty sensible and I’m not sure how much more improvements we can get. As it is, it looks to me that you have pretty decent coverage on most of your samples to move forward. The samples that are poor with ~30-40% final numbers tend to be problematic samples anyways with low staring filters. Maybe these are control or low biomass samples?

One thing that may be worth trying, if you haven’t already, is truncate more from the poor quality tails. This should improve a bit the # of reads passing the initial filter, as well as perhaps improve the quality of the overlap region for merging.

So based on your primer set I calculate: 444 bp amplicons, and you have 2x300=600 nt long reads. So we expect a 156 bp overlap, take into account a minimum of 12 nt for DADA2 merging + an extra 20-30 bp to take into account natural variation in the V3 region, that leaves up with roughly 124-134 bp to truncate safely. You currently are truncating a total of 80, which means we can afford to truncate another ~30-40 or so which may help improve the initial filtering step and merging.
So, give this a try and see how that goes:

keep trimming the same as I assume those are there to remove primers.
Truncate from Forward 270, truncate from reverse 220.
leave everything else to default. You can keep min-fold-over-parent 4 if you think DADA2 was discarding too many real reads as chimeras. I generally tend not to modify that unless I have reason to do so.

If that doesn’t improve things it might be easier to just move forward with your existing run.

Good luck, keep us posted!

system · August 10, 2020, 3:55am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.