DADA2: %filtered and %merged very low

Hello all,

I know this has been posted about several times and I have gone through many forum posts with the same issues, but I was hoping someone might be able to help me out on this. Any help would be very much appreciated.

Basically, my issue is low %filtered and %merged about running dada2. I'm looking at V3V4 regions, so I think I understand that my sequences should at least be (805-341)+20 = 484bps.

So to begin with, the quality scores of my reverse reads are not great:

But I decided to plow on with paired denoising because as someone pointed out, sequencing is expensive, so let's try to use everything I have first. These were my trimming and truncating parameters:

qiime dada2 denoise-paired
--i-demultiplexed-seqs Baywide_microbiome/all_samples/Baywide-full-demux-paired-end.qza
--p-trim-left-f 18
--p-trim-left-r 18
--p-trunc-len-f 290
--p-trunc-len-r 200
--p-n-threads 48
--o-representative-sequences Baywide_microbiome/all_samples/trim290f-200r/full-rep-seqs-dada2-trim290f-200r-18left.qza
--o-table Baywide_microbiome/all_samples/trim290f-200r/full-table-dada2-trim290f-200r-18left.qza
--o-denoising-stats Baywide_microbiome/all_samples/trim290f-200r/full-stats-dada2-trim290f-200r-18left.qza

Which I realise fails the overlapping required since 290-18+200-18 = 454. Which might explain my %merging issue. But I don't think I can lower the truncation any more because of my not great reverse reads. Here's a screenshot of my dada2-stats:

I've sort from the lowest, but basically my %filter range is from 0 - 71.27%. I wonder about the first four samples, which had very little input to begin with. But even discounting them, my %filter range is from 52.58 - 71.27%.

So I thought, ok fine, let's just try the forward reads then. I used basically the same parameters because I figured the forward read looks pretty good.

qiime dada2 denoise-single
--i-demultiplexed-seqs Baywide_microbiome/all_samples/only_forward/Baywide-full-demux-single-end.qza
--p-trim-left 18
--p-trunc-len 290
--p-n-threads 48
--o-representative-sequences Baywide_microbiome/all_samples/only_forward/rep-seqs-dada2-trim18-290.qza
--o-table Baywide_microbiome/all_samples/only_forward/table-dada2-trim18-290.qza
--o-denoising-stats Baywide_microbiome/all_samples/only_forward/stats-dada2-trim18-290.qza

However, that didn't really solve my %filtered problem, with it still ranging from 19.17 - 77.75%. I guess the bottom range has improved, but why not the top range? :persevere:

I am pretty new to all this and have been consulting a more experience user about all this as well. But figured extra eyes will definitely help as well.

I am running QIIME 2021.2.0 on Linux (Ubuntu).

Thanks everyone!
Baywide-full-demux-paired-end.qzv (322.9 KB) full-stats-dada2-trim290f-200r-18left.qzv (1.2 MB) Baywide-full-demux-single-end.qzv (296.7 KB) stats-dada2-trim18-290.qzv (1.2 MB)

Hi @ymt89!

I think you can expand the reverse reads quite a bit, personally I would try truncating at 240.

I think its safe to say those first 4 samples aren't recoverable, so that puts the effective range at 67%-77%, which I think is perfectly reasonable (IMO), and isn't cause for concern.

2 Likes

Hello @thermokarst ! Thanks so much for replying.

I've tried your suggested parameters. I'm guessing you were happy to keep everything else and just truncate at 240? Hopefully, cause that's what I did haha. It unfortunately hasn't helped my dada2 stats. If, like you said, we ignore the first 4 inputs, my %filter is still 19.42 - 59.92% and %merge 3.73 - 33.11% :weary:

full-stats-dada2-trim290f-240r-18left.qzv (1.2 MB)

@ymt89 - you'll have to play around with this a bit to figure out the best path forward - if you haven't already read the DADA2 paper and docs, I highly recommend you start there:

http://benjjneb.github.io/dada2/