loosing reads after dada2 filteration

Dear forum memebers,

I need your kind help to figure out why I loose almost 90% of reads in 10 samples out of 30 after dada2 filteration step.


trimmed-demux.qzv (319.9 KB)

I did the calculations as follow

Bacterial 16S amplicon V3/V4 (341_F/805_R)

805-341 equals 464 length-of-amplicon
trunc-len-r + trunc-len-l - length-of-amplicon = overlap
280 + 200 - 464 = overlap
16 = overlap
16 bp gap!! :frowning:

Please see attached as ref. the quality of my forward and reverse reads.

Many thanks in advance.

1 Like

805-341 equals 464 length-of-amplicon
trunc-len-r + trunc-len-l - length-of-amplicon = overlap
280 + 200 - 464 = overlap
16 = overlap

That's correct! The reads are expected to overlap by 16 basepairs.
(There is no gap, which is good!)

It looks like many of your reads are merging and there are thousands of reads in most samples, so this may be okay!

If I had data like this, I would continue with analysis and return to this step if I discovered issues later.

@colinbrislawn thank you so much for your quick response!

Unfortunately the 8 samples I loose at the dada2 filteration step is important for my study. They all have over 30k reads but turn into less than 1k after dada2 filteration step.

Any way I could diagnoise why I loose them although I should have enough overlap for merging?

Yes. Review how the DADA2 algorithm works.

Because filtering is the very first step, it's pretty easy to change DADA2 filtering settings and see how many reads now pass the filter!

Unfortunately the 8 samples I loose at the dada2 filteration step is important for my study.

Are these samples related in some way?
Is there any biological reason they may be different from the other samples, like low biomass?

no biological reason, failed samples r random from different treatments while their replicates are ok. But with those failing i end up with one replicate per treatment!

I tried --max_ee 7 with dada2 but same results, does this means it is not sequencing issue, if yes then what?

Using only forward reads runs ok but then my taxonomic resolution goes bad, right!

I tried to use deblur instead of dada2 following this tutorial. It looks like it works to join but eventually i loose like 90% after denoising with deblur but after joining all reads retained.

deblur-stats_l_427.qzv (214.7 KB)
deblur-table_427.qzv (432.0 KB)
merged-vsearch-seqs.qzv (303.9 KB)
filter-stats.qzv (1.2 MB)

here is steps i followed:

qiime vsearch  merge-pairs \
  --i-demultiplexed-seqs trimmed-demux.qza \
  --o-merged-sequences  merged-vsearch-seqs.qza \
  --o-unmerged-sequences unmerged-vsearch-seqs.qza

qiime quality-filter q-score \
  --i-demux merged-vsearch-seqs.qza \
  --o-filtered-sequences filtered-seqs.qza \
  --o-filter-stats filter-stats.qza

  qiime deblur denoise-16S \
      --i-demultiplexed-seqs filtered-seqs.qza  \
      --p-trim-length 427 \
      --o-table deblur-table_427.qza \
      --o-representative-sequences deblur-rep-seqs_427.qza \
      --o-stats deblur-stats_l_427.qza

however deblur stats file *qzv looks corrupted , run with this code, What is wrong here please?:

qiime deblur visualize-stats \
--i-deblur-stats deblur-stats_l_427.qza \
--o-visualization deblur-stats_l_427.qzv

..
my main question with this new approach , am I choosing wrong length to trim with deblur?

when i tried --p-trim-length 385 as that was the minimum length at subsampling I got much better results , but still not sure if that is the best choice to move forward with!
deblur-table_385.qzv (465.2 KB)

Extremely grateful for your assistance :smiling_face:

Okay, good to know!

Yeah, I don't see the data columns either. Perhaps you can try to run that command again and see if the rerun fixes the file?

I'm not very familiar with the deblur plugin, so I will not offer any advice here.

I think trying DADA2 denoise-single for just your forward reads is a good idea.

The taxonomic resolution may be reduced, but it's not 'bad'. The first amplicon studies using the Illumina miseq used <100 bp reads and they got published. Your forward reads alone are more then twice that.

Here is one last clue that may be related to your problem:

It's by benjjneb, who is the developer of DADA2, and related to the V3-V4 primers, like you are using:

If these 8 samples happen to have lots of microbes with the longer ~460 bp amplicon, then they need that extra length to merge.