DADA2 filtered out half of my 16s sequences

Manuss_Ponce · September 25, 2018, 11:42am

Hello,

I'm looking for some advise regarding how DADA2 filtered my 16s data (attached files).
I followed that configuration of DADA2, after taking a look at demux-paired-summary.qzv:

qiime dada2 denoise-paired --i-demultiplexed-seqs UPSTREAM/paired-end-demux.qza --p-trunc-len-f 280 --p-trunc-len-r 223 --p-trim-left-f 19 --p-trim-left-r 22 --p-n-threads 0 --o-table resultados-dada2/table.qza --o-representative-sequences resultados-dada2/rep-seqs.qza --o-denoising-stats resultados-dada2/stats-dada2.qza

I see that my sequences dropped dramatically (from 4M to 2M) after the filtering proccess. I tried again after setting the p-trunc-len-r parameter to 200, but I obtained worse results (due to merging issues, logically).
I don´t understand why DADA2 filtered out so much. I didn't find what's wrong with my data.

I would appreciate some help! Thanks in advance

demux-paired-summary.qzv (290.5 KB)
table.qzv (389.8 KB)
stats-dada2.qzv (1.2 MB)
rep-seqs.qzv (535.4 KB)

ebolyen · September 27, 2018, 10:35pm

Hey @Manuss_Ponce,

Thanks for the files! It looks like your reads are not a very consistent length. At your current trunc params, there’s only about 50% of the random subsample used by the demux summary left, so it’s not surprising that we see similar results from the full set of data in the stats-dada2 table.

What kind of sequencing platform did you use for this, what is your primer pair, and what (if any) processing was done to the reads prior to QIIME 2 import?

Manuss_Ponce · October 1, 2018, 7:06am

Hi, thanks for the answer! Our sequencing department used a Miseq platform (Nextera library protocol). Primer sequences target the v3-v4 region of 16s (amplicon expected length after merging pairs: 460bp). Prior to qiime2, I have only noticed the use of Prinseq lite to demultiplex sequences.

ebolyen · October 5, 2018, 6:45pm

Hey @Manuss_Ponce,

That all sounds great!

I suspect prinseq lite is trimming your reads based on quality, is there a way to use it only for demultiplexing?

That is the kind of input which will work best with DADA2 since it is able to use the quality information (up to a point) to correct the sequences instead of discarding the base-calls (and then you should have no problem during the merge step).

system · November 6, 2018, 12:45am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.