I'm looking for some advise regarding how DADA2 filtered my 16s data (attached files).
I followed that configuration of DADA2, after taking a look at demux-paired-summary.qzv:
I see that my sequences dropped dramatically (from 4M to 2M) after the filtering proccess. I tried again after setting the p-trunc-len-r parameter to 200, but I obtained worse results (due to merging issues, logically).
I don´t understand why DADA2 filtered out so much. I didn't find what's wrong with my data.
Thanks for the files! It looks like your reads are not a very consistent length. At your current trunc params, there’s only about 50% of the random subsample used by the demux summary left, so it’s not surprising that we see similar results from the full set of data in the stats-dada2 table.
What kind of sequencing platform did you use for this, what is your primer pair, and what (if any) processing was done to the reads prior to QIIME 2 import?
Hi, thanks for the answer! Our sequencing department used a Miseq platform (Nextera library protocol). Primer sequences target the v3-v4 region of 16s (amplicon expected length after merging pairs: 460bp). Prior to qiime2, I have only noticed the use of Prinseq lite to demultiplex sequences.
I suspect prinseq lite is trimming your reads based on quality, is there a way to use it only for demultiplexing?
That is the kind of input which will work best with DADA2 since it is able to use the quality information (up to a point) to correct the sequences instead of discarding the base-calls (and then you should have no problem during the merge step).