I’ve been struggling with dada2 for the past weeks and while i think i have a basic understanding of the program, i am unable to get dada2 to not filter out >50% of my reads.
I’m running paired-end 18S illumina reads, the adapters and primers are trimmed off leaving 300 bp paired end reads. The quality plots look as follow:
I’m using a subset of two samples out of nine to configure dada2 parameters.
I entered the raw reads into dada2 without any pre-filtering, i ran the following command:
qiime dada2 denoise-paired --i-demultiplexed-seqs paired-end-demux.qza --p-trunc-len-f 220 --p-trunc-len-r 160 --p-n-threads 4 --o-denoising-stats stats.qza --o-table dada-table.qza --o-representative-sequences rep-seqs.qza --verbose
The first sample had 60k reads, the second 70k reads, after filtering. ~45k of both were left, after denoising ~27k and after merging only 9k. So i decided to try the forward read.
I ran the following command for denoise single:
qiime dada2 denoise-single --i-demultiplexed-seqs paired-end-demux.qza --p-trunc-len 220 --p-n-threads 4 --o-denoising-stats stats.qza --o-table dada-table.qza --o-representative-sequences rep-seqs.qza --verbose
This led to the following stats:
I’m still over 50% of my data during the dada2 step it seems denoising is the biggest factor in the forward read run, but i am unable to improve the numbers by tweaking the filter and truncation parameter. I’m starting to feel the problem may be with the data itself, but i have no clue on where the problem may be. Any help on the issue is much appreciated. I have uploaded the files to my dropbox, they can be downloaded with the following link: