Dear all,
I did some small research on this topic. I hope I present it clear enough. I have tested different options according to what others said in similar topics. And so it goes (sorry for a long post)
- dada2, PE reads, default Phred score = 2
qiime dada2 denoise-paired
--i-demultiplexed-seqs FastQ-LB18_31-demux.qza
--p-trim-left-f 8
--p-trim-left-r 8
--p-trunc-len-f 240
--p-trunc-len-r 240
--p-n-threads 0
--p-chimera-method consensus
--o-representative-sequences rep-seqs-dada2-LB18_31-default.qza
--o-table table-dada2-LB18_31-default.qza
--o-denoising-stats stats-dada2_LB18_31-default.qza
Result: stats-dada2_LB18_31-default.qzv (1.2 MB)
- dada2, PE reads, chimera - none
qiime dada2 denoise-paired
--i-demultiplexed-seqs FastQ-LB18_31-demux.qza
--p-trunc-q 20
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-n-threads 0
--p-chimera-method none
--o-representative-sequences rep-seqs-dada2-LB18_31-nochim.qza
--o-table table-dada2-LB18_31-nochim.qza
--o-denoising-stats stats-dada2_LB18_31-nochim.qza
Result: stats-dada2_LB18_31-nochim.qzv (1.2 MB)
- dada2, only forward reads
qiime dada2 denoise-single
--i-demultiplexed-seqs FastQ-LB18_31-demux-R1.qza
--p-trim-left 8
--p-trunc-len 240
--p-n-threads 0
--p-chimera-method consensus
--o-representative-sequences rep-seqs-dada2-LB18_31-R1-noQS.qza
--o-table table-dada2-LB18_31-R1-noQS.qza
--o-denoising-stats stats-dada2_LB18_31-R1-noQS.qza
Result: stats-dada2_LB18_31-R1-noQS.qzv (1.2 MB)
Also, I did with another Phred score selection. Chimera filtering stayed 'consensus' in all
--p-trunc-q 20; stats-dada2_LB18_31-R1.qzv (1.2 MB)
--p-trunc-q 10; stats-dada2_LB18_31-R1-10.qzv (1.2 MB)
Additionally, I counter proof all above I used another method for denoising deblur in case of forward reads only
qiime deblur denoise-16S
--i-demultiplexed-seqs FastQ-LB18_31-demux-R1.qza
--p-trim-length 240
--o-representative-sequences rep-seqs-deblur.qza
--o-table table-deblur.qza
--p-sample-stats
--o-stats deblur-stats.qza
deblur-stats.qzv (198.3 KB)
What I noticed is:
- what @thermokarst and @jnesme mentioned - it may be one of reasons to decrease the --p-trunc-q value to default version. Yet I always thought that 20 is some kind of a threshold that at first I should check. I dont think yet it is a main reason for such a huge read loss.
- In other option I send you here I think it may be a problem with merging of the read so I proceeded with analysis of only forward reads
- deblur denoising showed that reads-raw and reads-derep vary a lot. Here, 'click' it was said that if they are not similar it may mean I have singletons. Could you confirm this? All in all the result is similar to forward read only denoising, --p-trunc-q 20
What is the reasonable % of reads that pass good filtering (if the sequencing is good quality offcourse)?
I calculated for all the % of reads that passed denoising. It looks this way:
Top chart is analysis of forward reads R1 in different parameters, bottom chart is forward R1 and reversed R2 reads