DADA2: filtered reads low

microme · August 24, 2019, 11:36pm

For my V4-V6 amplicon of about 549 bp, I ran the denoising step for my demux.qza with the following command :
qiime dada2 denoise-paired
--i-demultiplexed-seqs demuxV-trimmed.qza
--o-table tableV.qza
--o-representative-sequences rep-seqsV.qza
--o-denoising-stats Dada2-statsV.qza
--p-chimera-method consensus
--p-trim-left-f 5
--p-trim-left-r 12
--p-trunc-len-f 280
--p-trunc-len-r 280
--p-n-threads 16
--verbosedemuxV-trimmed.qzv (295.1 KB)

After this I have been loosing a substantial amount of reads after filtering. I have also tried readjusting the trunc-len value. And i have already followed the instructions from the past DADA2 filter based topics. The problem still persists.

While running the above commands, I received the following message:

Filtering The filter removed all reads: /local/tmp/tmpedyg6x93/filt_f/VE6F_79_L001_R1_001.fastq.gz and /local/tmp/tmpedyg6x93/filt_r/VE6F_174_L001_R2_001.fastq.gz not written.
Some input samples had no reads pass the filter.

(Just so you know) Since I also seem to lose reads in the chimera removal stage, I tried using the –p-min-fold-parent-over-abundance float. I still have been obtaining the same value for the filtered reads. Could you please suggest anything? Thanks in advance.

timanix · August 25, 2019, 7:59am

Hi! Probably you are cutting it too strict and you don’t have an overlapped region, so try to run it with longer reads (increase 280 to 290).
Outside of the qiime2 in Dada2-R you can decrease minimum overlapped region as well.

Mehrbod_Estaki · August 25, 2019, 8:30pm

Hi @microme,
Welcome to the forum and thanks for providing us with your .qzv file. The big loss in your data comes right at the initial filtering step and not merging the paired-end. In looking at your quality scores it looks as though they start to drop in quality well before the 280 bp position you’ve selected which means many of your reads are getting discarded before denoising even occurs. Usually in this scenario we would say to reduce your trunc parameter but since this is a very long region that will unfortunately mean you won’t be able to merge them. My advise would be to discard the reverse reads and just use your forward.

microme · August 30, 2019, 11:28am

Thank you for the suggestion. I used the forward reads for all my samples (which were more than 100).
For the re-seqs.qza file that was generated, they didn't have any link to the ncbi database.
Since I was unable to figure out the problem as my sample sequence quality is variable, I selected a subset of samples, to also reduce the waiting time.
So I used the following commands on the forward reads of the few samples:
qiime dada2 denoise-single
--i-demultiplexed-seqs demuxSE.qza
--o-table tableSE.qza
--o-representative-sequences rep-seqsSE.qza
--o-denoising-stats Dada2SE.qza
--p-chimera-method consensus
--p-trim-left 0
--p-trunc-len 180
--p-n-threads 16
--verbose

Although I think reads don't look bad, but reps-seqs do not have the direct link to ncbi database, which indicates there is an issue in any of my steps.
So this time I used the trunc-q parameter with a score of 20:
qiime dada2 denoise-single \

--i-demultiplexed-seqs demuxSE.qza \
--o-table table22.qza \
--o-representative-sequences rep-seqs22.qza \
--o-denoising-stats Dada.qza \
--p-trunc-len 0 \
--p-trunc-q 20 \
--p-trim-left 5 \
--p-chimera-method consensus \
--verbose

This is what happens:

•R version 3.5.1 (2018-07-02)

•Loading required package: Rcpp

•DADA2: 1.10.0 / Rcpp: 1.0.1 / RcppParallel: 4.4.2

•1) Filtering ......

•2) Learning Error Rates

•50250848 total bases in 369342 reads from 6 samples will be used for learning the error rates.

•3) Denoise samples ......

•4) Remove chimeras (method = consensus)

•5) Report read numbers through the pipeline

•6) Write output

I still got

and .
I have tried changing The result doesn't change
I also gave a try using paired end reads of these few samples using the commands below:
qiime dada2 denoise-paired \

--i-demultiplexed-seqs demuxtrimmed.qza \
--o-table table.qza \
--o-representative-sequences rep-seqs.qza \
--o-denoising-stats Dada2.qza \
--p-chimera-method consensus \
--p-trim-left-f 6 \
--p-trim-left-r 18 \
--p-trunc-len-f 280 \
--p-trunc-len-r 280 \
--p-n-threads 16 \
--verbose

And this was happening:
•R version 3.5.1 (2018-07-02)

•Loading required package: Rcpp

•DADA2: 1.10.0 / Rcpp: 1.0.1 / RcppParallel: 4.4.2

•1) Filtering ........

•2) Learning Error Rates

•11596776 total bases in 42324 reads from 8 samples will be used for learning the error rates.

•11088888 total bases in 42324 reads from 8 samples will be used for learning the error rates.

•3) Denoise remaining samples ........

•4) Remove chimeras (method = consensus)

•6) Write output

And got this as DADA2 stats and reps-seqs:

I also did a modification of my above commands changing the trunc-len to 230 and even 280, I still get the same problem. Would you be able to direct what problem is going on here? And what can I do here?

microme · August 30, 2019, 11:28am

Sorry, I attach here the demux summary of my single end reads.
demuxSE.qzv (293.7 KB)

Mehrbod_Estaki · August 30, 2019, 6:45pm

Hi @microme,

Hmm, this is a bit odd, could you share the rep-seqs.qza file with us please?

Aside from this your DADA2 stats look good, on its own there doesn't seem to be a reason for concern.

In the second scenario, you don't truncate manually but instead rely on the q-20 trimming meaning you cut your reads at the first instance of q<20. You end up with more reads, however, as you can see in your rep-seqs file you end up with some very short reads which are not very useful at all and will need to be discarded anyways. If you were to set a minimum length parameter I bet you will end up with even more similar results between the 2 runs. Generally speaking though, the default methods of using max expected Error is superior over quality scored based trimming, so I would stick with your 1st run.
In your 3rd scenario, we are back right to the beginning where you are losing most of your reads in the initial filtering step because the quality in the tails are tanking and you are truncating much alter.

microme · August 30, 2019, 7:19pm

Thank you. Please find attached the .qza file. This file (single end reads) is the one where I had set a quality score.
rep-seqs22.qza (30.6 KB)

Incase for more, here is another file (single end reads) for which I did not use a quality score.
rep-seqsSE.qza (170.9 KB)

Mehrbod_Estaki · September 1, 2019, 4:01am

Hi @microme,
I had no problem converting either of those files into rep-seqs.qzv with hyperlinked visualization (ex rep-seqsSE.qzv (548.6 KB) ). I'm guessing you were using qiime metadata-tabulate which still does visualize your rep-seqs table but it doesn't have the hyperlink feature. For that you'll want to use qiime feature-table tabulate-seqs instead. This is what the Moving Pictures tutorial uses too if you are following that.

system · October 2, 2019, 10:01am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.