Stuck on DADA2 denoising

demux.qzv (292.6 KB)
trimmed_sequences.qzv (296.8 KB)
Hello,
I have paired end, multiplex sequences, with barcodes. Illumina adapters were ligated on after PCR, therefore some of the reads on the R1 file are reverse reads that contain no barcode and some of the reads in the R2 file are forward reads to do contain a barcode.

Based on this forum, I have concatenated R1 + R2 → “forward” reads and R2 + R1 = “reverse” reads as follows:
cat Mary-RS-16s-72314_S1_L001_R1_001.fastq Mary-RS-16s-72314_S1_L001_R2_001.fastq > forward.fastq
cat Mary-RS-16s-72314_S1_L001_R2_001.fastq Mary-RS-16s-72314_S1_L001_R1_001.fastq > reverse.fastq

I then imported and demultiplexed these data using the following commands:

qiime tools import --type MultiplexedPairedEndBarcodeInSequence --input-path ~/Riley/AXRS/Fastq --output-path ~/Riley/AXRS/Fastq/multiplexed-seqs.qza

qiime cutadapt demux-paired
--i-seqs ~/Riley/AXRS/Fastq/multiplexed-seqs.qza
--m-forward-barcodes-file ~/Riley/AXRS/RS_mappingfile_SCFAs_completed_study_only_w_responders.tsv
--m-forward-barcodes-column BarcodeSequence
--p-error-rate 0
--o-per-sample-sequences ~/Riley/AXRS/Demux-0/demux.qza
--output-dir ~/Riley/AXRS/output-dir-0

I tried running DADA2 denoising but, some of the sequences in the rep-seqs file generated from this step still seemed to contain barcodes and primer sequences. I therefore took the output from the demux-paired command and tried using the cutadapt trim-paired command to find and delete the adapters using:

qiime cutadapt trim-paired
--i-demultiplexed-sequences ~/Riley/AXRS/Demux-0/demux.qza
--p-front-f GTGTGCCAGCMGCCGCGGTAA
--p-error-rate 0
--output-dir ~/Riley/AXRS/Demux-0/CutAdapt

I looked at the raw output of this and saw that it was successful (couldn't find the adapters in the sequences).

qiime demux summarize
--i-data ~/Riley/AXRS/Demux-0/CutAdapt-0/trimmed_sequences.qza
--o-visualization ~/Riley/AXRS/Demux-0/CutAdapt-0/trimmed_sequences.qzv

I tried running dada2 denoising on these trimmed sequences as follows but it just gets stuck at “denoise remaining samples”
qiime dada2 denoise-paired
--i-demultiplexed-seqs ~/Riley/AXRS/Demux-0/CutAdapt-0/trimmed_sequences.qza
--p-trim-left-f 21
--p-trunc-len-f 205
--p-trim-left-r 21
--p-trunc-len-r 206
--o-representative-sequences ~/Riley/AXRS/DADA2-0/rep-seqs-dada2.qza
--o-table ~/Riley/AXRS/DADA2-0/table-dada2.qza
--output-dir ~/Riley/AXRS/DADA2-0
--verbose
R version 3.4.1 (2017-06-30)
Loading required package: Rcpp
DADA2 R package version: 1.6.0

  1. Filtering ........................................................................

  2. Learning Error Rates
    2a) Forward Reads
    Initializing error rates to maximum possible estimate.
    Sample 1 - 509215 reads in 113988 unique sequences.
    Sample 2 - 111937 reads in 29605 unique sequences.
    Sample 3 - 114140 reads in 31532 unique sequences.
    Sample 4 - 423736 reads in 99168 unique sequences.
    selfConsist step 2
    selfConsist step 3
    selfConsist step 4
    selfConsist step 5
    selfConsist step 6
    selfConsist step 7
    Convergence after 7 rounds.
    2b) Reverse Reads
    Initializing error rates to maximum possible estimate.
    Sample 1 - 509215 reads in 99541 unique sequences.
    Sample 2 - 111937 reads in 28305 unique sequences.
    Sample 3 - 114140 reads in 28180 unique sequences.
    Sample 4 - 423736 reads in 114631 unique sequences.
    selfConsist step 2
    selfConsist step 3
    selfConsist step 4
    selfConsist step 5
    selfConsist step 6
    Convergence after 6 rounds.

  3. Denoise remaining samples ......................

I have attached the qzv files from before and after trimming for reference. Any advice on what I'm doing wrong?

Thank you,
Riley

Hey there @rlhughes!

I don't think you're doing anything wrong, maybe you just need to wait for DADA2 to do its thing... :mantelpiece_clock:

I read through the summary of your workflow, everything seems to make sense to me. I think you should consider giving DADA2 enough time to do what it does. If you haven't seen this note about runtime estimates, please take a peek:

https://benjjneb.github.io/dada2/bigdata.html#how-long-does-it-take

Looking at your demux summary, there are over 25 million reads in this dataset, which certainly seems like a lot to me! Of course, there are duplicates, since you did the concatenation business, but still...

Keep us posted! :clock10:

I did leave it for more than 12 hours but I will try again and get back to you!

1 Like

Just glanced at your command again - looks like you aren't taking advantage of the --p-n-threads parameter --- setting this to a value other than the default should provide a speedup, provided you have the appropriate computational resources available.

Hi Matthew,
I waited 48 hours and it was still stuck so I aborted and tried running again with --p-n-threads 0. After almost 24 hours it is still stuck on that last step of Denoise remaining samples....the dada2 denoising worked before I concatenated the R1 and R2 files so do you think there is any way to run my samples without this step? I realized that the hyperlink didn't transfer in my original post but here is the forum post where I got the idea to concatenate the samples (Problem with demux - #6 by Nicholas_Bokulich)

It finally finished! Here is the table.qzv file table.qzv (984.3 KB)

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.