DADA2 is running since 618 hours, Is this common while working with dada2?

Sai_Ravi_Chandra · March 25, 2019, 9:14am

Hello,
I'm new to qiime and I got stuck with this for a long time.
This is the command which I have used for running dada2:
qiime dada2 denoise-paired
--i-demultiplexed-seqs demux.qza
--p-trim-left-f 1
--p-trim-left-r 1
--p-trunc-len-f 100
--p-trunc-len-r 100
--p-n-threads 40
--o-table table.qza
--o-representative-sequences rep-seqs.qza
--o-denoising-stats denoising-stats.qza
These are the forward and reverse reads:

Help me out with this.

Thank you.

jwdebelius · March 25, 2019, 9:18am

Hi @Sai_Ravi_Chandra,

Dada 2 tends to run long, but looking at your command, it seems like it's running very long, but Ive also seen a week on multiple sequencing runs. I find Deblur tends to be much faster, if that's a concern.

I am concerned that even after it's done running you're going to have a fair number of failures to pair your samples. your revesre reads start to be low quality around bp 70, which will likely cause failure, and your forward reads seem to drop off around 90. So, watch for that. If it fails, Id recommend using the forward reads. That will also cut down on processing time, regardless of the algorithm you choose.

Best,
Justine

Sai_Ravi_Chandra · March 25, 2019, 11:17am

Hi @jwdebelius
Thanks for the reply.

As per your suggestion, I will try running using only forward reads and will get back to you.
One more thing I tried running deblur and it's still running for 20 hours. I don't know where I am going wrong.

Here are the things what I did from the beginning:
I have downloaded the files from sra(Sequence read archive).
Using fastq-dump --split-3 ERRxxx.sra I got 2 fastq files and each file has 33466587 reads.
I imported these fastq files using PairedEndFastqManifestPhred33.
Then I ran dada2 and deblur and still waiting for the output.
This is how my fastq data looks like:
@ERR476713.1 HWI-ST1018:8:1101:1326:2207 length=101
NGAATCATACACTACTTGAATTCTCAGGCCCGTAGCGTCCCTCGGTGCGCGGAGGGCCGCCCCTCTGCTGCCTTGTGGGGAAGACTCACGGGGAGGGGCCC
+ERR476713.1 HWI-ST1018:8:1101:1326:2207 length=101
#1=DDDEFHHHHHJJJJJIIIIIJJJJJIJJIDHIJJJJJIJJJI;CGIHHFDDDD@DDDDDD?BCDDDCDDDDCDCB<<?@?BCD@:ABBDB5>BDD9<B

I'm assuming that because of 33466587 reads in each file it's taking a lot of time for dada2 and deblur to give the output.

Correct me if I did anything wrong in the above-mentioned process.

Thanks.

jwdebelius · March 25, 2019, 11:20am

HI @Sai_Ravi_Chandra,

Im not sure about SRA downloads. However, if you've got 33M reads, it doesn't surprise me that it's a slow process. At that size, you'll probably be running for several days. Again, this depends on your computational resources and parallelisation.

Best,
Justine

Sai_Ravi_Chandra · March 25, 2019, 12:18pm

Hi @jwdebelius,
Thanks for the reply.

We are using a Linux server which has 320gb ram and it supports parallelization. As you have mentioned it will be a time-consuming process I'll run it for a couple of days and will get back to you. One more thing, I'm planning to run deblur only on forwards reads and can you suggest me a trim length for that? That would be really helpful.

Thanks.

jwdebelius · March 25, 2019, 1:35pm

Hi @Sai_Ravi_Chandra,

I would trim at 90 bp for your run.

Best,
Justine

Sai_Ravi_Chandra · March 25, 2019, 1:52pm

Hi @jwdebelius,

Thanks for the suggestion and will update in the discussion once I get the output.

Thanks for the help @jwdebelius.

system · April 25, 2019, 7:52pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.