Dada2 denoise-paired running time over 12 hours for 16 samples

I am test running dada2 denoise-paired for a subsample of 16 samples (~4,5 Gb) bacterial 16s sequences on Miseq ( I have 350 samples in total). I imported the demultiplexed files like this, the qza file is 981M:

qiime tools import **

--type 'SampleData[PairedEndSequencesWithQuality]' **

--input-path “my path” **

--output-path bobcat_sas_paired-end-demux.qza **

--source-format PairedEndFastqManifestPhred33

I inspected the files:

image

And I am running dada2 denoise-paired for more than 12 hours, is this normal run time, it seems longer than reported from others. Could it be a problem with how I imported the files to qiime"?? What can I expect the running time for the full sample set of 350 samples? I did not get any error yet, however I dont see any output from verbose as I had to relaunch the terminal window. ( I am running qiime through remote access).

(qiime2-2018.6) ubuntu@aerosol0:~/mdw/Line_files$ qiime dada2 denoise-paired \

--i-demultiplexed-seqs bobcat_sas_paired-end-demux.qza \

--o-table bobcatsastable.qza \

--o-representative-sequences bobcatsas_rep-seqs.qza \

--o-denoising-stats bobcatsas_denoising-stats.qza

--verbose --p-n-threads 0 --p-trunc-len-r 0 --p-trunc-len-f 0 &

I am trying to run Dada2 denoise-paired for more than 12 hours using several theads with a subsample of 16 samples. My real set consists of 350 samples. currently using 425 % CPU :confused: I imported the demultiplexed files like this:

qiime tools import **

–type ‘SampleData[PairedEndSequencesWithQuality]’ **

–input-path “mypath” **

–output-path bobcat_sas_paired-end-demux.qza **

–source-format PairedEndFastqManifestPhred33

The 16 samples are ~4,5 Gb and the resulting .qza file is 981M.

I inspected the file:

image

And then ran

(qiime2-2018.6) ubuntu@aerosol0:~/mdw/Line_files$ qiime dada2 denoise-paired \

–i-demultiplexed-seqs bobcat_sas_paired-end-demux.qza \

–o-table bobcatsastable.qza \

–o-representative-sequences bobcatsas_rep-seqs.qza \

–o-denoising-stats bobcatsas_denoising-stats.qza

–verbose --p-n-threads 0 --p-trunc-len-r 0 --p-trunc-len-f 0

Unfortunately the remote conncetion broke, so I am not seeing any output from verbose, however dada2 is still running, based on other reports, it seems like the running time is unusually long. What can I expect for the run time for the large set of 350 samples? Could it be something with the import to qiime? any suggestions would be helpful.

Hi @Linevmo,
As you have seen discussed across the forum, runtime depends on many many things — see here. So other users’ experiences may not extrapolate to your own.

If it is still running, it is still running — you will most likely get an error (e.g., a memory error) if it fails. 12 hours is not a long time for dada2 — even for a subset of samples if those samples have many sequences. I would recommend just waiting it out… 2-3 days of runtime are not unusual.

Sorry I don’t have a more concrete response! I just recommend being a little more patient on this one…

If you wanna track the command from the verbose I suggest you running it with nohup. That’s what I am doing. And, yes, Dada2 takes long.

nohup all_the_command > report.txt 2> error.txt &

a) Nohup is redirecting all the standard output (verbose) to the report.txt file. You can keep checking the file to see if there is some verbose output.
b) 2> this tells the system to redirect any error output to the error.txt file so if there is any problem with your run it will be written in this file.

hope it helps

1 Like

Thanks for the response. (and I am sorry for double posting. I thought the first post was not send!) Dada2 has stopped running now (after almost 48h). However I only got the output file table.qza? I did not got any error messages.

Thanks for the hint about nohup. I will certainly use that the next time!

That sounds highly unusual — there would be an error message (and no outputs) if anything went wrong. Please double-check and give us more details on the command you ran and the location of the output file.

The output table.qza file was in the same folder were I ran the command. I did not see any error messages printed, could it be because I ran it in the background? I guess I will just rerun it and print verbose and error output to files.

I searched for all .qza files with find . -type f -name *qza

ubuntu@aerosol0:~/Desktop/mdw$ find . -type f -name *qza
./Line_files/silva-132-99-nb-classifier.qza
./Line_files/bobcatsas_qiime2analysis/bobcat_sas_paired-end-demux.qza
./Line_files/bobcatsas_qiime2analysis/table.qza

I suspect something went sour here --- good call on re-running everything.

Keep us posted.

1 Like

I re -ran the samples and it all worked out fine. :slight_smile:

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.