DADA2 Stopping But Job Still Running

msirven · July 2, 2019, 5:28pm

Just as a warning I am a beginner with all of this stuff.

I am having issues with DADA2 stopping while running. I am running this on my university's server on 20 cores, 1 node using 2560 MB/CPU. It is odd because the output files (rep seqs, table and stats) will be created after 4 hr of running or so, but then if you ssh into the node I am running on and use the top command, the job will no longer appear after ~4 hr and it says the CPUs are not being utilized. At the beginning of the job the %CPU will be utilized however, and you see that the CPUs are utilizing R. What is strange is that the wall clock on my job on the server continues, it will not stop running the job even though nothing is happening (I think)!

I have 36 paired end reads that come demultiplexed and I have tried changing the truncation parameters. I believe I have set it so there is enough overlap for them to merge and also keep the quality high. I have also run one sample individually and that job actually completes successfully within a few hours. I have also made taxonomy bar charts, a beta diversity PCA and alpha diversity box plots with the output that is produced and I have data that maps to all of my samples, however I am fearful that it is not complete.

I've loaded some photos of my quality scores as well as the output from the table.qzv file that is generated from all of my samples after 4 hours within the job running. Something that may be odd is that some of my samples have 3x + the sequence counts than others??...

The following is the code:

#BSUB -J FullSampleDenoise #Set the job name to "ExampleJob2"
#BSUB -L /bin/bash #Uses the bash login shell to initialize the job's execution environment.
#BSUB -W 96:00 #Set the wall clock limit to 6hr and 30min
#BSUB -n 20 #Request 10 cores
#BSUB -R "span[ptile=20]" #Request 10 cores per node.
#BSUB -R "rusage[mem=2560]" #Request 2560MB per process (CPU) for the job
#BSUB -M 2560 #Set the per process enforceable memory limit to 2560MB.
#BSUB -o Denoisefullsample.%J #Send stdout and stderr to "Example2Out.[jobID]"FICATIONS

module load Anaconda/3-5.0.0.1
source activate qiime2-2018.8
module load R_tamu/3.4.2-intel-2017A-Python-2.7.12-default-mt

qiime dada2 denoise-paired
--i-demultiplexed-seqs demux-paired-end-full062819.qza
--p-trim-left-f 20
--p-trim-left-r 20
--p-trunc-len-f 280
--p-trunc-len-r 220
--p-n-threads 20
--o-representative-sequences rep-seqs-full062819.qza
--o-table table-dada-full062819.qza
--o-denoising-stats stats-dada-full062819.qza

I am also wanting to put the --verbose flag but I'm not sure where the output is produced since I'm running this on another server.

Any help would be much appreciated.

thermokarst · July 2, 2019, 5:59pm

Hey there @msirven! It sounds to me like q2-dada2 is done and has created the resulting outputs.

Hmm, this sounds to me like either an issue with the job submission itself, or, maybe something deeper in the queuing system itself. Does this only happen on the q2-dada2 jobs?

msirven · July 2, 2019, 7:29pm

Hello @thermokarst

Thanks so much for your quick response.

What is odd is that when I ran a smaller sample set (1 sample) with the same job specifications it said it completed successfully and the wall clock stopped. This issue seems to only occurs when I run the whole sample set. I’m not sure if this happens only with dada2.

My fear is that it is stopping somewhere and not able to finish the files? Or will the output files not be created if there is an error?

Thanks,

Maritza

thermokarst · July 2, 2019, 11:17pm

The QZA files won't be created unless DADA2 completed successfully. I think you are good to go...