Hello,
I've been running a HiSeq data set 2x250bp with a forward and reverse read file each ~138GB. I know this is a huge chunk of data and there isn't much info on QIIME2 runs with this type of data set that I can find. But I'm currently on the dada2 denoise step and it has been running on our supercomputer cluster with 16 threads for just shy of 7 days.
Command used:
qiime dada2 denoise-paired --i-demultiplexed-seqs demux.qza --p-trim-left-f 0 --p-trunc-len-f 250 --p-trim-left-r 0 --p-trunc-len-r 250 --p-chimera-method consensus --o-representative-sequences rep-seqs-dada2.250.qza --o-table table-dada2.250.qza --p-n-threads 16 --o-denoising-stats denoising-stats-dada2-250.qza
I can't really figure out how to know exactly where the command is in its progress and I stupidly did not pass the --verbose command. I did find a temp file that says the following:
-
Filtering The filter removed all reads: /ddnB/work/areige1/Ch1Qiime/temp/tmpwqvw9sli/filt_f/G33MB.B_455_L001_R1_001.fastq.gz and /ddnB/work/areige1/Ch1Qiime/temp/tmpwqvw9sli/filt_r/G33MB.B_455_L001_R2_001.fastq.gz not written.
Some input samples had no reads pass the filter.
..........................................................................................................................................................................................................................................................x............................................................................................................................................................................................................ -
Learning Error Rates
2a) Forward Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 237205 reads in 56239 unique sequences.
Sample 2 - 287159 reads in 70043 unique sequences.
Sample 3 - 50801 reads in 21595 unique sequences.
Sample 4 - 164914 reads in 32588 unique sequences.
Sample 5 - 329611 reads in 48024 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4
selfConsist step 5
Convergence after 5 rounds.
2b) Reverse Reads
Initializing error rates to maximum possible estimate.
Sample 1 - 237205 reads in 51137 unique sequences.
Sample 2 - 287159 reads in 64957 unique sequences.
Sample 3 - 50801 reads in 20618 unique sequences.
Sample 4 - 164914 reads in 32320 unique sequences.
Sample 5 - 329611 reads in 54306 unique sequences.
selfConsist step 2
selfConsist step 3
selfConsist step 4
Convergence after 4 rounds. -
Denoise remaining samples .............................................................................................................................................................................................................................................................................................................................................................................................................................
Does this mean that it's still Denoising? I can see that the average load per thread is changing every so often so I'm assuming that the command is working, but is it possible its stuck in some type of loop and just lingering forever?
I do have limited time available on the supercomputer and thus trying to figure out what my options are if I can't get this job to finish in the next 96 hrs. Does this seem like an excessive amount of time to be running dada2 for this much data?
Thank you!
Alicia Reigel