I really spent a long time to run the data2 denoise-paired command. I have allocated 28 processors,200GB RAM (my host computer has 32 processors and 256GB RAM) to Qiime2. However, when the program was running they are only using 2-4 core and less than 90GB RAM. is there anything wrong when I run the script?
My script as following,
time qiime dada2 denoise-paired
--i-demultiplexed-seqs SDSS-2020-demux-paired-end.qza
--p-trim-left-f 11
--p-trim-left-r 10
--p-trunc-len-f 247
--p-trunc-len-r 247
--p-n-threads 0
--o-table SDSS-2020-table.qza
--o-representative-sequences SDSS-2020-rep-seqs.qza
--o-denoising-stats SDSS-2020-denoising-stats.qza
That is a good question. I know setting --p-n-threads 0 should tell the command to use all available cores, but maybe it's incorrectly reading the number of available cores from the virtual machine. Have you tried setting --p-n-threads 28? Additionally, are you sure you're viewing the command during peak system usage there? It is possible that at some point the process was using more CPU resources than that.
Thanks for your reply.@Oddant1, I tried that but is not working. I monitored the running process, the qiime2 used 10 or 12 cores only in the first 5 -10min, after that only 2 or 3 core was used for computation. for 222 samples around 20GB, it takes about 13hours. It seemed not that too bad. But I have another error
"Denoise remaining samples ......................Error in dada_uniques(names(derep[[i]]$uniques), unname(derep[[i]]$uniques), :
Memory allocation failed."
Hi Matthew Ryan Dillon,
When I was running the "denoise paired" command in Qiime2 2020.2. They return error information. attached is the err log. Could you please check and help me solve this error?
qiime2-q2cli-err-ao9rxm00 log.txt (3.1 KB)
I have allocated 28 processors,200GB RAM (my host computer has 32 processors and 256GB RAM)
Here is the command I used,
time qiime dada2 denoise-paired
–i-demultiplexed-seqs SDSS-2020-demux-paired-end.qza
–p-trim-left-f 11
–p-trim-left-r 10
–p-trunc-len-f 247
–p-trunc-len-r 247
–p-n-threads 0
–o-table SDSS-2020-table.qza
–o-representative-sequences SDSS-2020-rep-seqs.qza
–o-denoising-stats SDSS-2020-denoising-stats.qza
@yi_zhou, I have merged this with your public question, as they are basically duplicates. In future, please post questions only once, on the public forum, so that information isn't scattered in multiple places.
@yi_zhou, it sounds like the times and resource usage you're getting are actually pretty typical.
I monitored the running process, the qiime2 used 10 or 12 cores only in the first 5 -10min, after that only 2 or 3 core was used for computation.
Not every part of DADA2 is actually parallelizable, so you aren't going to get extremely high thread usage the entire way through. That sounds like fairly normal thread utilization.
for 222 samples around 20GB, it takes about 13hours
This also sounds about right. DADA2 is performing some pretty heavy computations, and they can take a very large amount of time to run especially on sizeable data sets.
This probably happened because you tried to create too many threads. Each thread adds significantly to the amount of memory your program is using especially when you're trying to process significant amounts of data (like 20gigs). No matter how much RAM you have, you will eventually use it all if you create enough threads.
Additionally, I'm told you messaged another moderator about this issue. As stipulated in the Code of Conduct please be patient with response times. We will try to get back to you as quickly as we can.