As attached in the picture, you can see that I am currently at the second steps (assuming I will have the same output with the previous forum). I have let it run for 2 days and it still yet to proceed to next step. May I know how long usually it takes to finish step 2?
Hello!
The running time depends on multiple factors, such as tech. specifications of the machine you are using, amount of threads you provided in the command, amount of sequences you are denoising and some other factors I forgot to mention or I am not aware of.
From my experience it may take from several hours to several days.
But if you are running a lot of samples from different sequencing runs, you should be aware that it is recommended to denoise samples by sequencing run separately but using the same settings, and then merge feature tables and representative sequences.
I kill the previous command as I forget to specify the path to save my output. Currently I can see my CPU is actively running using around 24% of my CPU memory.
I doubt is it because of the low memory consumption making the it running longer time, however I have try to command for sudo ulimit -m unlimited prior to run the subsequent command as shown in this post. Yet, only around 24% of my CPU memory are allocated to run the command.
I see, my qza file is around 5GB yet it still running after 2 weeks. May I know when it was running the command, how much consumption of your CPU memory?
That a lot of reads! I am not surprised that it is running for a long time.
I am now curious, why you have so big qza file. I can think of 2 possible scenarios:
You have a lot of samples (thousands). Then it is better, as I wrote above, split samples by sequencing run and run separately with the same settings, and merge the outputs.
Sequencing depth is really high. Once I had the dataset with 1M reads per sample, so I just subsampled samples to fraction 0.1.
Well, 5GB in my opinion should run within 6 hours in my specs (11 cores, 18GB RAM, M3).
Two weeks is too long and something is wrong. Please have a look at my qza stat for references.
It was using around 15GB RAM and I couldn't check the CPU usage. But Im sure it used a maximum as I allocated 10 cores out of 11. It was hot and noisy with the cooling fan running almost throughout the run.
In case you have not found this already, here is the official DADA2 docs for working with 'Big Data'. The strategies discussed there could be helpful for you, even though the examples use DADA2 directly in R instead of through the Qiime2 plugin.