feature-table summarize: killed not generating table.qzv

jborin · July 9, 2021, 12:36am

I'm having trouble generating a table.qzv when running feature-table summarize.

I used dada2 on demultiplexed paired reads to generate the table.qza. However, when I run feature-table summarize, the command fails after ~1 hr. I am using qiime-2021.4 in the convda env on the UCSD TSCC and this command fails in both submitted jobs and interactive mode. The error returned is "killed" but with no specific number associated and no other description.

I double-checked my sample metadata and used keemei to check it. All looks good and it is a .tsv.

I tried running feature-table summarize without including metadata and the command still fails.

I tried including --verbose but it doesn't give describe the failure; it just says "Killed".

Here are my commands without and with sample metadata
qiime feature-table summarize
--i-table table.qza
--verbose
--o-visualization table-nometa.qzv

qiime feature-table summarize
--i-table table.qza
--o-visualization table.qzv
--verbose
--m-sample-metadata-file sample-metadata.tsv

Thanks in advance for your guidance with this! I sincerely appreciate it.

timanix · July 9, 2021, 4:26am

Hi!
Just yesterday had the same problem on server and reason was that I allocated to small amount of memory (it required a 32 gb of RAM for 1600 samples) and almost 2 h to finish. Try to allocate more ram and time.

jborin · July 9, 2021, 9:27pm

Thanks for your suggestion timanix!

Last night I allocated 8 cores but it was still killed. Today, I requested 2 nodes with 8 cores per node. It was still running after 3 hrs (which is promising) and then I ran out of walltime/credit on my account. When I get more funds I will try this again.

However I wanted to mention that I only have 254 samples. Is it surprising/concerning that it is taking so long to run when I have <20% the samples that you did?

Thanks,

thermokarst · July 9, 2021, 9:29pm

Unfortunately increasing the number of cores (or nodes) won't address what @timanix suggested above - you either need to request a node with more memory (ram), and/or request more time when allocating resources. You should chat with your sysadmin, they can help you figure out which one of those reasons was the cause for them killing your job, and can provide assistance on how to reconcile the problem.

timanix · July 9, 2021, 10:17pm

That's still making sense if you did not filter your feature table from features, that found only in several samples, and features, whose frequencies are low.
The same table with 1600 samples I mentioned above finished in 15 minutes with 16 gb RAM (I believe it would run with a smaller RAM, just allocated 16 to be safe) after filtering. So, I can suggest to filter a feature table to remove features with low frequency (less than 50, for example) and that found in small amount of samples (less than in 5, for example). After it memory and time requirements should drop down.

jborin · July 21, 2021, 6:56pm

Thanks @timanix and @thermokarst for all your help!

Processors are ~proportional to memory on the cluster I'm using. Upping the processors to 28 allowed me to complete the command in 3 h with the full dataset. Filtering also helps speed things up!

system · August 22, 2021, 12:56am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.