Hello friends,
I encountered the following error when I was using dada2pipeline to perform taxonomic training after splitting amplicon sequencing data: "Plugin error from feature-classifier:A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker."
Could anyone tell me where the problem might be? My code is:
qiime dada2 denoise-paired
--i-demultiplexed-seqs $OUTPATH/demux-paired-trimmed.qza
--p-trim-left-f 0
--p-trim-left-r 0
--p-trunc-len-f 0
--p-trunc-len-r 0
--p-n-threads 12
--p-n-reads-learn 360000
--o-table $OUTPATH/table.qza
--o-representative-sequences $OUTPATH/rep-seqs.qza
--o-denoising-stats $OUTPATH/denoising-stats.qza
--output-dir $OUTPATH/unspecified-dada2
--verbose
Hello and welcome to the forum!
Looks like your error is related to the RAM available at your machine.
To run this command successfully, you need either allocate more memory (if you are working on the cluster), or access to more powerful machine to run it there.
Another option worth considering is to filter your feature table to remove rare features and/or features with low prevalence (there are a lot of features that, for example, were found only in one samples and only one time, do you really need them?). After filtering feature table, you can filter representative sequences based on filtered feature table and use these files for taxonomy classification and other analyses.
Hi,Thank you for your prompt reply, but I think the available RAM under my server cluster account is sufficient. When I run Qiime2, there is still about 350G free. Could there be other possible reasons? Additionally, I added a command to Qiime feature-classifier classify-sklearn: --p-classify–chunk-size 5000. Can this resolve my issue?
I'd strongly recommend against many CPUs using the classifier. Essentially, for each CPU you use, you are loading another copy of the database into memory. With large reference databases like SILVA, expect to use anywhere from 32 - 124 GB of RAM per CPU. With the V4, you should be able to assume ~32 - 64 GB RAM per CPU. With 350 GB of RAM, I'd use at most 4-6 CPUs. You can probably more use more CPUs using the --p-classify–chunk-size. But I think it'd be far simpler to just use 4-6 CPUs and see if that works.