q2-phylogeny mafft job "killed"

I would like to load rep-seqs.qza, which consists of 24 100k reads samples, for phylogeny analysis, but I get an error. I would appreciate if you could tell me how to deal with this problem.

#input command
qiime phylogeny align-to-tree-mafft-fasttree
--p-n-threads 30
--i-sequences rep-seqs.qza
--o-alignment aligned-rep-seqs.qza
--o-masked-alignment masked-aligned-rep-seqs.qza
--o-tree unrooted-tree.qza
--o-rooted-tree rooted-tree.qza

#output log
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
Reallocating..done. *alloclen = 125739

Reallocating..done. *alloclen = 125739
Reallocating..done. *alloclen = 125739
Reallocating..done. *alloclen = 206192
/home/ysuzuki/miniconda3/envs/qiime2-2022.11/bin/mafft: line 2817: 1223186 Killed "$prefix/disttbfast" -q $npickup -E $cycledisttbfast -V "-"$gopdist -s $unalignlevel $legacygapopt $mergearg -W $tuplesize $termgapopt $outnum $addarg $add2ndhalfarg -C $numthreads-$numthreadstb $memopt $weightopt $treeinopt $treeoutopt $distoutopt $seqtype $model -g $gexp -f "-"$gop -Q $spfactor -h $aof $param_fft $algopt $treealg $scoreoutarg $anchoropt -x $maxanchorseparation $oneiterationopt < infile > pre 2>> "$progressfile"
Traceback (most recent call last):
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2cli/commands.py", line 352, in call
results = action(**arguments)
File "", line 2, in align_to_tree_mafft_fasttree
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 475, in callable_executor
outputs = self._callable(scope.ctx, **view_args)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_phylogeny/_align_to_tree_mafft_fasttree.py", line 19, in align_to_tree_mafft_fasttree
aligned_seq, = mafft(sequences=sequences, n_threads=n_threads,
File "", line 2, in mafft
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in callable_executor
output_views = self._callable(**view_args)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 128, in mafft
return _mafft(sequences_fp, None, n_threads, parttree, False)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 100, in _mafft
run_command(cmd, result_fp)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 26, in run_command
subprocess.run(cmd, stdout=output_f, check=True)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mafft', '--preservecase', '--inputorder', '--thread', '30', '/tmp/qiime2/ysuzuki/data/76cc47d7-5a79-4b0e-9f00-8f6dda5f3280/data/dna-sequences.fasta']' returned non-zero exit status 1.

Hello!
Looks like your machine is running out of RAM and not able to complete the task.

You can try to set number of threads to 4 (or another lower than 30 number) to decrease RAM requirements.

Another option to consider is to filter feature table to get rid of sequences with low counts (less than 10, for example), then filter rep-seqs file based on filtered feature table. It will significantly decrease RAM need to accomplish the task.

Best,

1 Like

Thanks for your comment.
I started the analysis with 4thread.
In case you are wondering, I have 48GB of memory installed, but is it still not enough?
Please also tell me the recommended amount of memory to run at 30thread.

Also, is it ok to use the following command to filter the table?

qiime feature-table filter-samples
--i-table table.qza
--p-min-frequency 10
--o-filtered-table sample-frequency-filtered-table.qza

It depends on the amount of sequences in your rep-seqs file. In my experirence it is enough for most of the cases with 16S datasets, but sometimes I need to use 64 or even 128 Gb. But after filtering usually 32 is enough.
High number of threads is not always speeding up the overall time required for the task, and I prefer not to put more than 6 or 8 threads, so I really do not know how much one need RAM to run it with 30 threads.

In that case you need to filter features, not samples:
link

After it, use filtered feature table to filter rep-seqs file:
link

I understood that about 8 threads is sufficient for analysis.
I just tried it with 4 threads and am getting the same error.
I may need to do some filtering.

output log

Running external command line application. This may print messages to stdout and/or stderr.

The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: mafft --preservecase --inputorder --thread 4 /tmp/qiime2/ysuzuki/data/76cc47d7-5a79-4b0e-9f00-8f6dda5f3280/data/dna-sequences.fasta

inputfile = orig

78518 x 3683 - 27 d

nthread = 4

nthreadpair = 4

nthreadtb = 4

ppenalty_ex = 0

stacksize: 8192 kb->15335 kb

generating a scoring matrix for nucleotide (dist=200) ... done

Gap Penalty = -1.53, +0.00, +0.00

Making a distance matrix ..

done.

Constructing a UPGMA tree (efffree=0) ...

done.

Progressive alignment 1/2...

Reallocating..done. *alloclen = 125739
Reallocating..done. *alloclen = 206192
/home/ysuzuki/miniconda3/envs/qiime2-2022.11/bin/mafft: line 2817: 1412851 Killed "$prefix/disttbfast" -q $npickup -E $cycledisttbfast -V "-"$gopdist -s $unalignlevel $legacygapopt $mergearg -W $tuplesize $termgapopt $outnum $addarg $add2ndhalfarg -C $numthreads-$numthreadstb $memopt $weightopt $treeinopt $treeoutopt $distoutopt $seqtype $model -g $gexp -f "-"$gop -Q $spfactor -h $aof $param_fft $algopt $treealg $scoreoutarg $anchoropt -x $maxanchorseparation $oneiterationopt < infile > pre 2>> "$progressfile"
Traceback (most recent call last):
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2cli/commands.py", line 352, in call
results = action(**arguments)
File "", line 2, in align_to_tree_mafft_fasttree
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 475, in callable_executor
outputs = self._callable(scope.ctx, **view_args)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_phylogeny/_align_to_tree_mafft_fasttree.py", line 19, in align_to_tree_mafft_fasttree
aligned_seq, = mafft(sequences=sequences, n_threads=n_threads,
File "", line 2, in mafft
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in callable_executor
output_views = self._callable(**view_args)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 128, in mafft
return _mafft(sequences_fp, None, n_threads, parttree, False)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 100, in _mafft
run_command(cmd, result_fp)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 26, in run_command
subprocess.run(cmd, stdout=output_f, check=True)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mafft', '--preservecase', '--inputorder', '--thread', '4', '/tmp/qiime2/ysuzuki/data/76cc47d7-5a79-4b0e-9f00-8f6dda5f3280/data/dna-sequences.fasta']' returned non-zero exit status 1.

Hi @ysuzuki,

If you run qiime phylogeny align-to-tree-mafft-fasttree --help, you'll find that there is a potentially useful flag: --p-partree. Try running with this flag and let us know if it works.

-Mike

2 Likes

I tried filtering but got the following error Do I need to process my table.qza with filter-samples?

****output_log

(1/1) Invalid value for '--i-table': Expected an artifact of at least type

FeatureTable[Frequency]. An artifact of type FeatureData[Sequence] was

provided.

It looks like you made and error and provided rep-seqs.qza file instead of feature table. Take a closer look on the plugin descriptions. In the first step (first link) feature table should be filtered, and in a second step (second link) feature table is provided, as well as rep-seqs.qza artifact. The latest will be filtered based on filtered feature table.

Best,

Thanks timanix.
Your comment made me realize a basic mistake.
I had merged multiple rep-seqs and table each before doing the analysis, but when I reviewed the script for merging rep-seqs, the output file name was table.qza, which had been overwritten.
I will redo all the analysis.

I followed your advice and ran it with the "--p-partree" option, but got the following output error
Is this also an error caused by insufficient memory?


Plugin error from phylogeny:

Command '['FastTreeMP', '-quote', '-nt', '/tmp/qiime2/ysuzuki/data/fcb10cdf-a46c-4a15-9912-3daea14f9472/data/aligned-dna-sequences.fasta']' died with <Signals.SIGKILL: 9>.

Debug info has been saved to /tmp/qiime2-q2cli-err-k28rci6q.log

List item

Hi @ysuzuki,

Yes, quite often related to a memory issue. Can you provide us with more details on how you generated your rep-seqs.qza file? What gene sequence are you using? Did you perform any quality control? denoising? dereplication? etc...? How many representative sequence's (i.e. features) does your data contain?

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.