q2-phylogeny mafft job "killed"

ysuzuki · February 16, 2023, 6:35am

I would like to load rep-seqs.qza, which consists of 24 100k reads samples, for phylogeny analysis, but I get an error. I would appreciate if you could tell me how to deal with this problem.

#input command
qiime phylogeny align-to-tree-mafft-fasttree
--p-n-threads 30
--i-sequences rep-seqs.qza
--o-alignment aligned-rep-seqs.qza
--o-masked-alignment masked-aligned-rep-seqs.qza
--o-tree unrooted-tree.qza
--o-rooted-tree rooted-tree.qza

output log
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
Reallocating..done. *alloclen = 125739

Reallocating..done. *alloclen = 125739
Reallocating..done. *alloclen = 125739
Reallocating..done. *alloclen = 206192
/home/ysuzuki/miniconda3/envs/qiime2-2022.11/bin/mafft: line 2817: 1223186 Killed "$prefix/disttbfast" -q $npickup -E $cycledisttbfast -V "-"$gopdist -s $unalignlevel $legacygapopt $mergearg -W $tuplesize $termgapopt $outnum $addarg $add2ndhalfarg -C $numthreads-$numthreadstb $memopt $weightopt $treeinopt $treeoutopt $distoutopt $seqtype $model -g $gexp -f "-"$gop -Q $spfactor -h $aof $param_fft $algopt $treealg $scoreoutarg $anchoropt -x $maxanchorseparation $oneiterationopt < infile > pre 2>> "$progressfile"
Traceback (most recent call last):
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2cli/commands.py", line 352, in call
results = action(**arguments)
File "", line 2, in align_to_tree_mafft_fasttree
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 475, in callable_executor
outputs = self._callable(scope.ctx, **view_args)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_phylogeny/_align_to_tree_mafft_fasttree.py", line 19, in align_to_tree_mafft_fasttree
aligned_seq, = mafft(sequences=sequences, n_threads=n_threads,
File "", line 2, in mafft
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in callable_executor
output_views = self._callable(**view_args)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 128, in mafft
return _mafft(sequences_fp, None, n_threads, parttree, False)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 100, in _mafft
run_command(cmd, result_fp)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 26, in run_command
subprocess.run(cmd, stdout=output_f, check=True)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mafft', '--preservecase', '--inputorder', '--thread', '30', '/tmp/qiime2/ysuzuki/data/76cc47d7-5a79-4b0e-9f00-8f6dda5f3280/data/dna-sequences.fasta']' returned non-zero exit status 1.

timanix · February 16, 2023, 7:57am

Hello!
Looks like your machine is running out of RAM and not able to complete the task.

You can try to set number of threads to 4 (or another lower than 30 number) to decrease RAM requirements.

Another option to consider is to filter feature table to get rid of sequences with low counts (less than 10, for example), then filter rep-seqs file based on filtered feature table. It will significantly decrease RAM need to accomplish the task.

Best,

ysuzuki · February 16, 2023, 8:55am

Thanks for your comment.
I started the analysis with 4thread.
In case you are wondering, I have 48GB of memory installed, but is it still not enough?
Please also tell me the recommended amount of memory to run at 30thread.

Also, is it ok to use the following command to filter the table?

qiime feature-table filter-samples
--i-table table.qza
--p-min-frequency 10
--o-filtered-table sample-frequency-filtered-table.qza

timanix · February 16, 2023, 11:32am

It depends on the amount of sequences in your rep-seqs file. In my experirence it is enough for most of the cases with 16S datasets, but sometimes I need to use 64 or even 128 Gb. But after filtering usually 32 is enough.
High number of threads is not always speeding up the overall time required for the task, and I prefer not to put more than 6 or 8 threads, so I really do not know how much one need RAM to run it with 30 threads.

In that case you need to filter features, not samples:
link

After it, use filtered feature table to filter rep-seqs file:
link

ysuzuki · February 16, 2023, 11:44am

I understood that about 8 threads is sufficient for analysis.
I just tried it with 4 threads and am getting the same error.
I may need to do some filtering.

output log

Running external command line application. This may print messages to stdout and/or stderr.

The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: mafft --preservecase --inputorder --thread 4 /tmp/qiime2/ysuzuki/data/76cc47d7-5a79-4b0e-9f00-8f6dda5f3280/data/dna-sequences.fasta

inputfile = orig

78518 x 3683 - 27 d

nthread = 4

nthreadpair = 4

nthreadtb = 4

ppenalty_ex = 0

stacksize: 8192 kb->15335 kb

generating a scoring matrix for nucleotide (dist=200) ... done

Gap Penalty = -1.53, +0.00, +0.00

Making a distance matrix ..

done.

Constructing a UPGMA tree (efffree=0) ...

done.

Progressive alignment 1/2...

Reallocating..done. *alloclen = 125739
Reallocating..done. *alloclen = 206192
/home/ysuzuki/miniconda3/envs/qiime2-2022.11/bin/mafft: line 2817: 1412851 Killed "$prefix/disttbfast" -q $npickup -E $cycledisttbfast -V "-"$gopdist -s $unalignlevel $legacygapopt $mergearg -W $tuplesize $termgapopt $outnum $addarg $add2ndhalfarg -C $numthreads-$numthreadstb $memopt $weightopt $treeinopt $treeoutopt $distoutopt $seqtype $model -g $gexp -f "-"$gop -Q $spfactor -h $aof $param_fft $algopt $treealg $scoreoutarg $anchoropt -x $maxanchorseparation $oneiterationopt < infile > pre 2>> "$progressfile"
Traceback (most recent call last):
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2cli/commands.py", line 352, in call
results = action(**arguments)
File "", line 2, in align_to_tree_mafft_fasttree
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 475, in callable_executor
outputs = self._callable(scope.ctx, **view_args)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_phylogeny/_align_to_tree_mafft_fasttree.py", line 19, in align_to_tree_mafft_fasttree
aligned_seq, = mafft(sequences=sequences, n_threads=n_threads,
File "", line 2, in mafft
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
outputs = self.callable_executor(scope, callable_args,
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in callable_executor
output_views = self._callable(**view_args)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 128, in mafft
return _mafft(sequences_fp, None, n_threads, parttree, False)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 100, in _mafft
run_command(cmd, result_fp)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_alignment/_mafft.py", line 26, in run_command
subprocess.run(cmd, stdout=output_f, check=True)
File "/home/ysuzuki/miniconda3/envs/qiime2-2022.11/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['mafft', '--preservecase', '--inputorder', '--thread', '4', '/tmp/qiime2/ysuzuki/data/76cc47d7-5a79-4b0e-9f00-8f6dda5f3280/data/dna-sequences.fasta']' returned non-zero exit status 1.

SoilRotifer · February 16, 2023, 5:20pm

Hi @ysuzuki,

If you run qiime phylogeny align-to-tree-mafft-fasttree --help, you'll find that there is a potentially useful flag: --p-partree. Try running with this flag and let us know if it works.

-Mike

ysuzuki · February 17, 2023, 2:50am

I tried filtering but got the following error Do I need to process my table.qza with filter-samples?

github.com

qiime2/docs/blob/master/source/tutorials/filtering.rst

Filtering data
==============

.. note:: This guide assumes you have installed QIIME 2 using one of the procedures in the :doc:`install documents <../install/index>`.

This tutorial describes how to filter feature tables, sequences, and distance matrices in QIIME 2, and will be expanded as more filtering functionality becomes available.

Obtain the data
---------------

First, create a directory to work in and change to that directory.

.. command-block::
   :no-exec:

   mkdir qiime2-filtering-tutorial
   cd qiime2-filtering-tutorial

Download the data we'll use in the tutorial. This includes sample metadata, a feature table, and a distance matrix:

This file has been truncated. show original

****output_log

(1/1) Invalid value for '--i-table': Expected an artifact of at least type

FeatureTable[Frequency]. An artifact of type FeatureData[Sequence] was

provided.

timanix · February 17, 2023, 8:08am

It looks like you made and error and provided rep-seqs.qza file instead of feature table. Take a closer look on the plugin descriptions. In the first step (first link) feature table should be filtered, and in a second step (second link) feature table is provided, as well as rep-seqs.qza artifact. The latest will be filtered based on filtered feature table.

Best,

ysuzuki · February 18, 2023, 12:26am

Thanks timanix.
Your comment made me realize a basic mistake.
I had merged multiple rep-seqs and table each before doing the analysis, but when I reviewed the script for merging rep-seqs, the output file name was table.qza, which had been overwritten.
I will redo all the analysis.

ysuzuki · February 20, 2023, 8:34am

I followed your advice and ran it with the "--p-partree" option, but got the following output error
Is this also an error caused by insufficient memory?

Plugin error from phylogeny:

Command '['FastTreeMP', '-quote', '-nt', '/tmp/qiime2/ysuzuki/data/fcb10cdf-a46c-4a15-9912-3daea14f9472/data/aligned-dna-sequences.fasta']' died with <Signals.SIGKILL: 9>.

Debug info has been saved to /tmp/qiime2-q2cli-err-k28rci6q.log

List item

SoilRotifer · February 21, 2023, 2:35pm

Hi @ysuzuki,

Yes, quite often related to a memory issue. Can you provide us with more details on how you generated your rep-seqs.qza file? What gene sequence are you using? Did you perform any quality control? denoising? dereplication? etc...? How many representative sequence's (i.e. features) does your data contain?

system · March 24, 2023, 8:36pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.