Hi,
I am trying to build a phylogenetic tree using Nanopore reads. Currently I am using qiime phylogeny align-to-tree-mafft-fasttree
command; however, since I did not perform any clustering, I am aligning all sequences to each other, and this probably causes my job to run out of memory.
In fact, when using only 1000 reads, I am able to obtain the desired phylogenetic tree.
In particular, this is the command I am using:
qiime phylogeny align-to-tree-mafft-fasttree
--i-sequences rep-seqs.qza
--o-alignment aligned-rep-seqs.qza
--o-masked-alignment masked-aligned-rep-seqs.qza
--o-tree unrooted-tree.qza
--o-rooted-tree rooted-tree.qza
--p-n-threads $threads
and this is the error I get when using the whole dataset:
/home/simone/miniconda3/envs/MetONTIIME_env/bin/mafft: line 2440: 21008 Killed "$prefix/disttbfast" -q $npickup -E $cycledisttbfast -V "-"$gopdist -s $unalignlevel $legacygapopt $mergearg -W $tuplesize $termgapopt $outnum $addarg $add2ndhalfarg -C $numthreadstb $memopt $weightopt $treeinopt $treeoutopt $distoutopt $seqtype $model -f "-"$gop -Q $spfactor -h $aof $param_fft $algopt $treealg $scoreoutarg < infile > pre 2>> "$progressfile"
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.Command: mafft --preservecase --inputorder --thread 24 /tmp/qiime2-archive-akns4pmq/9f1e0e1e-83a5-4908-8020-fe8f8ee29c69/data/dna-sequences.fasta
Traceback (most recent call last):
File "/home/simone/miniconda3/envs/MetONTIIME_env/lib/python3.6/site-packages/q2cli/commands.py", line 327, in call
results = action(**arguments)
File "</home/simone/miniconda3/envs/MetONTIIME_env/lib/python3.6/site-packages/decorator.py:decorator-gen-226>", line 2, in align_to_tree_mafft_fasttree
File "/home/simone/miniconda3/envs/MetONTIIME_env/lib/python3.6/site-packages/qiime2/sdk/action.py", line 240, in bound_callable
output_types, provenance)
File "/home/simone/miniconda3/envs/MetONTIIME_env/lib/python3.6/site-packages/qiime2/sdk/action.py", line 477, in callable_executor
outputs = self._callable(scope.ctx, **view_args)
File "/home/simone/miniconda3/envs/MetONTIIME_env/lib/python3.6/site-packages/q2_phylogeny/_align_to_tree_mafft_fasttree.py", line 18, in align_to_tree_mafft_fasttree
aligned_seq, = mafft(sequences=sequences, n_threads=n_threads)
File "</home/simone/miniconda3/envs/MetONTIIME_env/lib/python3.6/site-packages/decorator.py:decorator-gen-481>", line 2, in mafft
File "/home/simone/miniconda3/envs/MetONTIIME_env/lib/python3.6/site-packages/qiime2/sdk/action.py", line 240, in bound_callable
output_types, provenance)
File "/home/simone/miniconda3/envs/MetONTIIME_env/lib/python3.6/site-packages/qiime2/sdk/action.py", line 383, in callable_executor
output_views = self._callable(**view_args)
File "/home/simone/miniconda3/envs/MetONTIIME_env/lib/python3.6/site-packages/q2_alignment/_mafft.py", line 85, in mafft
run_command(cmd, aligned_fp)
File "/home/simone/miniconda3/envs/MetONTIIME_env/lib/python3.6/site-packages/q2_alignment/_mafft.py", line 27, in run_command
subprocess.run(cmd, stdout=output_f, check=True)
File "/home/simone/miniconda3/envs/MetONTIIME_env/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['mafft', '--preservecase', '--inputorder', '--thread', '24', '/tmp/qiime2-archive-akns4pmq/9f1e0e1e-83a5-4908-8020-fe8f8ee29c69/data/dna-sequences.fasta']' returned non-zero exit status 1.
What is the approach you would suggest? I was thinking about these possibilities:
- Subsample some reads from rep-seqs.qza and build the tree aligning them
- Subsample a lower number of reads from each sample (same number for each sample), reimport them and use them for building the tree
- Use an alternative method (if any exist) that might reduce the memory usage.
- Retrieve from the database the reference sequences to whom at least x reads have been assigned, and build the tree using those 'error-free' representative sequences
Thanks in advance,
Simone