Align-to-tree-mafft-fasttree error

I tryed " qiime phylogeny align-to-tree-mafft-fasttree --i-sequences rep-seqs-nonchimeric.qza --o-alignment alignded-rep-seqs.qza --o-masked-alignment masked-aligned-rep-seqs.qza --o-tree unrooted-tree.qza --o-rooted-tree rooted-tree.qza
"
and got
"Plugin error from phylogeny:

Command '['mafft', '--preservecase', '--inputorder', '--thread', '1', '/tmp/qiime2-archive-b3ktqfzt/243e8d70-800b-4fb1-8eaa-c32d62c3829a/data/dna-sequences.fasta']' returned non-zero exit status 1
"
after chimera de-novo filtering with vsearch.
what can i do now?

Hello Nikonov,

Could you post the full text of your error message or any log files you have? The full error text or log file should includes more clues about the 'non-zero exit status' that will help us solve this problem.

Colin

2 Likes

Hi.
"Plugin error from phylogeny:

Command '['mafft', '--preservecase', '--inputorder', '--thread', '1', '/tmp/qiime2-archive-b3ktqfzt/243e8d70-800b-4fb1-8eaa-c32d62c3829a/data/dna-sequences.fasta']' returned non-zero exit status 1

Debug info has been saved to /tmp/qiime2-q2cli-err-qlk0t_u4.log"
log: https://drive.google.com/file/d/1nxFuG1pSTzNYW2tz97Me1PX4P4He_TST/view?usp=sharing

1 Like

Hi @5cr34m,

What kind of environment are you running in?

It looks like the process was killed:

/home/qiime2/miniconda/envs/qiime2-2018.8/bin/mafft: line 2440:  4048 Killed                  "$prefix/disttbfast" -q $npickup -E $cycledisttbfast -V "-"$gopdist -s $unalignlevel $legacygapopt $mergearg -W $tuplesize $termgapopt $outnum $addarg $add2ndhalfarg -C $numthreadstb $memopt $weightopt $treeinopt $treeoutopt $distoutopt $seqtype $model -f "-"$gop -Q $spfactor -h $aof $param_fft $algopt $treealg $scoreoutarg < infile > pre 2>> "$progressfile"

That might happen if you run out of memory or walltime (if on a cluster).

What kind of data re you working with? If a reference tree existed for your amplicon, you could always use q2-fragment-insertion for very large datasets (which is common if you aren't using a denoising algorithm like DADA2 or Deblur).

Hi.
Virtual machine (linux-64) on windows 2012 server with 32 Gb memory.
table-nonchimeric.qza - 49 Mb
rep-seqs-nonchimeric- 79 Mb
Is it too big to build distanse matrix?

Hi @5cr34m,

No, that's an excellent machine. I don't think I have a good explanation for why this happened.

If you were to re-run the command, does it still fail? Has the the virtual machine been allocated enough of the hardware (I assume the answer is yes, since you're running a windows server install, but it doesn't hurt to check).

I tryed to rerun with different memory limits, but got same error. From 16 Gb to 28 Gb memory.

Yikes, alright. Would you be able to send me a DM with your rep-seqs-nonchimeric.qza? I'll see what I can do to reproduce and figure out what's going on (I won't be able to start that until next week).

Thanks.
rep-seqs-nonchimeric: https://drive.google.com/open?id=12sBon7QzLCr0gNZDSQfBvR1droRj_451

forgot table: https://drive.google.com/open?id=1Ani651-RjNCOMTKFJBnVqM69zfUtYrhy

You have around 1 million input sequences. The sequences look fine, but this is going to have a high memory demand.

since you are doing OTU clustering instead of denoising, I strongly recommend removing low-frequency sequences from your table and sequences before proceeding. This will dramatically reduce memory requirements. But that is not why I recommend this — it is best practice, since low-frequency OTUs are usually noisy, erroneous sequences, which can negatively impact results.

Do you want to give that a try and see if it also eliminates this memory issue?

1 Like

Thanks.
How can i do that?