Align-to-tree-mafft-fasttree error

5cr34m · October 22, 2018, 7:00pm

I tryed " qiime phylogeny align-to-tree-mafft-fasttree --i-sequences rep-seqs-nonchimeric.qza --o-alignment alignded-rep-seqs.qza --o-masked-alignment masked-aligned-rep-seqs.qza --o-tree unrooted-tree.qza --o-rooted-tree rooted-tree.qza
"
and got
"Plugin error from phylogeny:

Command '['mafft', '--preservecase', '--inputorder', '--thread', '1', '/tmp/qiime2-archive-b3ktqfzt/243e8d70-800b-4fb1-8eaa-c32d62c3829a/data/dna-sequences.fasta']' returned non-zero exit status 1
"
after chimera de-novo filtering with vsearch.
what can i do now?

colinbrislawn · October 22, 2018, 9:18pm

Hello Nikonov,

Could you post the full text of your error message or any log files you have? The full error text or log file should includes more clues about the 'non-zero exit status' that will help us solve this problem.

Colin

5cr34m · October 23, 2018, 12:44pm

Hi.
"Plugin error from phylogeny:

Command '['mafft', '--preservecase', '--inputorder', '--thread', '1', '/tmp/qiime2-archive-b3ktqfzt/243e8d70-800b-4fb1-8eaa-c32d62c3829a/data/dna-sequences.fasta']' returned non-zero exit status 1

Debug info has been saved to /tmp/qiime2-q2cli-err-qlk0t_u4.log"
log: https://drive.google.com/file/d/1nxFuG1pSTzNYW2tz97Me1PX4P4He_TST/view?usp=sharing

ebolyen · October 25, 2018, 11:16pm

Hi @5cr34m,

What kind of environment are you running in?

It looks like the process was killed:

/home/qiime2/miniconda/envs/qiime2-2018.8/bin/mafft: line 2440:  4048 Killed                  "$prefix/disttbfast" -q $npickup -E $cycledisttbfast -V "-"$gopdist -s $unalignlevel $legacygapopt $mergearg -W $tuplesize $termgapopt $outnum $addarg $add2ndhalfarg -C $numthreadstb $memopt $weightopt $treeinopt $treeoutopt $distoutopt $seqtype $model -f "-"$gop -Q $spfactor -h $aof $param_fft $algopt $treealg $scoreoutarg < infile > pre 2>> "$progressfile"

That might happen if you run out of memory or walltime (if on a cluster).

What kind of data re you working with? If a reference tree existed for your amplicon, you could always use q2-fragment-insertion for very large datasets (which is common if you aren't using a denoising algorithm like DADA2 or Deblur).

5cr34m · October 26, 2018, 12:54pm

Hi.
Virtual machine (linux-64) on windows 2012 server with 32 Gb memory.
table-nonchimeric.qza - 49 Mb
rep-seqs-nonchimeric- 79 Mb
Is it too big to build distanse matrix?

ebolyen · October 26, 2018, 5:41pm

Hi @5cr34m,

No, that's an excellent machine. I don't think I have a good explanation for why this happened.

If you were to re-run the command, does it still fail? Has the the virtual machine been allocated enough of the hardware (I assume the answer is yes, since you're running a windows server install, but it doesn't hurt to check).

5cr34m · October 26, 2018, 6:16pm

I tryed to rerun with different memory limits, but got same error. From 16 Gb to 28 Gb memory.

ebolyen · October 26, 2018, 6:17pm

Yikes, alright. Would you be able to send me a DM with your rep-seqs-nonchimeric.qza? I'll see what I can do to reproduce and figure out what's going on (I won't be able to start that until next week).

5cr34m · October 27, 2018, 4:52am

Thanks.
rep-seqs-nonchimeric: https://drive.google.com/open?id=12sBon7QzLCr0gNZDSQfBvR1droRj_451

5cr34m · October 27, 2018, 5:03am

forgot table: https://drive.google.com/open?id=1Ani651-RjNCOMTKFJBnVqM69zfUtYrhy

Nicholas_Bokulich · November 2, 2018, 2:03pm

You have around 1 million input sequences. The sequences look fine, but this is going to have a high memory demand.

since you are doing OTU clustering instead of denoising, I strongly recommend removing low-frequency sequences from your table and sequences before proceeding. This will dramatically reduce memory requirements. But that is not why I recommend this — it is best practice, since low-frequency OTUs are usually noisy, erroneous sequences, which can negatively impact results.

Do you want to give that a try and see if it also eliminates this memory issue?

5cr34m · November 2, 2018, 2:19pm

Thanks.
How can i do that?