Phylogeny error

EugeniaSH · June 23, 2024, 10:43pm

Hello,
I am using Qiime2-2022.11 , after more than 24 hrs of running the phylogeny command :

qiime phylogeny align-to-tree-mafft-fasttree
--i-sequences rep-seqs_juntas3.qza
--o-alignment aligned-rep-seq_juntas3f.qza
--o-masked-alignment masked-aligned-rep-seqs_juntas3f.qza
--o-tree unrooted-tree_juntas3f.qza
--o-rooted-tree rooted-tree_juntas3f.qza

I got the following error
Plugin error from phylogeny:

Each sequence's length must match the number of positions in the MSA: 31766 != 31767

Debug info has been saved to /var/folders/6l/z8w2yh2j2jv0yk0kmtygpl340000gn/T/qiime2-q2cli-err-fyavxt2n.log

my rep-seqs-juntas3.qza is 4.9MB

my computer has a 4GHz intel Core i7 processor and a 32 GB 1600 MHz DDR3 memory

In case it is a memory issue , Can I split the command in two separate steps?

All help will be greatly appreciated.
I also do not know how to see the Debug information

Thanks

colinvwood · June 24, 2024, 4:32pm

Hello @EugeniaSH,

Could you please re-run this command additionally with the --verbose flag so we get a little more information about what went wrong?

EugeniaSH · June 24, 2024, 5:02pm

Thank you,
Here is what I´ve got so far (18 hrs in)

Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: mafft --preservecase --inputorder --thread 1 /var/folders/6l/z8w2yh2j2jv0yk0kmtygpl340000gn/T/qiime2/eugeniash/data/797e5b43-2f87-4072-aed5-e5da456e8468/data/dna-sequences.fasta

inputfile = orig
62614 x 428 - 220 d
nthread = 1
nthreadpair = 1
nthreadtb = 1
ppenalty_ex = 0
stacksize: 8192 kb->12229 kb
generating a scoring matrix for nucleotide (dist=200) ... done
Gap Penalty = -1.53, +0.00, +0.00

Making a distance matrix ..
62601 / 62614 (thread 0)
done.

Constructing a UPGMA tree (efffree=0) ...
62610 / 62614
done.

Progressive alignment 1/2...
STEP 701 / 62613 f
Reallocating..done. *alloclen = 1921
STEP 22801 / 62613 f
Reallocating..done. *alloclen = 2963
STEP 25101 / 62613 f
Reallocating..done. *alloclen = 4014
STEP 26101 / 62613 f
Reallocating..done. *alloclen = 5470
STEP 28201 / 62613 f
Reallocating..done. *alloclen = 6867
STEP 33301 / 62613 f
Reallocating..done. *alloclen = 9381
STEP 35701 / 62613 f
Reallocating..done. *alloclen = 10388
STEP 36601 / 62613 f
Reallocating..done. *alloclen = 11921
STEP 37801 / 62613 f
Reallocating..done. *alloclen = 13679
STEP 40101 / 62613 f
Reallocating..done. *alloclen = 20882
STEP 49601 / 62613 f
Reallocating..done. *alloclen = 23989
STEP 50301 / 62613 f
Reallocating..done. *alloclen = 28147
STEP 53901 / 62613 f
Reallocating..done. *alloclen = 30844
STEP 56501 / 62613 f
Reallocating..done. *alloclen = 32206
STEP 62201 / 62613 f
Reallocating..done. *alloclen = 37807

len1=33428, len2=592, Switching to the memsave mode
STEP 62601 / 62613 mDP 00001 / 00001 1
done.

Making a distance matrix from msa..
62600 / 62614 (thread 0)
done.

Constructing a UPGMA tree (efffree=1) ...
62610 / 62614
done.

Progressive alignment 2/2...
STEP 16501 / 62613 f
Reallocating..done. *alloclen = 1858
STEP 30401 / 62613 h
Reallocating..done. *alloclen = 2869
STEP 51801 / 62613 f
Reallocating..done. *alloclen = 4015
STEP 54201 / 62613 f
Reallocating..done. *alloclen = 5025
STEP 57201 / 62613 f
Reallocating..done. *alloclen = 7479
STEP 57701 / 62613 f
Reallocating..done. *alloclen = 8990
STEP 58101 / 62613 f
Reallocating..done. *alloclen = 10731
STEP 58501 / 62613 h
Reallocating..done. *alloclen = 11996
STEP 58801 / 62613 f
Reallocating..done. *alloclen = 14419
STEP 58901 / 62613 f
Reallocating..done. *alloclen = 21247
STEP 59001 / 62613 f
Reallocating..done. *alloclen = 28728
STEP 60301 / 62613 f
Reallocating..done. *alloclen = 34132
STEP 62401 / 62613 f
len1=30289, len2=440, Switching to the memsave mode
STEP 62501 / 62613 m hDP 00001 / 00001
Reallocating..done. *alloclen = 35357
STEP 62601 / 62613 mDP 00001 / 00001
done.

disttbfast (nuc) Version 7.508
alg=M, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
0 thread(s)

Strategy:
FFT-NS-2 (Fast but rough)
Progressive method (guide trees were built 2 times.)

If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.

The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --leavegappyregion option.

Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: FastTree -quote -nt /var/folders/6l/z8w2yh2j2jv0yk0kmtygpl340000gn/T/qiime2/eugeniash/data/0936c8b6-c6ae-40dc-a119-0ad20095acf0/data/aligned-dna-sequences.fasta

FastTree Version 2.1.11 Double precision (No SSE3)
Alignment: /var/folders/6l/z8w2yh2j2jv0yk0kmtygpl340000gn/T/qiime2/eugeniash/data/0936c8b6-c6ae-40dc-a119-0ad20095acf0/data/aligned-dna-sequences.fasta
Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000
Search: Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.80
ML Model: Jukes-Cantor, CAT approximation with 20 rate categories
Initial topology in 19224.71 seconds 61224 01 of 61227 seqs 1200)
Refining topology: 64 rounds ME-NNIs, 2 rounds ME-SPRs, 32 rounds ML-NNIs
Total branch-length 3354.442 after 22880.96 sec 61225 splits 2 changes (max delta 0.000)
ML-NNI round 1: LogLk = -5426594.009 NNIs 15530 max delta 39.71 Time 26317.61s (max delta 39.706)
Switched to using 20 rate categories (CAT approximation)20 of 20
Rate categories were divided by 0.818 so that average rate = 1.0
CAT-based log-likelihoods may not be comparable across runs
Use -gamma for approximate but comparable Gamma(20) log-likelihoods
ML-NNI round 2: LogLk = -4861202.655 NNIs 9675 max delta 79.30 Time 29814.23s (max delta 79.297)
ML-NNI round 3: LogLk = -4853176.337 NNIs 5091 max delta 76.04 Time 31487.91s (max delta 76.038)
ML-NNI round 4: LogLk = -4849304.945 NNIs 2610 max delta 72.57 Time 32657.60s (max delta 72.574)
ML-NNI round 5: LogLk = -4847390.216 NNIs 1490 max delta 41.76 Time 33391.13s (max delta 41.762)
33783.40 seconds: ML NNI round 6 of 32, 11401 of 61225 splits, 702 changes (max delta 33.212)

EugeniaSH · June 24, 2024, 9:25pm

I repeated the command with verbose, and it completed the task.
Not sure why .

The last comment was :
Turning off heuristics for final round of ML NNIs (converged)
ML-NNI round 27: LogLk = -4838107.834 NNIs 1625 max delta 11.97 Time 38167.68 (final)elta 11.968)
Optimize all lengths: LogLk = -4837879.773 Time 38755.44
Total time: 49311.52 seconds Unique: 61227/62614 Bad splits: 251/61224 Worst delta-LogLk 14.601

cherman2 · June 25, 2024, 2:31pm

Hi @EugeniaSH,
Just confirming that this error is not occuring anymore?

system · July 26, 2024, 8:32pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.