phylogeny align-to-tree-mafft-fasttree

Hello,
I am using qiime2 2024.10 amplicon version as a singularity container that runs in nextflow on a slurm HPC.

While plugins prior to phylogeny align-to-tree-mafft-fasttree (e.g. import, dada2, classification with scikit, ...) work fine after setting environment variables and bindings, mafft drives me close to crazy.

In nextflow.config following entries seem relevant:

env{
	SINGULARITY_TMPDIR = '/tmp'
	MPLCONFIGDIR = '/tmp'	
	NUMBA_CACHE_DIR = '/tmp'
}
conda{
	useMamba = false
	enabled = true
	cacheDir = "/data/users/${username}/Nextflow/CondaCache"
	createTimeout = "2 h"
}

singularity {
    enabled = true
	autoMounts=false
	cacheDir = "/data/users/${username}/Nextflow/SingularityCache"
	runOptions = "-B /data/users/${username}:/data/users/${username} -B \${TMP_LOCAL}:/tmp "
	envWhitelist = ['SINGULARITY_TMPDIR','MPLCONFIGDIR','NUMBA_CACHE_DIR']
}

process {
	withLabel: 'qiime2' {
    beforeScript = """
					module add singularity
				"""
        container = 'docker://quay.io/qiime2/amplicon:2024.10'
    }
}

I don't know if i miss important env variables or bindings, but I feel that for the most part I should be ready to sail.

The verbose output of phylogeny align-to-tree-mafft-fasttree:

ERROR ~ Error executing process > 'analyzeDiversity:makeTree (Creating phylogenetic tree)'

Caused by:
  Process `analyzeDiversity:makeTree (Creating phylogenetic tree)` terminated with an error exit status (1)


Command executed:

  qiime phylogeny align-to-tree-mafft-fasttree --i-sequences repseqs.qza --p-n-threads 18 --o-alignment alignemt.qza --o-masked-alignment masked_alignment.qza --o-tree unrooted_tree.qza --o-rooted-tree rooted_tree.qza --verbose

Command exit status:
  1

Command output:
  Running external command line application. This may print messages to stdout and/or stderr.
  The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.
  
  Command: mafft --preservecase --inputorder --thread 18 /tmp/qiime2/#username#/data/88de1b5e-2ff3-4639-914c-7a951a6bb348/data/dna-sequences.fasta

Command error:
  /opt/conda/envs/qiime2-amplicon-2024.10/bin/mafft: line 1147: /_codonpos: Read-only file system
  /opt/conda/envs/qiime2-amplicon-2024.10/bin/mafft: line 1148: /_codonscore: Read-only file system
  /opt/conda/envs/qiime2-amplicon-2024.10/bin/mafft: line 1149: /_seedtablefile: Read-only file system
  /opt/conda/envs/qiime2-amplicon-2024.10/bin/mafft: line 1150: /_lara.params: Read-only file system
  /opt/conda/envs/qiime2-amplicon-2024.10/bin/mafft: line 1151: /pdblist: Read-only file system
  /opt/conda/envs/qiime2-amplicon-2024.10/bin/mafft: line 1152: /ownlist: Read-only file system
  /opt/conda/envs/qiime2-amplicon-2024.10/bin/mafft: line 1153: /_externalanchors: Read-only file system
  grep: /infile: No such file or directory
  /opt/conda/envs/qiime2-amplicon-2024.10/bin/mafft: line 1828: [: -gt: unary operator expected
  grep: /infile: No such file or directory
  /opt/conda/envs/qiime2-amplicon-2024.10/bin/mafft: line 1837: [: -eq: unary operator expected
  /opt/conda/envs/qiime2-amplicon-2024.10/bin/mafft: line 1844: [: too many arguments
  mv: cannot stat 'infile': No such file or directory
  inputfile = orig
  Cannot open orig
  Traceback (most recent call last):
    File "/opt/conda/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/q2cli/commands.py", line 530, in __call__
      results = self._execute_action(
    File "/opt/conda/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/q2cli/commands.py", line 608, in _execute_action
      results = action(**arguments)
    File "<decorator-gen-609>", line 2, in align_to_tree_mafft_fasttree
    File "/opt/conda/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/qiime2/sdk/action.py", line 299, in bound_callable
      outputs = self._callable_executor_(
    File "/opt/conda/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/qiime2/sdk/action.py", line 651, in _callable_executor_
      outputs = self._callable(ctx, **view_args)
    File "/opt/conda/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/q2_phylogeny/_align_to_tree_mafft_fasttree.py", line 19, in align_to_tree_mafft_fasttree
      aligned_seq, = mafft(sequences=sequences, n_threads=n_threads,
    File "<decorator-gen-911>", line 2, in mafft
    File "/opt/conda/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/qiime2/sdk/context.py", line 125, in deferred_action
      return action_obj._bind(
    File "<decorator-gen-915>", line 2, in mafft
    File "/opt/conda/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/qiime2/sdk/action.py", line 299, in bound_callable
      outputs = self._callable_executor_(
    File "/opt/conda/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/qiime2/sdk/action.py", line 570, in _callable_executor_
      output_views = self._callable(**view_args)
    File "/opt/conda/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/q2_alignment/_mafft.py", line 133, in mafft
      return _mafft(sequences_fp, None, n_threads, parttree, False, False)
    File "/opt/conda/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/q2_alignment/_mafft.py", line 105, in _mafft
      run_command(cmd, result_fp)
    File "/opt/conda/envs/qiime2-amplicon-2024.10/lib/python3.10/site-packages/q2_alignment/_mafft.py", line 26, in run_command
      subprocess.run(cmd, stdout=output_f, check=True)
    File "/opt/conda/envs/qiime2-amplicon-2024.10/lib/python3.10/subprocess.py", line 526, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['mafft', '--preservecase', '--inputorder', '--thread', '18', '/tmp/qiime2/#username#/data/88de1b5e-2ff3-4639-914c-7a951a6bb348/data/dna-sequences.fasta']' returned non-zero exit status 1.
  
  Plugin error from phylogeny:
  
    Command '['mafft', '--preservecase', '--inputorder', '--thread', '18', '/tmp/qiime2/#username#/data/88de1b5e-2ff3-4639-914c-7a951a6bb348/data/dna-sequences.fasta']' returned non-zero exit status 1.
  

First /opt/conda/envs/qiime2-amplicon-2024.10/bin/mafft: line 1147: /_codonpos: Read-only file system to me means, that write access in the container is needed?

-B ${TMP_LOCAL}:/tmp should give me plenty of space bound to /tmp, as TMP_LOCAL is actually implemented to overcome limitations with /tmp.

I allready was checking for alternative plugins to MAFFT, but do not want to skip phylogenetic metrics in diverstity completely.

Hoping for clues where to look for adjustment.

1 Like

A little update....

Since i saw in another forum post that an updated version of the container was availlable due to a problem that was not completely different from mine (not MAFFT but I/O and permissions),
I deleted my singularity image and restarted my pipeline.

I ran into the error [30] with permissions on /home/qiime2, previously working processes did not anymore and after further reading in the forum put further bindings for singularity while keeping automounts false.

In case it might be of help for someone stumbling accross this:
In nextflow config the updated bindings:

runOptions = "-B /data/users/${username}:/data/users/${username} -B \${TMP_LOCAL}:/tmp -B /data/users/${username}/Nextflow/qiime2home:/home/qiime2"

Data is obviously my data storage.
TMP_LOCAL is a shared /tmp on my cluster and grants writable /tmp to qiime2.

/home/qiime2 is additionally added to get a writable folder and things work fine for processes outside of MAFFT again, which has the same error as described earlier:

...
  /opt/conda/envs/qiime2-amplicon-2024.10/bin/mafft: line 1147: /_codonpos: Read-only file system
  /opt/conda/envs/qiime2-amplicon-2024.10/bin/mafft: line 1148: /_codonscore: Read-only file system
  /opt/conda/envs/qiime2-amplicon-2024.10/bin/mafft: line 1149: /_seedtablefile: Read-only file system
...

I then tried binding a readable folder to /opt, which obviously was leading to the command qiime being lost as the conda env resides in /opt in the container, which i cannot override.

So as far as i am concerned, I would argue that something mafft specific happens which is based on some wrting process within the container?

The fast itself seems fine:
path in tmp, artifact name and number of sequences (~3000) seem ok.

Will try to puzzle out whether a writable sandbox container helps, though i was trying to avoid that.

1 Like

I found this section on MAFFT specific temp dirs.
MPI-parallelization of MAFFT.

The location of temporary directory can be specified by the MAFFT_TMPDIR environmental variable. If this is not set, $HOME/maffttmp/ is automatically created and used as temporary directory. The temporary directory must be shared by all hosts. If your system has high-speed shared filesystem, such as Lustre, then use it as temporary directory.

I've not worked with Singularity in quite a while. I wish I could be of more help here.

2 Likes
...
[3b/ab3826] process > analyzeDiversity:makeTree (Creating phylogenetic tree)                                     [100%] 1 of 1 ✔
Completed at: 05-Dec-2024 18:25:10
Duration    : 1m 42s
CPU hours   : 2.4 (90.5% cached)
Succeeded   : 6
Cached      : 11

I would say that you could not have been of more help.
Tank you!

Our too old HPC glibc meant that I had to switch from conda to learning singularity the hard way...

Key nextflow.conf settings (translate to singularity without NF) in case someone else stumbles accross this with similar issues:

username=System.getenv('USER')
env{
	MPLCONFIGDIR = '/tmp'	
	NUMBA_CACHE_DIR = '/tmp'
	MAFFT_TMPDIR = '/tmp'
}
singularity {
    enabled = true
	autoMounts=false
	cacheDir = "/data/users/${username}/Nextflow/SingularityCache"
	runOptions = "-B /data/users/${username}:/data/users/${username} -B \${TMP_LOCAL}:/tmp -B /data/users/${username}/Nextflow/qiime2home:/home/qiime2" 
	envWhitelist = ['MPLCONFIGDIR','NUMBA_CACHE_DIR, MAFFT_TMPDIR']
}
// "/data/users/${username}/ is a cluster wide accesible storage, bound as is to keep paths identical
// "${TMP_LOCAL}" is a cluster wide accessible fast storage for /tmp
// Writable storage space is bound to /home/qiime2 for qiime2 caching.
1 Like