...temporary files that no longer exist.

ilasadar · November 28, 2024, 8:01pm

Hi,
I am using:

QIIME 2 Plugin 'rescript' version 2024.5.1 (from package 'rescript' version 2024.5.1).
Cluster (--mem=200G, -c 12)

- Partial Rescript Code: qiime rescript extract-seq-segments \
  --i-input-sequences ${X} \
  --i-reference-segment-sequences /scratch/username/rescript/filtered/${Y}.qza \
  --p-perc-identity 0.8 \
  --p-min-seq-len ${min_seq_len} \
  --p-threads 12 \
  --o-extracted-sequence-segments ${extracted_output} \
  --o-unmatched-sequences ${unmatched_output} \
  --verbose

One of the slurm*.out gives me: 
SLURM_ARRAY_TASK_ID: 188
X: /scratch/username/rescript/filtered/genbank_8_b/genbank_8_b.fasta.split/genbank_8_b.part_035.qza
Y: ref_ch
min_seq_len: 29
vsearch v2.22.1_linux_x86_64, 376.4GB RAM, 32 cores
https://github.com/torognes/vsearch

Reading file /scratch/username/metacurator/temp/qiime2/username/data/0f6bab7d-ff06-41ad-9942-81d12653a6a3/data/dna-sequences.fasta 100%
171767 nt in 1174 seqs, min 109, max 211, avg 146
Masking 100%
Counting k-mers 100%
Creating k-mer index 100%
Searching 100%
Matching unique query sequences: 95 of 769 (12.35%)
Running external command line application. This may print messages to stdout and/or stderr.
The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist.

Command: vsearch --usearch_global /scratch/username/metacurator/temp/qiime2/username/data/d90eac1e-3c72-4a54-a72f-a606981bc119/data/dna-sequences.fasta --db /scratch/username/metacurator/temp/qiime2/username/data/0f6bab7d-ff06-41ad-9942-81d12653a6a3/data/dna-sequences.fasta --id 0.8 --strand plus --threads 12 --qmask none --qsegout /scratch/username/metacurator/temp/q2-DNAFASTAFormat-4gbjoyb9 --notmatched /scratch/username/metacurator/temp/q2-DNAFASTAFormat-_lpy2p7u --minseqlength 29

Saved FeatureData[Sequence] to: /scratch/username/rescript/filtered/genbank_8_b/genbank_8_b.fasta.split/processed/genbank_8_b.part_035.qza_ref_ch_extracted.qza
Saved FeatureData[Sequence] to: /scratch/username/rescript/filtered/genbank_8_b/genbank_8_b.fasta.split/processed/genbank_8_b.part_035.qza_ref_ch_unmatched.qza

I am running multiple files using the same reference sequences (4 trnL reference sequences: CD, CH, GH, GD). It seems the reference (seed) sequence is always saved under the same temporary folder (-db /scratch/username/metacurator/temp/qiime2/username/data/0f6bab7d-ff06-41ad-9942-81d12653a6a3/data/dna-sequences.fasta). {{{ --usearch_global /scratch/username/metacurator/temp/qiime2/username/data/d90eac1e-3c72-4a54-a72f-a606981bc119/data/dna-sequences.fasta appears only once in the slurm output }}}. The following command confirms this, as the same temporary folder ID appears in multiple SLURM output files: [ grep -l "0f6bab7d-ff06-41ad-9942-81d12653a6a3" ./slurm*
./slurm-4266924_158.out
./slurm-4266924_161.out
./slurm-4266924_164.out
./slurm-4266924_167.out
./slurm-4266924_170.out
./slurm-4266924_173.out
./slurm-4266924_176.out
./slurm-4266924_179.out
./slurm-4266924_182.out
./slurm-4266924_185.out
./slurm-4266924_188.out
./slurm-4266924_191.out ]

This behavior may lead to the error message: "The command being run is below. This command cannot be manually re-run as it will depend on temporary files that no longer exist." error.

However, the output file is still created, likely using a partially converted seed FASTA file from another run.

My question is: Is my understanding correct? If so, this implies that I cannot run multiple files using the same seed sequence simultaneously. Do you have any suggestions to circumvent this issue?

{I have hundreds of files to run }

lizgehret · December 2, 2024, 7:15pm

Hi @ilasadar,

Welcome to the :qiime2: forum!

That message is in regards to the intermediate files that are created during the subprocess call that's happening behind the scenes. This won't affect your ability to re-run your actual input command (as that depends on files that you already have on your HPC).

Hope this helps! Let us know if you run into any issues, but you should be completely fine to run this in parallel on your HPC.

Cheers

ilasadar · December 3, 2024, 5:54pm

Hi Liz,
Thank you for your response. That was very helpful, put my mind at ease

system · January 3, 2025, 11:54pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.