hmmsearch (HMMER) failure during q2-itsxpress run

Hello Shay and Valentyn,

I'm new here, and I have the same problem.
I downloaded the sequences from NCBI (single end) and prepared the .qza file without meeting difficulties.
However, I met the same problem when using ITSxpress.

Here is the command I used:

qiime itsxpress trim-single\ 
  --i-per-sample-sequences single-end-demux.qza \  
  --p-region ITS1 \                                       
  --p-taxa F \                                         
  --p-threads 10 \
  --o-trimmed demux-single-end-trimmed-itsxpress.qza

I used --verbose to check the details, and here is the result:

[details="Summary"]
This text will be hidden
[/details]
Reading file /tmp/itsxpress_tdz_i0dj/seq.fq.gz 100%
0 nt in 0 seqs
Masking 100%
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 0
Singletons: 0

ERROR:root:Could not perform ITS identification with hmmserach. The error was:
 
Error: Sequence file /tmp/itsxpress_tdz_i0dj/rep.fa is empty or misformatted

Traceback (most recent call last):
  File "/home/chih/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/itsxpress/main.py", line 564, in _search
    p4.check_returncode()
  File "/home/chih/miniconda3/envs/qiime2-2022.11/lib/python3.8/subprocess.py", line 448, in check_returncode
    raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['hmmsearch', '--domtblout', '/tmp/itsxpress_tdz_i0dj/domtbl.txt', '-T', '10', '--cpu', '10', '--tformat', 'fasta', '--F1', '1e-6', '--F2', '1e-6', '--F3', '1e-6', '/home/chih/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/itsxpress/ITSx_db/HMMs/F.hmm', '/tmp/itsxpress_tdz_i0dj/rep.fa']' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/home/chih/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2cli/commands.py", line 352, in __call__
    results = action(**arguments)
  File "<decorator-gen-476>", line 2, in trim_single
  File "/home/chih/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 234, in bound_callable
    outputs = self._callable_executor_(scope, callable_args,
  File "/home/chih/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/qiime2/sdk/action.py", line 381, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/home/chih/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_itsxpress/_itsxpress.py", line 116, in trim_single
    results = main(per_sample_sequences=per_sample_sequences,
  File "/home/chih/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/q2_itsxpress/_itsxpress.py", line 212, in main
    sobj._search(hmmfile=hmmfile, threads=threads)
  File "/home/chih/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/itsxpress/main.py", line 567, in _search
    raise e
  File "/home/chih/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/itsxpress/main.py", line 564, in _search
    p4.check_returncode()
  File "/home/chih/miniconda3/envs/qiime2-2022.11/lib/python3.8/subprocess.py", line 448, in check_returncode
    raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['hmmsearch', '--domtblout', '/tmp/itsxpress_tdz_i0dj/domtbl.txt', '-T', '10', '--cpu', '10', '--tformat', 'fasta', '--F1', '1e-6', '--F2', '1e-6', '--F3', '1e-6', '/home/chih/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/itsxpress/ITSx_db/HMMs/F.hmm', '/tmp/itsxpress_tdz_i0dj/rep.fa']' returned non-zero exit status 1.

Plugin error from itsxpress:

  Command '['hmmsearch', '--domtblout', '/tmp/itsxpress_tdz_i0dj/domtbl.txt', '-T', '10', '--cpu', '10', '--tformat', 'fasta', '--F1', '1e-6', '--F2', '1e-6', '--F3', '1e-6', '/home/chih/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/itsxpress/ITSx_db/HMMs/F.hmm', '/tmp/itsxpress_tdz_i0dj/rep.fa']' returned non-zero exit status 1.

See above for debug info.

I check the /tmp/itsxpress_tdz_i0dj/rep.fa file. It's empty.
However, I used qiime tools validate single-end-demux.qza to check my .qza file, and the .qza file is valid.
Result single-end-demux.qza appears to be valid at level=max.

I am trying to figure out what causes this error. And I don't know where to begin to fix this problem.
Could anyone help me, please? Thank you very much!!

Best,
Betty

Hello Betty,

I moved your reply to a separate thread, as it's a different issue.
In this case, the failure is due to HMMER run.
Try to run the command from Plugin error message in the command line without any brackets, comas, quotes - this should give more info on why hmmsearch fails.

Cheers,
V

1 Like

Hello Valentyn,

Thanks for your reply.:slight_smile:

I tried the command from plugin error:

hmmsearch --domtblout /tmp/itsxpress_tdz_i0dj/domtbl.txt -T 10 --cpu 10 --tformat fasta --F1 1e-6 --F2 1e-6 --F3 1e-6 /home/chih/miniconda3/envs/qiime2-2022.11/lib/python3.8/site-packages/itsxpress/ITSx_db/HMMs/F.hmm /tmp/itsxpress_tdz_i0dj/rep.fa

Here is the outcome:
Error: Sequence file /tmp/itsxpress_tdz_i0dj/rep.fa is empty or misformatted

What should I do now?

Best,
Betty

It seems like sequences are empty or misformatted.
It's suspicious, could you please check this rep.fa file?
Or attach it here, if you're not sure.

Cheers,
V

Hello Valentyn,

I check the rep.fa file. It's empty. :smiling_face_with_tear:
I don't know if the problem was caused by my .qza file. Here (ITSxpress - Google Drive) is my .qza file, manifest.tsv, metadata.tsv, and the empty rep.fa file.

Here is how I prepared the .qza file:

  1. Download the sequences from NCBI by using SRAtoolkit
~/apps/sratoolkit.3.0.2-ubuntu64/bin/prefetch -O ~/NCBI_Data/Toju2018 --option-file Toju2018_Runs.txt
  1. Then prepare the .qza file follow the tutorial here.
#move all the .sra out of their folders.
find . -name '*.sra' -print0 | xargs -0 mv -t .
find . -type d -empty -delete
#Convert .sra files into fastq files
ls *.sra | parallel -j0 fastq-dump --split-files --origfmt {}
mkdir fastq
mv *.fastq fastq
mkdir sra
mv *.sra sra

#Prepare manifest.tsv file
mkdir manifest
echo "# single-end PHRED 33 fastq manifest file for forward reads" > manifest1.txt
echo -e "sample-id\tabsolute-filepath" >> manifest1.txt
ls *.fastq | cut -d "_" -f 1 | sort | uniq | parallel -j0 --keep-order 'echo -e "{/}\t"$PWD"/{/}_1.fastq"' | tr -d "'" > manifest2.txt
cat manifest1.txt manifest2.txt > manifest/manifest.tsv
rm *.txt

#import data into qiime2
conda activate qiime2-2022.11
NCORES=24
qiime tools import \
--type 'SampleData[SequencesWithQuality]' \
--input-path fastq/manifest/manifest.tsv \
--output-path single-end-demux.qza \
--input-format SingleEndFastqManifestPhred33V2 

#Then the .qza file is prepared
Imported fastq/manifest/manifest.tsv as SingleEndFastqManifestPhred33V2 to single-end-demux.qza

I'm not sure if I made any mistake in the .qza file preparation.
Could you help me, please? Thank you very much!

Best,
Betty

Hello, Betty!

Until this step, everything looks okay and data is imported correctly. You should have a demux quality visualization to check whether quality scores are correctly displayed - the data import went ok.
Is there any other preprocessing before q2-itsxpress?

Cheers
V

Hello Valentyn,

Thank you for your reply. :blush:
Yes, I checked the .qzv file, and the quality scores are fine (>30).
Here is the .qzv file. single-end-demux.qzv (294.0 KB)

I immediately started the qiime itsxpress after obtaining the .qza file from qiime2.

By the way, I tried qiime cutadapt today using the same .qza file I gave to qiime itsxpress.
Everything went smoothly.

I noticed that the result of the .qzv file obtained from cutadapt is similar to the .qzv file from untrimmed sequences (the single-end-demux.qzv file I attached above).
Here is the .qzv file obtain from qiime cutadapt cutadapt-demux-single-trimmed-seqs.qzv (299.5 KB).
Maybe the author uploaded the sequences that were already trimmed to SRA, instead of the raw sequences.

Will that be the reason why qiime itsxpress doesn't work?
Here is the original paper.

Best,
Betty.

It might be the case, stats are identical. People upload different kinds of stuff to SRA and there is no control over it.

However, my expertise with ITS and itsxpress is limited. The plugin developer visits the forum once in a while, he might be more knowledgable in this topic.

Cheers,
V

Hello Valentyn,

Thanks for sharing. I will try itsxpress by using another dataset.
I hope the problem can be solved soon. Thank you for discussing the topic with me. :grin:

Best,
Betty.