Plugin error from itsxpress with AVITI data (FASTQ quality value (45) above qmax (41)

Irshad · September 5, 2024, 8:44pm

I am using the university HPC. I installed qiime2 locally with miniforge. My samples were sequenced using the AVITI platform (300 bp; paired end; 16S amplicon as well as ITS2). I have run into problem while running the following command:

% qiime itsxpress trim-pair-output-unmerged \
--i-per-sample-sequences ITS2_analyses_qiime2/sequences-trimmed-primers-ends.qza \
--p-region ITS2 \
--p-taxa F \
--p-cluster-id 1.0 \
--p-threads 40 \
--o-trimmed ITS2_analyses_qiime2/sequences-trimmed-primers-ends-exact_ITS2.qza

This is the error I got:

more qiime2-q2cli-err-y0hubkek.log
ERROR:root:Could not perform read merging with vsearch. Error from vsearch was:
vsearch v2.22.1_linux_x86_64, 503.3GB RAM, 128 cores

Merging reads

Fatal error: FASTQ quality value (45) above qmax (41)
By default, quality values range from 0 to 41.
To allow higher quality values, please use the option --fastq_qmax 45
Traceback (most recent call last):
File "/users/3/ihaq/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/itsxpress/SeqSamplePaired.py", line 63, in merge_reads
p1.check_returncode()
File "/users/3/ihaq/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/subprocess.py", line 460, in check_returncode
raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['vsearch', '--fastq_mergepairs', '/users/3/ihaq/qiime2/ihaq/data/e206104f-2103-4d9f-9ac5-3253e4373b69/data/B10_ITS2_1_L001_R1_001.fastq.gz', '--rever
se', '/users/3/ihaq/qiime2/ihaq/data/e206104f-2103-4d9f-9ac5-3253e4373b69/data/B10_ITS2_170_L001_R2_001.fastq.gz', '--fastqout', '/users/3/ihaq/itsxpress_omlg5na/seq.fq', '--fastq_maxdiffs'
, '40', '--fastq_maxee', '2', '--threads', '40', '--fastq_allowmergestagger']' returned non-zero exit status 1.
Traceback (most recent call last):
File "/users/3/ihaq/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 520, in call
results = self._execute_action(
File "/users/3/ihaq/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/q2cli/commands.py", line 581, in _execute_action
results = action(**arguments)
File "", line 2, in trim_pair_output_unmerged
File "/users/3/ihaq/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 342, in bound_callable
outputs = self.callable_executor(
File "/users/3/ihaq/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/qiime2/sdk/action.py", line 576, in callable_executor
output_views = self._callable(**view_args)
File "/users/3/ihaq/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/itsxpress/q2_itsxpress.py", line 148, in trim_pair_output_unmerged
results = main(per_sample_sequences=per_sample_sequences,
File "/users/3/ihaq/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/itsxpress/q2_itsxpress.py", line 197, in main
sobj = _set_fastqs_and_check(
File "/users/3/ihaq/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/itsxpress/q2_itsxpress.py", line 76, in _set_fastqs_and_check
sobj._merge_reads(threads=threads,stagger=allow_staggered_reads)
File "/users/3/ihaq/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/itsxpress/SeqSamplePaired.py", line 67, in _merge_reads
raise e
File "/users/3/ihaq/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/site-packages/itsxpress/SeqSamplePaired.py", line 63, in merge_reads
p1.check_returncode()
File "/users/3/ihaq/.conda/envs/qiime2-amplicon-2024.5/lib/python3.9/subprocess.py", line 460, in check_returncode
raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['vsearch', '--fastq_mergepairs', '/users/3/ihaq/qiime2/ihaq/data/e206104f-2103-4d9f-9ac5-3253e4373b69/data/B10_ITS2_1_L001_R1_001.fastq.gz', '--rever
se', '/users/3/ihaq/qiime2/ihaq/data/e206104f-2103-4d9f-9ac5-3253e4373b69/data/B10_ITS2_170_L001_R2_001.fastq.gz', '--fastqout', '/users/3/ihaq/itsxpress_omlg5na/seq.fq', '--fastq_maxdiffs'
, '40', '--fastq_maxee', '2', '--threads', '40', '--fastq_allowmergestagger']' returned non-zero exit status 1.

Prior to this I used cutadapt as follows:


qiime cutadapt trim-paired \
--p-cores 30 \
--i-demultiplexed-sequences ITS2_analyses_qiime2/sequences.qza \
--p-front-f TCGATGAAGAACGCAGCG \
--p-front-r TCCTCCGCTTATTGATATGC \
--p-match-read-wildcards \
--p-match-adapter-wildcards \
--p-discard-untrimmed \
--o-trimmed-sequences ITS2_analyses_qiime2/sequences-trimmed-primers-ends.qza \
--verbose

Here is the qzv file
sequences-trimmed-primers-ends.qzv (324.8 KB)

I also tried cutadapt with the following:

qiime cutadapt trim-paired \
--p-cores 30 \
--i-demultiplexed-sequences ITS2_analyses_qiime2/sequences.qza \
--p-front-f TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTCGATGAAGAACGCAGCG \
--p-front-r GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCCTCCGCTTATTGATATGC \
--p-match-read-wildcards \
--p-match-adapter-wildcards \
--p-discard-untrimmed \
--o-trimmed-sequences ITS2_analyses_qiime2/sequences-trimmed-primers.qza \
--verbose

Here is the qzv file
sequences-trimmed-primers.qzv (324.9 KB)

Here is the original qzv file (prior to processing with cutadapt).
sequences.qzv (319.3 KB)

Can someone help me overcome this issue? Thank you for your time and help.

salias · September 10, 2024, 9:01am

Hi @Irshad ! And welcome back to the :qiime2: forum.

Sorry for the delay, I've been thinking how to address this issue properly and time flies!

The error you remark is raised because quality scores in FASTQ files generated from Illumina sequencing should not exceed 41. However, since AVITI was used in this case, it is possible to obtain quality scores as high as Q50!

The solution to the issue appears to be as the error suggests: using the --fastq_qmax 45 option in vsearch (one of the steps within ITSxpress pipeline). AFAIK we cannot modify that option within q2-itsxpress, so maybe the best option here is to adapt your data to Illumina format.

Looking at the q2-itsxpress tutorial, in the PacBio experimental section, they reformat PacBio scoring to Illumina scoring convention using bbmap reformat.sh. This is their example:

If you look at that maxcalledquality in the bbmap reformat.sh GitHub:

maxcalledquality=41     Quality scores capped at this upper bound.

I've never used reformat.sh so I'm not sure how it works. My intuition tells me that using it with maxcalledquality=41 would convert any quality over 41 to 41. We would lose information but that could be an option.

This option also looks interesting:

recalibrate=f           (recal) Recalibrate quality scores.  Must first generate matrices with CalcTrueQuality

But again, I'm not familiar with the tool. This is just in case you want to take the adventure of exploring.

Of course, another option could be just skipping ITSxpress. But I understand this is not what you want.

Sergio

Adam_Rivers · September 10, 2024, 1:40pm

Hi, I'm the ITSxpress developer. Using BBTools reformat on your input data is probably the fastest option, and it would have no impact on the final results. ITSxpress only uses the Qual scores temporarily in the merging step and the difference between a 40 (1:10,000 error rate) and a 50 (1:100,000 error rate) is irrelevant because it is only used when there is a base pair mismatch in the alignment of the reads, but when you have a base with a .0001 error rate * a base with a 0.00001 error rate interacting, the chance of them not matching is extremely small. Plus, all the other overlapping sites are used for the alignment too. Then finally, the original higher read scores are returned at the end of the analysis.

That said, it is annoying to have to reformat your reads, so I'lltest changing an internal setting to allow for higher Qual scores, so this error does not occur for future "high scoring" users.

As a side note, I don't think you need to use cutadapt before ITSxpress since the ITS cut sites are internal to the primer sites, but I've never tested the difference.

salias · September 10, 2024, 1:52pm

Thank you @Adam_Rivers for the explanation!

I forgot to mention that! As a regular ITSxpress user, I can confirm.

Irshad · September 10, 2024, 4:09pm

Hi @salias,

Thank you for your suggestions. I will take the reformat route and see if it works.

Irshad · September 10, 2024, 4:15pm

Hi @Adam_Rivers,

Thank you for contributing itsxpress to qiime2. It would be a great addition if the future releases address this error for high quality data. I will get back here with further updates, after I run the analysis with the reformatted data.

Irshad · September 16, 2024, 6:30pm

@salias @Adam_Rivers The problem with FASTQ quality was resolved using the bbmap reformat.sh on my input data.

Thank you for the support.

Adam_Rivers · September 17, 2024, 4:06pm

Would you be willing to share your sequences with me, either sequences-trimmed-primers-ends.qza or sequences.qza or a small sample from them? Fastq format would work too. I'm testing a fix for this issue and it would be convenient to use your data if you are able to share it.

Adam_Rivers · September 19, 2024, 8:03pm

I added support for quality scores over 41 to ITSxpress version 2.1.1, which I released today on Github. It may take a day ot two for the updated version to appear on Bioconda.

salias · September 19, 2024, 8:13pm

Thank you @Adam_Rivers !

system · October 21, 2024, 2:13am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.