q2-ITSxpress generates empty sequences?

Hey All,
I'm trying to run q2-ITSxpress on my samples. The plugin has crushed and the error message says that file "BC77_73_L001_R2_001.fastq.gz" (which doesn't exist) has a missing sequence on line 17.

My fastq files were processed by Casava and not trimmed \ merged otherwise;
file "BC77_S75_L001_R2_001.fastq.gz" has no missing sequences, as far as I can tell. Those files were processed by qiime2-2017.9 (without q2-ITSxpress, of course) without a problem.

From the log file it seems that vsearch crashes after ITSxpress clustering has completed, could ITSxpress generate empty reads?

More details follow. Thanks!

The command used to run ITSxpress:
qiime itsxpress trim-pair-output-unmerged --i-per-sample-sequences sequences.qza --p-region ITS2 --p-taxa F --o-trimmed trimmed.qza --p-threads 4

File seq.fq.gz (1.7 MB)

log file:

Reading file /tmp/itsxpress_cxfbgk37/seq.fq.gz 100%
8295000 nt in 26665 seqs, min 36, max 588, avg 311
Masking 100%
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 541 Size min 1, max 10572, avg 49.3
Singletons: 246, 0.9% of seqs, 45.5% of clusters

in write seqs
in write seqs
Traceback (most recent call last):
File "/home/omer/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/plugin/model/file_format.py", line 24, in validate
self.validate(level)
File "/home/omer/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_types/per_sample_sequences/_format.py", line 159, in validate
self._check_n_records(record_count_map[level])
File "/home/omer/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_types/per_sample_sequences/_format.py", line 130, in _check_n_records
% (i * 4 + 1))
qiime2.plugin.model.base.ValidationError: Missing sequence for record beginning on line 17

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/omer/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2cli/commands.py", line 274, in call
results = action(**arguments)
File "", line 2, in trim_pair_output_unmerged
File "/home/omer/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/home/omer/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 390, in callable_executor
spec.qiime_type, output_view, spec.view_type, prov)
File "/home/omer/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/result.py", line 244, in _from_view
result = transformation(view)
File "/home/omer/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/core/transform.py", line 68, in transformation
self.validate(view)
File "/home/omer/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/core/transform.py", line 143, in validate
view.validate('min')
File "/home/omer/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/plugin/model/directory_format.py", line 171, in validate
getattr(self, field)._validate_members(collected_paths, level)
File "/home/omer/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/plugin/model/directory_format.py", line 101, in _validate_members
self.format(path, mode='r').validate(level)
File "/home/omer/anaconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/plugin/model/file_format.py", line 29, in validate
) from e
qiime2.plugin.model.base.ValidationError: /tmp/q2-SingleLanePerSamplePairedEndFastqDirFmt-k502v7lm/BC77_73_L001_R2_001.fastq.gz is not a(n) FastqGzFormat file:

Missing sequence for record beginning on line 17

Hi @omera.WIS!

Unfortunately I don’t know enough to answer (hopefully @Adam_Rivers can help), but I just wanted to say how well structured your topic is, so thanks!


If q2-ITSxpress does generate empty sequences as a rule, we’ll need to figure out another way to handle that information. One option might be to return multiple artifacts, one being the successfully handled reads and the other being the collection of failures.

1 Like

Hi @omera.WIS,

This is an error I’ve never seen before, I’ll take a look. Could you run these commands and let me know the two version numbers?

pip show itsxpress
pip show q2-itsxpress

Do you have the command(s) you ran to create the sequences.qza file? If you are able to share the sequences.qza file the might be the easiest way to recreate the error, feel free to PM me a link to the file for download if you like.

I’ve never run qza file created from old version like 2017.9 through a new qiime version like 2018.8. @ebolyen, is it possible that changes to the way q2-SingleLanePerSamplePairedEndFastqDirFmt are formatted could caulse the FastqGzFormat error?

Theoretically although I don't think anything has changed for that format (and ideally formats in general don't change, we just make a new name).

That being said, validation of formats has definitely changed in the interim, so it's possible that what QIIME 2 2017.9 thought was OK, 2018.8 might balk at.

@omera.WIS, we could test that by running qiime tools export and qiime tools import successively to see if the data still works (or perhaps easier, you could just run qiime tools validate on the current qza you have).

@ebolyen, thanks :slight_smile:

@Adam_Rivers I apologize, I didn’t explain myself properly; The files were imported using qiime2-2018.8. An older version of qiime was used previously , it didn’t satisfy me in terms of results’ quality so I thought to improve the pipeline with ITSxpress. Technically speaking, those files were processed by qiime2-2017.9 successfully.

Files were imported with
qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path $PWD/sample/ --input-format CasavaOneEightSingleLanePerSampleDirFmt --output-path demux-paired-end.qza

Please note that renamed demux-paired-end.qza to sequences.qza, manually, I didn’t give it much of a thought. I don’t know if that’s “allowed” but the error seems unrelated to me anyway.

I’ll upload the files somewhere and contact you. itsxpress version number is 1.7.1. and q2-itsxpress is 1.7.2. Both of them are freshly installed.

Thanks for the quick response!

Filenames are always at your discretion :slight_smile: we have some "conventions", but the type is all QIIME 2 actually looks at.

1 Like

Could you send me a link to your input file? All the testing for ITSxpress was done with data imported in PairedEndFastqManifestPhred33 format not CasavaOneEightSingleLanePerSampleDirFmt. I think ITSxpress may be making some incorrect assumptions about the structure of the input data. This issue came up a second time here Trimming fungal demultiplexed paired-end sequences before dada2.

I’m still happy to look at your files directly, but another user had this issue and it turned out that their file had contaminants not fungi and was not generating any fungal reads. I’m planning to update itsxpress to give a helpful error message and process the other samples rather than just crashing when it encounters this error. More information on this error is here: Trimming fungal demultiplexed paired-end sequences before dada2.

Apologies for the late response, I wasn’t around. I’ll send you a link to the file.

I assume it is the same issue, I know that I have some contaminated samples. For now I’ll use the standalone version on each sample separately.

Thanks!

ITSxpress v1.7.2 fixes bug causing “Line 17” errors.

@omera.WIS @einamart @bsen2018 @Nicholas_Bokulich @ebolyen

Impact

  • Fixes an error that occasionally caused come qiime itsxpress runs to crash.
  • Fixes an error in the way that HMMER data was parsed that caused about 0.2% of reads to be incorrectly trimmed, including some to be trimmed to length 0 which was causing the Qiime issue.
  • For more details see: https://github.com/USDA-ARS-GBRU/itsxpress/issues/8

How to fix

To update make sure that both itsxpress and q2-itsxpress are at version 1.7.2

From conda (should be available in a few hours):
conda install -c bioconda itsxpress==1.7.2

Or if you can’t wait: from PIP:
pip install itsxpress==1.7.2

The plugin is versioned differently and requires itsxpress. You must update itsxpress to get the fix. You should check that the qiime plugin q2-itsxpress is also up to date:

pip install q2-itsxpress==1.7.2

4 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.