ITSxpress error: Missing sequence for record beginning on line ...

Dear @Adam_Rivers and all,

I have been trying to trim some ITS sequences using ITSxpress 1.8.0 (both plugin and standalone) on QIIME2 2022.2.1. The primers used were ITS1f (CTTGGTCATTTAGAGGAAGTAA) and ITS2 (GCTGCGTTCTTCATCGATGC), so only the ITS1 region should be amplified, which as far as I know has a size of 250-600 bp. 2x250 bp reads were sequenced on Illumina MiSeq. First I ran the following:

qiime itsxpress trim-pair-output-unmerged \

--i-per-sample-sequences ITS-demux-paired-end.qza
--p-region ITS1
--p-taxa F
--p-cluster-id 1.0
--p-threads 10
--o-trimmed ITS_trimmed_exact.qza
--verbose

Then I got an error saying:

Plugin error from itsxpress:

/tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-5ow5617m/ITS_15_040322-15-05_treefern_Actae01_74_L001_R2_001.fastq.gz is not a(n) FastqGzFormat file:

Missing sequence for record beginning on line 5

See above for debug info.

I have seen two or three posts dealing with a very similar matter and was wondering how I could solve this. Is it that that the fragment size could be too long to merge the reads or could it be something else? How could I get past this? I could DM the demultiplexed data if it would help.

Best,
Tsvetoslav

You can send me the data. Could you unpack your qza object and verify they the input file giving the error is actually gzipped as expected?

1 Like

An update on this, I ran your data and got a FastqGzFormat file validation error on a different output file. @seinarsson and I are going to look into this issue more and see what we can figure out.

1 Like

Okay, this happened because many of your reverse mate-pair reads are very short. Illumina has started providing trimming opinions during data acquisition so you don't always get 300 bases for every read on a 2x300 run anymore if in-run trimming has been done by your sequence provider.

In some cases, if the input read is very short ITSxpress will trim away the 5.8S of your reverse read and nothing will be left. The command line version of ITSxpress just outputs an empty reverse read to the Fastq without complaining but Qiime has a validation step that raises the error you encountered if any reads are length 0.

For your data, my advice is to use only our forward reads and run ITSxpress and Dada2 in single-read mode.

I will release a new version of ITSxpress shortly that will not write a trimmed mate-pair if either read is length 0 and will raise a warning informing the user of how many 0-length reads are encountered.

Thanks to @seinarsson for working with me to figure this out.

1 Like