Error importing with q2-itsxpress

The ITS2 data I have imported reads this error when running qiime itsxpress trim-pair-output-unmerged.

The format is fine to go straight into DADA2 but produces this error when attempting to run ITSxpress:

Plugin error from itsxpress:

/tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-_dtvfh39/NEG08_Rep3_767_L001_R1_001.fastq.gz is not a(n) FastqGzFormat file:

Missing sequence for record beginning on line 5

Debug info has been saved to /tmp/qiime2-q2cli-err-p7gr5zhi.log

Hello Michelle,

Welcome to the forums! :qiime2:

Looks like there's something wrong with that file! Can you post the first few lines by running a command like this?

gzip -dc NEG08_Rep3_767_L001_R1_001.fastq.gz | head -n 6

Its giving me this:

@M00307:70:000000000-K3RD6:1:1101:16278:18368 2:N:0:1

AGCGGAGGAGTCATAGCTGTTTCCTGAGGGTGTAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAGAAATAAGAAAGGTAGAATACTACACCAAATACTACTCTCATCAATACTGTTCAATTACTTTATACATTCAATTCTCAATACACATATCGTGTGATCATATTTCAATTCACATAGTCACTCAATCTATACACACGACACATGCACACATTGTTA

+

GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG[email protected]+5,,,7,,,,4,,7,3,,,,6,,,2,6>,,,,,,2,4,@,:,,5,+21;,4+1,,?,,,,,+++41+11+5+++3+++4+++22++3;).32+11++).+21**)0)1).)/)*))))((,)0*+)).)(-,)...

@M00307:70:000000000-K3RD6:1:1102:25804:8964 2:N:0:1

ATTGAATTTAGCGGCCGCGAATTCGCCCTTTTATAATTGGAGGATTTGGTAATTGGCATCGTAGGTAATCCCGGGTCGCCCCTGATATAGCTTTCCCCCGGTCCGGGTCCGGTGCATCAGACGATAGGCGGAACGTCGACACGCTTGGCCTGAAATCCGCCTACTTTTCGTTGGGAAAAGCTGTACCGATTGAACCGACGGATAGCGGGAGTGGATTTTGCAGCCGGCGTACACTCGTTCCTTCATGGAAAGCTACTCAA

@M00307:70:000000000-K3RD6:1:1101:16278:18368 1:N:0:1

TGCGTTCACTGGCCGTCGTTTTACATAATTACTGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAATTAGAGCCACTGATCTCTTTCCTATTCACTCGTACAACACTTACACTATCACTCAGAACTCATATGAACTAGTATAATATCTATGTACTATACATATCTAATTTAATTGTACTCCATTACCATATCTCATTCCTATACTCCGTACTCATCTACTTACATCACGATAT

GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCFGGGGGGGGGGBFFGGGGGFFGGGGGFCGGGG88+6,,,,,,,,@D,,8,3,3,8,,,,=,,7;,,7*@,2,,,,,62>,,,,2,,,,,,,,5,6,=,,,6,,6,515;9++315++++5=+<++++2++3+:9+++11+++++++)++1+2+++/202C:.*****(.)).2.-).9)**1)/))(-((-

@M00307:70:000000000-K3RD6:1:1102:25804:8964 1:N:0:1

TTAAGTTCAGCGGGTACTCCTACCTGATTTGAGGTCACCCTGCTATTTACCGCTGCGCTACTGTGGTGAGTTCTTGCGACTAGAGCAATTGGGACCGCGCTCTGCCGAAAAGACTGCTGGAAGCGCAGAACTACATAATACCAAGGAGCGCAATGTGCGTTCAAAGATTCGATGATTCACGCTCGCAGATGTAGCTACGCTGCGTTCTTCATCGATGCGAGAACCAAGCGATCCGTTGTTGAGTAGCTTTCCATGAAGGAACGAGTGTACGCC

I see the problem!

I've copied the first few characters from your post into this text block so it's easy to view:

@M00307:70:000000000-K3RD6:1:1101:16278:18368 2:N:0:1
AGCGGAGGAGTCATAGCTGTTTCCTGAGGGTGTAGATCGGAAGAG
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@M00307:70:000000000-K3RD6:1:1102:25804:8964 2:N:0:1
ATTGAATTTAGCGGCCGCGAATTCGCCCTTTTATAATTGGAGGATTT
@M00307:70:000000000-K3RD6:1:1101:16278:18368 1:N:0:1
TGCGTTCACTGGCCGTCGTTTTACATAATTACTGAGATCGGAAGAGC
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

Note how the first and third reads follow the correct format for fastq files:

@readID
SequenceATCGATCG
+
QualityGGGGGGGGG

But the second read has been messed up and is missing it's quality score information!

Can you get a copy of these fastq files directly from the Illumina sequencer? The original reads should be uncorrupted.

All reads have their quality scores. What I pasted is produced from reading only the first 6 lines.

line 1 @readID1
line 2 SequenceATCGATCG
line 3 +
line 4 QualityGGGGGGGGG
line 5 @readID2
line 6 SequenceATCGATCG

Otherwise with 8 lines the output is...

@M00307:70:000000000-K3RD6:1:1101:16278:18368 2:N:0:1
AGCGGAGGAGTCATAGCTGTTTCCTGAGGGTGTAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAGAAATAAGAAAGGTAGAATACTACACCAAATACTACTCTCATCAATACTGTTCAATTACTTTATACATTCAATTCTCAATACACATATCGTGTGATCATATTTCAATTCACATAGTCACTCAATCTATACACACGACACATGCACACATTGTTA
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG[email protected]+5,,,7,,,,4,,7,3,,,,6,,,2,6>,,,,,,2,4,@,:,,5,+21;,4+1,,?,,,,,+++41+11+5+++3+++4+++22++3;).32+11++).+21**)0)1).)/)*))))((,)0*+)).)(-,)...
@M00307:70:000000000-K3RD6:1:1102:25804:8964 2:N:0:1
ATTGAATTTAGCGGCCGCGAATTCGCCCTTTTATAATTGGAGGATTTGGTAATTGGCATCGTAGGTAATCCCGGGTCGCCCCTGATATAGCTTTCCCCCGGTCCGGGTCCGGTGCATCAGACGATAGGCGGAACGTCGACACGCTTGGCCTGAAATCCGCCTACTTTTCGTTGGGAAAAGCTGTACCGATTGAACCGACGGATAGCGGGAGTGGATTTTGCAGCCGGCGTACACTCGTTCCTTCATGGAAAGCTACTCAA
+
7FG<FEFGGEEGDGGGGGGGCF<[email protected]=FCFFGFGFFGGGGGGEDCFGGGGGFEGDC>[email protected]:BF7FEC<FEDFFDFGG8AEEE;>GFF5F8<9FGE5EEEGCFGGGGGG6*:5?5;+/36<<:<CC7;ECGF5CDDDGGDD47/)9):7*:4*:0246C<>73>7D:@64[email protected]>FFF<[email protected])0)5

Ok good!

In the original error, the file name is NEG08_Rep3_767_L001_R1_001.fastq.gz. Is that really a compressed .gz file, or is that a normal .fastq file just named with .fastq.gz?

It was originally a fastq.gz demultiplexed using cutadapt and imported as a .qza file in QIIME2

Was it really compressed or did the file name just end with .gz? I've seen that causing issues before...
(You could run head -c 20 file.fastq.gz to see if it's really compressed or just a .fastq)

Hello! I seem to be having a very similar problem. Trying to use the same method, getting the same error message, again for line 5. Tried opening the file like you suggested, but this is what I got and I don't see anything missing...:

@M02300:41:000000000-K3GLV:1:1101:15611:1334 2:N:0:NGCTTGATA+CGCCTCGGT
NGGCTGCGTTCTTCATCGATGCGAGAGCCAAGG
+
#1>A>>A>A>>AF3FGGGCE3EEGCEGG0B0FF
@M02300:41:000000000-K3GLV:1:1101:14200:1504 2:N:0:GGCTTGATA+CGCCTCGGT
CGGCTGCGTTCTTCATCGATGCGAGAGCCAAGAGATCCTTTGTTGAAAGTTTTTACTTTAGAACAGATATATATTAAGGAGTTATTGCTTTAATGCGACGG
+

A33A2>AABGGCGGGGGGGGGGGGHFHFHFH2GFH5BFBGGFFDFFHHHH1B3GHH5535BA33B55D55555D533?23D55555D4BFF4B1>//>
@M02300:41:000000000-K3GLV:1:1101:14521:1605 2:N:0:GGCTTGATA+CGCCTCGGT
CGGCTGCGTTCTTCATCGATGCGAGAGCCAAGAGATCCTTTGTTGAAAGCTTTAATTCTTAGAATTGATTCAGACACAAATCGGCTATGAGCATGAAAGCGGCG

11>>>@AD1>>AGGGGGFGGGGCFCGEHHGGHFHGG1FBBFHBGHFHFF01FGGFFHGFFFFGFEHBHHGHHHHFFFHGHEBEEC/ED1BGFHHFEGHBB///>
@M02300:41:000000000-K3GLV:1:1101:13653:1616 2:N:0:GGCTTGATA+CGCCTCGGT
CGGCTGCGTTCTTCATCGATGCAAGAGCCAAGGGATCCTTTGTTGAAAGTTTTAATTCTAGATCAGATTCAGACACAGATGATATTGCATTAATGCGACGCCACGGGGGCGTCGAAGCACGGGTCGATTCCACGGTTTTTTGGG
+
3>>>[email protected]FFGFHFFHHHHHHGHGHHFEHHGHHHHHGFECGCG?EE/E?C/<<@CEG?GB-@CGGGC.<GB0<.C<CGEE--::