Problem importing fastq.gz files

Hi,
I have paired end fastq files that I have converted to fastq.gz using the command below:

gzip extracted/metagseqs_fastq/*.fastq

However, when I try importing the files using

qiime tools import --type "SampleData[PairedEndSequencesWithQuality]" --input-path fastq_folder --source-format CasavaOneEightSingleLanePerSampleDirFmt --output-path demux-paired-end.qza

I get error as below:

/Users/Jewelna/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/skbio/io/registry.py:922: FormatIdentificationWarning: '_fastq_sniffer' has encountered a problem.
Please send the following to our issue tracker at
https://github.com/biocore/scikit-bio/issues

Traceback (most recent call last):
  File "/Users/Jewelna/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/skbio/io/registry.py", line 914, in wrapped_sniffer
    return sniffer(fh)
  File "/Users/Jewelna/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/skbio/io/format/fastq.py", line 320, in _fastq_sniffer
    if split_length == 10 and description[1] in 'YN':
IndexError: list index out of range

  FormatIdentificationWarning)
There was a problem importing extracted/metagseqs_fastq:

  extracted/metagseqs_fastq/G1_S35_L001_R2_001.fastq.gz is not a(n)
  FastqGzFormat file

Could someone help explain what is happening?

Hi @Jewelna_Osei-Poku! Sounds like scikit-bio, which is being used behind the scenes to handle processing your fastq files is upset! It looks like the description portion of the fastq records aren’t formatted in a way that scikit-bio is expecting to see. Do you think you could provide at least the first few records of G1_S35_L001_R2_001.fastq.gz? That would really help us lock-down the issue. Thanks!

To add on to @thermokarst’s answer. We did end up swapping out some of this code in 2017.10. Since you are using 2017.9, I would recommend upgrading. That should also fix the issue.

2 Likes

Thank you @thermokarst for the response. Kindly find below to the records requested for the file in question. However, I should say that the error pops up with other files names when I re-run the command, so it is not always the same file that gives the error.

@M01232:58:000000000-B8WPP:1:1101:9955:2217 2:N:0:35
GCTGCTGGCACGAAGTTAGCCGGTGCTTATTCTTTGGGTACCGTCAGAACAATCGGGTATTAGCCGACTGCTTTTCTTTCCCAACAAAAGGGCTTTACAACCCGAAGGCCTTCTTCACCCACGCGGTATGGCTGGATCAGGCTTGCGCCCATTGTCCACTATTCCCCACTGCTGCCCCCCGTAGGAGACTGGACCGTGTTTCCGTTCCAGATTGGCTGAGCATCCTCTCAGACCAACTACGCATCGTCCCCTTGGGGGTCCTCTACACACACACATAGATAAAACGACATCGGCC
+
GGGDEGGCFFFEGGEFEEFFCFDFECFGFGGFGGCGGGGCFEFFGEDGCCAFGFGG>ECFFD<<EFEFG+CFGGFEGGFC,5C=F,[email protected]<EEFFFGGGGGGGGGG,@FEG=:B:D8FGE?+<B:7DE,@DC3,3:E,E;:>:E786D,>,4=4:C=E8EE,=+>7E*=224:3+*++37/0)94)//97)8>)9).1)1):7DF44)17))):09C7517:77)/***)))())8.
@M01232:58:000000000-B8WPP:1:1101:17718:2864 2:N:0:35
GCGGCTGCTGGTAATCGGGAACCTGCACGGGGCCTTGCACCTGGACGGGGTACACGCCACACCGCTGCTGCTGCGAATCGGGAACCTGCACGGGGCCTTGCACCTGGACGGGGTACGCACCACTCCGCTGCTGCTGCGTCACGGCCCGCTGCGCGGGCCCCTGCCCCCTGACGGGGTCCGCACCACCCCGCTGCGGGTGCGACACCGCCCGGTGCCGCAGACGCATCCCGCTACCGTGGTCGCCCGACTGCCGCTCCGGGTCTGGCAGCTTCAGGAGGCGGTGGCGCAGGCGCA
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGG7FGGGFGGGGDGGGGGGGGGG7FFGCD:>DFGGGEE++=>CFFFF<<DFFCF[email protected]+8DFGG>[email protected]:FF6
>1>****4<CF
1211/2***A;:8/22**/2::E/881157)77:)97))18119))0)0/))0)/7)17)8.100))1)))1)00)9./)0)01)944)**+1):4)73))9)).))-))))).
@M01232:58:000000000-B8WPP:1:1101:18878:3414 2:N:0:35
GCGGCTGCTGGCACGTAGTTGGCCGGAGCTTCTTCTGCAGGTACCGTCATTATCGTCCCTGCTTGAACGAGGTTTACAATCCGAAGACCGTCATCCCTCACGCGGCGTTGCTGCGTCACGCTTTCGCCCATTGCGCAAGATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAGCGTGGCTGATCGTCCTCTCAGACCAGCTACCCGTCGTTGCCTTGGTGAGCCCTTACCCCCCCAACAAGCTGATAGGCCGCGAGCCCCCCCCGAAGCCGAACCCCTT
+
[email protected]<[email protected]@EFGGGGGGFGGGFGGGGGGG8<[email protected],?,?CFG:[email protected]:FFFGEFDGFGGF::B:+>=A<,:3>[email protected][email protected],><CFGGC[email protected]:DF,98,D8EC,:E5CFBF?C,A=BCFGGGFGEC9>902=8;/+A*/9CF99:?FFGGGG9>7).1<09:9297C42)9C627)>3)0384>96=:;)77CC:7(/94).8)7)))4)72>))1).)
@M01232:58:000000000-B8WPP:1:1101:22497:4170 2:N:0:35
GCGGCTGCTGGCACGTAGTTGGCCGGGGCTTCTTCTGCAGGTACCGTCATCTTCGTCCCTGCTGAAAGGGGGTTACAACCCGAGAGCCTTCATCCCCCACGCGGTGTTGCTGCCTCAGGCTTGCGCCCATTGGGCAAGATTCCCTACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCGGTCCCAGTGTGGCTGACCATCCTCTCAGACCAGCTACCGATCGTCGCCTTTGTCCGCCCTTACCCCACCAAATAGCTAATCGGACGCAGGCCCCTCACACAGCCACATGACCGC
+
[email protected]DFGFGGGGGGGGGGGGGGGGGGGGGGGGEEGFFFFGGGGG:FFCFFDGGGGGGGCG,@FFG>FGGGGCCFECFGFGGGGGGGGFCFC,2:[email protected],:CEGGFE+>C8:*/><0++45=ECC;668+8C>7C7>07EGGGGG4C)>CCC>5
:7**).9))07>0/17>54C)2/:7682.77C)))1.49C(5))/)04)).02)))
@M01232:58:000000000-B8WPP:1:1101:19938:4272 2:N:0:35
GCGGCTGCTGGCACGTAGTTAGCCGGTGCTTCTTTACCCATTACCGTCACTCACGCTTCGTCACAGGCGAAAGCGGTTTCCAACCCGCAGGCCGTCATCCCCCACGCGGCGTTGCTGCATCTGGCTTCCGCCCATTGTGCACTATTCCCCACTGCTGCCTCCCGTAGTAGTCTGGGCCGTATCCCAGTCCCAATGTGGCCGGTCACCCTCTCCGGCCGGCTACCCGTCACAGCCATTGTAAGCCCCTACCCCACCAACAAGCTGCTCCGCCGCGCTTACACCCCCAACCCCCCA
+
FGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGA9<[email protected]GGGGFCEFEE,[email protected]<FG,@DGGDE:<@@:@C,[email protected][email protected][email protected]FEF:@3CC9><>3D,@9,>CFG
>CGC,@F9,[email protected]?C7B,46,<:C88BE81,6+>+A8A***[email protected]+>+0?C69CF513:7)//2?4::7/)))799971<9>5**19/0;[email protected]<FG47)[email protected]>F692)1*:>>))())//**0)(.),)11C35
@M01232:58:000000000-B8WPP:1:1101:15116:4846 2:N:0:35
CGGCTGCTGGCACGTAGTTAGCCGTGGCTTTCTGGTTAGATACCGTCAAGGGATGAACAGTTACTCTCATCCTTGTTCTTCTCTAACAACAGAGTTTTACGATCCGAAAACCTTCTTCACTCACGCGGCGTTGCTCGGTCAGACTTTCGTCCATTGCCGAAGATTCCCTACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTTCCAGTGTGGCTGACCATCCTCTCAGACCAGCTAACGAACGTCGCCTTGGGCCGCCATTACCCCTCCCACCAGCTAATCGAACGCA

Actually, Evan Bolyen was right. Upgrading to 2017.10 solved the issue. Thanks guys.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.