Problem importing fastq.gz files

Hi,
I have paired end fastq files that I have converted to fastq.gz using the command below:

gzip extracted/metagseqs_fastq/*.fastq

However, when I try importing the files using

qiime tools import --type "SampleData[PairedEndSequencesWithQuality]" --input-path fastq_folder --source-format CasavaOneEightSingleLanePerSampleDirFmt --output-path demux-paired-end.qza

I get error as below:

/Users/Jewelna/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/skbio/io/registry.py:922: FormatIdentificationWarning: '_fastq_sniffer' has encountered a problem.
Please send the following to our issue tracker at
https://github.com/biocore/scikit-bio/issues

Traceback (most recent call last):
  File "/Users/Jewelna/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/skbio/io/registry.py", line 914, in wrapped_sniffer
    return sniffer(fh)
  File "/Users/Jewelna/miniconda3/envs/qiime2-2017.9/lib/python3.5/site-packages/skbio/io/format/fastq.py", line 320, in _fastq_sniffer
    if split_length == 10 and description[1] in 'YN':
IndexError: list index out of range

  FormatIdentificationWarning)
There was a problem importing extracted/metagseqs_fastq:

  extracted/metagseqs_fastq/G1_S35_L001_R2_001.fastq.gz is not a(n)
  FastqGzFormat file

Could someone help explain what is happening?

Hi @Jewelna_Osei-Poku! Sounds like scikit-bio, which is being used behind the scenes to handle processing your fastq files is upset! It looks like the description portion of the fastq records aren't formatted in a way that scikit-bio is expecting to see. Do you think you could provide at least the first few records of G1_S35_L001_R2_001.fastq.gz? That would really help us lock-down the issue. Thanks!

To add on to @thermokarst's answer. We did end up swapping out some of this code in 2017.10. Since you are using 2017.9, I would recommend upgrading. That should also fix the issue.

2 Likes

Thank you @thermokarst for the response. Kindly find below to the records requested for the file in question. However, I should say that the error pops up with other files names when I re-run the command, so it is not always the same file that gives the error.

@M01232:58:000000000-B8WPP:1:1101:9955:2217 2:N:0:35
GCTGCTGGCACGAAGTTAGCCGGTGCTTATTCTTTGGGTACCGTCAGAACAATCGGGTATTAGCCGACTGCTTTTCTTTCCCAACAAAAGGGCTTTACAACCCGAAGGCCTTCTTCACCCACGCGGTATGGCTGGATCAGGCTTGCGCCCATTGTCCACTATTCCCCACTGCTGCCCCCCGTAGGAGACTGGACCGTGTTTCCGTTCCAGATTGGCTGAGCATCCTCTCAGACCAACTACGCATCGTCCCCTTGGGGGTCCTCTACACACACACATAGATAAAACGACATCGGCC
+
GGGDEGGCFFFEGGEFEEFFCFDFECFGFGGFGGCGGGGCFEFFGEDGCCAFGFGG>ECFFD<<EFEFG+CFGGFEGGFC,5C=F,C49EFC+BFGFDE?A=FGFC@EDGEEFEF<EEFFFGGGGGGGGGG,@FEG=:B:D8FGE?+<B:7DE,@DC3,3:E,E;:>:E786D,>,,4=4:C=E8EE,=+>7E*=224:3+*++37/0)94)//97)8>)9).1)1):7DF44)17))):09C7517:77)/***)))())8.
@M01232:58:000000000-B8WPP:1:1101:17718:2864 2:N:0:35
GCGGCTGCTGGTAATCGGGAACCTGCACGGGGCCTTGCACCTGGACGGGGTACACGCCACACCGCTGCTGCTGCGAATCGGGAACCTGCACGGGGCCTTGCACCTGGACGGGGTACGCACCACTCCGCTGCTGCTGCGTCACGGCCCGCTGCGCGGGCCCCTGCCCCCTGACGGGGTCCGCACCACCCCGCTGCGGGTGCGACACCGCCCGGTGCCGCAGACGCATCCCGCTACCGTGGTCGCCCGACTGCCGCTCCGGGTCTGGCAGCTTCAGGAGGCGGTGGCGCAGGCGCA
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGG7FGGGFGGGGDGGGGGGGGGG7FFGCD:>DFGGGEE++=>CFFFF<<DFFCFF@FGCCB+8DFGG>BF@FCG:FF6
>1>****4<CF
1211/2***A;:8/22**/2::E/881157)77:)97))18119))0)0/))0)/7)17)8.100))1)))1)00)9./)0)01)944)**+1):4)73))9)).))-))))).
@M01232:58:000000000-B8WPP:1:1101:18878:3414 2:N:0:35
GCGGCTGCTGGCACGTAGTTGGCCGGAGCTTCTTCTGCAGGTACCGTCATTATCGTCCCTGCTTGAACGAGGTTTACAATCCGAAGACCGTCATCCCTCACGCGGCGTTGCTGCGTCACGCTTTCGCCCATTGCGCAAGATTCCCCACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTCCCAGCGTGGCTGATCGTCCTCTCAGACCAGCTACCCGTCGTTGCCTTGGTGAGCCCTTACCCCCCCAACAAGCTGATAGGCCGCGAGCCCCCCCCGAAGCCGAACCCCTT
+
BEFCGG@FGF-FFFFCFGGG<CFGG7+@@EFGGGGGGFGGGFGGGGGGG8<EEFC@CAAFGF,?,?CFG:?F@:FFFGEFDGFGGF::B:+>=A<,:3>=+8@CEC9++@CF,><CFGGC3@AFGCF7FG:DF,98,D8EC,:E5CFBF?C,,,A=BCFGGGFGEC9>902=8;/+A*/9CF99:?FFGGGG9>7).1<09:9297C42)9C627)>3)0384>96=:;)77CC:7(/94).8)7)))4)72>))1).)
@M01232:58:000000000-B8WPP:1:1101:22497:4170 2:N:0:35
GCGGCTGCTGGCACGTAGTTGGCCGGGGCTTCTTCTGCAGGTACCGTCATCTTCGTCCCTGCTGAAAGGGGGTTACAACCCGAGAGCCTTCATCCCCCACGCGGTGTTGCTGCCTCAGGCTTGCGCCCATTGGGCAAGATTCCCTACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCGGTCCCAGTGTGGCTGACCATCCTCTCAGACCAGCTACCGATCGTCGCCTTTGTCCGCCCTTACCCCACCAAATAGCTAATCGGACGCAGGCCCCTCACACAGCCACATGACCGC
+
CDGGGGGDFGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGFFFGGGGGGEGGGGGGGGGGGGF@FDFGFGGGGGGGGGGGGGGGGGGGGGGGGEEGFFFFGGGGG:FFCFFDGGGGGGGCG,@FFG>FGGGGCCFECFGFGGGGGGGGFCFC,2:DE8EC@CEGGGCCFGCF,:CEGGFE+>C8:*/><0++45=ECC;668+8C>7C7>07EGGGGG4C)>CCC>5
:7**).9))07>0/17>54C)2/:7682.77C)))1.49C(5))/)04)).02)))
@M01232:58:000000000-B8WPP:1:1101:19938:4272 2:N:0:35
GCGGCTGCTGGCACGTAGTTAGCCGGTGCTTCTTTACCCATTACCGTCACTCACGCTTCGTCACAGGCGAAAGCGGTTTCCAACCCGCAGGCCGTCATCCCCCACGCGGCGTTGCTGCATCTGGCTTCCGCCCATTGTGCACTATTCCCCACTGCTGCCTCCCGTAGTAGTCTGGGCCGTATCCCAGTCCCAATGTGGCCGGTCACCCTCTCCGGCCGGCTACCCGTCACAGCCATTGTAAGCCCCTACCCCACCAACAAGCTGCTCCGCCGCGCTTACACCCCCAACCCCCCA
+
FGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGA9<E8@FFGGGGGGGGFCEFEE,6C@CF<FG,@DGGDE:<@@:@C,CCF=CC7+4@+6+48@F5EE+44@FFEF:@3CC9><>3D,@9,>CFG
>CGC,,@F9,72727@F2?C7B,,46,<:C88BE81,6+>+A8A***=@+>+0?C69CF513:7)//2?4::7/)))799971<9>5**19/0;0967@<FG47).1.5B@5>F692)1*:>>))())//**0)(.),)11C35
@M01232:58:000000000-B8WPP:1:1101:15116:4846 2:N:0:35
CGGCTGCTGGCACGTAGTTAGCCGTGGCTTTCTGGTTAGATACCGTCAAGGGATGAACAGTTACTCTCATCCTTGTTCTTCTCTAACAACAGAGTTTTACGATCCGAAAACCTTCTTCACTCACGCGGCGTTGCTCGGTCAGACTTTCGTCCATTGCCGAAGATTCCCTACTGCTGCCTCCCGTAGGAGTCTGGGCCGTGTCTCAGTTCCAGTGTGGCTGACCATCCTCTCAGACCAGCTAACGAACGTCGCCTTGGGCCGCCATTACCCCTCCCACCAGCTAATCGAACGCA

Actually, Evan Bolyen was right. Upgrading to 2017.10 solved the issue. Thanks guys.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.