Import of demultiplexed paired end fastq with manifest

Hello. I am also having difficulty with the import of the fastq (not fastq.gz) files using the manifest approach. I am using the manifest approach since the fastq file names are changed when the sequencing facility runs their in-house script. Indeed, the file names do have underscores, which may play into this issue. I checked my manifest file for trailing spaces and correct headers. I am running qiime2 in VB.

My qiime script :

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path /media/sf_qiime/MMPC/Berryman/pe-33-manifest.csv \
  --output-path /media/sf_qiime/MMPC/Berryman/paired-end-demux.qza \
  --source-format PairedEndFastqManifestPhred33`

example of portion of my mainfest file:

sample-id,absolute-filepath,direction
G691,/media/sf_qiime/MMPC/Berryman/TKnotts0517_G691_1.fastq,forward
G693,/media/sf_qiime/MMPC/Berryman/TKnotts0517_G693_1.fastq,forward
G725,/media/sf_qiime/MMPC/Berryman/TKnotts0517_G725_1.fastq,forward
G691,/media/sf_qiime/MMPC/Berryman/TKnotts0517_G691_2.fastq,reverse
G693,/media/sf_qiime/MMPC/Berryman/TKnotts0517_G693_2.fastq,reverse
G725,/media/sf_qiime/MMPC/Berryman/TKnotts0517_G725_2.fastq,reverse

Error message seems to be similar to what others have reported.

'/home/qiime2/miniconda/envs/qiime2-2017.8/lib/python3.5/site-packages/skbio/io/registry.py:922: FormatIdentificationWarning: '_fastq_sniffer' has encountered a problem.
Please send the following to our issue tracker at
https://github.com/biocore/scikit-bio/issues

Traceback (most recent call last):
  File "/home/qiime2/miniconda/envs/qiime2-2017.8/lib/python3.5/site-packages/skbio/io/registry.py", line 914, in wrapped_sniffer
    return sniffer(fh)
  File "/home/qiime2/miniconda/envs/qiime2-2017.8/lib/python3.5/site-packages/skbio/io/format/fastq.py", line 320, in _fastq_sniffer
    if split_length == 10 and description[1] in 'YN':
IndexError: list index out of range

  FormatIdentificationWarning)
Traceback (most recent call last):
  File "/home/qiime2/miniconda/envs/qiime2-2017.8/bin/qiime", line 6, in <module>
    sys.exit(q2cli.__main__.qiime())
  File "/home/qiime2/miniconda/envs/qiime2-2017.8/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/qiime2/miniconda/envs/qiime2-2017.8/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/qiime2/miniconda/envs/qiime2-2017.8/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/qiime2/miniconda/envs/qiime2-2017.8/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/qiime2/miniconda/envs/qiime2-2017.8/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/qiime2/miniconda/envs/qiime2-2017.8/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/qiime2/miniconda/envs/qiime2-2017.8/lib/python3.5/site-packages/q2cli/tools.py", line 111, in import_data
    view_type=source_format)
  File "/home/qiime2/miniconda/envs/qiime2-2017.8/lib/python3.5/site-packages/qiime2/sdk/result.py", line 192, in import_data
    return cls._from_view(type_, view, view_type, provenance_capture)
  File "/home/qiime2/miniconda/envs/qiime2-2017.8/lib/python3.5/site-packages/qiime2/sdk/result.py", line 217, in _from_view
    result = transformation(view)
  File "/home/qiime2/miniconda/envs/qiime2-2017.8/lib/python3.5/site-packages/qiime2/core/transform.py", line 62, in transformation
    other.validate(new_view)
  File "/home/qiime2/miniconda/envs/qiime2-2017.8/lib/python3.5/site-packages/qiime2/core/transform.py", line 131, in validate
    view.validate()
  File "/home/qiime2/miniconda/envs/qiime2-2017.8/lib/python3.5/site-packages/qiime2/plugin/model/directory_format.py", line 168, in validate
    getattr(self, field)._validate_members(collected_paths)
  File "/home/qiime2/miniconda/envs/qiime2-2017.8/lib/python3.5/site-packages/qiime2/plugin/model/directory_format.py", line 100, in _validate_members
    self.format(path, mode='r').validate()
  File "/home/qiime2/miniconda/envs/qiime2-2017.8/lib/python3.5/site-packages/qiime2/plugin/model/file_format.py", line 31, in validate
    % (self.path, self.__class__.__name__))
ValueError: InPath('/tmp/q2-SingleLanePerSamplePairedEndFastqDirFmt-wvek0f3l/G860_38_L001_R2_001.fastq.gz') is not formatted as a FastqGzFormat file.'

Please let me know if you need further information to help troubleshoot this issue.
Thanks!
Trina

Hey @taknotts!

Could you provide the first few records of the reverse fastq file for sample G860 (I’m guessing TKnotts0517_G860_2.fastq is the file)? That seems to be where things go wrong.

Thanks so much!

Hi Evan.
Actually it errors with a different sequence file each time I run it. Not sure if it just pulls the first sequence file randomly so that it has an issue with any and all of my sequence files. Here is the output from head of
TKnotts0517_G860_2.fastq

@M01533:390:000000000-B4NPH:1:1101:15039:1594:N:0:0/2
TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGCAGGCGGTCTGTTAAGTCAGCGGTCAAAGCCCGGGGCTCAACCCCGGTCCGCCGTTGAAACTGGCAGTCTCGAGTTGGAGAGAAGTATGCGGAATGCGCGGTGTAGCGGTGAAATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCATGCCGGCTCCACACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACC
+
BBBAABBBBFBFGFGGEEGEGGGEGGGDHHHHHHGGGGFGHGHGHGCGFGGGFGGGGGHHHHHHHHHHH?EEEGFFHHHHGGGGGGHHHHHGGGGGGGGGGGGGGHHGHFHGHHGHHHHH0EGFHFGHFGFHHHGGGHGGGDFGFGFGGABEFFCFAFF9/[email protected][email protected]/.;[email protected];A:/.-
@M01533:390:000000000-B4NPH:1:1101:14571:1636:N:0:0/2
TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGCAGGCGGTCTGTTAAGTCAGCGGTCAAAGCCCGGGGCTCAACCCCGGTCCGCCGTTGAAACTGGCAGTCTCGAGTTGGAGAGAAGTATGCGGAATGCGCGGTGTAGCGGTGATATGCATAGATATCGCGCAGAACTCCGATTGCGAAGGCAGCATGCCGGCTCCACACTGACGCTGAGGCACGAAAGCGTGGGTATCGAACA
+
AAABAABBBFFFGGGGCEGEGGHGEFGGGHHH5FGEFEFGHHHGHGGGGGGGGGGGGGHGGFHHHHFHH1EFGE3GFHHHGGGGBCGHHHHGGGG?DGGGGGGGGHHFHHHHHHGGHFHHCDGGFFGFGDEHHHHHGHGGFGGGFDGCG.BAFFFFA9BEFFFFFFFFFFFFFFEDFFDB=B;[email protected]?FFFFFBFF;BFFDEFFFFFFFFFFFFFFFFEFFFBFFF?AD;CDBBFD9B;>
@M01533:390:000000000-B4NPH:1:1101:13598:1664:N:0:0/2
TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGCAGGCGGCACGGCAAGCCGGCGGTCAAATCGCGGGGCTCAACCCCGCCCAGCCGTCGGAACTGCCGGGCTAGAGTGGGCGAGAAGTGCGCGGAATGCGTGGTGTAGCGGTGAAATGCATAGATATCACGCAGAACTCCGATTGCGAAGGCAGCGCACCGGCGCCCGACTGACGCTCGGGCACGAAAGCGTGGGTATCGAAAA

Thanks,

Trina

Hi @taknotts,

It looks like you’ve got the perfect fastq header to trip up scikit-bio’s format detection (you have 10 segments seperated by : but no description field). There’s nothing wrong with your data, there’s just a bug in scikit-bio where it makes an assumption that it shouldn’t.

I’m afraid there’s not a lot of options at the moment for importing this data as is.

Where did you get this data from? This is the first time we’ve seen this error, but I’m sure others will run into it eventually.

In the meanwhile, I’ll discuss our options with the team and figure out how we’ll fix this. I’ll post an update when I know more.

Thanks so much for reporting this!

1 Like

Thanks Evan.

It is part of the sequencing facility’s in-house script. They were able to provide me with a perl script to run to fix them. I am happy to report that I was able to import my sequences and got my paired-end-demux artifact file. Thanks for your help diagnosing the error. On to qiime2 -ing!

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.

We’ve fixed our format detection for fastq in the newly released QIIME 2 2017.10, so hopefully no one else will run into this problem moving forward!