Problem with "--source-format" for importing paired-end demultiplexed data

Hello,

I am very new to Qiime2 and trying to create my QIIME 2 artifact. I am trying to follow the Casava 1.8 paired-end demultiplexed fastq example but the "--source-format" column is confusing me. I am trying to run this command:

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path liver_ --source-format LiverDir --output-path liver-demux-paired-end.qza

An example of how my data is named is: TL9e_1.fq.gz TL9e_2.fq.gz, denoting sampleID_foward/reverse.fq.gz

I am getting the error "TypeError: No format: LiverDir" (full error message below). Any help of advice would be appreciated, Thanks!

Traceback (most recent call last):
File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/qiime2-2017.2.0-py3.5.egg/qiime2/sdk/util.py", line 91, in parse_format
format_record = pm.formats[format_str]
KeyError: 'LiverDir'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/s/angus/index/common/tools/miniconda3/envs/qiime2/bin/qiime", line 6, in
sys.exit(q2cli.main.qiime())
File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/click/core.py", line 722, in call
return self.main(*args, **kwargs)
File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/q2cli-2017.2.0-py3.5.egg/q2cli/tools.py", line 62, in import_data
File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/qiime2-2017.2.0-py3.5.egg/qiime2/sdk/result.py", line 158, in import_data
view_type = qiime2.sdk.parse_format(view_type)
File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/qiime2-2017.2.0-py3.5.egg/qiime2/sdk/util.py", line 93, in parse_format
raise TypeError("No format: %s" % format_str)
TypeError: No format: LiverDir
(qiime2) weinroth@angus:/s/angus/index/projs/mega_tylan/analysis/XIT/livers$

Hi @mweinroth! Your question looks similar to this topic (which we just followed up on while you were writing your post). Can you see if that solves your problem?

1 Like

Thank you for the response. After changing my --source format to CasavaOneEightSingleLanePerSampleDirFmt so my command reads:

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path raw_data --source-format CasavaOneEightSingleLanePerSampleDirFmt --output-path liver-demux-paired-end.qza

I am now getting the following error:

ValueError: Missing one or more files for CasavaOneEightSingleLanePerSampleDirFmt: '.+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz

(full error below). Also, I am using v. 2017.2 currently.My fastq files do not have barcodes associated with them as the sequencing company already took them out, would that matter?

Traceback (most recent call last):
  File "/s/angus/index/common/tools/miniconda3/envs/qiime2/bin/qiime", line 6, in <module>
    sys.exit(q2cli.__main__.qiime())
  File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/q2cli-2017.2.0-py3.5.egg/q2cli/tools.py", line 62, in import_data
  File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/qiime2-2017.2.0-py3.5.egg/qiime2/sdk/result.py", line 191, in import_data
    return cls._from_view(type_, view, view_type, provenance_capture)
  File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/qiime2-2017.2.0-py3.5.egg/qiime2/sdk/result.py", line 216, in _from_view
    result = transformation(view)
  File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/qiime2-2017.2.0-py3.5.egg/qiime2/core/transform.py", line 57, in transformation
    self.validate(view)
  File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/qiime2-2017.2.0-py3.5.egg/qiime2/core/transform.py", line 114, in validate
    view.validate()
  File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/qiime2-2017.2.0-py3.5.egg/qiime2/plugin/model/directory_format.py", line 168, in validate
    getattr(self, field)._validate_members(collected_paths)
  File "/s/angus/index/common/tools/miniconda3/envs/qiime2/lib/python3.5/site-packages/qiime2-2017.2.0-py3.5.egg/qiime2/plugin/model/directory_format.py", line 104, in _validate_members
    self.pathspec))
ValueError: Missing one or more files for CasavaOneEightSingleLanePerSampleDirFmt: '.+_.+_L[0-9][0-9][0-9]_R[12]_001\\.fastq\\.gz'

Hi @mweinroth! It looks like your input file names don't quite match what the CasavaOneEightSingleLanePerSampleDirFmt is looking for. This directory format expects to see filenames formatted like this:

L2S357_15_L001_R1_001.fastq.gz.

The underscore-separated fields in this file name are the sample identifier (you should make sure this matches your sample id), the barcode sequence or a barcode identifier (this shouldn't matter, you can put 01 here if you want), the lane number (this shouldn't matter, you can put 001 here), the read number (these should match whichever read direction you choose), and the set number (must be 001).

[Source]

So as you mentioned, the barcodes are stripped out of your filenames, which is why importing isn't working! You have a few options here: rename your files to match this format, using bogus barcode info (the barcode in the filename isn't used by QIIME 2 for anything): TL9e_1.fq.gz -> TL9e_01_L001_R1_001.fq.gz, for example. This can be a pain if you have many files that need renaming (you could script out the rename action, but that is another story). Your next option is to wait for the next release of QIIME 2 (2017.4), which should be coming out within the next week or two, which includes a new source format, that will allow you to keep your existing files named as-is, and then you would create a MANIFEST file with some metadata about your files:

sample-id,absolute-filepath,direction
TL9e,/data/project/TL9e_1.fq.gz,forward
TL9e,/data/project/TL9e_2.fq.gz,reverse
...

There will be new documentation and tutorials about using this new source format when the release comes out (we will announce the release here on the forum).

Hope that helps!

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.