Problem with "--source-format" for importing paired-end demultiplexed data

Hi @mweinroth! It looks like your input file names don't quite match what the CasavaOneEightSingleLanePerSampleDirFmt is looking for. This directory format expects to see filenames formatted like this:

L2S357_15_L001_R1_001.fastq.gz.

The underscore-separated fields in this file name are the sample identifier (you should make sure this matches your sample id), the barcode sequence or a barcode identifier (this shouldn't matter, you can put 01 here if you want), the lane number (this shouldn't matter, you can put 001 here), the read number (these should match whichever read direction you choose), and the set number (must be 001).

[Source]

So as you mentioned, the barcodes are stripped out of your filenames, which is why importing isn't working! You have a few options here: rename your files to match this format, using bogus barcode info (the barcode in the filename isn't used by QIIME 2 for anything): TL9e_1.fq.gz -> TL9e_01_L001_R1_001.fq.gz, for example. This can be a pain if you have many files that need renaming (you could script out the rename action, but that is another story). Your next option is to wait for the next release of QIIME 2 (2017.4), which should be coming out within the next week or two, which includes a new source format, that will allow you to keep your existing files named as-is, and then you would create a MANIFEST file with some metadata about your files:

sample-id,absolute-filepath,direction
TL9e,/data/project/TL9e_1.fq.gz,forward
TL9e,/data/project/TL9e_2.fq.gz,reverse
...

There will be new documentation and tutorials about using this new source format when the release comes out (we will announce the release here on the forum).

Hope that helps!

1 Like