File names change when importing and mismatch manifest

Hello,
I seem to have found a bug importing CasavaOneEightLanelessPerSampleDirFm format data.
I found a workaround, but thought you should be aware of the bug so it can be fixed.

My original sample names are in this format:
eDNA468_S1_R2_001.fastq.gz
eDNA469_S2_R1_001.fastq.gz
eDNA469_S2_R2_001.fastq.gz
eDNA470_S3_R1_001.fastq.gz

I was able to import them with this code:

qiime tools import
–type ‘SampleData[PairedEndSequencesWithQuality]’
–input-path PATH
–input-format CasavaOneEightLanelessPerSampleDirFmt
–output-path ARTIFACT.qza

The import works, but if I try to do anything downstream I get an error.
I then noticed that after import, within the .qza artifact data, all of the filenames have been renamed with a lane number inserted.
For example:
eDNA468_S1_R2_001.fastq.gz becomes eDNA468_S1_L001_R2_001.fastq.gz upon import

In the manifest file all of the filenames are the same as they were originally
For example: eDNA468,eDNA468_S1_R2_001.fastq.gz,reverse

Thus there is a mismatch between the sample names and the manifest withing the .qza artifact.

If I change the sample names in the manifest file to match the changed sample names manually (by adding the lane # L001 to each sample). I can get all of the downstream analyses to run fine.

It would be nice however, if the bug was fixed so that the sample names are not changed upon import when importing in this format.

Thanks

2 Likes

Hey there @Trodgers, this is definitely a bug, thanks for reporting! I will write up a github issue some time next week, keep an eye on this post for more details. In the meantime, using a manifest format would be the recommended work-around, and doesn’t involve you mucking around with system internals.

I’ve created an issue to track this:

Thanks again @Trodgers!