Problems importing data

I’ve been having problems importing my data. It is demultiplexed data from Illumina miseq and I used it in qiime1 with no issues. It is paired end data, but I am only using R1 since the R2 data is lousy. I can’t get it into qiime2 using
qiime tools import
–type ‘SampleData[SequencesWithQuality]’
–input-path /Users/***/Hal-qiime2
–source-format CasavaOneEightSingleLanePerSampleDirFmt
–output-path /Users/***/qiime2halASV/halplus-demux.qza

I get this error:

There was a problem importing /Users/Paige/Hal-qiime2:

Missing one or more files for CasavaOneEightSingleLanePerSampleDirFmt: ‘.+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz’

Hey @bmillerlab!

I suspect your filenames don’t exactly match the default Illumina naming scheme. Could you provide a screenshot of the directory you are trying to import into an artifact?

Thanks!

Hi @bmillerlab! Thanks for the screenshot, that was really helpful. Your filenames match the Illumina naming scheme, but, they aren’t gzipped. You have two options:

  • gzip all of your files, one at a time.
  • Use a fastq-manifest format, which will gzip your files at import time.

Hope that helps, and let us know if you get stuck! :t_rex:

Hello, I guess I should have mentioned I tried it first with the files in .gz format. I thought that was the mistake I made. I did go back to my old folder and tried again - here is the output and a screen shot of the file names this time. I also did try to make a manifest format file, but I used excel and despite removing the weird spaces it put in (used bbedit for that), qiime2 said it wasn't the correct manifest file format - so I gave up on that before I tried this. Thanks for helping.

(qiime2-2017.10) miller3:~ Paige$ qiime tools import \

--type 'SampleData[SequencesWithQuality]'
--input-path /Users/Paige/Hal_gz
--source-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path /Users/Paige/qiime2halASV/halplus-demux.qza
There was a problem importing /Users/Paige/Hal_gz:

Unrecognized file (/Users/Paige/Hal_gz/MOCK-3_L001_R1_001.fastq.gz) for CasavaOneEightSingleLanePerSampleDirFmt.

Hi @bmillerlab!

The file MOCK-3_L001_R1_001.fastq.gz doesn’t match the naming convention that the rest your files follow (it is missing the barcode identifier, as outlined in the importing doc), you could rename to something like MOCK-3_XYZ_L001_R1_001.fastq.gz, which puts a dummy value in the filename for the barcode identifier.

You can see in your screenshot that that file was edited much more recently that the rest of the files, and also has a much smaller file size, so you might want to double-check things out here before proceeding.

If you don’t wan’t to rename, you could go with the manifest format, but it sounds like you had some issues there. Feel free to share your file here and we can help diagnose, but, it is probably easier for you to just rename the file that is causing this issue and go with the Casava format import.

Also, it looks like you have one set of reverse reads in the files labeled MOCK-5 and NEG-CON-1... — are you sure that these are reverse reads?

Thank you. That worked!! I don’t know why but I have 4 files with names like that from Illumina, I did not know they were a problem. When I remade the folder I had messed up those 2 files and I fixed them too.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.