Hello! I've been stuck on importing my Illumina data for a few days, and it probably relates to the pathway. I've had two types of error messages, below:
--type EMPSingleEndSequences
--input-path /Users/cybelecollins/qiime2-skate3/emp-paired-end-sequences/barcodes.fastq.gz
--output-path skate.qza
There was a problem importing /Users/cybelecollins/qiime2-skate3/emp-paired-end-sequences/barcodes.fastq.gz:
Importing 'EMPSingleEndDirFmt' requires a directory, not /Users/cybelecollins/qiime2-skate3/emp-paired-end-sequences/barcodes.fastq.gz
I think that you are actually quite close to the correct command however, I believe you have one small error.
In the first command it appears that the path you are providing to the input files is correct, however you are providing the path to a specific file whereas the import type you are passing is expecting a path to a directory. For example:
In the above example the emp-paired-end-sequences directory should contain a file named sequences.fastq.gz and a second file named barcodes.fastq.gz Notice how I removed the file name from the path.
In the second command the error you are getting is letting you know that the file is not found at all. You can check this for yourself by running: ls qiime2-skate3/emp-paired-end-sequences/barcodes.fastq.gz
if the file is not present the terminal will output: ls: cannot access 'qiime2-skate3/emp-paired-end-sequences/barcodes.fastq.gz': No such file or directory
If the file is present it will simply return the path.
Thanks very much! I did not have the right kind of file for barcodes, but I might have a more serious issue. The sequencing facility works with BaseSpace and when my PI wrote to them, the response was problematic - enough that I might have to start a different thread and search more for what I can do now:
“BaseSpace doesn’t generate an index read file. The data just comes demultiplexed. I’ve been trying to get my off-instrument computer to generate a index read file for you, but it looks like a necessary file for this may have been corrupted or lost during data transfer. I’ll keep working on this, but I’m not sure how successful I’ll be. I’m not sure what you’re looking at in the Fastq files, but I think that the Illumina software does allow for a 1 bp difference to still assign it to the correct index.”
Yep! There is no requirement that you work with multiplexed data - in fact, according to an informal survey we did here, most people seem to jump into QIIME 2 directly with demux data. Sounds like you are on the right track!
I am working with MiSeq samples from BaseSpace that have R1 and R2 , and trying to find a protocol since the BaseSpace data is demultiplexed. I did make a metadata file with indexes, but am not sure where this fits.
These these are manifest files, as I understand, and I am entering these commands:
But for each file, I get (qiime2-2018.6) Cybeles-MacBook-Pro:~ cybelecollins$ sample-1,/Users/cybelecollins/qiime2-skate3/paired-end-sequences/CC1_S1_L001_R1_001.fastq.gz,forward
-bash: sample-1,/Users/cybelecollins/qiime2-skate3/paired-end-sequences/CC1_S1_L001_R1_001.fastq.gz,forward: No such file or directory
This is not necessary, since your data is already demultiplexed (meaning, split by sample). The only reason you would need barcodes in your metadata file is if you had multiplexed reads that still needed to be demultiplexed (the barcode is how you tell the computer which sample a particular read belongs to).
The manifest file is a file, not a command. As the docs say, this should be a CSV file (comma-separated values) --- this file just tells QIIME 2 which file belongs to which sample, and that file's read orientation.
Hello again - I'm afraid that I'm still at the point from two weeks ago. I've attached what I see on my screen. I think this has to do with file paths. Thanks! -
Those files don't look like they exist to me, either, there are no files sitting in the paired-end-sequence dir in your screenshot! Perhaps you meant to write:
I assume there are files in that dir, but I don't know what their names are because the screenshot hasn't expanded that folder.
Okay, the last thing about your manifest that jumped out at me --- you listed the forward reads for pair CC1 as sample-1 and the reverse reads as sample-2 --- this will import as two separate samples, but I suspect what you really want is to import the pair as sample CC1. Just update the first column to have the sample name for both rows --- whatever name you give it here is the name the sample will have for the rest of the analysis.
Thanks! I wonder now if there's a problem with my files in themselves (as even the Illumina analysis showed very low PF scores and the sequencing facility admitted to a possible loading issue) and I got this message:
An unexpected error has occurred: Decoded Phred score is out of range [0, 62].
This error is because you specified PairedEndManifestPhred64 in your import command --- this should only be used if you know that your reads are phred 64 encoded. Most likely they aren't, so you should use PairedEndManifestPhred33. Please take some time to review the docs - this info is listed there.
With total gratitude, and perhaps to reduce the volume of questions, if you are updating this doc anytime and would like to make it more accessible to those without much experience, a few tweaks might reduce errors. These can be hard to catch when something is new and overwhelming (ex: the tutorial gives “pe-64-manifest” as the example, rather than “pe-64-manifest.csv”, which might be obvious to some people since it is the total file name, but not when imitating something blindly. Also, the main doc does link to a way to determine the format, but in implication (highlighted link) rather than clear direction or within the main text.)
But yes, for the most part, the information is all there, and hopefully I will be able to self-correct more as familiarity increases. One just feels very blind at first, so this forum has been invaluable.