I have received MiSeq PairedEnd sequences, and when I imported the files as follows:
qiime tools import
--type 'SampleData[PairedEndSequencesWithQuality]'
--input-path sequences
--input-format CasavaOneEightSingleLanePerSampleDirFmt
--output-path demux-paired-end.qza
I got this error message:
There was a problem importing sequences:
Missing one or more files for CasavaOneEightSingleLanePerSampleDirFmt: '.+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz'
My guess that the sent sequences are named in a different way from what is posted in Qiime2 tutorial.
e.g., my sample name "MI.M05812_0186.001.FLD0289.P1_R1.fastq.gz" (the sample_ID is P1)
whereas the tutorial says (e.g., L2S357_15_L001_R2_001.fastq.gz), it starts with the sample identifier.
I would appreciate your clarification on this issue.
Thanks!
But here I can see the sample_id at the beginning of the sample name, and my samples like this
sample-id forward-absolute-filepath reverse-absolute-filepath
P1 $PWD/some/filepath/MI.M05812_0186.001.FLD0289.P1_R1.fastq.gz $PWD/some/filepath/MI.M05812_0186.001.FLD0289.P1_R2.fastq.gz
Ahh, so you want to import using a Manifest? This was not clear in your initial post. I though you were indeed trying to import using the CASAVA format.
In your case, the --input-format value should be PairedEndFastqManifestPhred64V2 or PairedEndFastqManifestPhred33V2. Thus, your final command should be something like:
Hi,
I changed my directory to where a folder named demo including selected sequences to import, plus the manifest file
I created the manifest.txt file for the selected demo sequences to apply when successful to the entire sequences. In the beginning, I used the sample-id as in my metadata file P1, P2,..., and then added Forward-absolute-filepath, Reverse-absolute-filepath, then I got this error:
There was a problem importing manifest.txt:
manifest.txt is not a(n) PairedEndFastqManifestPhred33V2 file:
'forward-absolute-filepath' is not a column in the metadata. Available columns: 'Forward-absolute-filepath', 'Reverse-absolute-filepath'
So, I changed the sample-id as below (including the multiplex key)
And I got the same error. I am not sure how to create a manifest file (excel sheet then save as txt file), but what the exact data to include in each column. According to the provided info from the genomic facility:
MI.M05812_0186 is the run name,
FLD0289 is the multiplex key,
MI.M05812_0186.001.FLD0289.P1 is the read set id, the direction of sequence (R1 or R2).
the quality offset is 33
In my previous project I used Casava 1.8 paired-end demultiplexed fastq given the compatibility of the sequences with CASAVA format. But here, I do not know how to figure it out?
May you please explain that to me, and if possible, how can I convert these sequences into CASAVA format?
It looks like you changed the column headers to begin with upper case. These should be as you originally had them: sample-id forward-absolute-filepath reverse-absolute-filepath
Please see the Import documentation I linked earlier, and follow the manifest format description exactly. Have you tried downloading and running the examples provided there to make sure they work?
Did you try the other option I suggested, e.g. PairedEndFastqManifestPhred64V2? Try this after you try importing with PairedEndFastqManifestPhred33V2 using the corrected headers.
In the manifest file, the sample-id can be whatever you'd like to label the sample. It does not need to match the file names or anything else for that matter.