Help with importing data. Nighter Casava nor Manifest file option are working.

Hi guys,
I am trying to start working with QIIME and is seams I am not able to import my data.
I am trying to import data from 1 sample (I thought I should keep it simple in the beginning, but is not.) obtained by MiSeq (PE250). I have one folder with the forward (I guess) and one reverse. They should contain the seq. without primers and barcodes.
Firstly I was using Casava 1.8 paired-end demultiplexed fastq importing option and I got the usual error message:

Missing one or more files for CasavaOneEightSingleLanePerSampleDirFmt: ‘.+_.+_L[0-9][0-9][0-9]_R[12]_001\.fastq\.gz’

I had changed the name of my files accordingly to the QIIME name format and I still got the error message.

So I try to import them vis the Manifest File.
Now I get this error message:
There was a problem importing manifest:

manifest is not a(n) PairedEndFastqManifestPhred33V2 file:

Filepath on line 1 and column “forward-absolute-filepath” could not be found (/Ips-larvae-B/Ips.larvae.B/H000613-1a_01_L001_R1_001.fq.gz) for sample “sample1”.

Can you please help?

This is how my manifest file looks like:

sample-id forward-absolute-filepath reverse-absolute-filepath
sample1 /Ips-larvae-B/Ips.larvae.B/H000613-1a_01_L001_R1_001.fq.gz /Ips-larvae-B/Ips.larvae.B/H000613-1a_01_L001_R2_001.fq.gz

And this is how my reads look like:

@A01045:95:HJ3K2DRXX:1:2101:16984:1063 1:N:0:TAGCTGAG
@A01045:95:HJ3K2DRXX:1:2101:10917:1235 1:N:0:TAGCTGAG

Thank you.

Hello @iuliachiciudean, I notice that according to your manifest your files have the file extension .fq.gz not .fastq.gz. Can you try changing the extensions on your files to .fastq.gz? Let me know if that works. Thank you.

Hi again,

Now I am getting another error message:

There was a problem importing /home/iulia/Ips-larvae-B/Ips.larvae.B/metadata2.csv:

/home/iulia/Ips-larvae-B/Ips.larvae.B/metadata2.csv is not a(n) PairedEndFastqManifestPhred33V2 file:

‘forward-absolute-filepath’ is not a column in the metadata. Available columns: ‘absolute-filepath’, ‘direction’

I double checked everything (in my opinion):

  • I checked the metadata with Keemei and is fine.
  • Now the metadata looks like this:
sample-id absolute-filepath direction
sample1 /home/iulia/Ips-larvae-B/Ips.larvae.B/H000613-1a_01_L001_R1_001.fastq.gz forward
sample2 /home/iulia/Ips-larvae-B/Ips.larvae.B/H000613-1a_01_L001_R2_001.fastq.gz revers
  • all the files are in the same folder:“Ips.larvae.B”.
  • the command that I used is:
    qiime tools import
    –type ‘SampleData[PairedEndSequencesWithQuality]’
    –input-path /home/iulia/Ips-larvae-B/Ips.larvae.B/metadata2.csv
    –output-path paired-end-demux.qza
    –input-format PairedEndFastqManifestPhred33V2

No idea what else I can do t fix this situation.
Please help. I am getting veeeeery frustrated here.
Is going to be this hard all the way with QIIME?

Thank you.

Hi @Oddant1,

This did not fix it. :thinking:

Try changing to --input-format PairedEndFastqManifestPhred33 not V2

Hi again,
Did that and now the error message is:

/home/iulia/Ips-larvae-B/Ips.larvae.B/metadata2.csv is not a(n) PairedEndFastqManifestPhred33 file:

Found header on line 1 with the following labels: [‘sample-id\tabsolute-filepath\tdirection’], expected: [‘sample-id’, ‘absolute-filepath’, ‘direction’]

Any new ideas?
Thx for all your hepl @Oddant1.

Hi again,
So I think I am getting closer to the problem here. But is still a problem.
I went back to the “Casava 1.8 paired-end demultiplexed fastq” importing data option. Now, after i changed the format from .fq.gz to fastq.gz, seamns that things are improving. I am getting this error message:
There was a problem importing /home/iulia/Ips-larvae-B/Ips.larvae.B:

/home/iulia/Ips-larvae-B/Ips.larvae.B/FDMP20H000613-1a_01_L001_R1_001.fastq.gz is not a(n) FastqGzFormat file:

Missing sequence for record beginning on line 27169

And now the problem is visible. At the 27169 position I can see the problem:
@A01045:95:HJ3K2DRXX:1:2114:14859:8750 1:N:0:TAGCTGAG

@A01045:95:HJ3K2DRXX:1:2114:17029:8782 1:N:0:TAGCTGAG
@A01045:95:HJ3K2DRXX:1:2114:4833:8954 1:N:0:TAGCTGAG

Some data are missing.
But I don’t know how to fix the problem without damaging the sequencing data.
Why are those data missing in the first place?
Should I have a discussion with the sequencing company?

Maybe you can help figure this one out.

Yeah it looks like that sequence is just straight up missing. Does that record appear anywhere else in your file (with an associated sequence)? If so, I’d say you’re safe to remove it from the line it’s causing an error on and proceed. If that isn’t the case you can still remove it and proceed but you’ll be down a sequence.

Otherwise, if these are the reads you got from the sequencing center, I suggest you contact them about that missing sequence. If you’ve made any changes to the reads or done anything that could have affected them since you received them, I would suggest you go back through those changes and see if that sequence was ever there before contacting the sequencing center.

Additionally, the issue you were having previously in this post came from the values in your manifest being tab seperated and not comma seperated.

Hi @Oddant1,

I have managed to import the data.
Firstly I filtered them with cutadapt, then re-zip them into the .fastq.gz format and all was perfect. No error message anymore.

Thank you for all your support!
Let’s see how is going to go from now on.

Best regard,

1 Like