Missing sequence for record beginning on line 13/ qiime2

amaria.gallego · March 17, 2020, 5:02pm

Cordial greeting. I am running next command for my paired-end sequences, and also I validated the manifiest with keemei.org and everything was ok (I also attached)

qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path MetadataMilk5.tsv --output-path paired-end-demux.qza --input-format PairedEndFastqManifestPhred33V2

Then I got next error

/tmp/q2-SingleLanePerSamplePairedEndFastqDirFmt-u0c_sxxj/DFbaby2_11_L001_R1_001.fastq.gz is not a(n) FastqGzFormat file:

  Missing sequence for record beginning on line 13

I double check the lines and they look ok. I cannot understand, thanks for the help!

MetadataMilk5.tsv (13.1 KB)

jwdebelius · March 17, 2020, 5:04pm

Hi @amaria.gallego,

I think the error is telling you that it can't find the sequences associated with the samples on line 13 (Which should be either DFbaby1 or DFbaby2). So, I would double check your filepaths and make sure the file exists and is in that location

Best,
Justine

amaria.gallego · March 17, 2020, 7:39pm

I already checked and everything is Ok, it could be something else? I am using last version of QIIIME2

Thanks for help!!

thermokarst · March 17, 2020, 8:07pm

Hi @amaria.gallego - I am going to step in for @jwdebelius. The error is a little confusing, but what it is saying is that one of your FASTQ files is corrupt (it might be missing sequence data around line 13 of a specific file). To figure out which file, we can work backwards:

The filename in the error message tells us two things: which sample is impacted, and which read direction is impacted. It looks like the sample ID is DFbaby2, and the read direction is forward, because of the R1. Now, working backwards, in your manifest file, looking up the filepath for the forward reads of DFbaby2:

/home_unal/Biop_Temp/LECHEMATERNA/LECHE/Delivery/result/00.RawData/Sequences/FDMP19H003467-1a_L1_DFbaby2_1.fq

So, go ahead and take a look at that file, specifically. You can run the following and share the results with us, if you wish:

head -n 20 /home_unal/Biop_Temp/LECHEMATERNA/LECHE/Delivery/result/00.RawData/Sequences/FDMP19H003467-1a_L1_DFbaby2_1.fq

One thing to keep in mind - depending on how your files were pre-processed, this error could be more widespread than just this one FASTQ file - we will have to take care of these as they might crop up.

Thanks for double-checking your manifest/metadata - luckily the error message doesn't seem to be indicating that QIIME 2 is upset with that, so we should be all good there.

Keep us posted! :qiime2:

amaria.gallego · March 17, 2020, 8:36pm

I run your code, and I upload the screenshot, what do you think?
To be sure, is the manifiest ok?

thermokarst · March 17, 2020, 8:39pm

Yep, no problem there.

Thanks for sharing! I think you have at least one "empty" record:

For some reason that record has no sequence data (or quality data) associated with it, which is a bit strange. Where did you get these files from? What kind of pre-processing was applied to them?

amaria.gallego · March 17, 2020, 8:46pm

I got the files from the Sequencing center,
The center provided me reads pretretates as follow

Amplicon was performed on a paired-end Illumina platform to generate 250bp paired-end raw reads (Raw PE). Paired-end reads were truncated by cutting off the barcode and primer sequence. These are my input files with barcodes and primers removed.

In that case how can I proceed with this fastq file? should I start fro the raw data?

Thanks!

amaria.gallego · March 19, 2020, 8:18pm

Cordial greeting and thanks @thermokarst!

I edited the corrupt fastq file with vim option and I was able to import the data to QIIME2.
Thanks

Best!