Importing Data Failing. Not sure about the right protocol

Hello All,
I have a set of old raw data fastq files, and not sure how to import them (I am new in QIIME). All the data (141 samples) are packed in four fastq.gz files and their names are indicated below. I also have a metadata file (validated in Keemei already).
I used the protocol “Multiplexed paired-end FASTQ with barcodes in sequence” for importing but QIIME 2-2020.8 gave me an error message indicating that it cannot find: ‘forward.fastq.gz’
Should I be using another protocol for importing instead? Like Casava 1.8 paired-end demultiplexed fastq?? The problem is that I don’t have two fastq.gz per sample since all my 141 samples are packed into the four files below.
Samp1-66_S1_L001_R1_001.fastq.gz
Samp1-66_S1_L001_R2_001.fastq.gz
Samp67-91_S2_L001_R1_001.fastq.gz
Samp67-91_S2_L001_R2_001.fastq.gz

What should I do? Any help would be greatly appreciated. Thank you so much.

Mauricio

Hi @RodLan, welcome!

Do you have a separate file with barcode sequences? Or are the barcodes in-line in the fwd or rev reads?

Thanks Matt,

Yes, I have a separate file with barcodes. This looks like this (pasted below)

sampleid BarcodeSequence LinkerPrimerSequence BarcodeName ProjectName Name Site Time Point Colony
14 CTGGAAGT AGRGTTTGATCMTGGCTCAG Ill27Fbar68 102516MRL3 14 Coffins 14-Apr COL-21
16 CTGGACTG AGRGTTTGATCMTGGCTCAG Ill27Fbar69 102516MRL3 16 Coffins 14-Apr COL-25
20 CTGGATCC AGRGTTTGATCMTGGCTCAG Ill27Fbar70 102516MRL3 20 Coffins 14-Apr COL-37
23 CTGGTGTG AGRGTTTGATCMTGGCTCAG Ill27Fbar72 102516MRL3 23 Coffins 14-Apr COL-48
28 CTGGTATT AGRGTTTGATCMTGGCTCAG Ill27Fbar71 102516MRL3 28 Coffins 14-Apr COL-39
32 CTACCAAG AGRGTTTGATCMTGGCTCAG Ill27Fbar26 102516MRL3 32 Pickle 14-Apr COL-52
33 CTACTCGC AGRGTTTGATCMTGGCTCAG Ill27Fbar29 102516MRL3 33 Pickle 14-Apr COL-89
39 CTAGCTGG
.
.
.
Samples 141

Thanks @RodLan - that appears to be your sample metadata, with a column of the Barcodes, which is certainly helpful. What I am looking for is information about where your barcodes are with respect to your sequences. For example, EMP protocol generates a separate barcodes.fastq.gz file. Other sequencing protocols will keep the barcodes in the forward reads. We need something in order to tie all of these pieces together. Put another way - how can we put that barcode metadata to use to actually identify which sequences belong to which samples. If you don't know, no worries! Just ask your sequencing center and they will help you piece it all together.

:qiime2:

1 Like

Thanks. Yes I have just now reached out to the sequencing center. They did the sequencing three years ago.

Now with the “Multiplexed paired-end FASTQ with barcodes in sequence” protocol we could use the metadata file along with the Fastq files without a barcodes.fastq, correct?

Correct - but only if the barcodes are in the sequences. The big point to take away here is that each read has to have a barcode value associate with it, otherwise there is no way to map the read back to an individual sample.

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.