Import barcodes

stella · December 10, 2020, 5:01pm

Hi
I received the follwoing from a company that seuqenced for us
Forward.fastq , Reverse.fastq (Sequence with barcodes) and a mappingFile (#SampleID BarcodeSequence LinkerPrimerSequence).
However, the index.fastq file is not being returned…Hence I m not able to use the qiime tools import plugin. Is there a way to produce a barcodes.fastq from the barcodes i got in the xls file?
If i try it gives me the follwing error: Header on line 1 is not FASTQ, records may be misaligned
thanks

SoilRotifer · December 10, 2020, 5:11pm

Hi @stella, welcome to .

Can you provide the list of the actual file names you received from your sequencing company? Or are these the actual file names:

Are these single-end or paired-end reads? I assume single-end, as it sounds like that your reverse reads may actually be barcodes / indices? You can peak inside the file by running the following at the command line:

head Reverse.fastq

If you see short sequences, e.g ~12 bp, then these are likely your barcodes.

The sequencing company should provide you with details outlining the sequencing protocol and how to demultiplex and use the data. You should ask them for this information if you do not have it.

-Mike

stella · December 10, 2020, 5:44pm

Hi Mike
Thanks
so I want to use this plugin:
qiime tools import
–type EMPPairedEndSequences
–input-path emp-paired-end-sequences
–output-path emp-paired-end-sequences.qza
For the input I need 3 fastq.gz files, corresponding to the forward, reverse, and barcode (i.e., index) reads.
The company privde me only with the forwars and revers reads as well as an excell mapping file. Can I somehow convert this mapping file to a barcode fastq file needed for the import?

SoilRotifer · December 10, 2020, 7:05pm

I'd suggest contacting them to obtain further details. They may have simply forgot to supply you with the Index (barcodes) fastq file. You should confirm that you have everything you need.

The "import command" you use is largely dependent on the format of your data. If these are indeed multiplexed paired-end data then you need to have the company supply you with the index (barcodes) fastq file. However, given that your initial post mentioned:

Makes me wonder if the barcodes may be within the reverse reads. If so, then you may want to try one of the cutadapt demultiplex options.

Otherwise, the company may have mis-labeled your index reads as reverse reads. Did you happen to peek at the contents of the file as I suggested to confirm?

If this is the case then you would want to use EMPSingleEndSequences. Then you can rename reverse.fastq to barcodes.fastq. See the import documentation here.

-Mike

stella · December 10, 2020, 8:07pm

Hi Mike

You are right! I applied your suggestion and realized that the barcodes are still in the sequences, in the forward as well as the reverse… (each 6bp).

However, I know see another folder they provided with demultiplexed sequences already named with the sample: For each sample a fwd & rev sequence, without primers/barcodes. How would I be able to import this demultiplexed fastq files and merge them to one so I can continue and run the DADA2 plugin?

SoilRotifer · December 10, 2020, 8:30pm

Great news @stella!

It appears that you have a couple options. You can try to demultiplex the data yourself using cutadapt, or use the Manifest format to import your already demultiplexed data. I’d suggest the manifest format, as it is much easier than dealing with demultiplexing duel-indexed reads.

After you import the data, checkout the tutorials, specifically the Atacama tutorial, as it runs through processing paired-end data.

stella · December 10, 2020, 10:50pm

Thanks for your fast replies & help! I followed the manifest format and it worked out…

system · January 11, 2021, 4:50am

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.