Importing older solexa data

Hello,

I have been unable to import this older solexa data to qiime2.
Due to the headers.
For example:
@SOLEXA1_0069_FC:3:1:16444:1031#ACAGTG/2 AGTCAACAGGATTAGATACCCGGGTAGTCCACGCCGTAAACGATGAATGTTAGCCGTCGGGCAGTATACTGTTCGG-TCAGAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACA
GCCATGCAGCACCTGT
TCAGAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAGCCATGCAGCACCTGT
+SOLEXA1_0069_FC:3:1:16444:1031#ACAGTG/2
hhghfhhhghhghfhfhhhgehhhghhhhfhhhhhhfghhhchhhghheaghhhahhcg_gge_dhfccfbceadg

Does anyone have an idea on how to import it? Also, I don’t have the index reads.

Hey @raw937,

You are probably going to need to cook something custom up.

One thing you should know about Solexa fastq files is that the quality scores are not Phred scores. Here’s a paper describing the formula for conversion.

I think Biopython (and Bioperl/Biostars) can do the quality score conversion for you, but I haven’t done that myself.

As far as the barcodes/index goes, I don’t know enough about that technology to really advise as to what the protocol would have been. But without barcodes, QIIME 2 can’t do much of anything.

I don’t suppose you were lucky and these are already demultiplexed? It would have been really uncommon at the time, but maybe these were already processed a little bit?

Oh, great point!!! I had forgotten about this Solexa vs. Illumina. I will take a look at Biopython to convert them.
Ah, so I had two rounds of barcodes/indexing.
With these older files the index is in the header.
See in bold - ACAGTG and /2 is R2.
@SOLEXA1_0069_FC:3:1:16444:1031**#ACAGTG/2** AGTCAACAGGATTAGATACCCGGGTAGTCCACGCCGTAAACGATGAATGTTAGCCGTCGGGCAGTATACTGTTCGG-TCAGAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACA
GCCATGCAGCACCTGT
TCAGAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAGCCATGCAGCACCTGT
+SOLEXA1_0069_FC:3:1:16444:1031#ACAGTG/2
hhghfhhhghhghfhfhhhgehhhghhhhfhhhhhhfghhhchhhghheaghhhahhcg_gge_dhfccfbceadg

I was able to demultiplex them in the first and second round.
Then the second index is the first four letters which I used ea-utils to demultiplex the second file.

So, I have all the files demultiplexed.
I guess I am lost on how to import these into qiime2. Which --type should I use for these files single end and paired end.

qiime tools import --type ? – ?
Also, will these headers cause issues?

Hey @raw937,

Excellent, you should be able to import these with a fastq manifest format. Alternatively you might be able to use the Casava format instead (further down on that page), but your filenames have to match exactly.

Nope, the read headers themselves are pretty unimportant since we keep all of the per-sample reads in a single file and instead store a directory of per-sample files.


I think the only messy bit here will be converting the quality scores to phred and it sounds like you have a handle on that!

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.