I’m getting an error, and when I open the file it says:
Plugin error from vsearch:
[Errno 32] Broken pipe
Debug info has been saved to /var/folders/2p/18k5mj456zs496j5kkqs_2sw0000gn/T/qiime2-q2cli-err-zxl78i8l.log
Fatal error: FASTQ quality value (43) out of range (0-41).
Please adjust the FASTQ quality base character or range with the
--fastq_ascii, --fastq_qmin or --fastq_qmax options. For a complete
diagnosis with suggested values, please run vsearch --fastq_chars file.
I tried to run because of the error suggestion but it said –fastq_qmax 43 / is not an option.
And then import into QIIME 2. There are other options to convert from Phred+64 to Phred+33 (you could even use a custom script) but I like this one.
More on Phred scores, Phred+33 and Phred+64
Phred scores (Q) are a measure of the quality of each nucleotide identification that depends on the probability of sequencing error (P) in this way: Q = -10log(10)P
So, if Q is:
10, it means the probability of error is 1/10 (90% accuracy)
20, it means the probability of error is 1/100 (99% accuracy)
30, it means the probability of error is 1/1000 (99.9% accuracy)
…and so on
In FASTQ files (and, in general, bioinformatic files) we want to optimize file size and save as much information we can in the minimum (digital) space. So instead of saving Phred scores as 30, 33, 34, 32… we map them to single characters. The choice here are ASCII characters. So we could take e.g. a Phred score of 22 and instead of ‘22’ type the ASCII character number 22.
The problem here is that some ASCII characters are spaces and non-printable characters (ASCII characters from 1 to 32). We don’t want characters that we don’t see to represent the quality of our data. That’s why we use the Phred+33 encoding. To use Phred+33 encoding, take the Phred score, add 33 to it, then use the ASCII character corresponding to the sum. For example, a Phred score of 30 would be the ASCII code of 63 (30 + 33), which is ‘?’.
For older data sequenced some years ago (that seems to be your data), quality scores are often encoded in Phred+64, which is the same as Phred+33 but you add 64 instead of 33. FASTQ files don’t indicate if they are in Phred+33 or Phred+64, so if you don’t know it beforehand, you need to discover it. Normally it is easy to differenciate (if you find Phred scores of 42 or more, you have Phred+64, otherwise it is normally Phred+33).
In theory, if you have a terrible Phred+64 FASTQ file, it could be also a really good Phred+33 file, but that is an edge case I’ve never seen (at least in my data).