Quality Scores too high for Qiime2

zu_mu · September 23, 2025, 6:15am

Hello, I’m analyzing COI data and I just ran:

qiime vsearch fastq-stats \
  --i-sequences paired-end-demux.qza \
  --p-threads 1 \
  --o-visualization VIEWABLE_ee_paired-end-demux.qzv

I’m getting an error, and when I open the file it says:

Plugin error from vsearch:

  [Errno 32] Broken pipe

Debug info has been saved to /var/folders/2p/18k5mj456zs496j5kkqs_2sw0000gn/T/qiime2-q2cli-err-zxl78i8l.log

Fatal error: FASTQ quality value (43) out of range (0-41).
Please adjust the FASTQ quality base character or range with the
--fastq_ascii, --fastq_qmin or --fastq_qmax options. For a complete
diagnosis with suggested values, please run vsearch --fastq_chars file.

I tried to run because of the error suggestion but it said –fastq_qmax 43 / is not an option.

qiime vsearch fastq-stats \
  --i-sequences paired-end-demux.qza \
  --p-threads 1 \ 
  --fastq_qmax 43 \
  --o-visualization VIEWABLE_ee_paired-end-demux.qzv

I found that one suggestion is convert to a Phred+33. What are the pros and cons to this option? Are there any other things I can do?

salias · September 23, 2025, 8:11am

Hello @zu_mu

It looks that your FASTQ is encoded in Phred+64 (instead of Phred+33).

There is no con, it is only another way of representing the same Phred scores.

If you want to convert to Phred+33, you can use reformat.sh from BBMap with a command like:

reformat.sh in=reads-p64.fastq out=reads-p33.fastq qin=64 qout=33

And then import into QIIME 2. There are other options to convert from Phred+64 to Phred+33 (you could even use a custom script) but I like this one.

More on Phred scores, Phred+33 and Phred+64

Phred scores (Q) are a measure of the quality of each nucleotide identification that depends on the probability of sequencing error (P) in this way: Q = -10log(10)P

So, if Q is:

10, it means the probability of error is 1/10 (90% accuracy)
20, it means the probability of error is 1/100 (99% accuracy)
30, it means the probability of error is 1/1000 (99.9% accuracy)
…and so on

In FASTQ files (and, in general, bioinformatic files) we want to optimize file size and save as much information we can in the minimum (digital) space. So instead of saving Phred scores as 30, 33, 34, 32… we map them to single characters. The choice here are ASCII characters. So we could take e.g. a Phred score of 22 and instead of ‘22’ type the ASCII character number 22.

The problem here is that some ASCII characters are spaces and non-printable characters (ASCII characters from 1 to 32). We don’t want characters that we don’t see to represent the quality of our data. That’s why we use the Phred+33 encoding. To use Phred+33 encoding, take the Phred score, add 33 to it, then use the ASCII character corresponding to the sum. For example, a Phred score of 30 would be the ASCII code of 63 (30 + 33), which is ‘?’.

For older data sequenced some years ago (that seems to be your data), quality scores are often encoded in Phred+64, which is the same as Phred+33 but you add 64 instead of 33. FASTQ files don’t indicate if they are in Phred+33 or Phred+64, so if you don’t know it beforehand, you need to discover it. Normally it is easy to differenciate (if you find Phred scores of 42 or more, you have Phred+64, otherwise it is normally Phred+33).

In theory, if you have a terrible Phred+64 FASTQ file, it could be also a really good Phred+33 file, but that is an edge case I’ve never seen (at least in my data).

Best,

Sergio

ebolyen · September 23, 2025, 6:06pm

You could also use this very rarely used import command:

Which will convert the quality scores to 33, although I expect it to be slower than @salias's solution.

system · October 25, 2025, 1:33pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.