Interactive quality plot interpretation and colors

Hello again!

Ignore my request for your data -- @ebolyen figured out what's going on. This is definitely a bug and will be fixed in the next QIIME 2 release. We'll follow up here when that happens!

TL;DR: The "minimum sequence length observed during subsampling (N bases)" that appears in the red warning text below the interactive plot is incorrect. The number being reported is the global minimum sequence length, but the subsampled minimum sequence length should be reported here. The way that you interpret these plots (blue vs. red) is the same though, and the cutoff between blue and red plots is correct. So this is a pretty minor bug but definitely makes things confusing when interpreting these warnings.

Longer explanation:

What's happening is that the shortest sequence in your data is 40bp. However, only 10,000 of those sequences are randomly subsampled, and their quality scores are plotted. These subsampled sequences (by chance) did not include that really short (40bp) sequence, so the box plots actually include quality scores from all subsampled sequences, up to 150bp (which is when the sequence lengths of the subsampled sequences begin to differ). Thus, the interactive plot is correctly warning (and coloring the box plots in red) at position 150 because that's the point where some sequences are longer than others in the subsampled sequences. The bug here is that 40bp (the global minimum) is being reported in the error message, when it really should be reporting the subsampled minimum of 149bp.

Let me know if you have any more questions about interpreting these plots!

2 Likes