After import data

I encountered a problem as follows: “The plot at position 249 was generated using a random sampling of 9837 out of 545984 sequences without replacement. This position (249) is greater than the minimum sequence length observed during subsampling (251 bases). As a result, the plot at this position is not based on data from all of the sequences, so it should be interpreted with caution when compared to plots for other positions. Outlier quality scores are not shown in box plots for clarity.” I don’t know what does it mean. Thanks for you help!

Hi @alexximalaya! This bug has been fixed in QIIME 2 2017.12, which was released yesterday! If you regenerate this visualization with the latest version you shouldn’t encounter this bug anymore. Thanks! :bug:

To tag onto @thermokarst's answer:

That minimum sequence length was the buggy part, it was out of sync with the random sample.


But if your question is more generally about what the text is saying:

The goal is to indicate that some of your sequences do not have exactly the same length as the others (variable length adapters can cause this). It's really not something to worry about, unless you are seeing really dramatic difference in length, or a huge number of sequences that aren't meeting a given reasonable length.

For example if your entire plot was red instead of blue, then there would be something to explore. But if just the last few bars are red, then it's not really a problem. The distribution and summary statistics of those red bars can't really be fairly compared to the blue ones, but that's only because some of the reads couldn't be counted at that position for being too short.

1 Like

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.