Hi Qiime2 community!
Once I did the demux-paired-end step, under de bar plot I saw the following message:
The plot at position 239 was generated using a random sampling of 9990 out of 19179393 sequences without replacement. This position (239) is greater than the minimum sequence length observed during subsampling (301 bases). As a result, the plot at this position is not based on data from all of the sequences, so it should be interpreted with caution when compared to plots for other positions. Outlier quality scores are not shown in box plots for clarity.
I read in a post that if you have this message in some positions it is not a problem but I have it in all the positions and all my bars are red.
When I had no problems with this step the message is:
These plots were generated using a random sampling of 10000 out of 16248858 sequences without replacement. The minimum sequence length identified during subsampling was 301 bases. Outlier quality scores are not shown in box plots for clarity.
What makes the difference is the dfference between random sampling (in one case 9990 and the other 10000). I don’t know how to interpret this. Should I be worried about the first message? Is the sequentation process OK? Can I continue my analysis?
Thank you very much!
The message can be a bit tricky to interpret. It basically is telling you that at the position you hovered over with your mouse (ex 239) when 10,000 random reads were subsampled from the total 19179393 reads you had, 9990 were at least 239 bp long (this is normal and what you would expect) and 10 reads were shorter than 239 bp, which may or may not be an issue. Let’s say in a previous step you had used cutadapt to remove some barcodes/primers or some other non-biological sequence out of your reads, and for some reason this cut some of your reads to shorter than 239 bp. Or in another scenario, you simply had a bunch of short read as a consequence of sequencing error, contaminants, or chimeras. Any of these scenarios may lead to you seeing this warning, meaning some of your sequences are shorter than that position. This is typically not an issue as long as this represents a small percentage of your reads, such as in your case. But if this number becomes larger and say most of your reads are now shorter than expected, then this may mean you have some issues upstream that you want to look at more in detail.
Hope this clarifies it a bit, let us know if you have additional questions.
This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.