Red interactive quality plot

thermokarst · July 12, 2018, 1:40am

Thank you so much for searching the forum before posting! That is such a huge help for us . Unfortunately, that error report you linked to is not applicable here (please allow me to explain!). In that issue, the bug (or error) was in the values of the message itself - "This position (249) is greater than the minimum sequence length observed during subsampling (251 bases)". According to my calculations, 249 is ~~not~~ less than 251 (not greater than), which is why that error was probably reported in the first place.

Okay, so on to your data. This isn't a bug , or even an error, it is just a cautionary warning to you:

This position (37) is greater than the minimum sequence length observed during subsampling (35 bases)

What that means is that the shortest read observed in these sequences was 35nts long. The reason there is a warning here is because the number of observations represented in the boxplot at this position is less than the number of reads present at earlier bp positions. That is the kind of thing to keep in mind when comparing the boxplots --- it isn't exactly a fair comparison if the "sample size" for one box plot is different from another.

Another way to look at this is to check out the "Demultiplexed sequence length summary" --- the low end of your read distribution is only 35 nts long , while it quickly jumps up to ~210ish nts. If all of your reads were the same length you would see the same number for each percentile in this 7-number summary.

Well, it looks like you have a wide spread of read lengths --- it is possible that the trim/trunc params your are running DADA2 with might not provide sufficient overlap when merging the reads (DADA2 needs ~20nt overlap). Have you tried running just your forward reads through q2-dada2?

Hope that helps! :qiime2: