Sequence length after denoise paired-end reads

Hi @kyle_paddock,

Welcome to the :qiime2: forum!

This is a great question - and your impression is correct, your sequence lengths should be around 253 bp. A couple of good things to point out with your data set is that your mean length is 253.11, which is right where it should be! Additionally, the standard deviation is only about 3bp which means that most of your data is sitting right around that 253 bp mark.

However, you do seem to have some outliers on both ends (the min and max lengths) - especially that max length, which is much longer than the V4 region. The short answer is that these lengths can be related to non-target DNA, which you will most likely want to filter out unless they are of interest to you. You could try filtering out anything that's shorter than 240 and longer than 255 and see what the statistics look like after that, but that should remove those outliers you're seeing.

This is a great forum post that goes into more detail on this situation, for your reference.

Hope this helps! Cheers :lizard:

1 Like