Analyzing variable length joined paired-end reads with Deblur

Nicholas_Bokulich · May 1, 2018, 12:51pm

The latter. You can use q2-cutadapt to trim out adapters/primers.

Yes, unless if those reads are absurdly short. If you are using 16S amplicons, there should not be a vast amount of variation, though sometimes those variants are interesting/rare but real organisms. There should probably be a narrow distribution of the most abundant seqs, and anything much shorter than that may be artifact/poorly joined reads — but if you're concerned you could blast a couple of these before deciding whether to use a higher length threshold.

You will of course lose a certain amount of information but no it should not impact classification too much. As I mentioned above, 16S gene domains should have a fairly narrow length distribution so trimming to the shortest joined read should still get "in the ballpark" of this length distribution. If you are using ITS or another length-variable marker gene then yes, you may be losing useful information for classification (but such variable genes are often also so heterogeneous that the truncated read may still contain enough information for a good classification).

See the text and notes in this tutorial section. We do recommend trimming (for 16S) and yes it does impact quality, but in my experience it does not make that much of a difference. Extracting the correct domain with the correct PCR primers is more impactful that trimming to the precise trim length. If you are constrained (e.g., by memory or time) then this step is not critical and using an appropriate pre-trained classifier will be fine.

I hope that helps!