Read length selection during extraction from reference databases

Mehrbod_Estaki · February 19, 2020, 4:53am

Hi @Negin,
I moved this discussion to a new thread.
So to clarify, when you are extracting reads from the SILVA database using your primer sets you are getting some reads that are largely outside of your expected range, as short as 53 bp etc.
I never really thought about this as a potential issue, and it may be that it is not really an issue, assuming your feature-table doesn't have any short reads such as this. But if your primers hit short regions in the reference database then it is possible it happens in your real data too. I would certainly impose some size restrictions then, something like 100bp above and below your expected size length should be good enough. If you still have doubts about some of the reads that are say 99bp shorter than what you expect, I would blast those and make a decision as what you want to do next. Keep or discard. You can always do a second round of trimming after denoising using this nifty approach by @thermokarst so you don't need to re-run dada2.
Keep us posted.