Can you pre-filter data by raw sequence read length before denoising/truncating/clustering?

Hi @reige012,
The issue of host contamination is common enough, especially in samples from low microbe: high host DNA environment. In my experience, even mouse colon tissues which have loads of microbes can suffer from this if the extraction protocol heavily disrupts the host cells. Anecdotally I see this especially true around the V4 region. Instead of trimming these prior to denoising (and risk introducing bias) I would recommend simply filtering these host-associated sequences from your feature table after the fact. In Deblur this is done automatically using a positive-filter based on greengenes database but you would have to do this manually if you are using DADA2 for denoisig.
This has been discussed on the forum before, most recently here, and you can also see a lengthier discussion of it here. Both links will have examples of how to do the filtering I believe. Let us know if that helps.

3 Likes