Remove host sequences prior to Dada2

Hey everyone,

I have some samples for V4 region of 16s amplicon, which contain a low amount of bacterial DNA but, have a considerable amount of host DNA.

I was wondering if it would be useful to align the sequences and remove host ones prior to run DADA2?

Thanks in advance

Hi @asbarros,

A couple thoughts/questions.

Your primers target the gene of interest, in this case, the V4 region fo the 16S gene. They can have some off target effects (IIRC, sometimes you get 12S amplification if you're not careful) but for the most part, because you're amplifying based on a target, you're selecting for the target. So, the host DNA you're considering is primarily off-target amplification. (Plus some mitochondria or chloroplast, maybe, since t hose can be host-contaminants that do amplify in the V4 region).

Let's assume the worst case scenario where you had untargeted amplification and you picked up a bunch of host DNA (this is unlikely). I would expect the Illumina error profile to be similar between the host and bacteria DNA, so I dont think it will substantially affect the way your error correction model is trained. It might affect chimera detection, and it will probably have poor taxonomic annotation.

So, I guess my advice would be that you an pre-remove but its IMO kind of computationally expensive for something that is unlikely to be a problem and then just make sure that when you do your taxonomic annotation, you have a good out group an filter thinsgt hat aren't annotated to at least hte class level.