16S V3-V4 length : what to do with the sequences of shorter than expected length after DADA2?

Hi @Liang_Cheng,
The expected 460 amplicon length includes the primer sites, however, you likely have removed those (as you should) prior to DADA2, so the difference in length is due to that.
V3-V4 region can hit some unspecific targets as you have seen. My experience is that it can hit quite a bit of mouse host genes if the sample is high in host cells. Eitherway, their removal is important and I would recommend doing this.

You have a couple of options here. To get rid of non-16S reads you can use a permissive positive filter like the one implemented in Deblur by default. I believe it uses 88% clustered greengenes OTUs, then you can exclude sequences using quality-control exclude-seqs and give it very permissive threshold like 65% identity, with 50% coverage. This will basically toss away any reads that look weird and not anything like bacteria. I've found this method to work really well for me and is very fast. You can also go on to build your taxonomy file first and then use taxonomy-based filtering to discard reads that don't hit at least at a Phylum level in a bacteria database. I prefer the first approach but I don't have any benchmarking data to recommend one over the other. See what works best for you.

I doubt it since these are real targets hit by the primers and not chimeras. Also note that DADA2 already has a chimera removal step in it so you don't need to do this again separately.

I would advise you to read the literature and the various posts on this forum as why you should -or more likely- shouldn't use OTU picking and stick with your ASVs.

Hope this helps!

1 Like