A collaborator and I are trying to understand our beta diversity results, with an eye for potential lab protocol problems. These are v4 16s data from relatively high biomass gut samples, EMP, 515-806 primers, from three extractions and two MiSeq runs.
When we visualize beta diversity, we see no meaningful separation in PCoA ordinations of any of the core-metrics metrics except for unweighted unifrac (UU):
UU gives us great separation on the most important (39.56%) axis. Unfortunately, this separation is not explained by any of our metadata.
In addition to the variables of interest, we have considered extraction, sequencing run, cage, and breeder without finding a coloring that fits The only decent match is metadata describing whether gels show unexpected banding, which many of them do (in blue above, example gel image below):
Based on some QC figures we looked over, roughly 1/37 of the amplified sequences are between 600 and 700 bp long just before sequencing, and we’re trying to figure out what could cause that. We don’t seem to have contamination in our non-template controls at the gel phase, and I would expect to see run or extraction effects if we were dealing with contamination during sequencing/extraction.
- Do any of you lovely people have experience with gels like these? Any insights into why this might be occurring, or how we might correct for it in future runs?
- Are there other likely explanatory variables we’ve overlooked here that we should explore?
- I read here about host DNA contamination on v4. Levels of Bacteria-only classification seem pretty low, and uncorrelated to gel banding in our case. Does host DNA contamination still seem like a probable cause?
- If you have attempted to learn about potential contaminants by BLAST-ing (or similar), how successful was the exercise? Would you do it again, or would you just filter and move on with your analysis?
Thanks for reading my novel!