Separation Anxiety and Banding on Gels?

ChrisKeefe · June 4, 2021, 12:44am

A collaborator and I are trying to understand our beta diversity results, with an eye for potential lab protocol problems. These are v4 16s data from relatively high biomass gut samples, EMP, 515-806 primers, from three extractions and two MiSeq runs.

When we visualize beta diversity, we see no meaningful separation in PCoA ordinations of any of the core-metrics metrics except for unweighted unifrac (UU):

UU gives us great separation on the most important (39.56%) axis. Unfortunately, this separation is not explained by any of our metadata.

In addition to the variables of interest, we have considered extraction, sequencing run, cage, and breeder without finding a coloring that fits The only decent match is metadata describing whether gels show unexpected banding, which many of them do (in blue above, example gel image below):

Based on some QC figures we looked over, roughly 1/37 of the amplified sequences are between 600 and 700 bp long just before sequencing, and we're trying to figure out what could cause that. We don't seem to have contamination in our non-template controls at the gel phase, and I would expect to see run or extraction effects if we were dealing with contamination during sequencing/extraction.

Questions

Do any of you lovely people have experience with gels like these? Any insights into why this might be occurring, or how we might correct for it in future runs?
Are there other likely explanatory variables we've overlooked here that we should explore?
I read here about host DNA contamination on v4. Levels of Bacteria-only classification seem pretty low, and uncorrelated to gel banding in our case. Does host DNA contamination still seem like a probable cause?
If you have attempted to learn about potential contaminants by BLAST-ing (or similar), how successful was the exercise? Would you do it again, or would you just filter and move on with your analysis?

Thanks for reading my novel!
Chris

Mehrbod_Estaki · June 4, 2021, 2:25am

Hi @ChrisKeefe ,
Yikes!
The first thing that pops to my head, especially because this is separating so clearly on UU (and I'm guessing on Jaccard as well), is the host contamination. Are these Deblur or DADA2 denoised?
With Deblur, I wouldn't expect to see this so clearly because of its inbuilt positive filtering. With DADA2 if there are host contaminations it would obviously be included and so you'd have to remove them yourself. Have you looked at the taxonomies yet? BLAST a bunch of the unknown or unclassified sequences, in my experience, if you're using beads to disrupt the cells, some mouse DNA can get amplified and show up in your results like this. For gut samples I generally use a positive filter + do a taxonomy filter to remove anything that doesn't have at least Phylum level classification and that seems to do ok.

Some other thoughts:

You can try doing a 2 step cell disruption to minimize host contamination. Step 1) cut tissue samples into smaller bits with scissors and do a high intense vortex (no beads) step with some detergent. This should ideally remove bacteria from the tissue sample and then you can separate the remaining big tissue chunks either by a short centrifuge, or try to pipette around it. Step 2) Your usual bead beating extraction without all the host cells there. The issue here is that this will require some benchmarking on your end to make sure you're not biasly discarding some bacterial cells alongside the host tissue. This may also have the unwanted consequence that will make it look different compared to previous data that simply "discarded the unknown".

Date of animal batches? Same vendor over time can still see change in microbiome (though certainly not THIS much)

Finally, can you clarify this gel by the way?Are the brightest bands the V4 target? What are the bands above and below it? What is their weight? Adding ladder weights to the image could help Also, are the wells in between the bright ones your no template controls? It's good that those are coming off blank, supports the idea of host contamination more.

This is strange indeed, though gels are not super accurate for that. You may want to look at some of them on a Bioanalzyer or something. What is your expected band weight with the target+barcodes+primers+adapters +linkers included?
Also, those gels can run weird on the outside wells, so you can try putting one of our ladders in a middle well, though I don't think this is the issue here. It could just be a not a perfect ladder?

jwdebelius · June 4, 2021, 2:35pm

Hi @ChrisKeefe,

I think @Mehrbod_Estaki has excellent advice!

I'll just throw out that diagnostically, I might look at an empire plot. I've had my fair share of splits in PCoA space that were explained by nothing in my data, but ended up being aa rare organism that decided to make my life difficult. So, the empire plot might help you visually identify the problematic point.

Best,
Justsine

ChrisKeefe · June 4, 2021, 11:55pm

Thanks for your wonderful feedback, @Mehrbod_Estaki and @jwdebelius. This morning was like Forum Christmas! We've got a lot to poke at now, and I'm very excited. I'll drop notes and questions in here while we work through things, in case the process is useful to anyone.

My collaborator tells me the gel banding issue also came up in a recent run of fecals, which may call the host DNA hypothesis into question. I'm going to poke around with positive filtering and taxonomy stuff, and see what I can see. (Thanks for the empire plot recommendation, @jwdebelius! This'll be my first time playing with Empress, and I'm geekin hard.)

Jaccard shows no separation, and the three top axes explain less than 20% of the difference.

The brightest bands are the v4 target, and are showing up in the mid-300s as expected. The bands below are probably primer-dimer, and the bands above, roughly between 600 and 700 bp, are our mystery.

I'm not sure what QC analysis my figures were from (because no idea what lab work is ), but it was a pretty comprehensive and reasonably reliable report, I'm told .

jbisanz · June 6, 2021, 5:13am

I believe the ~600-700 bands are PCR artifacts which I have variably heard called concatemers/heteroduplexes/pcr bubbles/etc. It has always been my understanding that these are indicative of over amplification. Most of the time when I have seen this occur it has been with fecal samples amplified for >30 cycles. Using the classic EMP amplification protocol (35 cycles) this is very common and I have also noted very high rates of chimeric reads. Have you tried any crude biplot or differential abundance/presence test between your two groups to get an idea for which features could be driving this difference?

Since your Jaccard doesn't show separation, I am going to go out on a limb and say you have a sketchy feature or two in your phylogenetic tree. I often will plot the tree before doing any analysis (especially de novo trees) and look for problematic tips like in the attached example. Usually you can figure out what they are from pulling the sequencing and doing a quick BLAST.

jbisanz · June 6, 2021, 5:19am

Here is a nice description and explanation. The tip off is that they are often ~2x your expected product size.

ChrisKeefe · June 7, 2021, 4:04pm

Thanks for your insight and a great resource, @jbisanz! That illumina article is great, and offers another possible cause of "bubble" formation. Too much nucleic acid input may have the same effect as too many PCR cycles. We're running 30 cycles on these tissue samples, but we may need to scale back in other places.

Thanks again, everyone, for your input! We'll keep you posted on what we discover.

jbisanz · June 7, 2021, 8:39pm

Great! I know some people normalize the input template to 5ng/ul but I'm not a huge fan of that method as it is time consuming and doesn't achieve the desired affect if you have a mix of prokaryotic/eukaryotic DNA. I personally use a variation of Systematic improvement of amplicon marker gene methods for increased accuracy in microbiome studies | Nature Biotechnology wherein you do a dilution series of your sample in smaller reactions and then pick the appropriate dilution to get late-exponential phase amplification. The example image is from mouse feces extracted with the zymo magbead kit.

ChrisKeefe · June 7, 2021, 8:41pm

Nice! We've gone solidly above my pay grade here, but I'll pass this along to our local resident lab-whiz!

ChrisKeefe · June 14, 2021, 11:10pm

TLDR: Forward progress is happening, and you're all amazing!

Just some notes here for posterity:
Mouse mitochondria were indeed out and about, as well as something classified as a shrimp fungus.

Filtering under-assigned taxa went smoothly, and the empire plots made it easy to spot a few other oddballs (mitochondria and a few archaea) . These were present only in very small quantities, and weren't in the 8 most important features shown in the emperor biplot, so I let them be.

All of the wet lab feedback was passed along to my collaborator's lab group. They've been doing some protocol improvement experiments, so hopefully that will all get straightened out and we'll have nice clean gels again.

CK

Mehrbod_Estaki · June 15, 2021, 7:11am

Awesome, things are getting better! The empire plots is actually a very great tool for this (great suggestion @jwdebelius!), and in fact when I was writing its tutorial I used it exactly for this purpose with the Moving Picture data. Removing mitochondria and Archaea ASVs are a bit more tricky because they are technically real targets of your V4 primer, and as long as you can justify removing them doesn't discard some real biological signal then it should be ok. But who knows, maybe one of your treatments actually does have an effect on archaea? Would be good to be sure.
Good luck and thanks for keeping us posted!