I have a 16S V3-4 (Klindworth) metabarcoding sequencing run from a study looking at freeze-thaw cycles across a fish farm impact gradient that is merging strangely. Essentially the further from the fish farm the lower the merge percentage, and a higher percentage with increasing freeze-thaw cycles.
I've gone through the denoise stats and it looks like there might be a pattern in the read depths for each sample that it's amplifying, but when looking at the percentages they're reasonably consistent up to the merge where it gets exaggerated into a really clear pattern. I can understand the pattern in the abundances perpetuating but not the sudden change in percentage.
It happens with different parameters/approaches (different trunc lengths, trunc based on quality score, varying max error). I don't believe it's related to the library prep due to the set up of the plates and patterns. There was odd quality in this run, with the forward being really good for the entire length and the reverse dropping off at 217 (I normally see them a bit closer). I've sequenced fish farm gradient 16S before from a similar study and not had this issue, and had better merging rates even with lower quality forward reads.
Does anyone know anything that might be causing this, or anything that can affect merging like this? Is it biological or technical somehow? I can't find anything about things that affect merging unequally!
This is a fantastic first post. The details and graphs are super helpful.
I concur!
Can you tell us more about the 3 cohorts, like what are CE, AZE, and REF?
This may point to a biological source of this change.
Also, how long is the amplicon expected to be from these primers? V3-V4 can be quite long and difficult for Illumina to sequence through.
If you search the forums you will find many people struggling to join V3-V4 data.
This may point to a technical source of this change.
CE is cage edge (very impacted by organic loading from the fish farms), AZE is allowable zone of effect (an intermediate), and REF is reference (unimpacted). I usually find CE has a much higher DNA yield after extraction, though diversity appears to be similar across them all.
It should be around 460 bp according to the Illumina protocol but my longest is 430 bp. I've definitely had issues with merging before but managed to get better results than this with no pattern in it. Some of that was from the same sampling sites too (albeit a year later).
Hi @Tremadorr,
This is definitely odd. We are thinking this stochastic issue with V3-V4.
However I have one last question, All this data was sequences on the same sequencing run correct? That might explain the differences in groups we are seeing.