Patterns in merging using DADA2

Hi there!

I have a 16S V3-4 (Klindworth) metabarcoding sequencing run from a study looking at freeze-thaw cycles across a fish farm impact gradient that is merging strangely. Essentially the further from the fish farm the lower the merge percentage, and a higher percentage with increasing freeze-thaw cycles.
I've gone through the denoise stats and it looks like there might be a pattern in the read depths for each sample that it's amplifying, but when looking at the percentages they're reasonably consistent up to the merge where it gets exaggerated into a really clear pattern. I can understand the pattern in the abundances perpetuating but not the sudden change in percentage.

It happens with different parameters/approaches (different trunc lengths, trunc based on quality score, varying max error). I don't believe it's related to the library prep due to the set up of the plates and patterns. There was odd quality in this run, with the forward being really good for the entire length and the reverse dropping off at 217 (I normally see them a bit closer). I've sequenced fish farm gradient 16S before from a similar study and not had this issue, and had better merging rates even with lower quality forward reads.

Does anyone know anything that might be causing this, or anything that can affect merging like this? Is it biological or technical somehow? I can't find anything about things that affect merging unequally!

Thanks for any help or insight!

Hello @Tremadorr,

Welcome to the forums! :qiime2:

This is a fantastic first post. The details and graphs are super helpful.

I concur!

Can you tell us more about the 3 cohorts, like what are CE, AZE, and REF?
This may point to a biological source of this change.

Also, how long is the amplicon expected to be from these primers? V3-V4 can be quite long and difficult for Illumina to sequence through.
If you search the forums you will find many people struggling to join V3-V4 data.
This may point to a technical source of this change.

Hi Colin!

Thank you!

CE is cage edge (very impacted by organic loading from the fish farms), AZE is allowable zone of effect (an intermediate), and REF is reference (unimpacted). I usually find CE has a much higher DNA yield after extraction, though diversity appears to be similar across them all.

It should be around 460 bp according to the Illumina protocol but my longest is 430 bp. I've definitely had issues with merging before but managed to get better results than this with no pattern in it. Some of that was from the same sampling sites too (albeit a year later).

1 Like

You could try merging reads using vsearch and see what that does. This will confirm it's not a DADA2 issue:

I suppose REF is the biological control. Did you sequence any technical controls?

VSEARCH did an awful job of it but it doesn't have the same pattern!

Yep. Yes, denoising mostly removed everything from them.

1 Like

Hello @Tremadorr,

Could you attach the demux and dada2 stats visualizations?


Hi @colinvwood

Sorry for the delay!

ft_demux_paired_end.qzv (330.4 KB)
ftB_denoise_stats.qzv (1.2 MB)