Patterns in merging using DADA2

Tremadorr · April 15, 2024, 2:05pm

Hi there!

I have a 16S V3-4 (Klindworth) metabarcoding sequencing run from a study looking at freeze-thaw cycles across a fish farm impact gradient that is merging strangely. Essentially the further from the fish farm the lower the merge percentage, and a higher percentage with increasing freeze-thaw cycles.
I've gone through the denoise stats and it looks like there might be a pattern in the read depths for each sample that it's amplifying, but when looking at the percentages they're reasonably consistent up to the merge where it gets exaggerated into a really clear pattern. I can understand the pattern in the abundances perpetuating but not the sudden change in percentage.

It happens with different parameters/approaches (different trunc lengths, trunc based on quality score, varying max error). I don't believe it's related to the library prep due to the set up of the plates and patterns. There was odd quality in this run, with the forward being really good for the entire length and the reverse dropping off at 217 (I normally see them a bit closer). I've sequenced fish farm gradient 16S before from a similar study and not had this issue, and had better merging rates even with lower quality forward reads.

Does anyone know anything that might be causing this, or anything that can affect merging like this? Is it biological or technical somehow? I can't find anything about things that affect merging unequally!

Thanks for any help or insight!

colinbrislawn · April 15, 2024, 4:22pm

Hello @Tremadorr,

Welcome to the forums! :qiime2:

This is a fantastic first post. The details and graphs are super helpful.

I concur!

Can you tell us more about the 3 cohorts, like what are CE, AZE, and REF?
This may point to a biological source of this change.

Also, how long is the amplicon expected to be from these primers? V3-V4 can be quite long and difficult for Illumina to sequence through.
If you search the forums you will find many people struggling to join V3-V4 data.
This may point to a technical source of this change.

Tremadorr · April 17, 2024, 12:55pm

Hi Colin!

Thank you!

CE is cage edge (very impacted by organic loading from the fish farms), AZE is allowable zone of effect (an intermediate), and REF is reference (unimpacted). I usually find CE has a much higher DNA yield after extraction, though diversity appears to be similar across them all.

It should be around 460 bp according to the Illumina protocol but my longest is 430 bp. I've definitely had issues with merging before but managed to get better results than this with no pattern in it. Some of that was from the same sampling sites too (albeit a year later).

colinbrislawn · April 17, 2024, 4:46pm

You could try merging reads using vsearch and see what that does. This will confirm it's not a DADA2 issue:
https://docs.qiime2.org/2024.2/plugins/available/vsearch/merge-pairs/

I suppose REF is the biological control. Did you sequence any technical controls?

Tremadorr · April 22, 2024, 5:01pm

VSEARCH did an awful job of it but it doesn't have the same pattern!

Yep. Yes, denoising mostly removed everything from them.

colinvwood · April 23, 2024, 4:58pm

Hello @Tremadorr,

Could you attach the demux and dada2 stats visualizations?

Tremadorr · May 23, 2024, 3:42pm

Hi @colinvwood

Sorry for the delay!

ft_demux_paired_end.qzv (330.4 KB)
ftB_denoise_stats.qzv (1.2 MB)

cherman2 · June 10, 2024, 4:42pm

Hi @Tremadorr,
This is definitely odd. We are thinking this stochastic issue with V3-V4.

However I have one last question, All this data was sequences on the same sequencing run correct? That might explain the differences in groups we are seeing.

Tremadorr · June 13, 2024, 2:06pm

Yes it was all the same sequencing run.

system · July 14, 2024, 8:07pm

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.