Trimming the same sequences differently creates two distinctly mirrored groupings

I am trying to fully understand this phenomenon. This occurs when viewing a V3V4 16s sequencing run from fecal samples. and in Bray curtis and Jaccard, but not Unifrac.
If I truncate the same set of sequences, one at 290/250 and the other at 280/240, I get essentially the two trimmed sets as mirror images of each other. so within a set the differences between samples is preserved, but they are completely viewed as different from the other trimmed set. (see image below, with the same samples but different trims colored the same color. My initial understanding was this is because Bray curtis works with ASVs or exact sequences vs weighted and unweighted unifrac maybe mapping OTUs to a database first. Is this what is happening? and if so, why does a little bit of trim difference create such a drastic shift in position on the pcoa? the two sequences are still aligning to make a single contig, I wouldnt think that many base pairs would shift trimming off a little more lower quality sequence?
you can see on the left group the yellow blue and red points that are found on the bottom of the image are crudely mirrored at the top of the group on the right.
also if I rotate the chart i can see the axes are not particularly shared.
I know the solve is to trim the sequences identically but I wondered if this is normal, and if the ASV vs some other proccesing to the OTUs is why Bray and Jaccard are affected when the others methods aren't
Thanks!

Hi @Dan ,
Welcome to the forum!

This is totally normal.

Your intuition is more or less correct. The issue here is that bray-curtis and jaccard distance treat all features as being totally unique, independent entities, and does not consider their sequence similarity (as unifrac does). So if you trim a single nucleotide from ASV, suddenly you have a new ASV that will be treated as a distinct entity. When you take the same exact samples and trim them two different ways, now you have two different sets of totally unique ASVs, and no ASVs will overlap between these two "datasets"! So this is why you see that Jaccard and Bray-Curtis cluster these samples separately.

unifrac is not affected by this because it considers the phylogenetic similarity between each feature when calculating distances between samples. Even if you trim the reads, they should (ideally) map to more or less the same position on a reference phylogeny. So it is more robust to trimming than the phylogenetically agnostic methods (bray curtis, jaccard, etc).

Good luck!

5 Likes

Hi Nicholas,
thank you for clarity on this. That makes complete sense now. I get why the trim would always result in a new ASV due to length.
As for truncating more or less sequence, that wouldn't necessitate change in length the way trimming the outer ends would, so long as they still overlapped,
So the explanation for the change in ASV is that the alternate truncation is likely to affect some base call changes, and even a change in one base call would have drastic effect because that still creates a new ASV.

1 Like

yes I agree. But you trimmed and truncated (5' and 3' ends, if I undestood correctly), correct?

Yes this is possible also, i.e., that truncating at different positions could impact alignment and merging of the read pairs. This is a good reason for keeping trimming and truncation parameters consistent when merging data from multiple runs/studies/whatever.

2 Likes

This topic was automatically closed 31 days after the last reply. New replies are no longer allowed.