Varying ASV lengths with 100% overlap after processing paired end reads

Hi all,

I am currently looking through the rep-seqs data from my output after processing paired end reads and I’m noticing that despite running the raw data through DADA2 for ASV clustering, I am getting sequences that are varying in length (which I expected based on the Atacama Soil microbiome tutorial) but some of these ASVs have a 100% ID to each other. For example, the ASVs below both appear to belong to the same taxonomic group, but one is 253 bp and the other is 238 bp. Across it’s 238 bp overlap, they are exactly the same. Is this a result of the parameters i set being too loose or is it possible that they are actually two different populations of organisms? If not, why were they not clustered together?

Thank you for the help!



Hello @termofilos, :wave:

That’s the expected result of DADA2! The goal of the program is to detect all true sequence variants that are present in the sample, and that second sequence is longer.

DADA2 does not cluster reads to make OTUs, it denoises reads to make ASVs. From the DADA2 paper:

DADA2 infers sample sequences exactly, without coarse-graining into OTUs, and resolves differences of as little as one nucleotide.

That’s totally possible! It’s also possible that these two amplicon sequence variants could be coming from the same microbe. :man_shrugging:


Great, thank you so much for this info @colinbrislawn !