The value of mean length after dada2 is too different from the expected value

Hello again @LiyingXie,

Just as I had feared. Because the region overlapping is low quality, a longer overlap does not cause more reads to join...

OK, using only the forward read fixed the problems with quality and joining! :+1:

That is a problem, but luckily I have found the cause :point_down:

It looks like your reads start with some random bases, then an adapter, then the true read. I'm willing to bet that these 6 bp are barcodes, causing your ASVs to seperate by sample.

(This paper lists actcctacgggaggcagcag is a 16S primer.)

If you trim off the barcode and adapter, your ASVs should appear across samples!

Try running with --p-trim-left 26 and see how it goes!


Yes. Removing low quality data should make your diversity analysis more accurate! :sparkles:

Those 6 starting basepairs and adapters look pretty abiotic to me! :robot: :face_with_monocle:

Even better: the ASVs created by DADA2 could be 100% unique, with as little as 1 bp difference between them! The DADA2 paper explains this more. :bookmark:

Keep in touch,
Colin