The value of mean length after dada2 is too different from the expected value

colinbrislawn · March 30, 2021, 2:51pm

Just as I had feared. Because the region overlapping is low quality, a longer overlap does not cause more reads to join...

OK, using only the forward read fixed the problems with quality and joining!

That is a problem, but luckily I have found the cause

It looks like your reads start with some random bases, then an adapter, then the true read. I'm willing to bet that these 6 bp are barcodes, causing your ASVs to seperate by sample.

(This paper lists actcctacgggaggcagcag is a 16S primer.)

If you trim off the barcode and adapter, your ASVs should appear across samples!

Try running with --p-trim-left 26 and see how it goes!

Yes. Removing low quality data should make your diversity analysis more accurate!

Those 6 starting basepairs and adapters look pretty abiotic to me!

Even better: the ASVs created by DADA2 could be 100% unique, with as little as 1 bp difference between them! The DADA2 paper explains this more.

Keep in touch,
Colin