Hi @nounou,
As @ebolyen mentioned, the most likely culprit here is the non-biological portion of your primers intact. That is to say the the overhang portion of the primer that is used to bind to barcodes/spacers etc, the portion of the primer that is part of the actual read shouldnāt cause a problem.
You mentioned that you had 27K representative sequences, not sure what your expected community diversity is but that is pretty high! In fact, suspiciously highā¦ Which again might just be reflected in those non-biological sequences. Remember that rep-seqs are just the number of unique features identified in your samples, it doesnāt hold any information regarding their abundances.
I think the difference in your previous experience and with DADA2 is in the nature of how OTU vs ASVs are selected.
DADA2 is first building an error model based on a portion of all your samples regardless of what you have trimmed or what you left in. Then it will denoise your reads based on that error model, again not caring what you left in it or not. So if you have some reads that are the same taxa but with different primers on the 5ā, or in your case without the primers; for ex.
no-primer-AAACCCGGG
primerA-AAACCCGGG
custom_primer-AAACCCGGG
These will end up as 3 separate unique features in your rep-seqs table. Even though they should be the same feature. This is because the non-biological primers you left in there are making them different than each other. This would be true even if a single bp was different between them. Thatās why they are referred to as exact sequence variants.
In contrast, OTU picking might have clustered them together into 1 OTU if they were within lets say 97% similarity of each other.
When it comes to assigning taxonomy, since those non-biological sequences donāt exist in the reference databases, the classifier canāt assign them to any single species, instead just leaves them as Bacteria since that is the closest assignment it could get. With OTU picking, this wasnāt an issue since we would have been much more lenient since up to 3% of error was still good enough to assign it to something.
To be sure, the DADA2 (and other denoising methods) are much more accurate and should replace OTU methods in most cases, including yours.
So, as per @ebolyenās suggestion, trim those non-biological sequences out, re-run dada2 and try assigning taxonomy again. You should see less ASVs in your rep-seqs and your taxonomy should be much more accurate. We hopeā¦if not weāll have to start troubleshooting elsewhere:P
The second issue of importance here is, (again as @ebolyen mentioned) is that these samples are not from the same run. DADA2ās error model building step is specific to a single run, so you should run the samples from the same run together then merge them later. Now, the merging of samples from different primers is a whole other topic on its own which has been discussed on this forum quite a bit. I recommend you do some searching and reading on the forum on that topic before merging your samples to make sure it is in fact appropriate in your case.
Keep us posted!