I apologize if this is not the appropriate place to ask this, but this forum is the only place I've found to get reliable information about DADA2.
I have sequenced a pooled plasmid library using illumina NGS. As an initial step in analyzing the data, I'd like to merge and dreplicate the reads to make a sequence table, both of which I know DADA2 can do. My question, though, is whether DADA2 is specialized for metagenome amplicon analysis (and if so, if it might introduce errors/inaccuracies when used outside this conctext) or if it can be applied just as well to any NGS amplicon sequencing sample.
My question, though, is whether DADA2 is specialized for metagenome amplicon analysis (and if so, if it might introduce errors/inaccuracies when used outside this conctext) or if it can be applied just as well to any NGS amplicon sequencing sample.
DADA2 is designed for the study of amplicon sequencing libraries. It is starting with the assumption that all reads are starting from the same position due to PCR amplification steps using a common primer sequence (although it naturally handles mixture of a few different amplicons in one library as well). If library preparation involves a "shotgun" approach, where DNA is more randomly sheared and thus reads start from random places along the genome/plasmid/etc, then DADA2 is not an appropriate tool to analyze the data.
I'm not 100% sure from quickly skimming the pooled plasmid library link you provided whether the amplification steps happening here are targeted by specific primers, or are randomly amplifiying DNA from all across a larger plasmid, but that's the distinction that is important to understanding whether DADA2 is applicable to your data.
Hi Ben!
Thank you so much for addressing my question! Yes, the amplification is from a single site, using specific primers. Is that all that is necessary for DADA2 to be applicable?
That's the main thing. The other expectations are that technical bases (like primers) are removed, and that the underlying collection of sequences being amplified is not so hyper-diverse that every single read comes from a different biological sequence.