I’ve run about 320 samples across three MiSeq cartridges, using the V3V4 region with 300 bp PE sequencing. I then processed each sequencing run separately through Dada2, and finally merged the table and rep-seq files into one.
For the three runs, I got 24k, 11k and 21k representative sequences, which is already higher than expected in the first place. When I merged those three runs, I got 45k rep sequences. This is a lot higher than other analyses I’ve done in the past.
In contrast, another analysis (three runs, V3 only, 300 bp SE sequencing) of someone else’s data ended in 5k, 5k and 10k of representative sequences, which were merged in 10k. This study had slightly fewer samples overall, but not enough to explain this difference.
Does anyone have any idea what the problem could be and how to fix this? Please let me know if should provide any additional information.
Here is the command I used for the dada2 denoising:
indent preformatted text by 4 spaces
qiime dada2 denoise-paired