I have over 55 million reads generated by three Miseq chips. The DADA2 command is expected to run a long time on Mac, so I would like to confirm that I use the right parameters before run it.
Your trim/truncating parameters look fine to me based on your quality scores but I did want to confirm a few things since you mentioned you have 3 separate runs.
First, make sure all non-biological sequences are removed including primers/barcodes/adaptors etc, and that no other QC has been done prior to dada2. I bring this up because typically we see quality plots with 250 or 300 cycles whereas here I see its at 260 something… also the plots are a bit more u-shaped then we typically see with Illumina runs, though I may be overthinking it here.
This is just the quality plot from one of the 3 runs and not a combined version right? For dada2 you want to denoise the runs separately then merge them afterwards. You’ll also want to use the same parameters in all 3 runs so make sure the other 2 plots can handle the same parameters.
If all these items are accounted for, have a go and good luck!
Aha, that would explain the odd looking pattern. Yes, there will be important differences in the error model that dada2 builds which is going to specific to each run, so you’ll want to make sure you denoise them separately and combine the feature tables after denoising. One additional consideration, do all 3 runs have the same target region with the same primers?
That’s good, that’s one less thing to worry about. So at this point you should go back one step before the 3 runs were merged and denoise each separately with the same parameters and merge the resulting tables after. Let us know if you run into any issues!