I have 180 samples that were analyzed by a bioinformatics center using Qiime 1.9 about a year and a half ago (V3-V4, MiSeq, 2x300). I am interested in learning more about analyzing the data myself, comparing Qiime2 results to the older version, and potentially developing an undergraduate course using this data set, so I am running through Qiime 2 using a virtual box. I can get to the denoising step with the complete data set (importing and primer trimming) with no issues, but my limited RAM can’t handle that many samples in dada2. I have found that I can process 12 samples at a time in dada2, then merge the results for downstream analysis.
My question is how I should determine my p-trunc-len values for each dada2 run. Would it be better to look at the quality plots from the entire pooled data set (180 samples) to determine forward and reverse truncations for dada2 or is it better to look at the quality plots for just the 12 samples that will be grouped in the dada2 process?
From the few runs of 12 samples I have completed, it seems like the truncation values for each run are about the same, which makes me think that there isn’t a difference between a pooled vs individual group approach.